boax.policies module#

Implements functionalities to construct policy functions.

boax.policies#

Policy Types#

class boax.policies.Policy(*args, **kwargs)#

A callable type for policy functions.

A policy function takes a set of parameters of type T, a timestep, and a pseudo-random key as input and returns a selected variant.

Policies#

Action Value Policies#

epsilon_greedy(epsilon)

The epsilon greedy policy function.

boltzmann(tau)

The boltzmann policy function.

upper_confidence_bound(confidence)

The upper confidence bound policy function.

Beta Policies#

thompson_sampling()

The thompson sampling policy function.

boax.policies.believes#

Belief Types#

class boax.policies.believes.Belief(init: InitFn[T], update: UpdateFn[T, R], best: BestFn[T])#

A policy belief.

The belief is defined by a set of an init, update, and an best function.

Believes#

binary(num_variants)

The binary Beta belief.

continuous(num_variants)

The continous belief.