`boax.policies` module#

Implements functionalities to construct policy functions.

boax.policies#

class boax.policies.Policy(*args, **kwargs)#

A callable type for policy functions.

A policy function takes a set of parameters of type T, a timestep, and a pseudo-random key as input and returns a selected variant.

`epsilon_greedy`(epsilon)	The epsilon greedy policy function.
`boltzmann`(tau)	The boltzmann policy function.
`upper_confidence_bound`(confidence)	The upper confidence bound policy function.

The thompson sampling policy function.

class boax.policies.believes.Belief(init: InitFn[T], update: UpdateFn[T, R], best: BestFn[T])#

A policy belief.

The belief is defined by a set of an init, update, and an best function.

`binary`(num_variants)	The binary Beta belief.
`continuous`(num_variants)	The continous belief.