boax.policies module#
Implements functionalities to construct policy functions.
boax.policies#
Policy Types#
- class boax.policies.Policy(*args, **kwargs)#
A callable type for policy functions.
A policy function takes a set of parameters of type T, a timestep, and a pseudo-random key as input and returns a selected variant.
Policies#
Action Value Policies#
|
The epsilon greedy policy function. |
|
The boltzmann policy function. |
|
The upper confidence bound policy function. |
Beta Policies#
The thompson sampling policy function. |
boax.policies.believes#
Belief Types#
- class boax.policies.believes.Belief(init: InitFn[T], update: UpdateFn[T, R], best: BestFn[T])#
A policy belief.
The belief is defined by a set of an init, update, and an best function.
Believes#
|
The binary Beta belief. |
|
The continous belief. |