boax.policies.thompson_sampling#

boax.policies.thompson_sampling()#

The thompson sampling policy function.

Randomly samples action values for all variants and selects the variant with the highest sampled values.

Example

>>> policy = thompson_sampling()
>>> variant = policy(params, timestep, key)

boax.policies.thompson_sampling