boax.policies.thompson_sampling#
- boax.policies.thompson_sampling()#
The thompson sampling policy function.
Randomly samples action values for all variants and selects the variant with the highest sampled values.
Example
>>> policy = thompson_sampling() >>> variant = policy(params, timestep, key)