boax.policies.thompson_sampling

boax.policies.thompson_sampling#

boax.policies.thompson_sampling()#

The thompson sampling policy function.

Randomly samples action values for all variants and selects the variant with the highest sampled values.

Example

>>> policy = thompson_sampling()
>>> variant = policy(params, timestep, key)
Return type:

Policy[Beta]

Returns:

The corresponding Policy.