boax.policies.boltzmann#

boax.policies.boltzmann(tau)#

The boltzmann policy function.

Randomly selects a variant proportional to the current action-values.

Example

>>> policy = boltzmann(tau)
>>> variant = policy(params, timestep, key)

Parameters:: tau (Union[Array, ndarray, bool, number, float, int]) – The temperature parameter guiding exploration vs exploitation.
Return type:: Policy[ActionValues]
Returns:: The corresponding Policy.

boax.policies.boltzmann