boax.policies.boltzmann

Contents

boax.policies.boltzmann#

boax.policies.boltzmann(tau)#

The boltzmann policy function.

Randomly selects a variant proportional to the current action-values.

Example

>>> policy = boltzmann(tau)
>>> variant = policy(params, timestep, key)
Parameters:

tau (Union[Array, ndarray, bool, number, float, int]) – The temperature parameter guiding exploration vs exploitation.

Return type:

Policy[ActionValues]

Returns:

The corresponding Policy.