boax.policies.boltzmann#
- boax.policies.boltzmann(tau)#
The boltzmann policy function.
Randomly selects a variant proportional to the current action-values.
Example
>>> policy = boltzmann(tau) >>> variant = policy(params, timestep, key)
- Parameters:
tau (
Union[Array,ndarray,bool,number,float,int]) – The temperature parameter guiding exploration vs exploitation.- Return type:
Policy[ActionValues]- Returns:
The corresponding Policy.