boax.policies.epsilon_greedy#
- boax.policies.epsilon_greedy(epsilon)#
The epsilon greedy policy function.
Greedily selects the variant with the highest action value with a probability of 1 - epsilon or uniform randomly selects a variant with probability of epsilon.
Example
>>> policy = epsilon_greedy(epsilon) >>> variant = policy(params, timestep, key)
- Parameters:
epsilon (
Union[Array,ndarray,bool,number,float,int]) – The parameter guiding exploration vs exploitation.- Return type:
Policy[ActionValues]- Returns:
The corresponding Policy.