boax.policies.epsilon_greedy

Contents

boax.policies.epsilon_greedy#

boax.policies.epsilon_greedy(epsilon)#

The epsilon greedy policy function.

Greedily selects the variant with the highest action value with a probability of 1 - epsilon or uniform randomly selects a variant with probability of epsilon.

Example

>>> policy = epsilon_greedy(epsilon)
>>> variant = policy(params, timestep, key)
Parameters:

epsilon (Union[Array, ndarray, bool, number, float, int]) – The parameter guiding exploration vs exploitation.

Return type:

Policy[ActionValues]

Returns:

The corresponding Policy.