boax.policies.epsilon_greedy#

boax.policies.epsilon_greedy(epsilon)#

The epsilon greedy policy function.

Greedily selects the variant with the highest action value with a probability of 1 - epsilon or uniform randomly selects a variant with probability of epsilon.

Example

>>> policy = epsilon_greedy(epsilon)
>>> variant = policy(params, timestep, key)

Parameters:: epsilon (Union[Array, ndarray, bool, number, float, int]) – The parameter guiding exploration vs exploitation.
Return type:: Policy[ActionValues]
Returns:: The corresponding Policy.

boax.policies.epsilon_greedy

Contents

boax.policies.epsilon_greedy#