boax.policies.upper_confidence_bound#
- boax.policies.upper_confidence_bound(confidence)#
The upper confidence bound policy function.
Selects the variant with highest action-value plus the upper confidence bound.
Example
>>> policy = upper_confidence_bound(confidence) >>> variant = policy(params, timestep, key)
- Parameters:
confidence (
Union[Array,ndarray,bool,number,float,int]) – The confidence parameter guiding exploration vs exploitation.- Return type:
Policy[ActionValues]- Returns:
The corresponding Policy.