Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

06/20/2017
by   Philip S. Thomas, et al.
0

We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro