Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

05/26/2020
by   Gen Li, et al.
4

We investigate the sample efficiency of reinforcement learning in a γ-discounted infinite-horizon Markov decision process (MDP) with state space S and action space A, assuming access to a generative model. Despite a number of prior work tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined. In particular, prior results suffer from a sample size barrier, in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least |S||A|/(1-γ)^2 (up to some log factor). The current paper overcomes this barrier by certifying the minimax optimality of model-based reinforcement learning as soon as the sample size exceeds the order of |S||A|/1-γ (modulo some log factor). More specifically, a perturbed model-based planning algorithm provably finds an ε-optimal policy with an order of |S||A| /(1-γ)^3ε^2log|S||A|/(1-γ)ε samples for any ε∈ (0, 1/1-γ]. Along the way, we derive improved (instance-dependent) guarantees for model-based policy evaluation. To the best of our knowledge, this work provides the first minimax-optimal guarantee in a generative model that accommodates the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically impossible).

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro