An Information-Theoretic Optimality Principle for Deep Reinforcement Learning

08/06/2017
by   Felix Leibfried, et al.
0

In this paper, we methodologically address the problem of cumulative reward overestimation in deep reinforcement learning. We generalise notions from information-theoretic bounded rationality to handle high-dimensional state spaces efficiently. The resultant algorithm encompasses a wide range of learning outcomes that can be demonstrated by tuning a Lagrange multiplier that intrinsically penalises rewards. We show that deep Q-networks arise as a special case of our proposed approach. We introduce a novel scheduling scheme for bounded-rational behaviour that ensures sample efficiency and robustness. In experiments on Atari games, we show that our algorithm outperforms other deep reinforcement learning algorithms (e.g., deep and double deep Q-networks) in terms of both game-play performance and sample complexity.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro