Article Articles


Demystifying the Many Deep Reinforcement Learning Algorithms

Glen Berseth

In recent years, there has been an explosion in Deep Reinforcement learning research resulting in the creation of many different RL algorithms that work with deep networks. In DeepRL and RL, in general, the goal is to optimize a policy \(\pi(a|s,\theta)\) with parameters \(\theta\) with respect to the future discounted reward.

$$J(\pi) = \mathbb{E} [\sum_\limits{t=0}^{T} \gamma^{t} r_{t}] $$

It can be difficult to keep track of the many algorithms let alone their properties and when it is best to use which one. In this post, I make an effort to organize several RL methods into a few groups. This organization helps clear up some misconceptions of different algorithms and demystifies what these properties mean, for example, on-policy vs off-policy.



Info

Glen Berseth

I am a PostDoc at the Berkeley Artificial Intelligence Research (BAIR) group working in the Robotic AI & Learning Lab (RAIL) lab with Sergey Levine. I received my PhD at the Department of Computer Science at the University of British Columbia in 2019 where I worked on reinforcement learning, machine learning and motion planning with Michiel van de Panne. I received my BSc degree in Computer Science from York University in 2012 and my MSc from York University under the supervision of Petros Faloutsos in 2014 for optimization and authoring crowd simulations. I have published in a wide range of areas including computer animation, machine learning and robotics and was an NSERC scholarship award winner. You can find a list of projects and publications here.