Publications


Apprenticeship Learning and Reinforcement Learning with Application to Robotic Control,
Pieter Abbeel
Ph.D. Dissertation, Stanford University, Computer Science, August 2008
pdf



[ALL | Deep RL | Learning-to-Learn | Apprentice | Optimization-based Planning | Belief Space Planning | Hierarchical Planning | Perception | Deformable Objects | Medical Robotics | Helicopter | Connectomics ]


Pre-prints

Learning to Adapt: Meta-Learning for Model-Based Control,
Ignasi Clavera, Anusha Nagabandi, Ronald S. Fearing, Pieter Abbeel, Sergey Levine, Chelsea Finn.
arXiv 1803.11347, videos

Accelerated Methods for Deep Reinforcement Learning,
Adam Stooke and Pieter Abbeel.
arXiv 1802.02811

Meta-Reinforcement Learning of Structured Exploration Strategies,
Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine.
arXiv 1802.07245

Evolved Policy Gradients,
Rein Houthooft, Richard Y. Chen, Phillip Isola, Bradly C. Stadie, Filip Wolski, Jonathan Ho, Pieter Abbeel.
arXiv 1802.04821

Domain Randomization and Generative Models for Robotic Grasping,
Joshua Tobin, Wojciech Zaremba, Pieter Abbeel.
arXiv 1710.06425

UCB Exploration via Q-Ensembles,
Richard Y. Chen, Szymon Sidor, Pieter Abbeel, John Schulman.
arXiv 1706.01502

Equivalence Between Policy Gradients and Soft Q-Learning,
John Schulman, Xi (Peter) Chen, Pieter Abbeel.
arXiv 1704.06440

Adversarial Attacks on Neural Network Policies,
Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel.
arXiv 1702.02284, videos

Uncertainty-Aware Reinforcement Learning for Collision Avoidance,
Gregory Kahn, Adam Villaflor, Vitchyr Pong, Pieter Abbeel, Sergey Levine.
arXiv 1702.01182, videos

RL2: Fast Reinforcement Learning via Slow Reinforcement Learning,
Yan (Rocky) Duan, John Schulman, Xi (Peter) Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel.
arXiv 1611.02779, videos

Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model,
Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, Wojciech Zaremba.
arXiv 1610.03518


Publications

bibtex

[191] DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills,
Xue Bin (Jason) Peng, Pieter Abbeel, Sergey Levine, Michiel van de Panne.
In the proceedings of SIGGRAPH, Vancouver, Canada, August 2018.
arXiv 1804.02717

[190] Self-Consistent Trajectory Autoencoder: Learning Trajectory Embeddings for Model Based Hierarchical Reinforcement Learning,
In the proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, July 2018.
(arxiv forthcoming)

[189] Latent Space Policies for Hierarchical Reinforcement Learning,
Tuomas Haarnoja, Kristian Hartikainen, Pieter Abbeel, Sergey Levine.
In the proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, July 2018.
arXiv 1804.02809

[188] Universal Planning Networks,
Aravind Srinivas, Allan Jabri, Pieter Abbeel, Sergey Levine, Chelsea Finn.
In the proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, July 2018.
arXiv 1804.00645, videos

[187] Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine.
In the proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, July 2018.
arXiv 1801.01290, github

[185] Automatic Goal Generation for Reinforcement Learning Agents,
David Held, Xinyang Geng, Carlos Florensa, Pieter Abbeel.
In the proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, July 2018.
arXiv 1705.06366

[183] Asymmetric Actor Critic for Image-Based Robot Learning,
Lerrel Pinto, Marcin Andrychowicz, Peter Welinder, Wojciech Zaremba, Pieter Abbeel.
In the proceedings of Robotics: Science and Systems (RSS), Pittsburgh, PA, USA, June 2018.
arXiv 1710.06542, videos

[182] Learning with Opponent-Learning Awareness,
Jakob N. Foerster*, Richard Y. Chen*, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch.
In the proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Stockholm, Sweden, July 2018 (arXiv 1709.04326)

[181] Learning Generalized Reactive Policies using Deep Neural Networks,
Edward Groshev, Aviv Tamar, Siddharth Srivastava, Pieter Abbeel.
In the proceedings of the 28th International Conference on Automated Planning and Scheduling (ICAPS), Delft, The Netherlands, June 2018 (arXiv 1708.07280)

[180] Model-Ensemble Trust-Region Policy Optimization,
Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, Pieter Abbeel.
In the proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, Canada, April 2018 (arXiv 1802.10592)

[179] A Simple Neural Attentive Meta-Learner,
Nikhil Mishra*, Mostafa Rohaninejad*, Xi (Peter) Chen, Pieter Abbeel.
In the proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, Canada, April 2018 (arXiv 1707.03141)

[178] Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines,
Cathy Wu, Aravind Rajeswaran, Yan Duan, Vikash Kumar, Alexandre M Bayen, Sham Kakade, Igor Mordatch, Pieter Abbeel.
In the proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, Canada, April 2018 (arXiv 1803.07246)

[177] Meta Learning Shared Hierarchies,
Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, John Schulman.
In the proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, Canada, April 2018 (arXiv 1710.09767)

[176] Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments,
Maruan Al-Shedivat, Trapit Bansal, Yuri Burda, Ilya Sutskever, Igor Mordatch, Pieter Abbeel.
In the proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, Canada, April 2018 (arXiv 1710.03641, videos)

[175] Parameter Space Noise for Exploration,
Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, Marcin Andrychowicz.
In the proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, Canada, April 2018 (arXiv 1706.01905)

[174] Composable Deep Reinforcement Learning for Robotic Manipulation,
Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, Sergey Levine
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, May 2018. (arXiv 1803.06773, videos, code)

[173] Learning Robotic Assembly from CAD, Best Paper Finalist,
Garrett Thomas*, Melissa Chien*, Aviv Tamar, Juan Aparicio Ojea, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, May 2018. (arXiv 1803.07635, video)

[172] Sim-to-Real Transfer of Robotic Control with Dynamics Randomization,
Xue Bin (Jason) Peng, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, May 2018. (arXiv 1710.064537, video)

[170] Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation,
Gregory Kahn, Adam Villaflor, Bosen Ding, Pieter Abbeel, Sergey Levine.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, May 2018. (arXiv 1709.10489)

[169] Overcoming Exploration in Reinforcement Learning with Demonstrations,
Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, May 2018. (arXiv 1709.10089)

[168] Deep Object-Centric Representations for Generalizable Robot Learning,
Coline Devin, Pieter Abbeel, Trevor Darrell, Sergey Levine.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, May 2018. (arXiv 1708.04225)

[167] Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation,
YuXuan (Andrew) Liu*, Abhishek Gupta*, Pieter Abbeel, Sergey Levine.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, May 2018. (arXiv 1707.03374)

[166] Emergence of Grounded Compositional Language in Multi-Agent Populations,
Igor Mordatch, Pieter Abbeel.
In The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018), New Orleans, Louisiana, February 2018. arXiv 1703.04908

[165] Inverse Reward Design,
Dylan Hadfield-Menell et al.
In Neural Information Processing Systems (NIPS), Long Beach, CA, December 2017. (pdf forthcoming)

[164] Hindsight Experience Replay,
Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba.
In Neural Information Processing Systems (NIPS), Long Beach, CA, December 2017. (arXiv 1707.01495, videos)

[163] Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments,
Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch.
In Neural Information Processing Systems (NIPS), Long Beach, CA, December 2017. (arXiv 1706.02275)

[161] #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning,
Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel.
In Neural Information Processing Systems (NIPS), Long Beach, CA, December 2017. (arXiv 1611.04717)

[159] Mutual Alignment Transfer Learning,
Markus Wulfmeier, Ingmar Posner, Pieter Abbeel.
In the proceedings of the 1st Annual Conference on Robot Learning (CoRL), Mountain View, CA, November 2017. (arXiv 1707.07907)

[158] Reverse Curriculum Generation for Reinforcement Learning,
Carlos Florensa, David Held, Markus Wulfmeier, Pieter Abbeel.
In the proceedings of the 1st Annual Conference on Robot Learning (CoRL), Mountain View, CA, November 2017. (arXiv 1707.05300)

[155] The Off-Switch Game,
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell.
In the proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, August 2017. (arXiv 1611.08219)

[154] Constrained Policy Optimization,
Josh Achiam, David Held, Aviv Tamar, Pieter Abbeel.
In the proceedings of the International Conference on Machine Learning, Sydney, Australia, August 2017. (arXiv 1705.10528)

[153] Prediction and Control with Temporal Segment Models,
Nikhil Mishra, Pieter Abbeel, Igor Mordatch.
In the proceedings of the International Conference on Machine Learning, Sydney, Australia, August 2017. (arXiv 1703.04070)

[152] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks,
Chelsea Finn, Pieter Abbeel, Sergey Levine.
In the proceedings of the International Conference on Machine Learning, Sydney, Australia, August 2017. (arXiv 1703.03400)

[151] Reinforcement Learning with Deep Energy-Based Policies,
Tuomas Haarnoja*, Haoran Tang*, Pieter Abbeel, Sergey Levine.
In the proceedings of the International Conference on Machine Learning, Sydney, Australia, August 2017. (arXiv 1702.08165)

[150] Enabling Robots to Communicate their Objectives,
Sandy H. Huang, David Held, Pieter Abbeel, Anca D. Dragan.
In the proceedings of Robotics Science and Systems, Cambridge, MA, July 2017. (arXiv 1702.03465)

[147] Learning Visual Servoing with Deep Features and Trust Region Fitted Q-Iteration,
Alex Lee, Sergey Levine, Pieter Abbeel.
In the proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, April 2017. (arXiv 1703.11000, videos, code, benchmark)

[146] Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning,
Abhishek Gupta*, Coline Devin*, YuXuan (Andrew) Liu, Pieter Abbeel, Sergey Levine.
In the proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, April 2017. (pdf forthcoming)

[145] Stochastic Neural Networks for Hierarchical Reinforcement Learning,
Carlos Florensa Campo, Yan (Rocky) Duan, Pieter Abbeel.
In the proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, April 2017. (arXiv 1704.03012, videos, code)

[144] Generalizing Skills with Semi-Supervised Reinforcement Learning,
Chelsea Finn, Tianhe Yu, Justin Fu, Pieter Abbeel, Sergey Levine.
In the proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, April 2017. arXiv 1612.00429

[142] Probabilistically Safe Policy Transfer,
David Held, Zoe McCarthy, Michael Zhang, Yide (Fred) Shentu, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 2017. (arXiv 1705.05394)

[141] Combining Self-Supervised Learning and Imitation for Vision-Based Rope Manipulation,
Ashvin Nair, Pulkit Agrawal, Dian Chen, Phillip Isola, Pieter Abbeel, Jitendra Malik, Sergey Levine.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 2017. (arXiv 1703.02018)

[140] Reset-Free Guided Policy Search: Efficient Deep Reinforcement Learning with Stochastic Initial States,
William Montgomery*, Anurag Ajay*, Chelsea Finn, Pieter Abbeel, Sergey Levine.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 2017. (arXiv 1610.01112)

[139] Deep Reinforcement Learning for Tensegrity Robot Locomotion,
Xinyang Geng*, Marvin Zhang*, Jonathan Bruce*, Ken Caluwaerts, Massimo Vespignani, Vytas SunSpiral, Pieter Abbeel, Sergey Levine.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 2017. (arXiv 1609.09049)

[138] Learning from the Hindsight Plan -- Episodic MPC Improvement,
Aviv Tamar, Garrett Thomas, Tianhao Zhang, Sergey Levine, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 2017. (arXiv 1609.09001)

[137] Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer,
Coline Devin*, Abhishek Gupta*, Trevor Darrell, Pieter Abbeel, Sergey Levine.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 2017. (arXiv 1609.07088)

[136] PLATO: Policy Learning using Adaptive Trajectory Optimization,
Gregory Kahn, Tianhao Zhang, Sergey Levine, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 2017. (arXiv 1603.00622)

[135] Towards Adapting Deep Visuomotor Representations from Simulated to Real Environments,
Eric Tzeng, Coline Devin, Judy Hoffman, Chelsea Finn, Pieter Abbeel, Sergey Levine, Kate Saenko, Trevor Darrell.
In the proceedings of the Workshop on Algorithmic Foundations of Robotics (WAFR), San Francisco, CA, USA, December 2016. (arXiv 1511.07111

[133] Cooperative Inverse Reinforcement Learning,
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell.
In Neural Information Processing Systems (NIPS), Barcelona, Spain, December 2016. (arXiv 1606.03137)

[132] Value Iteration Networks, Best Paper Award,
Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, Pieter Abbeel.
In Neural Information Processing Systems (NIPS), Barcelona, Spain, December 2016. (arXiv 1602.02867)

[131] Learning to Poke by Poking: Experiential Learning of Intuitive Physics,
Pulkit Agrawal, Ashvin Nair, Pieter Abbeel, Jitendra Malik, Sergey Levine.
In Neural Information Processing Systems (NIPS), Barcelona, Spain, December 2016. (arXiv 1606.07419)

[130] VIME: Variational Information Maximizing Exploration,
Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel.
In Neural Information Processing Systems (NIPS), Barcelona, Spain, December 2016. (arXiv 1605.09674)

[126] One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors,
Justin Fu, Sergey Levine, Pieter Abbeel.
In the proceedings of the 29th IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, October 2016. (pdf, arXiv 1509.06841)

[122] Benchmarking Deep Reinforcement Learning for Continuous Control,
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel.
In the proceedings of the International Conference on Machine Learning (ICML), 2016. (arXiv 1604.06778, rllab:code, rllab:docs)

[121] Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization,
Chelsea Finn, Sergey Levine, Pieter Abbeel.
In the proceedings of the International Conference on Machine Learning (ICML), 2016. (arXiv 1603.00448)

[119] End-to-End Training of Deep Visuomotor Policies,
Sergey Levine*, Chelsea Finn*, Trevor Darrell, Pieter Abbeel.
To appear in the Journal of Machine Learning Research (JMLR), 2016. (arXiv 1504.00702, video)

[118] High-Dimensional Continuous Control Using Generalized Advantage Estimation,
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel.
In the proceedings of the International Conference on Learning Representations (ICLR), 2016 (arXiv 1506.02438, video)

[117] Combining Model-Based Policy Search with Online Model Learning for Control of Physical Humanoids,
Igor Mordatch, Nikhil Mishra, Clemens Eppner, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2016. (pdf)

[115] Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search
Tianhao Zhang, Gregory Kahn, Sergey Levine, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2016. (arXiv 1509.06791)

[114] Deep Spatial Autoencoders for Visuomotor Learning
Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2016. (arXiv 1509.06113)

[113] Learning Deep Neural Network Policies with Continuous Memory States
Marvin Zhang, Zoe McCarthy, Chelsea Finn, Sergey Levine, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2016. (arXiv 1507.01273)

[112] Model-based Reinforcement Learning with Parametrized Physical Models and Optimism-Driven Exploration
Christopher Xie, Sachin Patil, Teodor Moldovan, Sergey Levine, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2016. (arXiv 1509.06824)

[110] Gradient Estimation Using Stochastic Computation Graphs,
John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel.
In Neural Information Processing Systems (NIPS), Montreal, Canada, December 2015.
arXiv 1506.05254

[W] Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models,
Bradly C. Stadie, Sergey Levine, Pieter Abbeel.
Presented at NIPS 2015 Workshop on Deep Reinforcement Learning
arXiv 1507.00814

[105] Learning Compound Multi-Step Controllers under Unknown Dynamics,
Weiqiao Han, Sergey Levine, Pieter Abbeel.
In the proceedings of the 28th IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, September 2015. (pdf)

[97] Trust Region Policy Optimization,
John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel.
In the proceedings of the 32nd International Conference on Machine Learning (ICML), 2015. (pdf, arXiv preprint)

[95] Deep Learning Helicopter Dynamics Models,
Ali Punjani, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2015. (pdf)

[94] Learning Contact-Rich Manipulation Skills with Guided Policy Search, Best Robotic Manipulation Paper Award,
Sergey Levine, Nolan Wagener, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2015. (pdf)

[86] Optimism-Driven Exploration for Nonlinear Systems,
Teodor Mihai Moldovan, Sergey Levine, Michael I. Jordan, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2015. (pdf)

[80] Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics,
Sergey Levine, Pieter Abbeel.
In Neural Information Processing Systems (NIPS) 27, 2015. (pdf)

[51] Safe Exploration in Markov Decision Processes,
Teodor Moldovan and Pieter Abbeel.
In the proceedings of the 29th International Conference on Machine Learning (ICML), 2012. (pdf)

[35] On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient,
Jie Tang and Pieter Abbeel.
In Neural Information Processing Systems (NIPS) 23, 2011. (pdf)

[26] Autonomous Helicopter Aerobatics through Apprenticeship Learning,
Pieter Abbeel, Adam Coates and Andrew Y. Ng.
In the International Journal of Robotics Research (IJRR), Volume 29 Issue 13 November 2010. (pdf, videos)

[3] Apprenticeship Learning via Inverse Reinforcement Learning,
Pieter Abbeel and Andrew Y. Ng.
In Proceedings of ICML, 2004. (ps, pdf, supplement: ps , pdf, supplementary webpage here)