Publications


Apprenticeship Learning and Reinforcement Learning with Application to Robotic Control,
Pieter Abbeel
Ph.D. Dissertation, Stanford University, Computer Science, August 2008
pdf



[ALL | Deep RL | Learning-to-Learn | Apprentice | Sim2Real | Unsupervised | Optimization-based Planning | Belief Space Planning | Hierarchical Planning | Perception | Deformable Objects | Medical Robotics | Helicopter | Connectomics ]


Pre-prints

Visual Hindsight Experience Replay,
Himanshu Sahni, Toby Buckley, Pieter Abbeel, Ilya Kuzovkin.
arXiv 1901.11529

Soft Actor-Critic Algorithms and Applications,
Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine.
arXiv 1812.05905

SOLAR: Deep Structured Latent Representations for Model-Based Reinforcement Learning,
Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew J. Johnson, Sergey Levine.
arXiv 1808.09105

Variational Option Discovery Algorithms,
Joshua Achiam, Harrison Edwards, Dario Amodei, Pieter Abbeel.
arXiv 1807.10299

The Limits and Potentials of Deep Learning for Robotics,
Niko Sünderhauf, Oliver Brock, Walter Scheirer, Raia Hadsell, Dieter Fox, Jürgen Leitner, Ben Upcroft, Pieter Abbeel, Wolfram Burgard, Michael Milford, Peter Corke.
arXiv 1804.06557

Accelerated Methods for Deep Reinforcement Learning,
Adam Stooke and Pieter Abbeel.
arXiv 1802.02811

A Berkeley View of Systems Challenges for AI,
Ion Stoica, Dawn Song, Raluca Ada Popa, David Patterson, Michael W. Mahoney, Randy Katz, Anthony D. Joseph, Michael Jordan, Joseph M. Hellerstein, Joseph E. Gonzalez, Ken Goldberg, Ali Ghodsi, David Culler, Pieter Abbeel.
arXiv 1712.05855

Interpretable and Pedagogical Examples,
Smitha Milli, Pieter Abbeel, Igor Mordatch.
arXiv 1711.00694

Synkhronos: a Multi-GPU Theano Extension for Data Parallelism,
Adam Stooke and Pieter Abbeel.
arXiv 1710.04162

UCB Exploration via Q-Ensembles,
Richard Y. Chen, Szymon Sidor, Pieter Abbeel, John Schulman.
arXiv 1706.01502

Equivalence Between Policy Gradients and Soft Q-Learning,
John Schulman, Xi (Peter) Chen, Pieter Abbeel.
arXiv 1704.06440

Adversarial Attacks on Neural Network Policies,
Sandy H. Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel.
arXiv 1702.02284, videos

Uncertainty-Aware Reinforcement Learning for Collision Avoidance,
Gregory Kahn, Adam Villaflor, Vitchyr Pong, Pieter Abbeel, Sergey Levine.
arXiv 1702.01182, videos

A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models,
Chelsea Finn, Paul Christiano, Pieter Abbeel, Sergey Levine.
arXiv 1611.03852

RL2: Fast Reinforcement Learning via Slow Reinforcement Learning,
Yan (Rocky) Duan, John Schulman, Xi (Peter) Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel.
arXiv 1611.02779, videos

Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model,
Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, Wojciech Zaremba.
arXiv 1610.03518


Publications

bibtex

[207] Preference Implicit in the State of the World,
Rohin Shah*, Dmitrii Krasheninnikov*, Jordan Alexander*, Pieter Abbeel, Anca Dragan
In the proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, USA, May 2019.
arXiv 1902.04198 (code)

[206] Guiding Policies with Language via Meta-Learning,
John D. Co-Reyes, Abhishek Gupta, Suvansh Sanjeev, Nick Altieri, John DeNero, Pieter Abbeel, Sergey Levine.
In the proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, USA, May 2019.
arXiv 1811.07882

[205] ProMP: Proximal Meta-Policy Search,
Jonas Rothfuss*, Dennis Lee*, Ignasi Clavera*, Tamim Asfour, Pieter Abbeel.
In the proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, USA, May 2019.
arXiv 1810.06784

[204] Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow,
Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, Sergey Levine.
In the proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, USA, May 2019.
arXiv 1810.00821

[203] Learning to Adapt: Meta-Learning for Model-Based Control,
Ignasi Clavera, Anusha Nagabandi, Ronald S. Fearing, Pieter Abbeel, Sergey Levine, Chelsea Finn.
In the proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, USA, May 2019.
arXiv 1803.11347, videos

[202] SFV: Reinforcement Learning of Physical Skills from Videos,
Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, Sergey Levine.
In the proceedings of SIGGRAPH ASIA, Tokyo, Japan, December 2018.
arXiv 1810.03599

[201] Learning Plannable Representations with Causal InfoGAN,
Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart Russell, Pieter Abbeel.
In Neural Information Processing Systems (NeurIPS), Montreal, Canada, December 2018.
arXiv 1807.09341

[200] Some Considerations on Learning to Explore via Meta-Reinforcement Learning,
Bradly C. Stadie, Ge Yang, Rein Houthooft, Xi Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever.
In Neural Information Processing Systems (NeurIPS), Montreal, Canada, December 2018.
arXiv 1803.01118

[199] Meta-Reinforcement Learning of Structured Exploration Strategies,
Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine.
In Neural Information Processing Systems (NeurIPS), Montreal, Canada, December 2018.
arXiv 1802.07245

[198] Evolved Policy Gradients,
Rein Houthooft, Richard Y. Chen, Phillip Isola, Bradly C. Stadie, Filip Wolski, Jonathan Ho, Pieter Abbeel.
In Neural Information Processing Systems (NeurIPS), Montreal, Canada, December 2018.
arXiv 1802.04821

[196] Modular Architecture for StarCraft II with Deep Reinforcement Learning,
Dennis Lee, Haoran Tang, Jeffrey O Zhang, Huazhe Xu, Trevor Darrell, Pieter Abbeel.
In the proceedings of the 14th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE'18), Edmonton, Canada, November 2018.
arXiv 1811.03555

[195] Model-Based Reinforcement Learning via Meta-Policy Optimization,
Ignasi Clavera*, Jonas Rothfuss*, John Schulman, Yasuhiro Fujita, Tamim Asfour, Pieter Abbeel.
In the proceedings of the Conference on Robot Learning (CoRL), Zurich, Switzerland, October 2018.
arXiv 1809.05214

[194] Composable Action-Conditioned Predictors: Flexible Off-Policy Learning for Robot Navigation,
Gregory Kahn, Adam Villaflor, Pieter Abbeel, Sergey Levine.
In the proceedings of the Conference on Robot Learning (CoRL), Zurich, Switzerland, October 2018.
arXiv 1810.07167, video

[193] Establishing Appropriate Trust via Critical States,
Sandy H. Huang, Kush Bhatia, Pieter Abbeel, Anca D. Dragan.
In the proceedings of the IEEE/RSJ International Conference on Intelligent RObots and Systems (IROS), Madrid, Spain, October 2018.
arXiv 1810.08174

[192] Domain Randomization and Generative Models for Robotic Grasping,
Joshua Tobin, Lukas Biewald, Rocky Duan, Marcin Andrychowicz, Ankur Handa, Vikash Kumar, Bob McGrew, Jonas Schneider, Peter Welinder, Wojciech Zaremba, Pieter Abbeel.
In the proceedings of the IEEE/RSJ International Conference on Intelligent RObots and Systems (IROS), Madrid, Spain, October 2018.
arXiv 1710.06425

[191] DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills,
Xue Bin (Jason) Peng, Pieter Abbeel, Sergey Levine, Michiel van de Panne.
In the proceedings of SIGGRAPH, Vancouver, Canada, August 2018.
arXiv 1804.02717

[190] Self-Consistent Trajectory Autoencoder: Learning Trajectory Embeddings for Model Based Hierarchical Reinforcement Learning,
In the proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, July 2018.
(arxiv forthcoming)

[189] Latent Space Policies for Hierarchical Reinforcement Learning,
Tuomas Haarnoja, Kristian Hartikainen, Pieter Abbeel, Sergey Levine.
In the proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, July 2018.
arXiv 1804.02809

[188] Universal Planning Networks,
Aravind Srinivas, Allan Jabri, Pieter Abbeel, Sergey Levine, Chelsea Finn.
In the proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, July 2018.
arXiv 1804.00645, videos

[187] Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine.
In the proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, July 2018.
arXiv 1801.01290, github

[185] Automatic Goal Generation for Reinforcement Learning Agents,
David Held, Xinyang Geng, Carlos Florensa, Pieter Abbeel.
In the proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, July 2018.
arXiv 1705.06366

[183] Asymmetric Actor Critic for Image-Based Robot Learning,
Lerrel Pinto, Marcin Andrychowicz, Peter Welinder, Wojciech Zaremba, Pieter Abbeel.
In the proceedings of Robotics: Science and Systems (RSS), Pittsburgh, PA, USA, June 2018.
arXiv 1710.06542, videos

[182] Learning with Opponent-Learning Awareness,
Jakob N. Foerster*, Richard Y. Chen*, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch.
In the proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Stockholm, Sweden, July 2018 (arXiv 1709.04326)

[181] Learning Generalized Reactive Policies using Deep Neural Networks,
Edward Groshev, Aviv Tamar, Siddharth Srivastava, Pieter Abbeel.
In the proceedings of the 28th International Conference on Automated Planning and Scheduling (ICAPS), Delft, The Netherlands, June 2018 (arXiv 1708.07280)

[180] Model-Ensemble Trust-Region Policy Optimization,
Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, Pieter Abbeel.
In the proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, Canada, April 2018 (arXiv 1802.10592)

[179] A Simple Neural Attentive Meta-Learner,
Nikhil Mishra*, Mostafa Rohaninejad*, Xi (Peter) Chen, Pieter Abbeel.
In the proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, Canada, April 2018 (arXiv 1707.03141)

[178] Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines,
Cathy Wu, Aravind Rajeswaran, Yan Duan, Vikash Kumar, Alexandre M Bayen, Sham Kakade, Igor Mordatch, Pieter Abbeel.
In the proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, Canada, April 2018 (arXiv 1803.07246)

[177] Meta Learning Shared Hierarchies,
Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, John Schulman.
In the proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, Canada, April 2018 (arXiv 1710.09767)

[176] Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments,
Maruan Al-Shedivat, Trapit Bansal, Yuri Burda, Ilya Sutskever, Igor Mordatch, Pieter Abbeel.
In the proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, Canada, April 2018 (arXiv 1710.03641, videos)

[175] Parameter Space Noise for Exploration,
Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, Marcin Andrychowicz.
In the proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, Canada, April 2018 (arXiv 1706.01905)

[174] Composable Deep Reinforcement Learning for Robotic Manipulation,
Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, Sergey Levine
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, May 2018. (arXiv 1803.06773, videos, code)

[173] Learning Robotic Assembly from CAD, Best Paper Finalist,
Garrett Thomas*, Melissa Chien*, Aviv Tamar, Juan Aparicio Ojea, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, May 2018. (arXiv 1803.07635, video)

[172] Sim-to-Real Transfer of Robotic Control with Dynamics Randomization,
Xue Bin (Jason) Peng, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, May 2018. (arXiv 1710.064537, video)

[170] Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation,
Gregory Kahn, Adam Villaflor, Bosen Ding, Pieter Abbeel, Sergey Levine.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, May 2018. (arXiv 1709.10489)

[169] Overcoming Exploration in Reinforcement Learning with Demonstrations,
Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, May 2018. (arXiv 1709.10089)

[168] Deep Object-Centric Representations for Generalizable Robot Learning,
Coline Devin, Pieter Abbeel, Trevor Darrell, Sergey Levine.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, May 2018. (arXiv 1708.04225)

[167] Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation,
YuXuan (Andrew) Liu*, Abhishek Gupta*, Pieter Abbeel, Sergey Levine.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, May 2018. (arXiv 1707.03374)

[166] Emergence of Grounded Compositional Language in Multi-Agent Populations,
Igor Mordatch, Pieter Abbeel.
In The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018), New Orleans, Louisiana, February 2018. arXiv 1703.04908

[165] Inverse Reward Design,
Dylan Hadfield-Menell et al.
In Neural Information Processing Systems (NIPS), Long Beach, CA, December 2017. (pdf forthcoming)

[164] Hindsight Experience Replay,
Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba.
In Neural Information Processing Systems (NIPS), Long Beach, CA, December 2017. (arXiv 1707.01495, videos)

[163] Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments,
Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch.
In Neural Information Processing Systems (NIPS), Long Beach, CA, December 2017. (arXiv 1706.02275)

[161] #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning,
Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel.
In Neural Information Processing Systems (NIPS), Long Beach, CA, December 2017. (arXiv 1611.04717)

[159] Mutual Alignment Transfer Learning,
Markus Wulfmeier, Ingmar Posner, Pieter Abbeel.
In the proceedings of the 1st Annual Conference on Robot Learning (CoRL), Mountain View, CA, November 2017. (arXiv 1707.07907)

[158] Reverse Curriculum Generation for Reinforcement Learning,
Carlos Florensa, David Held, Markus Wulfmeier, Pieter Abbeel.
In the proceedings of the 1st Annual Conference on Robot Learning (CoRL), Mountain View, CA, November 2017. (arXiv 1707.05300)

[155] The Off-Switch Game,
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell.
In the proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, August 2017. (arXiv 1611.08219)

[154] Constrained Policy Optimization,
Josh Achiam, David Held, Aviv Tamar, Pieter Abbeel.
In the proceedings of the International Conference on Machine Learning, Sydney, Australia, August 2017. (arXiv 1705.10528)

[153] Prediction and Control with Temporal Segment Models,
Nikhil Mishra, Pieter Abbeel, Igor Mordatch.
In the proceedings of the International Conference on Machine Learning, Sydney, Australia, August 2017. (arXiv 1703.04070)

[152] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks,
Chelsea Finn, Pieter Abbeel, Sergey Levine.
In the proceedings of the International Conference on Machine Learning, Sydney, Australia, August 2017. (arXiv 1703.03400)

[151] Reinforcement Learning with Deep Energy-Based Policies,
Tuomas Haarnoja*, Haoran Tang*, Pieter Abbeel, Sergey Levine.
In the proceedings of the International Conference on Machine Learning, Sydney, Australia, August 2017. (arXiv 1702.08165)

[150] Enabling Robots to Communicate their Objectives,
Sandy H. Huang, David Held, Pieter Abbeel, Anca D. Dragan.
In the proceedings of Robotics Science and Systems, Cambridge, MA, July 2017. (arXiv 1702.03465)

[147] Learning Visual Servoing with Deep Features and Trust Region Fitted Q-Iteration,
Alex Lee, Sergey Levine, Pieter Abbeel.
In the proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, April 2017. (arXiv 1703.11000, videos, code, benchmark)

[146] Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning,
Abhishek Gupta*, Coline Devin*, YuXuan (Andrew) Liu, Pieter Abbeel, Sergey Levine.
In the proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, April 2017. (pdf forthcoming)

[145] Stochastic Neural Networks for Hierarchical Reinforcement Learning,
Carlos Florensa Campo, Yan (Rocky) Duan, Pieter Abbeel.
In the proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, April 2017. (arXiv 1704.03012, videos, code)

[144] Generalizing Skills with Semi-Supervised Reinforcement Learning,
Chelsea Finn, Tianhe Yu, Justin Fu, Pieter Abbeel, Sergey Levine.
In the proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, April 2017. arXiv 1612.00429

[142] Probabilistically Safe Policy Transfer,
David Held, Zoe McCarthy, Michael Zhang, Yide (Fred) Shentu, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 2017. (arXiv 1705.05394)

[141] Combining Self-Supervised Learning and Imitation for Vision-Based Rope Manipulation,
Ashvin Nair, Pulkit Agrawal, Dian Chen, Phillip Isola, Pieter Abbeel, Jitendra Malik, Sergey Levine.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 2017. (arXiv 1703.02018)

[140] Reset-Free Guided Policy Search: Efficient Deep Reinforcement Learning with Stochastic Initial States,
William Montgomery*, Anurag Ajay*, Chelsea Finn, Pieter Abbeel, Sergey Levine.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 2017. (arXiv 1610.01112)

[139] Deep Reinforcement Learning for Tensegrity Robot Locomotion,
Xinyang Geng*, Marvin Zhang*, Jonathan Bruce*, Ken Caluwaerts, Massimo Vespignani, Vytas SunSpiral, Pieter Abbeel, Sergey Levine.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 2017. (arXiv 1609.09049)

[138] Learning from the Hindsight Plan -- Episodic MPC Improvement,
Aviv Tamar, Garrett Thomas, Tianhao Zhang, Sergey Levine, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 2017. (arXiv 1609.09001)

[137] Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer,
Coline Devin*, Abhishek Gupta*, Trevor Darrell, Pieter Abbeel, Sergey Levine.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 2017. (arXiv 1609.07088)

[136] PLATO: Policy Learning using Adaptive Trajectory Optimization,
Gregory Kahn, Tianhao Zhang, Sergey Levine, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 2017. (arXiv 1603.00622)

[135] Towards Adapting Deep Visuomotor Representations from Simulated to Real Environments,
Eric Tzeng, Coline Devin, Judy Hoffman, Chelsea Finn, Pieter Abbeel, Sergey Levine, Kate Saenko, Trevor Darrell.
In the proceedings of the Workshop on Algorithmic Foundations of Robotics (WAFR), San Francisco, CA, USA, December 2016. (arXiv 1511.07111

[133] Cooperative Inverse Reinforcement Learning,
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell.
In Neural Information Processing Systems (NIPS), Barcelona, Spain, December 2016. (arXiv 1606.03137)

[132] Value Iteration Networks, Best Paper Award,
Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, Pieter Abbeel.
In Neural Information Processing Systems (NIPS), Barcelona, Spain, December 2016. (arXiv 1602.02867)

[131] Learning to Poke by Poking: Experiential Learning of Intuitive Physics,
Pulkit Agrawal, Ashvin Nair, Pieter Abbeel, Jitendra Malik, Sergey Levine.
In Neural Information Processing Systems (NIPS), Barcelona, Spain, December 2016. (arXiv 1606.07419)

[130] VIME: Variational Information Maximizing Exploration,
Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel.
In Neural Information Processing Systems (NIPS), Barcelona, Spain, December 2016. (arXiv 1605.09674)

[126] One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors,
Justin Fu, Sergey Levine, Pieter Abbeel.
In the proceedings of the 29th IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, October 2016. (pdf, arXiv 1509.06841)

[122] Benchmarking Deep Reinforcement Learning for Continuous Control,
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel.
In the proceedings of the International Conference on Machine Learning (ICML), 2016. (arXiv 1604.06778, rllab:code, rllab:docs)

[121] Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization,
Chelsea Finn, Sergey Levine, Pieter Abbeel.
In the proceedings of the International Conference on Machine Learning (ICML), 2016. (arXiv 1603.00448)

[119] End-to-End Training of Deep Visuomotor Policies,
Sergey Levine*, Chelsea Finn*, Trevor Darrell, Pieter Abbeel.
To appear in the Journal of Machine Learning Research (JMLR), 2016. (arXiv 1504.00702, video)

[118] High-Dimensional Continuous Control Using Generalized Advantage Estimation,
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel.
In the proceedings of the International Conference on Learning Representations (ICLR), 2016 (arXiv 1506.02438, video)

[117] Combining Model-Based Policy Search with Online Model Learning for Control of Physical Humanoids,
Igor Mordatch, Nikhil Mishra, Clemens Eppner, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2016. (pdf)

[115] Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search
Tianhao Zhang, Gregory Kahn, Sergey Levine, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2016. (arXiv 1509.06791)

[114] Deep Spatial Autoencoders for Visuomotor Learning
Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2016. (arXiv 1509.06113)

[113] Learning Deep Neural Network Policies with Continuous Memory States
Marvin Zhang, Zoe McCarthy, Chelsea Finn, Sergey Levine, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2016. (arXiv 1507.01273)

[112] Model-based Reinforcement Learning with Parametrized Physical Models and Optimism-Driven Exploration
Christopher Xie, Sachin Patil, Teodor Moldovan, Sergey Levine, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2016. (arXiv 1509.06824)

[110] Gradient Estimation Using Stochastic Computation Graphs,
John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel.
In Neural Information Processing Systems (NIPS), Montreal, Canada, December 2015.
arXiv 1506.05254

[W] Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models,
Bradly C. Stadie, Sergey Levine, Pieter Abbeel.
Presented at NIPS 2015 Workshop on Deep Reinforcement Learning
arXiv 1507.00814

[105] Learning Compound Multi-Step Controllers under Unknown Dynamics,
Weiqiao Han, Sergey Levine, Pieter Abbeel.
In the proceedings of the 28th IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, September 2015. (pdf)

[97] Trust Region Policy Optimization,
John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel.
In the proceedings of the 32nd International Conference on Machine Learning (ICML), 2015. (pdf, arXiv preprint)

[95] Deep Learning Helicopter Dynamics Models,
Ali Punjani, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2015. (pdf)

[94] Learning Contact-Rich Manipulation Skills with Guided Policy Search, Best Robotic Manipulation Paper Award,
Sergey Levine, Nolan Wagener, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2015. (pdf)

[86] Optimism-Driven Exploration for Nonlinear Systems,
Teodor Mihai Moldovan, Sergey Levine, Michael I. Jordan, Pieter Abbeel.
In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2015. (pdf)

[80] Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics,
Sergey Levine, Pieter Abbeel.
In Neural Information Processing Systems (NIPS) 27, 2015. (pdf)

[51] Safe Exploration in Markov Decision Processes,
Teodor Moldovan and Pieter Abbeel.
In the proceedings of the 29th International Conference on Machine Learning (ICML), 2012. (pdf)

[35] On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient,
Jie Tang and Pieter Abbeel.
In Neural Information Processing Systems (NIPS) 23, 2011. (pdf)

[26] Autonomous Helicopter Aerobatics through Apprenticeship Learning,
Pieter Abbeel, Adam Coates and Andrew Y. Ng.
In the International Journal of Robotics Research (IJRR), Volume 29 Issue 13 November 2010. (pdf, videos)

[3] Apprenticeship Learning via Inverse Reinforcement Learning,
Pieter Abbeel and Andrew Y. Ng.
In Proceedings of ICML, 2004. (ps, pdf, supplement: ps , pdf, supplementary webpage here)