Vitchyr H. Pong

I am a Ph.D. student at UC Berkeley advised by Sergey Levine. I am interested in using deep reinforcement learning for robotics. I did my undergrad at Cornell University, where I worked with Ross Knepper and Hadas Kress-Gazit.

CV  /  LinkedIn  /  GitHub
vitchyr at berkeley dot edu

RIG visualization

Visual Reinforcement Learning with Imagined Goals
Ashvin Nair*, Vitchyr Pong*, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine . Neural Information Processing Systems. 2018. Spotlight. [arXiv] [videos] [code] [blog]

For an autonomous agent to fulfill a wide range of user-specified goals at test time, it must be able to learn broadly applicable and general-purpose skill repertoires. Furthermore, to provide the requisite level of generality, these skills must handle raw sensory input such as images. In this paper, we propose an algorithm that acquires such general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies. Since the particular goals that might be required at test-time are not known in advance, the agent performs a self-supervised "practice" phase where it imagines goals and attempts to achieve them. We learn a visual representation with three distinct purposes: sampling goals for self-supervised practice, providing a structured transformation of raw sensory inputs, and computing a reward signal for goal reaching. We also propose a retroactive goal relabeling scheme to further improve the sample-efficiency of our method. Our off-policy algorithm is efficient enough to learn policies that operate on raw image observations and goals for a real-world robotic system, and substantially outperforms prior techniques.

Composable visualization

Composable Deep Reinforcement Learning for Robotic Manipulation
Tuomas Haarnoja , Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, Sergey Levine . International Conference on Robotics and Automation (ICRA), 2018. [arXiv] [video] [code]

Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the interaction time with the environment is limited, as is the case for most real-world robotic tasks. In this paper, we study how maximum entropy policies trained using soft Q-learning can be applied to real-world robotic manipulation. The application of this method to real-world manipulation is facilitated by two important features of soft Q-learning. First, soft Q-learning can learn multimodal exploration strategies by learning policies represented by expressive energy-based models. Second, we show that policies learned with soft Q-learning can be composed to create new policies, and that the optimality of the resulting policy can be bounded in terms of the divergence between the composed policies.

TDM visualization

Temporal Difference Models: Model-Free Deep RL for Model-Based Control
Vitchyr Pong*, Shixiang Gu*, Murtaza Dalal, Sergey Levine . International Conference on Learning Representations. 2018. [arXiv] [code] [blog]

Model-free reinforcement learning (RL) is a powerful, general tool for learning complex behaviors. However, its sample efficiency is often impractical large for solving challenging real-world problems, even with off-policy algorithms such as Q-learning. We introduce temporal difference models (TDMs), a family of goal-conditioned value functions that can be trained with model-free learning and used for model-based control. TDMs combine the benefits of model-free and model-based RL: they leverage the rich information in state transitions to learn very efficiently, while still attaining asymptotic performance that exceeds that of direct model-based RL methods.

bebpo in front of box

Uncertainty-Aware Reinforcement Learning for Collision Avoidance
Gregory Kahn, Adam Villaflor, Vitchyr Pong, Pieter Abbeel, Sergey Levine . arXiv:1702.01182 [Video] [arXiv]

Practical deployment of reinforcement learning methods must contend with the fact that the training process itself can be unsafe for the robot. In this paper, we consider the specific case of a mobile robot learning to navigate an a priori unknown environment while avoiding collisions. We present an uncertainty-aware model-based learning algorithm that estimates the probability of collision together with a statistical estimate of uncertainty. We evaluate our method on a simulated and real-world quadrotor, and a real-world RC car.

Using Deep Memory States for Reinforcement Learning

Learning Long-term Dependencies with Deep Memory States
Vitchyr Pong, Shixiang Gu, Sergey Levine . Lifelong Learning: A Reinforcement Learning Approach Workshop, International Conference on Machine Learning. 2017.

Training an agent to use past memories to adapt to new tasks and environments is important for lifelong learning algorithms. We propose a reinforcement learning method that addresses the limitations of methods like BPTT and truncated BPTT by training a critic to estimate truncated gradients and by saving and loading hidden states outputted by recurrent neural networks. We present results showing that our algorithm can learn long-term dependencies while avoiding the computational constraints of BPTT.

DARPA Robotics Challenge picture

Reactive high-level behavior synthesis for an Atlas humanoid robot
Spyros Maniatopoulos, Philipp Schillinger, Vitchyr Pong, David D. Connor, Hadas Kress-Gazit. IEEE International Conference on Robotics and Automation, 2016.

We present and end-to-end approach for the automatic generation of code that implements high-level robot behaviors in a verifiably correct manner. We start with Linear Temporal Logic (LTL) equations and use them to synthesize a reactive mission plana that is gauranteed to satisfy the formal specifications.

social network example

Two evolving social network models
Sam Magura, Vitchyr Pong, Rick Durrett, David Sivakoff . ALEA, Lat. Am. J. Probab. Math. Stat., 2015.

We study two different social network models. We prove that their stationary distributions satisfy the detailed balance condition and give explicit formulas for the stationary distributions. From this distribution, we also obtain results about the degree distribution, connectivity, and diameter for each model.


Chomp the Graph
Sam Magura, Vitchyr Pong, Elliot Cartee, Kevin Valakuzhy . Broad Street Scientific, 2012

Chomp the Graph is a terminating impartial game that adheres to normal play convetion. By the Sprague-Grundy Theorem, Chomp has a number, which determines if a position leads to a win if played optimally. We prove the nimber of certain types of graphs.

Course Projects
keyboard gloves

Keyboard Gloves
Vitchyr Pong, Gulnar Mirza, 2015
Demo / Video Explanation

Designed and created gloves that allow users to type on any hard surface as if they were using a QWERTY keyboard. The gloves recognize the standard QWERTY keyboard layout by recognizing which finger is pressed via push buttons, and how bent the finger is via flex sensors. We combined knowledge of analog circuit design, serial communication protocols, and embedded programming to implement this project.


CS188: Artificial Intelligence
Graduate Student Instructor.
University of California, Berkeley. Spring 2017.


CS4780 / CS5780: Machine Learning
Teaching Assistant.
Cornell University. Fall 2015.

I like this website.