Vitchyr H. Pong
I am a Ph.D. student at UC Berkeley advised by Sergey
Levine. I am interested in using deep reinforcement learning for
robotics. I did my undergrad at Cornell University, where I worked with
Ross Knepper and Hadas KressGazit.
CV /
LinkedIn /
GitHub
vitchyr at berkeley dot edu



Visual Reinforcement Learning with Imagined Goals
Ashvin Nair*,
Vitchyr Pong*,
Murtaza Dalal,
Shikhar Bahl,
Steven Lin,
Sergey Levine
.
Neural Information Processing Systems. 2018.
Spotlight.
[arXiv]
[videos]
[code]
[blog]
For an autonomous agent to fulfill a wide range of userspecified goals
at test time, it must be able to learn broadly applicable and
generalpurpose skill repertoires. Furthermore, to provide the requisite
level of generality, these skills must handle raw sensory input such as
images. In this paper, we propose an algorithm that acquires such
generalpurpose skills by combining unsupervised representation learning
and reinforcement learning of goalconditioned policies. Since the
particular goals that might be required at testtime are not known in
advance, the agent performs a selfsupervised "practice" phase where it
imagines goals and attempts to achieve them. We learn a visual
representation with three distinct purposes: sampling goals for
selfsupervised practice, providing a structured transformation of raw
sensory inputs, and computing a reward signal for goal reaching. We also
propose a retroactive goal relabeling scheme to further improve the
sampleefficiency of our method. Our offpolicy algorithm is efficient
enough to learn policies that operate on raw image observations and
goals for a realworld robotic system, and substantially outperforms
prior techniques.


Composable Deep Reinforcement Learning for Robotic Manipulation
Tuomas Haarnoja
,
Vitchyr Pong,
Aurick Zhou,
Murtaza Dalal,
Pieter
Abbeel,
Sergey Levine
.
International Conference on Robotics and Automation (ICRA), 2018.
[arXiv]
[video]
[code]
Modelfree deep reinforcement learning has been shown to exhibit good
performance in domains ranging from video games to simulated robotic
manipulation and locomotion. However, modelfree methods are known to
perform poorly when the interaction time with the environment is
limited, as is the case for most realworld robotic tasks. In this
paper, we study how maximum entropy policies trained using soft
Qlearning can be applied to realworld robotic manipulation. The
application of this method to realworld manipulation is facilitated by
two important features of soft Qlearning. First, soft Qlearning can
learn multimodal exploration strategies by learning policies represented
by expressive energybased models. Second, we show that policies learned
with soft Qlearning can be composed to create new policies, and that
the optimality of the resulting policy can be bounded in terms of the
divergence between the composed policies.


Temporal Difference Models: ModelFree Deep RL for ModelBased Control
Vitchyr Pong*,
Shixiang Gu*,
Murtaza Dalal,
Sergey Levine
.
International Conference on Learning Representations. 2018.
[arXiv]
[code]
[blog]
Modelfree reinforcement learning (RL) is a powerful, general tool for
learning complex behaviors. However, its sample efficiency is often
impractical large for solving challenging realworld problems, even with
offpolicy algorithms such as Qlearning.
We introduce temporal difference models (TDMs), a family of
goalconditioned value functions that can be trained
with modelfree learning and used for modelbased control.
TDMs combine the benefits of modelfree and modelbased RL: they
leverage the rich information in state transitions to learn very
efficiently, while still attaining asymptotic performance that exceeds
that of direct modelbased RL methods.


UncertaintyAware Reinforcement Learning for Collision Avoidance
Gregory Kahn,
Adam Villaflor,
Vitchyr Pong,
Pieter Abbeel,
Sergey Levine
.
arXiv:1702.01182
[Video]
[arXiv]
Practical deployment of reinforcement learning methods must contend with
the fact that the training process itself can be unsafe for the robot.
In this paper, we consider the specific case of a mobile robot learning
to navigate an a priori unknown environment while avoiding collisions.
We present an uncertaintyaware modelbased learning algorithm that
estimates the probability of collision together with a statistical
estimate of uncertainty. We evaluate our method on a simulated and
realworld quadrotor, and a realworld RC car.


Learning Longterm Dependencies with Deep Memory States
Vitchyr Pong,
Shixiang Gu,
Sergey Levine
.
Lifelong Learning: A Reinforcement Learning Approach Workshop,
International Conference on Machine Learning.
2017.
Training an agent to use past memories to adapt to new tasks and
environments is important for lifelong learning algorithms.
We propose a reinforcement learning method that addresses the
limitations of methods like BPTT and truncated BPTT by training
a critic to estimate truncated gradients and by saving and loading
hidden states outputted by recurrent neural networks.
We present results showing that our algorithm can learn longterm
dependencies while avoiding the computational constraints of BPTT.


Reactive highlevel behavior synthesis for an Atlas
humanoid robot
Spyros Maniatopoulos,
Philipp Schillinger,
Vitchyr Pong,
David D. Connor,
Hadas KressGazit.
IEEE International Conference on Robotics and Automation,
2016.
We present and endtoend approach for the automatic generation of code
that implements highlevel robot behaviors in a verifiably correct
manner. We start with Linear Temporal Logic (LTL) equations and use them
to synthesize a reactive mission plana that is gauranteed to satisfy the
formal specifications.


Two evolving social network models
Sam Magura,
Vitchyr Pong,
Rick Durrett,
David Sivakoff
.
ALEA, Lat. Am. J. Probab. Math. Stat., 2015.
We study two different social network models. We prove that their
stationary distributions satisfy the detailed balance condition and give
explicit formulas for the stationary distributions. From this
distribution, we also obtain results about the degree distribution,
connectivity, and diameter for each model.


Chomp the Graph
Sam Magura,
Vitchyr Pong,
Elliot Cartee,
Kevin Valakuzhy
.
Broad Street Scientific, 2012
Chomp the Graph is a terminating impartial game that adheres to
normal play convetion. By the SpragueGrundy Theorem, Chomp has a
number, which determines if a position leads to a win if played
optimally. We prove the nimber of certain types of graphs.


Keyboard Gloves
Vitchyr Pong, Gulnar Mirza, 2015
Demo / Video
Explanation
Designed and created gloves that allow users to type on any hard
surface as if they were using a QWERTY keyboard. The gloves recognize
the standard QWERTY keyboard layout by recognizing which finger is
pressed via push buttons, and how bent the finger is via flex sensors.
We combined knowledge of analog circuit design, serial communication
protocols, and embedded programming to implement this project.

