I am a Ph.D. student at UC Berkeley advised by Sergey
Levine. I am interested in using deep reinforcement learning for
robotics. I did my undergrad at Cornell University, where I worked with
Ross Knepper and Hadas Kress-Gazit.
vitchyr at berkeley dot edu
Temporal Difference Models: Model-Free Deep RL for Model-Based Control
International Conference on Learning Representations. 2018.
Model-free reinforcement learning (RL) is a powerful, general tool for
learning complex behaviors. However, its sample efficiency is often
impractical large for solving challenging real-world problems, even with
off-policy algorithms such as Q-learning.
We introduce temporal difference models (TDMs), a family of
goal-conditioned value functions that can be trained
with model-free learning and used for model-based control.
TDMs combine the benefits of model-free and model-based RL: they
leverage the rich information in state transitions to learn very
efficiently, while still attaining asymptotic performance that exceeds
that of direct model-based RL methods.
Uncertainty-Aware Reinforcement Learning for Collision Avoidance
Practical deployment of reinforcement learning methods must contend with
the fact that the training process itself can be unsafe for the robot.
In this paper, we consider the specific case of a mobile robot learning
to navigate an a priori unknown environment while avoiding collisions.
We present an uncertainty-aware model-based learning algorithm that
estimates the probability of collision together with a statistical
estimate of uncertainty. We evaluate our method on a simulated and
real-world quadrotor, and a real-world RC car.
Learning Long-term Dependencies with Deep Memory States
Lifelong Learning: A Reinforcement Learning Approach Workshop,
International Conference on Machine Learning.
Training an agent to use past memories to adapt to new tasks and
environments is important for lifelong learning algorithms.
We propose a reinforcement learning method that addresses the
limitations of methods like BPTT and truncated BPTT by training
a critic to estimate truncated gradients and by saving and loading
hidden states outputted by recurrent neural networks.
We present results showing that our algorithm can learn long-term
dependencies while avoiding the computational constraints of BPTT.
Reactive high-level behavior synthesis for an Atlas
David D. Connor,
IEEE International Conference on Robotics and Automation,
We present and end-to-end approach for the automatic generation of code
that implements high-level robot behaviors in a verifiably correct
manner. We start with Linear Temporal Logic (LTL) equations and use them
to synthesize a reactive mission plana that is gauranteed to satisfy the
Two evolving social network models
ALEA, Lat. Am. J. Probab. Math. Stat., 2015.
We study two different social network models. We prove that their
stationary distributions satisfy the detailed balance condition and give
explicit formulas for the stationary distributions. From this
distribution, we also obtain results about the degree distribution,
connectivity, and diameter for each model.
Chomp the Graph
Broad Street Scientific, 2012
Chomp the Graph is a terminating impartial game that adheres to
normal play convetion. By the Sprague-Grundy Theorem, Chomp has a
number, which determines if a position leads to a win if played
optimally. We prove the nimber of certain types of graphs.
Vitchyr Pong, Gulnar Mirza, 2015
Demo / Video
Designed and created gloves that allow users to type on any hard
surface as if they were using a QWERTY keyboard. The gloves recognize
the standard QWERTY keyboard layout by recognizing which finger is
pressed via push buttons, and how bent the finger is via flex sensors.
We combined knowledge of analog circuit design, serial communication
protocols, and embedded programming to implement this project.
CS188: Artificial Intelligence
Graduate Student Instructor.
University of California, Berkeley. Spring 2017.
CS4780 / CS5780: Machine Learning
Cornell University. Fall 2015.