Welcome to my academic website! I'm a Postdoctoral Scholar working with Sergey Levine and others within the UC Berkeley Artificial Intelligence Research lab. I received a Ph.D. in Robotics working with Kris Kitani at Carnegie Mellon University. I've also worked with Paul Vernaza at NEC Labs America, and Drew Bagnell at Uber ATG and Carnegie Mellon. I studied CS and Engineering at Swarthmore College. See this page for a more formal bio.
My research: One of my main goals is to create useful and general learning agents that make complex decisions by forecasting the long-term consequences of their actions. Towards this goal, I often work on reinforcement learning, imitation learning, and probabilistic modeling methods at the interface of machine learning and computer vision.
- Oct 2021: I am honored to have received a Top Reviewer Award at NeurIPS 2021
- Sep 2021: Paper at NeurIPS 2021: IC2 (link coming soon)
- Sep 2021: I am honored that our paper, RECON, received an
Oral Presentation at CoRL 2021
- Sep 2021: Paper at CoRL: RECON
- Jul 2021: Preprint available: Explore and Control with Adversarial Surprise
- Jul 2021: I am co-organizing the NeurIPS 2021 Workshop on Machine Learning for Autonomous Driving
- Jul 2021: I am honored to have received a Top Reviewer Award at ICML 2021
- May 2021: 2 Papers at ICRA: Contingencies From Observations and ViNG
- May 2021: 3 Papers at ICLR: PARROT, SMiRL, and Conservative Safety Critics
- May 2021: Invited Talk, ICRA 2021 Workshop on Long-Term Human Motion Prediction
- May 2021: Preprint available: RECON
- Jan 2021: I am honored that two of our papers, PARROT and SMiRL, received
Oral Presentations at ICLR 2021
We propose an unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. The method leads to the emergence of complex skills by exhibiting clear phase transitions, and we show theoretically and empirically that our method has the potential to be applied to the exploration of stochastic, partially-observed environments.
We developed a learning-based robotic system that efficiently explores large open-world environments without constructing geometric maps. The key is a latent goal model that forecasts actions and transit times to goals, is robust to variations in the input images, and enables 'imagining' relative goals. The latent goal model is used to continually construct topological maps that the robot can use to quickly travel to specified goals.
We developed an approach for deep contingency planning by learning from observations. Given a context, the approach plans a policy that achieves high expected return under the uncertainty of the forecasted behavior of other agents. We evaluate our method's closed-loop performance in common driving scenarios constructed in the CARLA simulator, show that our contingency planner solves these scenarios, and show that noncontingent planning approaches cannot.
We developed a graph-based RL approach to enable a robot to navigate real-world environments given diverse, visually-indicated goals. We instantiate our method on a real outdoor ground robot and show that our system, which we call ViNG, outperforms previously-proposed methods for goal-conditioned reinforcement learning.
Whereas RL agents usually explore randomly when faced with a new task, humans tend to explore with structured behavior. We demonstrate a method for learning a behavioral prior that can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
ICLR 2021 | pdf | project page
We propose that a search for order amidst chaos might offer a unifying principle for the emergence of useful behaviors in artificial agents, and formalize this idea into an unsupervised reinforcement learning method called surprise minimizing RL (SMiRL). The resulting agents acquire several proactive behaviors to seek and maintain stable states, which include successfully playing Tetris, Doom, and controlling a humanoid to avoid falls, without any task-specific reward supervision.
The key idea of our algorithm is to train a conservative safety critic that overestimates how unsafe a particular state is and modifies the exploration strategy to appropriately account for this safety under-estimate (by overestimating the probability of failure). Empirically, we show that the proposed approach can achieve competitive performance on challenging navigation, manipulation, and locomotion tasks while incurring significantly lower catastrophic failure rates during training than prior methods.
CoRL 2020 | pdf | project page
Instead of a standard pipeline for trajectory forecasting that first (1) detects objects with LiDAR (2) forecasts object pose trajectories, we ''inverted'' it to create a new pipeline that (1) forecasts LiDAR trajectories (2) detects object pose trajectories. We found that our proposed pipeline is competitive with the standard pipeline in the domains of vehicle forecasting and robotic manipulation forecasting, and has the ability to scale its performance with the addition of unlabelled LiDAR data.
ICML 2020 | pdf | code | blog post | project page
We used recent techniques to estimate the epistemic uncertainty of a Deep Imitative Model used for planning vehicle trajectories and found that we could use this epistemic uncertainty to reliably detect out-of-distribution situations, plan more effectively in them, and adapt the model online with expert feedback.
CVPR 2020 | pdf | supp
Some activites are best represented discretely, others continuously. We learn a deep likelihood-based generative model to jointly forecast discrete and continuous activities, and show how to tweak the model to learn efficiently online.
ICLR 2020 | pdf | code (tf, official) | code (pytorch, reimplementation) | project page | talk video
We learn a deep conditional distribution of human driving behavior to guide planning and control of an autonomous car in simulation, without any trial-and-error data. We show that the approach can be adapted to execute tasks that were never demonstrated, including safely avoiding potholes, and is robust to misspecified goals that would cause it to violate its model of the rules of the road, and achieve S.O.T.A. on the CARLA benchmark.
ICCV 2019 | pdf | project page | code | visualization code | iccv pdf | iccv talk slides (pdf) | Baylearn talk (youtube)
We perform deep conditional forecasting with multiple interacting agents: when you control one of them, you can use its goals to better predict what nearby agents will do. The model also outperforms S.O.T.A. methods on the more standard task of unconditional forecasting.
Many behaviors are naturally composed of sub-tasks. Our approach learns to imitate behaviors with subtasks by discovering topics of latent behavior to influence its imitation.
We continuously model and forecast long-term goals of a first-person camera wearer through our Online Inverse RL algorithm. We show our approach learns efficiently continuously in theory and practice.
We designed an objective to jointly maximize diversity and precision for generative models, and designed a deep autoregressive flow to efficiently optimize this objective for the task of motion forecasting. Unlike many popular generative models, ours can exactly evaluate its probability density function for arbitrary points.
We analyze the benefit of incorporating a notion of subgoals into Inverse Reinforcement Learning (IRL) with a Human-In-The-Loop (HITL) framework and find our approach to require less demonstration data than a baseline Inverse RL approach
We designed a principled method to perform neural model compression: we trained a compression agent via RL on the sequential task of compressing large networks while maintaining high performance. The compressing agent was able to generalize to compress previously-unseen networks.
We use the idea of Predictive State Representations to guide learning of RNNs: by encouraging the hidden-state of the RNN to be predictive of future observations, we found it to improve RNN performance on various tasks in probabilistic filtering, imitation learning, and reinforcement learning.
ICCV 2017 | pdf | code
We continuously model and forecast long-term goals of a first-person camera wearer through our Online Inverse RL algorithm. In contrast to motion forecasting, our approach reasons about semantic states and future goals that are potentially far away in space and time.
ICRA 2015 | pdf |
We developed a principled imitation learning approach for the task of object detection, which is best described as a sequence prediction problem. Our approach reasons sequentially about objects, and requires no heuristics, such as Non-Maxima Suppression, to filter its predictions that are common in object detection frameworks.