Nick Rhinehart

Postdoctoral Researcher

Berkeley AI Research (BAIR)

CV • 
Bio • 
Scholar • 
Twitter • 

About me

Welcome to my academic website! I'm a Postdoctoral Scholar working with Sergey Levine and others within the UC Berkeley Artificial Intelligence Research lab. I received a Ph.D. in Robotics working with Kris Kitani at Carnegie Mellon University. I've also worked with Paul Vernaza at NEC Labs America, and Drew Bagnell at Uber ATG and Carnegie Mellon. I studied CS and Engineering at Swarthmore College. See this page for a more formal bio.

Research goal: My research aims to build highly capable autonomous systems that learn to accurately forecast sequences of outcomes and reliably use this ability to make good decisions in new situations. I have developed algorithms that (1) learn to forecast behaviors and observations given visual data [1, 2, 3, 4, 5, 6], (2) use their models to plan and control complex behaviors [7, 8, 9, 10, 11], and (3) are effective for control on real systems [12, 13]. You can learn more from my complete list of publications, as well as my brief thoughts on why learning to forecast is a promising direction for building highly capable autonomous systems.

Research fields:
  • Machine learning
  • Robotics
  • Computer vision
  • Artificial intelligence
Research topics:
  • Deep conditional generative models
  • Motion and video forecasting
  • Deep imitation learning
  • Deep reinforcement learning
  • Intrinsic motivation
Research applications:
  • Autonomous navigation
  • Autonomous exploration
  • First-person video understanding
News (Last modified: 2022-01.)
Conference, Journal, and arXiv Publications (Last modified: 2021-12.)
Information is Power: Intrinsic Control via Information Capture

N. Rhinehart, J. Wang, G. Berseth, JD Co-Reyes, D. Hafner, C. Finn, S. Levine

NeurIPS 2021 • pdf • project page

We argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model. We instantiate this approach as a deep reinforcement learning agent equipped with a deep variational Bayes filter. We find that our agent learns to discover, represent, and exercise control of dynamic objects in a variety of partially-observed environments sensed with visual observations without extrinsic reward.

Hybrid Imitative Planning with Geometric and Predictive Costs in Off-road Environments

N. Dashora, D. Shin, D. Shah, H. Leopold, D. Fan, A. Agha-Mohammadi, N. Rhinehart, S. Levine

arXiv 2021 • pdf • project page

Geometric methods for solving open-world off-road navigation tasks provide generalization but can be brittle. Learning-based methods can directly learn collision-free behavior from raw observations, but are difficult to integrate with standard geometry-based pipelines. We design a method comprised of learning and non-learning-based components. The learned component contributes predicted traversability as rewards, while the geometric component contributes obstacle cost information. We show this approach inherits complementary gains from the learned and geometric components and significantly outperforms either of them.

Explore and Control with Adversarial Surprise

A. Fickinger*, N. Jaques*, S. Parajuli, M. Chang, N. Rhinehart, G. Berseth, S. Russell, S. Levine

arXiv 2021 • pdf • project page

We propose an unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. The method leads to the emergence of complex skills by exhibiting clear phase transitions, and we show theoretically and empirically that our method has the potential to be applied to the exploration of stochastic, partially-observed environments.

RECON: Rapid Exploration for Open-World Navigation with Latent Goal Models

D. Shah, B. Eysenbach, N. Rhinehart, S. Levine

Oral Presentation (6.5% of submissions)
CoRL 2021 • pdf • project page

We developed a learning-based robotic system that efficiently explores large open-world environments without constructing geometric maps. The key is a latent goal model that forecasts actions and transit times to goals, is robust to variations in the input images, and enables 'imagining' relative goals. The latent goal model is used to continually construct topological maps that the robot can use to quickly travel to specified goals.

decision tree representing CfO method
Contingencies from Observations: Tractable Contingency Planning with Learned Behavior Models

N. Rhinehart*, J. He*, C. Packer, M. A. Wright, R. McAllister, J. E. Gonzalez, S. Levine

ICRA 2021 • pdf • project page

We developed an approach for deep contingency planning by learning from observations. Given a context, the approach plans a policy that achieves high expected return under the uncertainty of the forecasted behavior of other agents. We evaluate our method's closed-loop performance in common driving scenarios constructed in the CARLA simulator, show that our contingency planner solves these scenarios, and show that noncontingent planning approaches cannot.

ViNG: Learning Open-World Navigation with Visual Goals

D. Shah, B. Eysenbach, G. Kahn, N. Rhinehart, S. Levine

ICRA 2021 • pdf • project page

We developed a graph-based RL approach to enable a robot to navigate real-world environments given diverse, visually-indicated goals. We instantiate our method on a real outdoor ground robot and show that our system, which we call ViNG, outperforms previously-proposed methods for goal-conditioned reinforcement learning.

Parrot: Data-Driven Behavioral Priors for Reinforcement Learning

A. Singh*, H. Liu*, G. Zhou, A. Yu, N. Rhinehart, S. Levine

Oral Presentation (1.8% of submissions)
ICLR 2021 • pdf • project page

Whereas RL agents usually explore randomly when faced with a new task, humans tend to explore with structured behavior. We demonstrate a method for learning a behavioral prior that can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.

SMiRL: Surprise Minimizing RL in Dynamic Environments

G. Berseth, D. Geng, C. Devin, N. Rhinehart, C. Finn, D. Jayaraman, S. Levine

Oral Presentation (1.8% of submissions)
ICLR 2021 • pdf • project page

We propose that a search for order amidst chaos might offer a unifying principle for the emergence of useful behaviors in artificial agents, and formalize this idea into an unsupervised reinforcement learning method called surprise minimizing RL (SMiRL). The resulting agents acquire several proactive behaviors to seek and maintain stable states, which include successfully playing Tetris, Doom, and controlling a humanoid to avoid falls, without any task-specific reward supervision.

Conservative Safety Critics for Safe Exploration

H. Bharadhwaj, A. Kumar, N. Rhinehart, S. Levine, F. Shkurti, A. Garg

ICLR 2021 • pdf • project page

The key idea of our algorithm is to train a conservative safety critic that overestimates how unsafe a particular state is and modifies the exploration strategy to appropriately account for this safety under-estimate (by overestimating the probability of failure). Empirically, we show that the proposed approach can achieve competitive performance on challenging navigation, manipulation, and locomotion tasks while incurring significantly lower catastrophic failure rates during training than prior methods.

Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting

X. Weng, J. Wang, S. Levine, K. Kitani, N. Rhinehart

CoRL 2020 • pdf • project page

Instead of a standard pipeline for trajectory forecasting that first (1) detects objects with LiDAR (2) forecasts object pose trajectories, we ''inverted'' it to create a new pipeline that (1) forecasts LiDAR trajectories (2) detects object pose trajectories. We found that our proposed pipeline is competitive with the standard pipeline in the domains of vehicle forecasting and robotic manipulation forecasting, and has the ability to scale its performance with the addition of unlabelled LiDAR data.

Can Autonomous Vehicles Identify, Recover from, and Adapt to Distribution Shifts?

A. Filos*, P. Tigas*, R. McAllister, N. Rhinehart, S. Levine, Y. Gal

ICML 2020 • pdf • code • blog post • project page

We used recent techniques to estimate the epistemic uncertainty of a Deep Imitative Model used for planning vehicle trajectories and found that we could use this epistemic uncertainty to reliably detect out-of-distribution situations, plan more effectively in them, and adapt the model online with expert feedback.

Generative Hybrid Representations for Activity Forecasting with No-Regret Learning

J. Guan, Y. Yuan, K. M. Kitani, N. Rhinehart

Oral Presentation (4.6% of submissions)
CVPR 2020 • pdf • supp

Some activites are best represented discretely, others continuously. We learn a deep likelihood-based generative model to jointly forecast discrete and continuous activities, and show how to tweak the model to learn efficiently online.

Deep Imitative Models for Flexible Inference, Planning, and Control

N. Rhinehart, R. McAllister, S. Levine

ICLR 2020 • pdf • code (tf, official) • code (pytorch, reimplementation) • project page • talk video

We learn a deep conditional distribution of human driving behavior to guide planning and control of an autonomous car in simulation, without any trial-and-error data. We show that the approach can be adapted to execute tasks that were never demonstrated, including safely avoiding potholes, and is robust to misspecified goals that would cause it to violate its model of the rules of the road, and achieve S.O.T.A. on the CARLA benchmark.

PRECOG: PREdiction Conditioned On Goals in Visual Multi-Agent Settings

N. Rhinehart, R. McAllister, K. M. Kitani, S. Levine

Best Paper, ICML 2019 Workshop on AI for Autonomous Driving
ICCV 2019 • pdf • project page • code • visualization code • iccv pdf • iccv talk slides (pdf) • Baylearn talk (youtube)

We perform deep conditional forecasting with multiple interacting agents: when you control one of them, you can use its goals to better predict what nearby agents will do. The model also outperforms S.O.T.A. methods on the more standard task of unconditional forecasting.

Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information

M. Sharma*, A. Sharma*, N. Rhinehart, K. M. Kitani

ICLR 2019 • pdf • project page

Many behaviors are naturally composed of sub-tasks. Our approach learns to imitate behaviors with subtasks by discovering topics of latent behavior to influence its imitation.

First-Person Activity Forecasting from Video with Online Inverse Reinforcement Learning

N. Rhinehart, K. Kitani

TPAMI 2018 • pdf

We continuously model and forecast long-term goals of a first-person camera wearer through our Online Inverse RL algorithm. We show our approach learns efficiently continuously in theory and practice.

R2P2: A ReparameteRized Pushforward Policy for Diverse, Precise Generative Path Forecasting

N. Rhinehart, K. M. Kitani, P. Vernaza

ECCV 2018 • pdf • supplement • blog post (third-party)

We designed an objective to jointly maximize diversity and precision for generative models, and designed a deep autoregressive flow to efficiently optimize this objective for the task of motion forecasting. Unlike many popular generative models, ours can exactly evaluate its probability density function for arbitrary points.

Learning Neural Parsers with Deterministic Differentiable Imitation Learning

T. Shankar, N. Rhinehart, K. Muelling, K. M. Kitani

CORL 2018 • pdf • code

We developed and applied a new imitation learning approach for the task of sequential visual parsing. The approach learns to imitate an expert parsing oracle.

depiction of HIRL method
Human-Interactive Subgoal Supervision for Efficient Inverse Reinforcement Learning

X. Pan, E. Ohn-Bar, N. Rhinehart, Y. Xu, Y. Shen, K. M. Kitani

AAMAS 2018 • pdf

We analyze the benefit of incorporating a notion of subgoals into Inverse Reinforcement Learning (IRL) with a Human-In-The-Loop (HITL) framework and find our approach to require less demonstration data than a baseline Inverse RL approach

N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning

A. Ashok, N. Rhinehart, F. Beainy, K. Kitani

ICLR 2018 • pdf • code

We designed a principled method to perform neural model compression: we trained a compression agent via RL on the sequential task of compressing large networks while maintaining high performance. The compressing agent was able to generalize to compress previously-unseen networks.

Predictive-State Decoders: Encoding the Future Into Recurrent Neural Networks

A. Venkataraman*, N. Rhinehart*, W. Sun, L. Pinto, M. Hebert, B. Boots, K. Kitani, J. A. Bagnell

NIPS 2017 • pdf

We use the idea of Predictive State Representations to guide learning of RNNs: by encouraging the hidden-state of the RNN to be predictive of future observations, we found it to improve RNN performance on various tasks in probabilistic filtering, imitation learning, and reinforcement learning.

First-Person Activity Forecasting with Online Inverse Reinforcement Learning

N. Rhinehart, K. Kitani

Best Paper Honorable Mention, ICCV 2017 (3 of 2,143 submissions)
ICCV 2017 • pdf • code

We continuously model and forecast long-term goals of a first-person camera wearer through our Online Inverse RL algorithm. In contrast to motion forecasting, our approach reasons about semantic states and future goals that are potentially far away in space and time.

Learning Action Maps of Large Environments Via First-Person Vision

N. Rhinehart, K. Kitani

CVPR 2016 • pdf

We developed an approach that learns to associate visual cues associated with sparse behaviors to make dense predictions of functionality in seen and unseen environments.

Visual Chunking: A List Prediction Framework for Region-Based Object Detection

N. Rhinehart, J. Zhou, M. Hebert, J. A. Bagnell

ICRA 2015 • pdf

We developed a principled imitation learning approach for the task of object detection, which is best described as a sequence prediction problem. Our approach reasons sequentially about objects, and requires no heuristics, such as Non-Maxima Suppression, to filter its predictions that are common in object detection frameworks.

© 2015-2021 Nick Rhinehart