Home News Publications
picture of Nick Rhinehart

Nick Rhinehart

Postdoctoral Researcher

Berkeley Artificial Intelligence Research Laboratory

Email: nrhinehart@berkeley.edu
CV | Bio | Google Scholar | Twitter | Github

About me

Welcome to my academic website! I'm a Postdoctoral Scholar working with Sergey Levine and others within the UC Berkeley Artificial Intelligence Research lab. I received a Ph.D. in Robotics working with Kris Kitani at Carnegie Mellon University. I've also worked with Paul Vernaza at NEC Labs America, and Drew Bagnell at Uber ATG and Carnegie Mellon. I studied CS and Engineering at Swarthmore College. See this page for a more formal bio.


My research: One of my main goals is to create useful and general learning agents that make complex decisions by forecasting the long-term consequences of their actions. Towards this goal, I often work on reinforcement learning, imitation learning, and probabilistic modeling methods at the interface of machine learning and computer vision.

News (Last modified: 2021-07-23.)
Conference, Journal, and arXiv Publications (Last modified: 2021-07-23.) (list)
Explore and Control with Adversarial Surprise

A. Fickinger*, N. Jaques*, S. Parajuli, M. Chang, N. Rhinehart, G. Berseth, S. Russell, S. Levine

arXiv 2021 | pdf | show abs | show bib | project page

We propose an unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. The method leads to the emergence of complex skills by exhibiting clear phase transitions, and we show theoretically and empirically that our method has the potential to be applied to the exploration of stochastic, partially-observed environments.


Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. However, since designing rewards often requires substantial engineering effort, we are interested in the problem of learning without rewards, where agents must discover useful behaviors in the absence of task-specific incentives. Intrinsic motivation is a family of unsupervised RL techniques which develop general objectives for an RL agent to optimize that lead to better exploration or the discovery of skills. In this paper, we propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. The policies each take turns controlling the agent. The Explore policy maximizes entropy, putting the agent into surprising or unfamiliar situations. Then, the Control policy takes over and seeks to recover from those situations by minimizing entropy. The game harnesses the power of multi-agent competition to drive the agent to seek out increasingly surprising parts of the environment while learning to gain mastery over them. We show empirically that our method leads to the emergence of complex skills by exhibiting clear phase transitions. Furthermore, we show both theoretically (via a latent state space coverage argument) and empirically that our method has the potential to be applied to the exploration of stochastic, partially-observed environments. We show that Adversarial Surprise learns more complex behaviors, and explores more effectively than competitive baselines, outperforming intrinsic motivation methods based on active inference, novelty-seeking (Random Network Distillation (RND)), and multi-agent unsupervised RL (Asymmetric Self-Play (ASP)) in MiniGrid, Atari and VizDoom environments.


@article{fickinger2021explore,
  title={Explore and Control with Adversarial Surprise},
  author={Fickinger, Arnaud and Jaques, Natasha and Parajuli, Samyak and Chang, Michael and Rhinehart, Nicholas and Berseth, Glen and Russell, Stuart and Levine, Sergey},
  journal={arXiv preprint arXiv:2107.07394},
  year={2021}
}  		
RECON: Rapid Exploration for Open-World Navigation with Latent Goal Models

D. Shah, B. Eysenbach, N. Rhinehart, S. Levine

arXiv 2021 | pdf | show abs | show bib | project page

We developed a learning-based robotic system that efficiently explores large open-world environments without constructing geometric maps. The key is a latent goal model that forecasts actions and transit times to goals, is robust to variations in the input images, and enables 'imagining' relative goals. The latent goal model is used to continually construct topological maps that the robot can use to quickly travel to specified goals.


We describe a robotic learning system for autonomous navigation in diverse environments. At the core of our method are two components: (i) a non-parametric map that reflects the connectivity of the environment but does not require geometric reconstruction or localization, and (ii) a latent variable model of distances and actions that enables efficiently constructing and traversing this map. The model is trained on a large dataset of prior experience to predict the expected amount of time and next action needed to transit between the current image and a goal image. Training the model in this way enables it to develop a representation of goals robust to distracting information in the input images, which aids in deploying the system to quickly explore new environments. We demonstrate our method on a mobile ground robot in a range of outdoor navigation scenarios. Our method can learn to reach new goals, specified as images, in a radius of up to 80 meters in just 20 minutes, and reliably revisit these goals in changing environments. We also demonstrate our method's robustness to previously-unseen obstacles and variable weather conditions.


@misc{shah2021recon,
 title={RECON: Rapid Exploration for Open-World Navigation with Latent Goal Models}, 
 author={Dhruv Shah and Benjamin Eysenbach and Gregory Kahn and Nicholas Rhinehart and Sergey Levine},
 year={2021},
 eprint={2104.05859},
 archivePrefix={arXiv},
 primaryClass={cs.RO}
}  		
decision tree representing CfO method
Contingencies from Observations: Tractable Contingency Planning with Learned Behavior Models

N. Rhinehart*, J. He*, C. Packer, M. A. Wright, R. McAllister, J. E. Gonzalez, S. Levine

ICRA 2021 | pdf | show abs | show bib | project page

We developed an approach for deep contingency planning by learning from observations. Given a context, the approach plans a policy that achieves high expected return under the uncertainty of the forecasted behavior of other agents. We evaluate our method's closed-loop performance in common driving scenarios constructed in the CARLA simulator, show that our contingency planner solves these scenarios, and show that noncontingent planning approaches cannot.


Humans have a remarkable ability to make decisions by accurately reasoning about future events, including the future behaviors and states of mind of other agents. Consider driving a car through a busy intersection: it is necessary to reason about the physics of the vehicle, the intentions of other drivers, and their beliefs about your own intentions. If you signal a turn, another driver might yield to you, or if you enter the passing lane, another driver might decelerate to give you room to merge in front. Competent drivers must plan how they can safely react to a variety of potential future behaviors of other agents before they make their next move. This requires contingency planning: explicitly planning a set of conditional actions that depend on the stochastic outcome of future events. Contingency planning outputs a policy that is a function of future timesteps and observations, whereas standard model predictive control-based planning outputs a sequence of future actions, which is equivalent to a policy that is only a function of future timesteps. In this work, we develop a general-purpose contingency planner that is learned end-to-end using high-dimensional scene observations and low-dimensional behavioral observations. We use a conditional autoregressive flow model to create a compact contingency planning space, and show how this model can tractably learn contingencies from behavioral observations. We developed a closed-loop control benchmark of realistic multi-agent scenarios in a driving simulator (CARLA), on which we compare our method to various noncontingent methods that reason about multi-agent future behavior, including several state-of-the-art deep learning-based planning approaches. We illustrate that these noncontingent planning methods fundamentally fail on this benchmark, and find that our deep contingency planning method achieves significantly superior performance.


@inproceedings{rhinehart2021contingencies,
 title={Contingencies from Observations: Tractable Contingency Planning with Learned Behavior Models},
 author={Nicholas Rhinehart and Jeff He and Charles Packer and Matthew A. Wright and Rowan McAllister and Joseph E. Gonzalez and Sergey Levine},
 booktitle={International Conference on Robotics and Automation (ICRA)},
 organization={IEEE},
 year={2021},
}  	        
ViNG: Learning Open-World Navigation with Visual Goals

D. Shah, B. Eysenbach, G. Kahn, N. Rhinehart, S. Levine

ICRA 2021 | pdf | show abs | show bib | project page

We developed a graph-based RL approach to enable a robot to navigate real-world environments given diverse, visually-indicated goals. We instantiate our method on a real outdoor ground robot and show that our system, which we call ViNG, outperforms previously-proposed methods for goal-conditioned reinforcement learning.


We propose a learning-based navigation system for reaching visually indicated goals and demonstrate this system on a real mobile robot platform. Learning provides an appealing alternative to conventional methods for robotic navigation: instead of reasoning about environments in terms of geometry and maps, learning can enable a robot to learn about navigational affordances, understand what types of obstacles are traversable (e.g., tall grass) or not (e.g., walls), and generalize over patterns in the environment. However, unlike conventional planning algorithms, it is harder to change the goal for a learned policy during deployment. We propose a method for learning to navigate towards a goal image of the desired destination. By combining a learned policy with a topological graph constructed out of previously observed data, our system can determine how to reach this visually indicated goal even in the presence of variable appearance and lighting. Three key insights, waypoint proposal, graph pruning and negative mining, enable our method to learn to navigate in real-world environments using only offline data, a setting where prior methods struggle. We instantiate our method on a real outdoor ground robot and show that our system, which we call ViNG, outperforms previously-proposed methods for goal-conditioned reinforcement learning, including other methods that incorporate reinforcement learning and search. We also study how ViNG generalizes to unseen environments and evaluate its ability to adapt to such an environment with growing experience. Finally, we demonstrate ViNG on a number of real-world applications, such as last-mile delivery and warehouse inspection. We encourage the reader to check out the videos of our experiments and demonstrations at our project website.


@misc{shah2020ving,
 title={ViNG: Learning Open-World Navigation with Visual Goals}, 
 author={Dhruv Shah and Benjamin Eysenbach and Gregory Kahn and Nicholas Rhinehart and Sergey Levine},
 year={2020},
 eprint={2012.09812},
 archivePrefix={arXiv},
 primaryClass={cs.RO}
}
Parrot: Data-Driven Behavioral Priors for Reinforcement Learning

A. Singh*, H. Liu*, G. Zhou, A. Yu, N. Rhinehart, S. Levine

Oral Presentation (1.8% of submissions)
ICLR 2021 | pdf | show abs | show bib | project page

Whereas RL agents usually explore randomly when faced with a new task, humans tend to explore with structured behavior. We demonstrate a method for learning a behavioral prior that can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.


Reinforcement learning provides a general framework for flexible decision making and control, but requires extensive data collection for each new task that an agent needs to learn. In other machine learning fields, such as natural language processing or computer vision, pre-training on large, previously collected datasets to bootstrap learning for new tasks has emerged as a powerful paradigm to reduce data requirements when learning a new task. In this paper, we ask the following question: how can we enable similarly useful pre-training for RL agents? We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials from a wide range of previously seen tasks, and we show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors. We demonstrate the effectiveness of our approach in challenging robotic manipulation domains involving image observations and sparse reward functions, where our method outperforms prior works by a substantial margin.


@misc{singh2020parrot,
 title={Parrot: Data-Driven Behavioral Priors for Reinforcement Learning}, 
 author={Avi Singh and Huihan Liu and Gaoyue Zhou and Albert Yu and Nicholas Rhinehart and Sergey Levine},
 year={2020},
 eprint={2011.10024},
 archivePrefix={arXiv},
 primaryClass={cs.LG}
}  		
SMiRL: Surprise Minimizing RL in Dynamic Environments

G. Berseth, D. Geng, C. Devin, N. Rhinehart, C. Finn, D. Jayaraman, S. Levine

Oral Presentation (1.8% of submissions)
ICLR 2021 | pdf | show abs | show bib | project page

We propose that a search for order amidst chaos might offer a unifying principle for the emergence of useful behaviors in artificial agents, and formalize this idea into an unsupervised reinforcement learning method called surprise minimizing RL (SMiRL). The resulting agents acquire several proactive behaviors to seek and maintain stable states, which include successfully playing Tetris, Doom, and controlling a humanoid to avoid falls, without any task-specific reward supervision.


All living organisms struggle against the forces of nature to carve out a maintainable niche. We propose that such a search for order amidst chaos might offer a unifying principle for the emergence of useful behaviors in artificial agents. We formalize this idea into an unsupervised reinforcement learning method called Surprise Minimizing RL (SMiRL). SMiRL alternates between learning a density model to evaluate the surprise of a stimulus, and improving the policy to seek more predictable stimuli. This process maximizes a lower-bound on the negative entropy of the states, which can be seen as maximizing the agent's ability to maintain order in the environment. The policy seeks out stable and repeatable situations that counteract the environment's prevailing sources of entropy. This might include avoiding other hostile agents, or finding a stable, balanced pose for a bipedal robot in the face of disturbance forces. We demonstrate that our surprise minimizing agents can successfully play Tetris, Doom, control a humanoid to avoid falls, and navigate to escape enemies in a maze without any task-specific reward supervision. We further show that SMiRL can be used together with a standard task rewards to accelerate reward-driven learning.


@misc{berseth2019smirl,
 title={SMiRL: Surprise Minimizing RL in Dynamic Environments},
 author={Glen Berseth and Daniel Geng and Coline Devin and Nicholas Rhinehart and Chelsea Finn and Dinesh Jayaraman and Sergey Levine},
 year={2019},
 eprint={1912.05510},
 archivePrefix={arXiv},
 primaryClass={cs.LG}
}
Conservative Safety Critics for Safe Exploration

H. Bharadhwaj, A. Kumar, N. Rhinehart, S. Levine, F. Shkurti, A. Garg

ICLR 2021 | pdf | show abs | show bib | project page

The key idea of our algorithm is to train a conservative safety critic that overestimates how unsafe a particular state is and modifies the exploration strategy to appropriately account for this safety under-estimate (by overestimating the probability of failure). Empirically, we show that the proposed approach can achieve competitive performance on challenging navigation, manipulation, and locomotion tasks while incurring significantly lower catastrophic failure rates during training than prior methods.


Safe exploration presents a major challenge in reinforcement learning (RL): when active data collection requires deploying partially trained policies, we must ensure that these policies avoid catastrophically unsafe regions, while still enabling trial and error learning. In this paper, we target the problem of safe exploration in RL by learning a conservative safety estimate of environment states through a critic, and provably upper bound the likelihood of catastrophic failures at every training iteration. We theoretically characterize the tradeoff between safety and policy improvement, show that the safety constraints are likely to be satisfied with high probability during training, derive provable convergence guarantees for our approach, which is no worse asymptotically than standard RL, and demonstrate the efficacy of the proposed approach on a suite of challenging navigation, manipulation, and locomotion tasks. Empirically, we show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates during training than prior methods. Videos are at https://sites.google.com/view/conservative-safety-critics/home.


@article{bharadhwaj2020conservative,
  title={Conservative Safety Critics for Exploration},
  author={Bharadhwaj, Homanga and Kumar, Aviral and Rhinehart, Nicholas and Levine, Sergey and Shkurti, Florian and Garg, Animesh},
  journal={arXiv preprint arXiv:2010.14497},
  year={2020}
}
  		
Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting

X. Weng, J. Wang, S. Levine, K. Kitani, N. Rhinehart

CoRL 2020 | pdf | show abs | show bib | project page

Instead of a standard pipeline for trajectory forecasting that first (1) detects objects with LiDAR (2) forecasts object pose trajectories, we ''inverted'' it to create a new pipeline that (1) forecasts LiDAR trajectories (2) detects object pose trajectories. We found that our proposed pipeline is competitive with the standard pipeline in the domains of vehicle forecasting and robotic manipulation forecasting, and has the ability to scale its performance with the addition of unlabelled LiDAR data.


Many autonomous systems forecast aspects of the future in order to aid decision-making. For example, self-driving vehicles and robotic manipulation systems often forecast future object poses by first detecting and tracking objects. However, this detect-then-forecast pipeline is expensive to scale, as pose forecasting algorithms typically require labeled sequences of object poses, which are costly to obtain in 3D space. Can we scale performance without requiring additional labels? We hypothesize yes, and propose inverting the detect-then-forecast pipeline. Instead of detecting, tracking and then forecasting the objects, we propose to first forecast 3D sensor data (e.g., point clouds with $100$k points) and then detect/track objects on the predicted point cloud sequences to obtain future poses, i.e., a forecast-then-detect pipeline. This inversion makes it less expensive to scale pose forecasting, as the sensor data forecasting task requires no labels. Part of this work's focus is on the challenging first step -- Sequential Pointcloud Forecasting (SPF), for which we also propose an effective approach, SPFNet. To compare our forecast-then-detect pipeline relative to the detect-then-forecast pipeline, we propose an evaluation procedure and two metrics. Through experiments on a robotic manipulation dataset and two driving datasets, we show that SPFNet is effective for the SPF task, our forecast-then-detect pipeline outperforms the detect-then-forecast approaches to which we compared, and that pose forecasting performance improves with the addition of unlabeled data. Our project website is http://www.xinshuoweng.com/projects/SPF2.


@misc{weng2020inverting,
 title={Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting}, 
 author={Xinshuo Weng and Jianren Wang and Sergey Levine and Kris Kitani and Nicholas Rhinehart},
 year={2020},
 eprint={2003.08376},
 archivePrefix={arXiv},
 primaryClass={cs.CV}
}
Can Autonomous Vehicles Identify, Recover from, and Adapt to Distribution Shifts?

A. Filos*, P. Tigas*, R. McAllister, N. Rhinehart, S. Levine, Y. Gal

ICML 2020 | pdf | show abs | show bib | code | blog post | project page

We used recent techniques to estimate the epistemic uncertainty of a Deep Imitative Model used for planning vehicle trajectories and found that we could use this epistemic uncertainty to reliably detect out-of-distribution situations, plan more effectively in them, and adapt the model online with expert feedback.


Out-of-distribution (OOD) driving scenarios are a common failure of learning agents at deployment, typically leading to arbitrary deductions and poorly-informed decisions. In principle, detection of and adaption to OOD scenes can mitigate their adverse effects. However, no benchmark evaluating OOD detection and adaption currently exists to compare methods. In this paper, we introduce an autonomous car novel-scene benchmark, CARNOVEL, to evaluate the robustness of driving agents to a suite of tasks involving distribution shift. We also highlight the limitations of current approaches to novel driving scenes and propose an epistemic uncertainty-aware planning method, called robust imitative planning (RIP). Our method can detect and recover from some distribution shifts, reducing the overconfident but catastrophic extrapolations in out-of-training-distribution scenes. When the model's uncertainty quantification is insufficient to suggest a safe course of action by itself, it is used to query the driver for feedback, enabling sample-efficient online adaptation, a variant of our method we term adaptive robust imitative planning (AdaRIP).


@article{filos2020can,
  title={Can autonomous vehicles identify, recover from, and adapt to distribution shifts?},
  author={Filos, Angelos and Tigas, Panagiotis and McAllister, Rowan and Rhinehart, Nicholas and Levine, Sergey and Gal, Yarin},
  journal={arXiv preprint arXiv:2006.14911},
  year={2020}
}		
Generative Hybrid Representations for Activity Forecasting with No-Regret Learning

J. Guan, Y. Yuan, K. M. Kitani, N. Rhinehart

Oral Presentation (4.6% of submissions)
CVPR 2020 | pdf | show abs | show bib | supp

Some activites are best represented discretely, others continuously. We learn a deep likelihood-based generative model to jointly forecast discrete and continuous activities, and show how to tweak the model to learn efficiently online.


Automatically reasoning about future human behaviors is a difficult problem but has significant practical applications to assistive systems. Part of this difficulty stems from learning systems' inability to represent all kinds of behaviors. Some behaviors, such as motion, are best described with continuous representations, whereas others, such as picking up a cup, are best described with discrete representations. Furthermore, human behavior is generally not fixed: people can change their habits and routines. This suggests these systems must be able to learn and adapt continuously. In this work, we develop an efficient deep generative model to jointly forecast a person's future discrete actions and continuous motions. On a large-scale egocentric dataset, EPIC-KITCHENS, we observe our method generates high-quality and diverse samples while exhibiting better generalization than related generative models. Finally, we propose a variant to continually learn our model from streaming data, observe its practical effectiveness, and theoretically justify its learning efficiency.


@InProceedings{Guan_2020_CVPR,
 author = {Guan, Jiaqi and Yuan, Ye and Kitani, Kris M. and Rhinehart, Nicholas},
 title = {Generative Hybrid Representations for Activity Forecasting With No-Regret Learning},
 booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
 month = {June},
 year = {2020}
}
Deep Imitative Models for Flexible Inference, Planning, and Control

N. Rhinehart, R. McAllister, S. Levine

ICLR 2020 | pdf | show abs | show bib | code (tf, official) | code (pytorch, reimplementation) | project page | talk video

We learn a deep conditional distribution of human driving behavior to guide planning and control of an autonomous car in simulation, without any trial-and-error data. We show that the approach can be adapted to execute tasks that were never demonstrated, including safely avoiding potholes, and is robust to misspecified goals that would cause it to violate its model of the rules of the road, and achieve S.O.T.A. on the CARLA benchmark.


Imitation Learning (IL) is an appealing approach to learn desirable autonomous behavior. However, directing IL to achieve arbitrary goals is difficult. In contrast, planning-based algorithms use dynamics models and reward functions to achieve goals. Yet, reward functions that evoke desirable behavior are often difficult to specify. In this paper, we propose "Imitative Models" to combine the benefits of IL and goal-directed planning. Imitative Models are probabilistic predictive models of desirable behavior able to plan interpretable expert-like trajectories to achieve specified goals. We derive families of flexible goal objectives, including constrained goal regions, unconstrained goal sets, and energy-based goals. We show that our method can use these objectives to successfully direct behavior. Our method substantially outperforms six IL approaches and a planning-based approach in a dynamic simulated autonomous driving task, and is efficiently learned from expert demonstrations without online data collection. We also show our approach is robust to poorly-specified goals, such as goals on the wrong side of the road.


@inproceedings{Rhinehart2020Deep,
 title={Deep Imitative Models for Flexible Inference, Planning, and Control},
 author={Nicholas Rhinehart and Rowan McAllister and Sergey Levine},
 booktitle={International Conference on Learning Representations},
 year={2020},
 url={https://openreview.net/forum?id=Skl4mRNYDr}
}
PRECOG: PREdiction Conditioned On Goals in Visual Multi-Agent Settings

N. Rhinehart, R. McAllister, K. M. Kitani, S. Levine

Best Paper, ICML 2019 Workshop on AI for Autonomous Driving
ICCV 2019 | pdf | show abs | show bib | project page | code | visualization code | iccv pdf | iccv talk slides (pdf) | Baylearn talk (youtube)

We perform deep conditional forecasting with multiple interacting agents: when you control one of them, you can use its goals to better predict what nearby agents will do. The model also outperforms S.O.T.A. methods on the more standard task of unconditional forecasting.


For autonomous vehicles (AVs) to behave appropriately on roads populated by human-driven vehicles, they must be able to reason about the uncertain intentions and decisions of other drivers from rich perceptual information. Towards these capabilities, we present a probabilistic forecasting model of future interactions of multiple agents. We perform both standard forecasting and conditional forecasting with respect to the AV's goals. Conditional forecasting reasons about how all agents will likely respond to specific decisions of a controlled agent. We train our model on real and simulated data to forecast vehicle trajectories given past positions and LIDAR. Our evaluation shows that our model is substantially more accurate in multi-agent driving scenarios compared to existing state-of-the-art. Beyond its general ability to perform conditional forecasting queries, we show that our model's predictions of all agents improve when conditioned on knowledge of the AV's intentions, further illustrating its capability to model agent interactions.


@InProceedings{Rhinehart_2019_ICCV,
author = {Rhinehart, Nicholas and McAllister, Rowan and Kitani, Kris and Levine, Sergey},
title = {PRECOG: PREdiction Conditioned on Goals in Visual Multi-Agent Settings},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}		

Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information

M. Sharma*, A. Sharma*, N. Rhinehart, K. M. Kitani

ICLR 2019 | pdf | show abs | show bib | project page

Many behaviors are naturally composed of sub-tasks. Our approach learns to imitate behaviors with subtasks by discovering topics of latent behavior to influence its imitation.


The use of imitation learning to learn a single policy for a complex task that has multiple modes or hierarchical structure can be challenging. In fact, previous work has shown that when the modes are known, learning separate policies for each mode or sub-task can greatly improve the performance of imitation learning. In this work, we discover the interaction between sub-tasks from their resulting state-action trajectory sequences using a directed graphical model. We propose a new algorithm based on the generative adversarial imitation learning framework which automatically learns sub-task policies from unsegmented demonstrations. Our approach maximizes the directed information flow in the graphical model between sub-task latent variables and their generated trajectories. We also show how our approach connects with the existing Options framework, which is commonly used to learn hierarchical policies.


@inproceedings{
sharma2018directedinfo,
title={Directed-Info {GAIL}: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information},
author={Mohit Sharma and Arjun Sharma and Nicholas Rhinehart and Kris M. Kitani},
booktitle={International Conference on Learning Representations},
year={2019},
url={https://openreview.net/forum?id=BJeWUs05KQ},
}
		
First-Person Activity Forecasting from Video with Online Inverse Reinforcement Learning

N. Rhinehart, K. Kitani

TPAMI 2018 | pdf | show abs | show bib | project page

We continuously model and forecast long-term goals of a first-person camera wearer through our Online Inverse RL algorithm. We show our approach learns efficiently continuously in theory and practice.


We address the problem of incrementally modeling and forecasting long-term goals of a first-person camera wearer: what the user will do, where they will go, and what goal they seek. In contrast to prior work in trajectory forecasting, our algorithm, DARKO, goes further to reason about semantic states (will I pick up an object?), and future goal states that are far in terms of both space and time. DARKO learns and forecasts from first-person visual observations of the user's daily behaviors via an Online Inverse Reinforcement Learning (IRL) approach. Classical IRL discovers only the rewards in a batch setting, whereas DARKO discovers the transitions, rewards, and goals of a user from streaming data. Among other results, we show DARKO forecasts goals better than competing methods in both noisy and ideal settings, and our approach is theoretically and empirically no-regret.


@ARTICLE{8481580, 
author={N. Rhinehart and K. Kitani}, 
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
title={First-Person Activity Forecasting from Video with Online Inverse Reinforcement Learning}, 
year={2018}, 
volume={}, 
number={}, 
pages={1-1}, 
keywords={Forecasting;Task analysis;Predictive models;Trajectory;Cameras;Learning (artificial intelligence);Visualization;First-Person Vision;Activity Forecasting;Inverse Reinforcement Learning;Online Learning}, 
doi={10.1109/TPAMI.2018.2873794}, 
ISSN={0162-8828}, 
month={},}		  
		
R2P2: A ReparameteRized Pushforward Policy for Diverse, Precise Generative Path Forecasting

N. Rhinehart, K. M. Kitani, P. Vernaza

ECCV 2018 | pdf | show abs | show bib | project page | supplement | blog post (third-party)

We designed an objective to jointly maximize diversity and precision for generative models, and designed a deep autoregressive flow to efficiently optimize this objective for the task of motion forecasting. Unlike many popular generative models, ours can exactly evaluate its probability density function for arbitrary points.


We propose a method to forecast a vehicle's ego-motion as a distribution over spatiotemporal paths, conditioned on features (e.g., from LIDAR and images) embedded in an overhead map. The method learns a policy inducing a distribution over simulated trajectories that is both diverse (produces most paths likely under the data) and precise (mostly produces paths likely under the data). This balance is achieved through minimization of a symmetrized cross-entropy between the distribution and demonstration data. By viewing the simulated-outcome distribution as the pushforward of a simple distribution under a simulation operator, we obtain expressions for the cross-entropy metrics that can be efficiently evaluated and differentiated, enabling stochastic-gradient optimization. We propose concrete policy architectures for this model, discuss our evaluation metrics relative to previously-used metrics, and demonstrate the superiority of our method relative to state-of-the-art methods in both the KITTI dataset and a similar but novel and larger real-world dataset explicitly designed for the vehicle forecasting domain.


@InProceedings{Rhinehart_2018_ECCV,
author = {Rhinehart, Nicholas and Kitani, Kris M. and Vernaza, Paul},
title = {R2P2: A ReparameteRized Pushforward Policy for Diverse, Precise Generative Path Forecasting},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}
}		
Learning Neural Parsers with Deterministic Differentiable Imitation Learning

T. Shankar, N. Rhinehart, K. Muelling, K. M. Kitani

CORL 2018 | pdf | show abs | show bib | code

We developed and applied a new imitation learning approach for the task of sequential visual parsing. The approach learns to imitate an expert parsing oracle.


We address the problem of spatial segmentation of a 2D object in the context of a robotic system for painting, where an optimal segmentation depends on both the appearance of the object and the size of each segment. Since each segment must take into account appearance features at several scales, we take a hierarchical grammar-based parsing approach to decompose the object into 2D segments for painting. Since there are many ways to segment an object the solution space is extremely large and it is very challenging to utilize an exploration based optimization approach like reinforcement learning. Instead, we pose the segmentation problem as an imitation learning problem by using a segmentation algorithm in the place of an expert, that has access to a small dataset with known foreground-background segmentations. During the imitation learning process, we learn to imitate the oracle (segmentation algorithm) using only the image of the object, without the use of the known foreground-background segmentations. We introduce a novel deterministic policy gradient update, DRAG, in the form of a deterministic actor-critic variant of AggreVaTeD, to train our neural network based object parser. We will also show that our approach can be seen as extending DDPG to the Imitation Learning scenario. Training our neural parser to imitate the oracle via DRAG allow our neural parser to outperform several existing imitation learning approaches.


@InProceedings{pmlr-v87-shankar18a,
  title = 	 {Learning Neural Parsers with Deterministic Differentiable Imitation Learning},
  author = 	 {Shankar, Tanmay and Rhinehart, Nicholas and Muelling, Katharina and Kitani, Kris M.},
  booktitle = 	 {Proceedings of The 2nd Conference on Robot Learning},
  pages = 	 {592--604},
  year = 	 {2018},
  editor = 	 {Billard, Aude and Dragan, Anca and Peters, Jan and Morimoto, Jun},
  volume = 	 {87},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {},
  month = 	 {29--31 Oct},
  publisher = 	 {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v87/shankar18a/shankar18a.pdf},
  url = 	 {http://proceedings.mlr.press/v87/shankar18a.html},
}		
depiction of HIRL method
Human-Interactive Subgoal Supervision for Efficient Inverse Reinforcement Learning

X. Pan, E. Ohn-Bar, N. Rhinehart, Y. Xu, Y. Shen, K. M. Kitani

AAMAS 2018 | pdf | show abs | show bib

We analyze the benefit of incorporating a notion of subgoals into Inverse Reinforcement Learning (IRL) with a Human-In-The-Loop (HITL) framework and find our approach to require less demonstration data than a baseline Inverse RL approach


Humans are able to understand and perform complex tasks by strategically structuring tasks into incremental steps or sub-goals. For a robot attempting to learn to perform a sequential task with critical subgoal states, these subgoal states can provide a natural opportunity for interaction with a human expert. This paper analyzes the benefit of incorporating a notion of subgoals into Inverse Reinforcement Learning (IRL) with a Human-In-The-Loop (HITL) framework. The learning process is interactive, with a human expert first providing input in the form of full demonstrations along with some subgoal states. These subgoal states defines a set of sub-tasks for the learning agent to complete in order to achieve the final goal. The learning agent queries for partial demonstrations corresponding to each sub-task as needed when the learning agent struggles with individual sub-task. The proposed Human Interactive IRL (HI-IRL) framework is evaluated on several discrete path-planning tasks. We demonstrate that subgoal-based interactive structuring of the learning task results in significantly more efficient learning, requiring only a fraction of the demonstration data needed for learning the underlying reward function with a baseline IRL model.


>@inproceedings{
        pan2018hi,
        title={Human-Interactive Subgoal Supervision for Efficient Inverse Reinforcement Learning},
        author={Xinlei Pan and Eshed Ohn-Bar and Nicholas Rhinehart and Yan Xu and Yilin Shen and Kris. M. Kitani},
        booktitle={International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
        year={2018},
        }
		

N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning

A. Ashok, N. Rhinehart, F. Beainy, K. Kitani

ICLR 2018 | pdf | show abs | show bib | code

We designed a principled method to perform neural model compression: we trained a compression agent via RL on the sequential task of compressing large networks while maintaining high performance. The compressing agent was able to generalize to compress previously-unseen networks.


While bigger and deeper neural network architectures continue to advance the state-of-the-art for many computer vision tasks, real-world adoption of these networks is impeded by hardware and speed constraints. Conventional model compression methods attempt to address this problem by modifying the architecture manually or using pre-defined heuristics. Since the space of all reduced architectures is very large, modifying the architecture of a deep neural network in this way is a difficult task. In this paper, we tackle this issue by introducing a principled method for learning reduced network architectures in a data-driven way using reinforcement learning. Our approach takes a larger `teacher' network as input and outputs a compressed `student' network derived from the `teacher' network. In the first stage of our method, a recurrent policy network aggressively removes layers from the large `teacher' model. In the second stage, another recurrent policy network carefully reduces the size of each remaining layer. The resulting network is then evaluated to obtain a reward -- a score based on the accuracy and compression of the network. Our approach uses this reward signal with policy gradients to train the policies to find a locally optimal student network. Our experiments show that we can achieve compression rates of more than 10x for models such as ResNet-34 while maintaining similar performance to the input `teacher' network. We also present a valuable transfer learning result which shows that policies which are pre-trained on smaller `teacher' networks can be used to rapidly speed up training on larger `teacher' networks.


@inproceedings{
        ashok2018nn,
        title={N2N learning: Network to Network Compression via Policy Gradient Reinforcement Learning},
        author={Anubhav Ashok and Nicholas Rhinehart and Fares Beainy and Kris M. Kitani},
        booktitle={International Conference on Learning Representations},
        year={2018},
        url={https://openreview.net/forum?id=B1hcZZ-AW},
        }		  
		
Predictive-State Decoders: Encoding the Future Into Recurrent Neural Networks

A. Venkataraman*, N. Rhinehart*, W. Sun, L. Pinto, M. Hebert, B. Boots, K. Kitani, J. A. Bagnell

NIPS 2017 | pdf | show abs | show bib

We use the idea of Predictive State Representations to guide learning of RNNs: by encouraging the hidden-state of the RNN to be predictive of future observations, we found it to improve RNN performance on various tasks in probabilistic filtering, imitation learning, and reinforcement learning.


Recurrent neural networks (RNNs) are a vital modeling technique that rely on internal states learned indirectly by optimization of a supervised, unsupervised, or reinforcement training loss. RNNs are used to model dynamic processes that are characterized by underlying latent states whose form is often unknown, precluding its analytic representation inside an RNN. In the Predictive-State Representation (PSR) literature, latent state processes are modeled by an internal state representation that directly models the distribution of future observations, and most recent work in this area has relied on explicitly representing and targeting sufficient statistics of this probability distribution. We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive-State Decoders (PSDs), which add supervision to the network's internal state representation to target predicting future observations. PSDs are simple to implement and easily incorporated into existing training pipelines via additional loss regularization. We demonstrate the effectiveness of PSDs with experimental results in three different domains: probabilistic filtering, Imitation Learning, and Reinforcement Learning. In each, our method improves statistical performance of state-of-the-art recurrent baselines and does so with fewer iterations and less data.


@inproceedings{venkatraman2017predictive,
  title={Predictive-state decoders: Encoding the future into recurrent networks},
  author={Venkatraman, Arun and Rhinehart, Nicholas and Sun, Wen and Pinto, Lerrel and Hebert, Martial and Boots, Byron and Kitani, Kris and Bagnell, J},
  booktitle={Advances in Neural Information Processing Systems},
  pages={1172--1183},
  year={2017}
}
		
First-Person Activity Forecasting with Online Inverse Reinforcement Learning

N. Rhinehart, K. Kitani

Best Paper Honorable Mention, ICCV 2017 (3 of 2,143 submissions)
ICCV 2017 | pdf | show abs | show bib | project page | code

We continuously model and forecast long-term goals of a first-person camera wearer through our Online Inverse RL algorithm. In contrast to motion forecasting, our approach reasons about semantic states and future goals that are potentially far away in space and time.


We address the problem of incrementally modeling and forecasting long-term goals of a first-person camera wearer: what the user will do, where they will go, and what goal they are attempting to reach. In contrast to prior work in trajectory forecasting, our algorithm, DARKO, goes further to reason about semantic states (will I pick up an object?), and future goal states that are far both in terms of space and time. DARKO learns and forecasts from first-person visual observations of the user's daily behaviors via an Online Inverse Reinforcement Learning (IRL) approach. Classical IRL discovers only the rewards in a batch setting, whereas DARKO discovers the states, transitions, rewards, and goals of a user from streaming data. Among other results, we show DARKO forecasts goals better than competing methods in both noisy and ideal settings, and our approach is theoretically and empirically no-regret.


@InProceedings{Rhinehart_2017_ICCV,
author = {Rhinehart, Nicholas and Kitani, Kris M.},
title = {First-Person Activity Forecasting With Online Inverse Reinforcement Learning},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}
}
		
Learning Action Maps of Large Environments Via First-Person Vision

N. Rhinehart, K. Kitani

CVPR 2016 | pdf | show abs | show bib

We developed an approach that learns to associate visual cues associated with sparse behaviors to make dense predictions of functionality in seen and unseen environments.


When people observe and interact with physical spaces, they are able to associate functionality to regions in the environment. Our goal is to automate dense functional understanding of large spaces by leveraging sparse activity demonstrations recorded from an ego-centric viewpoint. The method we describe enables functionality estimation in large scenes where people have behaved, as well as novel scenes where no behaviors are observed. Our method learns and predicts "Action Maps", which encode the ability for a user to perform activities at various locations. With the usage of an egocentric camera to observe human activities, our method scales with the size of the scene without the need for mounting multiple static surveillance cameras and is well-suited to the task of observing activities up-close. We demonstrate that by capturing appearance-based attributes of the environment and associating these attributes with activity demonstrations, our proposed mathematical frame- work allows for the prediction of Action Maps in new environments. Additionally, we offer a preliminary glance of the applicability of Action Maps by demonstrating a proof-of-concept application in which they are used in concert with activity detections to perform localization.


@InProceedings{Rhinehart2016CVPR,
 author = {Rhinehart, Nicholas and Kitani, Kris M.},
 title = {Learning Action Maps of Large Environments via First-Person Vision},
 booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
 month = {June},
 year = {2016}
} 
Visual Chunking: A List Prediction Framework for Region-Based Object Detection

N. Rhinehart, J. Zhou, M. Hebert, J. A. Bagnell

ICRA 2015 | pdf | show abs | show bib

We developed a principled imitation learning approach for the task of object detection, which is best described as a sequence prediction problem. Our approach reasons sequentially about objects, and requires no heuristics, such as Non-Maxima Suppression, to filter its predictions that are common in object detection frameworks.


We consider detecting objects in an image by iteratively selecting from a set of arbitrarily shaped candidate regions. Our generic approach, which we term visual chunking, reasons about the locations of multiple object instances in an image while expressively describing object boundaries. We design an optimization criterion for measuring the performance of a list of such detections as a natural extension to a common per-instance metric. We present an efficient algorithm with provable performance for building a high-quality list of detections from any candidate set of region-based proposals. We also develop a simple class-specific algorithm to generate a candidate region instance in near-linear time in the number of low-level superpixels that outperforms other region generating methods. In order to make predictions on novel images at testing time without access to ground truth, we develop learning approaches to emulate these algorithms' behaviors. We demonstrate that our new approach outperforms sophisticated baselines on benchmark datasets.


@inproceedings{rhinehart2015visual,
 title={Visual chunking: A list prediction framework for region-based object detection},
 author={Rhinehart, Nicholas and Zhou, Jiaji and Hebert, Martial and Bagnell, J Andrew},
 booktitle={Robotics and Automation (ICRA), 2015 IEEE International Conference on},
 pages={5448--5454},
 year={2015},
 organization={IEEE}
}
© 2015-2021 Nick Rhinehart