Read an overview of my lab's research and check out the lab's website here.

Highlights: Learning Reward Functions / Value Alignment from Diverse Human Feedback and Leaked Information


  • [we often misspecify reward functions, but the reward we do specify is a useful observation about the underlying true reward the agent should optimize; what makes the reward misspecified? that we can not focus on all possible environments/states the agent will operate in, and instead design only for a development set -- only there can the specified reward function be trusted] D. Hadfiled-Menell, S. Milli, P. Abbeel, S. Russell, and A.D. Dragan. Inverse Reward Design. Neural Information Processing Systems (NIPS), 2017. (oral, acceptance rate 1.2%)
  • [even prior to observing human behavior, the current state of the environment leaks information about what people want, because people have been acting in that environemnt already] R. Shah, D. Krasheninnikov, J. Alexander, P. Abbeel, and A.D. Dragan. Preferences Implicit in the State of the World. International Conference on Learning Representations (ICLR), 2019.
  • [responding to corrections by updating the reward function: there are many heuristics for responding to physical interaction from a human, but here we argue that pHRI is intentional and thus informative of the human's preferences for the task, thereby defining the notion of an optimal response; we also derive a real-time approximation] A. Bajcsy, D. Losey, M. O'Malley, and A.D. Dragan. Learning Robot Objectives from Physical Human Interaction. Conference on Robot Learning (CoRL), 2017. (oral, acceptance rate 10%)
  • [we observe that explicit feedback and leaked information can all be formalized as a reward-rational implicit choice the human is making -- a unifying lens on reward learning] S. Milli, H.j. Jeon, and A.D. Dragan. Reward-rational (implicit) choice: A unifying formalism for reward learning. (in review)

Highlights: Modeling Human Behavior Despite Irrationality


  • [human behavior appears irrational, but can be explained as rational under different beliefs than those of the robot's/agent's] S. Reddy, A.D. Dragan, and S. Levine. Where do you think you're going? Inferring beliefs about dynamics from behavior. Neural Information Processing Systems (NeurIPS), 2018.
  • [much work focuses on better predictive models of people; but almost any model is bound to be wrong at times, and here we enable the robot to detect this online -- by estimating the person's apparent irrationality, the robot not only detects misspecification when the human apppears too irrational to its model, but it also automatically becomes more conservative due to making higher variance future predictions ] J. Fisac, A. Bajcsy, D. Fridovich, S. Herbert, S. Wang, S. Milli, C. Tomlin, and A.D. Dragan. Probabilistically Safe Robot Planning with Confidence-Based Human Predictions. Robotics: Science and Systems (RSS), 2018. (invited to special issue)

Highlights: Capturing the Game-Theoretic Nature of Alignment/Interaction


  • [a formalism of value alignment as a collaboration with a human with known objective: here we advocate that Inverse RL should be formulated as a collaboration in which the agent is no longer a passive observer, and the human is no longer an uninterested expert acting as if in isolation] D. Hadfield-Menell, A.D. Dragan, P. Abbeel, and S. Russell. "Cooperative Inverse Reinforcement Learning". Neural Information Processing Systems (NIPS), 2016.
  • [there are many handcrafted strategies for enhancing coordination with people (e.g. cars inch forward at intersections); here we show that cars invent such strategies autonomously if they model their influence on human actions] D. Sadigh, S.S. Sastry, S.A. Seshia, and A.D. Dragan. "Information Gathering Actions over Human Internal State". International Conference on Intelligent Robots and Systems (IROS), 2016 (best cognitive robotics paper award finalist) , and "Planning for Autonomous Cars that Leverage Effects on Human Actions". Robotics: Science and Systems (RSS), 2016. (invited to special issue)

All Conference Papers & Journal Articles