CS 294-125, Spring 2016: Human-compatible AI
Reading list

This list is still under construction. An empty bullet item indicates more readings to come for that week.


Week 1 (1/26): Markov decision processes


Week 2 (2/2): Reinforcement learning, multi-attribute utility theory, preference elicitation

Week 3 (2/9): Goal inference.

Week 4 (2/16): Human preferences

Week 5 (2/23): Collaborative systems

Week 6 (3/1): Psychology of moral decisions

Week 7 (3/8): Inverse reinforcement learning

Week 8 (3/15): Inverse reinforcement learning (cont'd)

Week 9 (3/22):

Spring Break

Week 10 (3/29): Multiagent Sequential Decision Making

Week 11 (4/5): Game theory

Week 12 (4/12): Inverse games

Week 13 (4/19): Embedded reinforcement learning, Baldwinian evolution

  • Mark Ring and Laurent Orseau, "Delusion, Survival, and Intelligent Agents." In Proc. AGI, 2011.
    Describes a possible difficulty with reward-based agents, wherein the agent builds a delusion box that produces fake rewards that make it happy.
    • (optional) Daniel Dewey, "Learning What to Value.". In Proc. AGI, 2011.
      Argues that wireheading arises from RL formulations and proposes instead an approach based on learning an initially unknown utility function.
    • (optional) Bill Hibbard, "Model-based Utility Functions.". JAGI, 3(1), 1-24, 2012.
      Proposes and analyzes a solution to the wireheading problem based on utility functions that depend on unobserved state variables whose values the agent must infer.
    • (optional) Laurent Orseau and Mark Ring, "Space-Time Embedded Intelligence.". Proc. AGI, 2012.
      Defines a very general notion of rationality for agents whose computational substrate is part of the environment they inhabit.
  • David Ackley and Michael Littman, Interactions between learning and evolution. In Proc. Artificial Life II, 1991.
    Discusses the origin of reward functions and how learning speeds up evolution, clarifying the Baldwin effect first proposed in 1896.

Week 14 (4/26): Corrigibility

  • Soares, Nate, et al. "Corrigibility." Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015.

Week 15: