Foundations for Beneficial AI

Instructors: Stuart Russell (CS); Lara Buchak and Wesley Holliday (Philosophy); Shachar Kariv (Economics)

Computer Science listing: COMPSCI 294-166 (class #32762)

Economics listing: sign up under the Computer Science course listing

Philosophy listing: PHILOS 290-008 (class #33029)

3 units; Monday 2-4pm in 320 Soda

This interdisciplinary course examines the application of ideas from philosophy and economics to decision making by AI systems on behalf of humans, and in particular to the problem of ensuring that increasingly intelligent AI systems remain beneficial to humans. Solving this problem requires designing AI systems whose objective is to satisfy human preferences while remaining necessarily uncertain as to what those preferences are. The course will study issues arising when applying these principles to make decisions on behalf of multiple humans and real (rather than idealized) humans. Topics include utility theory, bounded rationality, utilitarianism, altruism, interpersonal comparisons of utility, preference learning, plasticity of human preferences, epistemic uncertainty about preferences, decision making under risk, social choice theory, and inequality. Students will read papers from the literature in AI, philosophy, and economics and will work in interdisciplinary teams to develop substantial analyses in one or more of these areas. No advanced mathematical background is assumed, but students should be comfortable with formal arguments involving axioms and proofs.

All students will be waitlisted initially. In order to ensure a balance of disciplines in the course, final enrollment decisions will be made by the instructors by the end of the first week of class. Preference will be given to PhD students in CS, Philosophy, and Economics, but other well-prepared students with a particular interest in the course will be considered.


Course structure

The syllabus has three parts: (1) introduction and relevant background on AI, economics, and philosophy; (2) issues related to AI systems that act on behalf of an individual human; and (3) issues related to AI systems that act on behalf of multiple individuals.


Part 1 (3 lectures): Background in AI, Economics, and Philosophy

Basic elements of AI for non-specialists, including various designs for intelligent agents, focusing on the different forms of objectives (goals, utilities, rewards, etc.). The King Midas problem of mis-specified objectives and the approach of “beneficial AI” (see below). Rational decisions in economics, including utility theory, multiattribute utility, risk, game theory, and social choice theory. Elements of moral philosophy including consequentialism and other approaches to making “good decisions,” and their potential application to AI systems.


Part 2 (4 lectures): AI collaboration with a single human

The basic “assistance game” for beneficial AI involves a human with preferences over possible futures and a machine that aims to satisfy those preferences but is uncertain as to what they are. Exploration of questions such as the nature of well-being; the general structure of human preferences; uncertainty about preferences; plasticity of preferences; preferences of non-rational agents and radical interpretation; the psychological reality and measurement of preferences and methods such as inverse reinforcement learning for learning preferences.


Part 3 (5 lectures): AI collaboration with multiple humans  

Basic notions of utilitarian preference aggregation and challenges thereto -- interpersonal comparisons of utility, variable numbers of people, fairness and inequality. Social choice theory and impossibility results. Mechanism design and truthful preference revelation. Altruism, envy, pride, and positional goods.