Stuart Russell -- Research on decision making over long time scales
Like many combinatorial algorithms for sequential decision making, AlphaGo is able to make decisions with a horizon of 15 or 20 primitive steps.
A human life consists of about 20 trillion primitive steps (potential changes in muscle activation).
Even something as simple as going to a conference is about a billion steps.
Humans appear to manage by imposing a hierarchical structure on behavior, so that each primitive action is part of some
higher-level activity, and so on up to very high-level activities such
as "get a PhD" and "earn enough money to retire to Bali".
We study both hierarchical reinforcement learning and hierarchical planning and lookahead,
establishing fundamental results in each area and showing how to enable much faster learning and planning.
Our original result (Parr and Russell, 1998) showed that reinforcement learning could occur within a hierarchy of nondeterministic finite-state machines
that provide some guidance on how to behave in a specific task environment. Nondeterminism corresponds to choice states where the machine must learn how to make each choice,
dependent on the environment state and any internal state. The key observation is that the interaction of the hierarchy with the environment
can be modelled in a joint [environment state, machine state] state space, yielding a semi-Markov decision process whose states are those joint states where the machine
is in a choice state. An appropriately designed reinforcement learning algorithm converges to the hierarchically optimal policy, i.e., the best policy consistent with the
hierarchy.
Subsequent papers introduced the idea of partial programs (Andre and Russell, 2000, 2002)
and concurrent behaviors (Russell and Zimdars, 2003; Marthi et al, 2005), leading to the ability to provide partial specifications
of arbitrarily complex behaviors. For example, in the Stratagus domain (similar to WarCraft), learning was successful on a task requiring over 100,000 primitive actions.
Subsequently, we studied the problem of lookahead decision making with
hierarchical partial programs. Our initial assumption that this
problem was already solved for deterministic domains in the HTN
(hierachical task network) planning literature proved false: there was
no proper semantics (transition model) known for high-level actions,
and therefore no notion of a provably correct high-level plan. We
developed the angelic semantics for high-level actions (Marthi,
Russell, and Wolfe, 2007) which provides a way to describe high-level
actions by the set of reachable states given the initial state
in which the action is executed. With this concept, it is possible to derive provably correct high-level plans without examining their
concrete instantiations, yielding potentially exponential reductions in planning complexity.
Further exploration of these ideas in the context of task-and-motion planning for robotics
led us to expand the semantics of abstract actions to include optimistic descriptions for
symbolic actions in physical domains (e.g., omitting geometric preconditions for grasping),
combined with planning algorithms that allow for failure and correction of abstract plans
when simulating their possible concrete implementations. This led to the first task-and-motion planners that combined off-the-shelf task planners and motion planners, while producing
correct plans.
Publications on hierarchical reinforcement learning
- Ron Parr and Stuart Russell,
``Reinforcement Learning with
Hierarchies of Machines.''
In Advances in Neural Information Processing Systems 10,
MIT Press, 1998.
- David Andre and Stuart Russell,
``Programmable Reinforcement Learning Agents.''
In Advances in Neural Information Processing Systems 13,
MIT Press, 2001.
- David Andre and Stuart Russell,
``State Abstraction
for Programmable Reinforcement Learning Agents.''
In Proc. AAAI-02, Edmonton, Alberta: AAAI Press, 2002.
- Stuart Russell and Andrew Zimdars,
``Q-Decomposition for Reinforcement Learning Agents.''
In Proc. ICML-03, Washington, DC, 2003.
- Bhaskara Marthi, Stuart Russell, David Latham, and Carlos Guestrin,
``Concurrent hierarchical reinforcement learning.''
In Proc. IJCAI-05, Edinburgh, Scotland, 2005.
- Bhaskara Marthi, Stuart Russell, and David Andre,
``A Compact, Hierarchical Q-Function Decomposition.''
In Proc. UAI-06, Cambridge, MA, 2006.
- Aijun Bai and Stuart Russell,
``Efficient Reinforcement Learning with Hierarchies of Machines by Leveraging Internal Transitions.''
In Proc. IJCAI-17, Melbourne, 2017.
Publications on hierarchical planning and lookahead decision making
- B. Marthi, S. J. Russell, and J. Wolfe, ``Angelic Semantics for
High-Level Actions.''
In Proceedings of the Seventeenth International Conference on Automated Planning and Scheduling (ICAPS 2007),
Providence, Rhode Island, 2007.
- Bhaskara Marthi, Stuart Russell, and Jason Wolfe,
``Angelic Hierarchical Planning: Optimal and Online Algorithms.''
In Proceedings of the Eighteenth International Conference on Automated Planning and Scheduling (ICAPS 2008), Sydney, 2008.
- Jason Wolfe, Bhaskara Marthi, and Stuart Russell,
``Combined Task and Motion Planning for Mobile Manipulation.''
In Proceedings of the Twentieth International Conference on Automated Planning and Scheduling (ICAPS 2010), Toronto, 2010.
- Jason Wolfe and Stuart Russell,
``Bounded Intention Planning.''
In Proc. IJCAI-11, Barcelona, 2011.
- Siddharth Srivastava, Eugene Fang, Lorenzo Riano, Rohan Chitnis, Stuart Russell, and
Pieter Abbeel,
``Combined Task and Motion Planning
Through an Extensible Planner-Independent Interface Layer.''
In Proc. ICRA-14,
Hong Kong, 2014.
- Siddharth Srivastava, Shlomo Zilberstein, Abhishek Gupta, Pieter Abbeel, Stuart Russell,
Tractability of Planning with Loops.
In Proc. AAAI-15,
Austin, Texas, 2015.
- Siddharth Srivastava, Stuart Russell, and Alessandro Pinto,
Metaphysics of Planning Domain Descriptions. In Proc. AAAI-16, Phoenix, 2016.