I am an Assistant Professor in the Department of Electrical Engineering and Computer Sciences
at UC Berkeley.
In my research, I focus on the intersection between control and machine learning, with the aim of developing algorithms
and techniques that can endow machines with the ability to autonomously acquire the skills for executing complex tasks. In particular,
I am interested in how learning can be used to acquire complex behavioral skills, in order to endow machines with greater autonomy
and intelligence. To see a more formal biography, click here.
Biography
Sergey Levine received a BS and MS in Computer Science from Stanford University in 2009, and a Ph.D. in Computer Science from Stanford University in 2014. He joined the faculty of the Department of Electrical Engineering and Computer Sciences at UC Berkeley in fall 2016. His work focuses on machine learning for decision making and control, with an emphasis on deep learning and reinforcement learning algorithms. Applications of his work include autonomous robots and vehicles, as well as computer vision and graphics. His research includes developing algorithms for end-to-end training of deep neural network policies that combine perception and control, scalable algorithms for inverse reinforcement learning, deep reinforcement learning algorithms, and more. His work has been featured in many popular press outlets, including the New York Times, the BBC, MIT Technology Review, and Bloomberg Business.
News and Announcements
August 24, 2016
My colleagues at Google have released the grasping and pushing data used for Levine et al. '16 (ISER) and Finn et al. '16 (NIPS): Google Brain Robotics Data.
August 12, 2016
Two papers accepted for oral presentation at NIPS 2016, and three accepted for poster presentation! Congratulations to my coauthors.
July 1, 2016
Two papers accepted to IROS 2016, and one to ISER 2016! Congratulations to my coauthors.
June 15, 2016
Three new preprints on deep robotic learning posted!
May 22, 2016
The paper "Optimal Control with Learned Local Models: Application to Dexterous Manipulation" wins the ICRA 2016 Best Manipulation Paper Award! Congratulations to my coauthors, Vikash and Emo!
April 24, 2016
Two papers accepted into ICML 2016 and one into JMLR! Congratulations to my co-authors.
Two new preprints on deep robotic learning posted!
March 7, 2016
New preprint on large-scale robotic deep learning posted!
March 3, 2016
Four new preprints on deep reinforcement learning and control posted!
March 1, 2016
Public implementation of guided policy search now available on GitHub!
February 2, 2016
Three papers accepted at ICLR 2016!
January 15, 2016
Five papers accepted at ICRA 2016!
September 25, 2015
One updated preprint (on continuous memory states) and five new preprints on deep reinforcement learning and control posted!
September 18, 2015
Two new IROS papers and a new ICCV paper posted!
September 2, 2015
New article and video segment covering our work on deep sensorimotor learning by Jack Clark from Bloomberg Business.
July 7, 2015
Three new arXiv preprints on deep reinforcement learning posted!
May 28, 2015
The paper "Learning Contact-Rich Manipulation Skills with Guided Policy Search" wins the ICRA 2015 Best Manipulation Paper Award! Congratulations to my coauthors, Nolan and Pieter!
This is a recent talk summarizing some of my work on deep learning for robotic control.
Representative Publications
These recent papers provide an overview of my research, including: large scale robotic learning, deep reinforcement learning algorithms, inverse optimal control, and deep learning of robotic sensorimotor skills.
This paper presents an approach for learning grasping with continuous servoing by using large-scale data collection on a cluster of up to 14
individual robots. We collected about 800,000 grasp attempts, which we used to train a large convolutional neural network to predict grasp
success given an image and a candidate grasp vector. We then construct a continuous servoing mechanism that uses this network to continuously
make decisions about the optimal motor command to maximize the probability of grasp success. We evaluate our approach by grasping objects that
were not seen at training time, and compare to an open-loop variant that does not perform continuous feedback control.
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization.
Chelsea Finn, Sergey Levine, Pieter Abbeel. ICML 2016.
[PDF]
[Video]
[arXiv]
In this paper, we explore optimal control methods that can be used to train deep neural network cost functions.
We formulate an efficient sample-based approximation for MaxEnt IOC, and evaluate our method on a series
of simulated tasks and real-world robotic manipulation problems, including pouring and inserting dishes into a rack.
Continuous Deep Q-Learning with Model-based Acceleration.
Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine. ICML 2016.
[PDF]
[arXiv]
In this paper, we explore algorithms and representations to reduce the sample complexity of deep reinforcement learning for continuous control tasks.
We propose two complementary techniques for improving the efficiency of such algorithms. First, we derive a continuous variant of the Q-learning algorithm,
which we call normalized adantage functions (NAF). To further improve the efficiency of our approach, we explore the use of learned models for accelerating
model-free reinforcement learning, and show that iteratively refitted local linear models are especially effective for this.
End-to-End Training of Deep Visuomotor Policies.
Sergey Levine*, Chelsea Finn*, Trevor Darrell, Pieter Abbeel. JMLR 17, 2016.
[PDF]
[Video]
[arXiv]
This paper presents a method for training visuomotor policies that perform both vision and control for robotic manipulation tasks. The
policies are represented by deep convolutional neural networks with about 92,000 parameters. By
learning to perform vision and control together, the vision system can adapt to the goals of the task, essentially performing goal-driven
perception. Experimental results on a PR2 robot show that this method achieves substantial improvements in the accuracy of the final policy.
We propose PLATO, an algorithm that trains complex neural network policies using an adaptive variant of model-predictive control (MPC)
to generate the supervision. We prove that our adaptive MPC teacher produces supervision that leads to good long-horizon performance
of the resulting policy, and empirically demonstrate that MPC can avoid dangerous on-policy actions in unexpected situations during training.
Towards Adapting Deep Visuomotor Representations from Simulated to Real Environments
Eric Tzeng, Coline Devin, Judy Hoffman, Chelsea Finn, Xingchao Peng, Sergey Levine, Kate Saenko, Trevor Darrell. arXiv 1511.07111.
[Overview]
[PDF]
[arXiv]
In this paper, we propose an approach for training the vision system of a visuomotor policy using a large number of simulated images
and a small number of real-world images, using a domain adaptation loss to compensate for the discrepancies between the real and
simulated scenes. We demonstrate that visual features can be trained using a low-fidelity rendering of the robot and manipulated objects,
and that these features can then be used to build a visuomotor policy that performs a simple robotic manipulation skill.
All Papers and Articles
2016
Guided Policy Search as Approximate Mirror Descent.
William Montgomery, Sergey Levine. NIPS 2016.
[Overview]
[PDF]
[arXiv]
In this paper, we describe a novel variant of guided policy search that can be interpreted as an approximation to mirror descent, consisting
of a policy update step that updates the policy in the space of trajectories, and a projection step that projects the improved trajectory
distribution onto the manifold of policies that are feasible under the current parameterization (e.g., neural networks). The resulting algorithm
corresponds precisely to mirror descent under time-varying linear dynamics and convex policy parameterizations, but even for complex non-linear
systems and high-dimensional neural network policies, it achieves good performance, while removing much of the complexity and most of the
hyperparameters of prior GPS methods.
Learning to Poke by Poking: Experiential Learning of Intuitive Physics.
Pulkit Agrawal, Ashvin Nair, Pieter Abbeel, Jitendra Malik, Sergey Levine. NIPS 2016.
[Overview]
[PDF]
[Video]
[arXiv]
This paper proposes a method for acquiring internal models of intuitive physics from automatically collected robotic interaction experience.
In our experimental setup, a Baxter robot obtains a large collection of experience poking various objects. This experience is then used to train
an intuitive physics model to predict the outcome of a poke, and this model can then be used to move objects into desired poses. Our experiments
analyze various predictive models and provide real-world and simulated evaluations of our approach.
Unsupervised Learning for Physical Interaction through Video Prediction.
Chelsea Finn, Ian Goodfellow, Sergey Levine. NIPS 2016.
[Overview]
[PDF]
[Video]
[arXiv]
[Data]
In this work, we propose an unsupervised method for learning about physical object interactions through video prediction. Our proposed
video prediction method outperforms a number of previous approaches, achieving good qualitative and quantitative results both on a standard
human motion dataset and on a new proposed dataset of robotic object interaction videos.
Backprop KF: Learning Discriminative Deterministic State Estimators.
Tuomas Haarnoja, Anurag Ajay, Sergey Levine, Pieter Abbeel. NIPS 2016.
[Overview]
[PDF]
[arXiv]
In this paper, we discuss how generative state estimators, such as the Kalman filter, can be composed with arbitrary neural network
observation functions in a framework that allows simple, discriminative end-to-end training. The key idea in the approach is to treat
the inference procedure in the state estimator as a differentiable computation graph, optimizing the parameters directly with backpropagation.
We introduce the value iteration network: a fully differentiable neural network with a 'planning module' embedded within. Value iteration networks are
suitable for making predictions about outcomes that involve planning-based reasoning, such as predicting a desired trajectory from an observation of a map.
Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network,
and trained end-to-end using standard backpropagation.
Learning Dexterous Manipulation for a Soft Robotic Hand from Human Demonstration.
Abhishek Gupta, Clemens Eppner, Sergey Levine, Pieter Abbeel. IROS 2016.
[Overview]
[PDF]
[Video]
[arXiv]
In this work, we present a method for learning dexterous manipulation skills for a low-cost, soft robotic hand. We show how we
can learn a variety of motion skills using object-centric human demonstrations: demonstrations where the human manipulates an
object using his own hand, and the robot then learns to track the trajectory of the object. By tracking a variety of human demonstrations
with different initial conditions, the robot can acquire a generalizable neural network policy that can carry out the demonstrated
behavior under new conditions. Control is performed directly at the level of inflation and deflation commands to the soft hand,
and we demonstrate the method on a range of tasks, including turning a valve, moving the beads on an abacus, and grasping a bottle.
One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors.
Justin Fu, Sergey Levine, Pieter Abbeel. IROS 2016.
[Overview]
[PDF]
[Video]
[arXiv]
In this paper, we address the problem of executing motion skills that have not been attempted before, by using model-predictive control
with online adaptation of the dynamics model. The dynamics model is estimated online using a simple linear representation, but the
estimation also incorporates a neural network prior that is trained on other, similar tasks. We show that this approach allows PR2
robot to solve new tasks on the first attempt, using prior data from other, related tasks.
This paper presents an approach for learning grasping with continuous servoing by using large-scale data collection on a cluster of up to 14
individual robots. We collected about 800,000 grasp attempts, which we used to train a large convolutional neural network to predict grasp
success given an image and a candidate grasp vector. We then construct a continuous servoing mechanism that uses this network to continuously
make decisions about the optimal motor command to maximize the probability of grasp success. We evaluate our approach by grasping objects that
were not seen at training time, and compare to an open-loop variant that does not perform continuous feedback control.
End-to-End Training of Deep Visuomotor Policies.
Sergey Levine*, Chelsea Finn*, Trevor Darrell, Pieter Abbeel. JMLR 17, 2016.
[Overview]
[PDF]
[Video]
[arXiv]
This paper presents a method for training visuomotor policies that perform both vision and control for robotic manipulation tasks. The
policies are represented by deep convolutional neural networks with about 92,000 parameters. By
learning to perform vision and control together, the vision system can adapt to the goals of the task, essentially performing goal-driven
perception. Experimental results on a PR2 robot show that this method achieves substantial improvements in the accuracy of the final policy.
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization.
Chelsea Finn, Sergey Levine, Pieter Abbeel. ICML 2016.
[Overview]
[PDF]
[Video]
[arXiv]
In this paper, we explore optimal control methods that can be used to train deep neural network cost functions.
We formulate an efficient sample-based approximation for MaxEnt IOC, and evaluate our method on a series
of simulated tasks and real-world robotic manipulation problems, including pouring and inserting dishes into a rack.
Continuous Deep Q-Learning with Model-based Acceleration.
Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine. ICML 2016.
[Overview]
[PDF]
[arXiv]
In this paper, we explore algorithms and representations to reduce the sample complexity of deep reinforcement learning for continuous control tasks.
We propose two complementary techniques for improving the efficiency of such algorithms. First, we derive a continuous variant of the Q-learning algorithm,
which we call normalized adantage functions (NAF). To further improve the efficiency of our approach, we explore the use of learned models for accelerating
model-free reinforcement learning, and show that iteratively refitted local linear models are especially effective for this.
This paper presents a simple and effective algorithm for training stochastic neural network models. Our method combines the effectiveness
of backpropagation, which is typically used for deterministic models, with likelihood ratio methods, in order to produce an unbiased
optimization algorithm that benefits from gradient information, but still produces an unbiased estimate of the gradient of the learning objective.
We demonstrate that our method can effectively optimize neural networks with several kinds of discrete stochastic units.
Learning Visual Predictive Models of Physics for Playing Billiards.
Katerina Fragkiadaki*, Pulkit Agrawal*, Sergey Levine, Jitendra Malik. ICLR 2016.
[Overview]
[PDF]
[arXiv]
This work proposes an object-centric model for predicting physics. The proposed models directly process raw visual input, and uses a novel object-centric prediction
formulation based on visual glimpses centered on objects (fixations) to enforce translational invariance of the learned physical laws.
The method is evaluated by learning to predict the motion of simulated billiard balls
in response to external forces from a billiards player, which can be used to not only model the motion of the balls, but also to choose
actions that will send the balls into desired positions.
High-Dimensional Continuous Control Using Generalized Advantage Estimation.
John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, Pieter Abbeel. ICLR 2016.
[Overview]
[PDF]
[arXiv]
This paper proposed a method for augmenting policy gradient methods with fitted value functions. In contrast to prior techniques, such as actor-critic methods,
our approach allows for an explicit tradeoff between bias and variance by mixing Monte Carlo and function approximator based estimates of the advantage functions.
We show that this method can be interpreted as a particular type of cost shaping, and demonstrate that, when both the policy and value function are represented
by deep neural networks, our approach can be used to learn policies for complex 3D locomotion tasks.
Deep Spatial Autoencoders for Visuomotor Learning.
Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel. ICRA 2016.
[Overview]
[PDF]
[Video]
[arXiv]
This paper addressed the problem of learning robotic motion skills that require vision and object tracking, such as manipulation of
freely moving objects. Learning is performed in two stages: in the first stage, the robot explores the environment and uses unsupervised
learning to acquire a state representation that captures moving objects in the world. In the second stage, this representation is
used to learn a policy that performs the desired task. Real-world results are presented on the PR2 robot.
Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search.
Tianhao Zhang, Gregory Kahn, Sergey Levine, Pieter Abbeel. ICRA 2016.
[Overview]
[PDF]
[Video]
[arXiv]
This paper presents a method for training neural network policies for autonomous aerial vehicles using model-predictive control (MPC) and
guided policy search. A major challenge in applying reinforcement learning to aerial vehicles is the possibility of critical failure
during training. To that end, MPC is used to guide off-policy learning with guided policy search. The final neural network policy provides
runtime efficiency and generalization, and removes the need for explicit state estimation at test time by using raw sensor inputs.
Optimal Control with Learned Local Models: Application to Dexterous Manipulation.
Vikash Kumar, Emanuel Todorov, Sergey Levine. ICRA 2016.
[Overview]
[PDF]
[Video]
This paper describes an application of reinforcement learning to control a five-finger pneumatically-actuated hand performing a variety
of object manipulation tasks. The learned controller directly controls the pneumatic valves, and is able to manipulate freely moving
objects. Results are presented in a high-fidelity simulation, as well as on a real physical system.
Model-Based Reinforcement Learning with Parametrized Physical Models and Optimism-Driven Exploration.
Christopher Xie, Sachin Patil, Teodor Moldovan, Sergey Levine, Pieter Abbeel. ICRA 2016.
[Overview]
[PDF]
[arXiv]
In this paper, we describe a model-based reinforcement learning algorithm that combines ideas from model identification and reinforcement
learning. The model is represented using features automatically extracted from the equations of motion. This requires structural knowledge
about the system, such as its morphology, but does not require knowing its physical parameters. Actions are then chosen using model-predictive
control, with optimistic exploration incorporated into the control problem using virtual controls.
Learning Deep Neural Network Policies with Continuous Memory States.
Marvin Zhang, Zoe McCarthy, Chelsea Finn, Sergey Levine, Pieter Abbeel. ICRA 2016.
[Overview]
[PDF]
[arXiv]
In this paper, we describe how the guided policy search algorithm can be used to learn recurrent policies with continuous internal memory states.
We show that, when the memory states are added to the state of the system and optimized as part of the guided policy search trajectory optimization
phase, we can train policies that effectively use memory to complete a variety of simulated robotic manipulation tasks. This approach avoids the
need to perform backpropagation through time, and outperforms a baseline that naively replaces the policy with a recurrent neural network.
2015
Recurrent Network Models for Human Dynamics.
Katerina Fragkiadaki, Sergey Levine, Panna Felsen, Jitendra Malik. ICCV 2015.
[Overview]
[PDF]
[Video]
[arXiv]
This paper presents a method for predicting, synthesizing, and tracking human motion using videos or motion capture, by means of an encoder-recurrent-decoder model based on the long short-term memory (LSTM).
The results show synthesized motion capture sequences using models trained on large datasets, as well as body joint tracking and prediction results from RGB videos.
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models.
Bradly C. Stadie, Sergey Levine, Pieter Abbeel. arXiv 1507.00814. 2015.
[Overview]
[PDF]
[arXiv]
In this work, we propose a method based on deep neural network for incentivizing exploration in reinforcement learning, with applications to
learning directly from raw sensory input in the domain of Atari games. Our method is based on exploration novelty bonuses, with the novelty
of a state determined by the degree to which a learned dynamics model is able to predict that state from the preceding state and action. Our
results show that this simple novelty measure dramatically improves the learning rate of a DQN learner.
In this paper, we address the learning of complex, multi-step behaviors through the framework
of compound controllers. We show how the individual skills in a compound controller can be trained progressively, and further show how reset controllers can be trained automatically to reset the
environment between episodes during learning. Finally, we demonstrate how a system of forward and reset controllers can be used to train a complex neural network policy for object grasping without any human intervention.
Learning from Multiple Demonstrations using Trajectory-Aware Non-Rigid Registration with Applications to Deformable Object Manipulation.
Alex X. Lee, Abhishek Gupta, Henry Lu, Sergey Levine, Pieter Abbeel. IROS 2015.
[Overview]
[PDF]
[Video]
Trajectory transfer using point cloud registration is powerful tool for learning from demonstration, but is typically unaware of
which elements of the scene are relevant to the task. In this work, we determine relevance by considering the demonstrated trajectory,
and perform registration with a trajectory-aware method to improve generalization.
Trust Region Policy Optimization.
John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel. ICML 2015.
[Overview]
[PDF]
[Video]
[arXiv]
In this paper, we present a trust region policy optimization algorithm that can effectively handle complex, high-dimensional policies.
Our experimental results show that this approach can be used to directly optimize high-dimensional neural network policies for a variety
of complex tasks, including both simulated locomotion and playing Atari games directly from images. We also present a theoretical analysis
of trust region policy optimization that shows that guaranteed monotonic improvement is possible with an appropriately chosen step size.
Learning Contact-Rich Manipulation Skills with Guided Policy Search.
Sergey Levine, Nolan Wagener, Pieter Abbeel. ICRA 2015.
[Overview]
[PDF]
[Video]
In this work, we apply the guided policy search method to autonomously learn a range of manipulation skills on a PR2 robot.
The method is able to learn controllers for stacking large lego blocks, threading rings onto tight-fitting pegs, assembling a toy airplane,
inserting a shoe tree into a shoe, and screwing bottle caps onto bottles. Experiments evaluate the robustness of the controllers to perturbations
and generalization to new target locations.
Learning Force-Based Manipulation of Deformable Objects from Multiple Demonstrations.
Alex X. Lee, Henry Lu, Abhishek Gupta, Sergey Levine, Pieter Abbeel. ICRA 2015.
[Overview]
[PDF]
[Video]
This paper combines trajectory transfer via point cloud registration with variable impedance control, in order to improve the generalization of behaviors that
require a mix of precise, high-gain motion and force-driven behaviors like straightening a towel. Multiple example demonstrations are analyzed to determine
which part of the motion should emphasize precise positioning, and which part requires matching the demonstrated forces. The method is demonstrated on rope
tying, towel folding, and erasing a whiteboard.
Optimism-Driven Exploration for Nonlinear Systems.
Teodor Mihai Moldovan, Sergey Levine, Michael I. Jordan, Pieter Abbeel. ICRA 2015.
[Overview]
[PDF]
This work presents a data-efficient online reinforcement learning algorithm that uses the concept of optimism in the face of uncertainty
to efficiently learn to perform continuous control tasks. Trajectory optimization is used to plan paths under optimistic dynamics,
using the variance in the currently learned model to determine the degree of optimism. As the learned model converges, the amount
of optimism inserted into the trajectory optimization decreases to zero. This method achieves extremely fast learning on a set of benchmark problems.
2014
Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics.
Sergey Levine, Pieter Abbeel. NIPS 2014.
[Overview]
[PDF]
[Video]
This paper presents an algorithm for optimizing linear-Gaussian controllers under unknown dynamics by iteratively refitting
linear dynamics models. This algorithm can be used to optimize trajectory distributions, which can then be used in conjunction with the
guided policy search method to train policies with any parameterization. Demonstrations include walking, peg insertion in partially observed domains, and octopus arm control.
In this work, we present an algorithm for learning complex policies represented by neural networks, by means of a novel constrained
trajectory optimization method. The algorithm iteratively optimizes trajectory distributions that minimize a cost function and agree with the current
neural network policy, which is then trained to reproduce those trajectories. Demonstrations include bipedal walking on rough terrain and recovery from very strong pushes.
Motor Skill Learning with Local Trajectory Methods.
Sergey Levine. Ph.D. thesis, Stanford University, 2014.
[Overview]
[PDF]
In my PhD thesis, I discuss algorithms for learning cost functions and control policies for humanoid motor skills
using example demonstrations and local trajectory methods. Cost functions are learned with a continuous local
inverse optimal control algorithm, while control policies are represented with general-purpose neural networks.
Results include running and walking on uneven terrain and bipedal push recovery.
Offline Policy Evaluation Across Representations with Applications to Educational Games.
Travis Mandel, Yun-En Liu, Sergey Levine, Emma Brunskill, Zoran Popović. AAMAS 2014.
[Overview]
[PDF]
[Website]
This paper uses an importance sampling-based policy evaluation method to compare a number of different policy representations for a concept
ordering task in an educational game. Using data from real human players, a representation is designed and a corresponding policy
is learned that achieves a significant improvement in player retention.
2013
Exploring Deep and Recurrent Architectures for Optimal Control.
Sergey Levine. NIPS Workshop on Deep Learning 2013.
[Overview]
[PDF]
This paper describes a set of experiments on using deep and recurrent neural networks to build controllers for walking
on rough terrain by using the Guided Policy Search algorithm. The results show that deep and recurrent networks have a
modest advantage over shallow architectures in terms of generalization, but also suggest that overfitting and local
optima are serious problems. Two different types of overfitting are analyzed, and directions for future work are discussed.
This paper combines policy search with trajectory optimization in a variational framework. The algorithm
alternates between optimizing a trajectory to match the current policy and minimize cost, and optimizing
a policy to match the current trajectory. Both optimizations are done using standard methods, resulting
in a simple algorithm that can solve difficult locomotion problems that are infeasible with only random exploration.
Inverse Optimal Control for Humanoid Locomotion.
Taesung Park, Sergey Levine. RSS Workshop on Inverse Optimal Control & Robotic Learning
from Demonstration, 2013.
[Overview]
[PDF]
In this paper, we learn a reward function for running from motion capture of a human run.
The reward is learned using a local inverse optimal control algorithm. We show that it can be used to synthesize realistic
running behaviors from scratch, and furthermore can be used to create new running behaviors under novel conditions, such
as sloped terrain and strong lateral perturbations.
This paper introduces a guided policy search algorithm that uses trajectory optimization to
direct policy learning and avoid poor local optima. Using differential dynamic programming
to guide the policy search, this method is able to train general-purpose neural network
controllers to execute complex, dynamic behaviors such as running on high-dimensional simulated
humanoids.
2012
Continuous Inverse Optimal Control with Locally Optimal Examples.
Sergey Levine, Vladlen Koltun. ICML 2012.
[Overview]
[PDF]
[Video/Code]
This paper introduces a new probabilistic inverse optimal control algorithm for learning reward functions
in Markov decision processes. The method is suitable for large, continuous domains
where even computing a full policy is impractical. By using a local approximation
of the reward function, this method can also drop the assumption that the demonstrations
are globally optimal, requiring only local optimality. This allows it to learn from
examples that are unsuitable for prior methods.
Continuous Character Control with Low-Dimensional Embeddings.
Sergey Levine, Jack M. Wang, Alexis Haraux, Zoran Popović, Vladlen Koltun. ACM SIGGRAPH 2012.
[Overview]
[PDF]
[Video/Code]
This work presents a method for animating characters performing user-specified tasks by using a probabilistic motion model,
which is trained on a small number of artist-provided animation
clips. The method uses a low-dimensional space learned from the
example motions to continuously control the character's pose to accomplish the desired task. By controlling the character through a
reduced space, our method can discover new transitions,
precompute a control policy, and avoid low quality poses.
Physically Plausible Simulation for Character Animation.
Sergey Levine, Jovan Popović. SCA 2012.
[Overview]
[PDF]
[Video]
This paper describes a method for generating physically plausible responses for animated characters without
requiring their motion to be strictly physical. Given a stream
of poses, the method simulates plausible responses to physical disturbances and environmental variations.
Since the quasi-physical simulation accounts for the dynamics of the character and surrounding objects
without requiring the motion to be physically valid, it is suitable for both realistic and stylized, cartoony motions.
This paper presents an inverse reinforcement learning algorithm for learning unknown nonlinear reward functions. The algorithm uses Gaussian processes
and a probabilistic model of the expert to capture complex behaviors from suboptimal stochastic demonstrations,
while automatically balancing the simplicity of the learned reward structure against its consistency with the observed actions.
In this article, we present a method for efficiently synthesizing animations for characters traversing complex dynamic environments by sequencing
parameterized locomotion controllers using space-time planning. The controllers are created from motion capture data, and the space-time
planner determines the optimal sequence of controllers to reach a goal in a dynamic, changing environment.
2010
Feature Construction for Inverse Reinforcement Learning.
Sergey Levine, Zoran Popović, Vladlen Koltun. NIPS 2010.
[Overview]
[PDF]
[Poster]
[Website]
This paper presents an algorithm for learning an unknown reward function for a Markov decision process when good basis features are not available, using example traces from the MDP's optimal policy.
The algorithm constructs reward features from a large collection of component features, by building logical conjunctions of
those component features that are relevant to the example policy.
Gesture Controllers.
Sergey Levine, Philipp Krähenbühl, Sebastian Thrun, Vladlen Koltun. ACM SIGGRAPH 2010.
[Overview]
[PDF]
[Video]
Gesture controllers learn optimal policies to generate smooth,
compelling gesture animations from speech and other optional inputs. The accompanying video presents examples of various controllers, including
controllers that recognize key words, admit manual manipulation of gesture style, and even animate a character
with a non-humanoid morphology.
2009
Real-Time Prosody-Driven Synthesis of Body Language.
Sergey Levine, Christian Theobalt, Vladlen Koltun. ACM SIGGRAPH Asia 2009.
[Overview]
[PDF]
[Video]
This paper presents the body language synthesis system described in my undergraduate thesis. The method automatically synthesizes body language animations
directly from the participants' speech signals, without the
need for additional input. The body
language animations are synthesized by selecting segments from motion capture
data of real people in conversation in real time.
Modeling Body Language from Speech in Natural Conversation.
Sergey Levine. Master's research report, Stanford University, 2009.
[Overview]
[PDF]
[Video]
In this report, I describe a new approach for synthesizing body language from prosody using a set of intermediate motion parameters
that can be used to describe stylistic qualities of gesture independent of their form.
The quality of synthesized motion parameters is compared to the parameters of the original motions accompanying an utterance to obtain a
quantitative measure of the performance of the method.
Body Language Animation Synthesis from Prosody.
Sergey Levine. Undergraduate thesis, Stanford University, 2009.
[Overview]
[PDF]
[Video]
In my undergraduate thesis, I describe the body language synthesis system.
This system generates believable body language animations from live speech input, using only the prosody of the speaker's voice.
Since the method is suitable for live speech, it can be used in interactive applications, such as networked virtual worlds.