Abhishek Gupta

Ph.D. student, UC Berkeley EECS
Office & Mailing Address:
750 Sutardja Dai Hall
Berkeley CA 94720
I am a Ph.D. student in EECS at UC Berkeley advised by Professor Pieter Abbeel and Professor Sergey Levine in the Berkeley Artificial Intelligence Research (BAIR) Lab. Previously I was an undergraduate EECS Major at UC Berkeley working with Professor Pieter Abbeel.

My main research goal is to develop algorithms which enable robotic systems to learn how to perform complex tasks quickly and efficiently in a variety of unstructured environments. I am currently working in the areas of transfer learning and quick adaptation for Deep Reinforcement Learning algorithms applied to robotic systems. I am also working on getting robotic hands to be able to learn a variety of complex dexterous skills using Deep Reinforcement Learning. In the past I have worked on video prediction, learning from demonstration and hierarchical planning.

Preprints and Tech Reports

Abhishek Gupta, Benjamin Eysenbach, Chelsea Finn, Sergey Levine
Unsupervised Meta-Learning for Reinforcement Learning

Meta-learning is a powerful tool that builds on multi-task learning to learn how to quickly adapt a model to new tasks. In the context of reinforcement learning, meta-learning algorithms can acquire reinforcement learning procedures to solve new problems more efficiently by meta-learning prior tasks. The performance of meta-learning algorithms critically depends on the tasks available for meta-training: in the same way that supervised learning algorithms generalize best to test points drawn from the same distribution as the training points, meta-learning methods generalize best to tasks from the same distribution as the meta-training tasks. In effect, meta-reinforcement learning offloads the design burden from algorithm design to task design. If we can automate the process of task design as well, we can devise a meta-learning algorithm that is truly automated. In this work, we take a step in this direction, proposing a family of unsupervised meta-learning algorithms for reinforcement learning. We describe a general recipe for unsupervised meta reinforcement learning, and describe an effective instantiation of this approach based on a recently proposed unsupervised exploration technique and model-agnostic meta-learning. We also discuss practical and conceptual considerations for developing unsupervised meta-learning methods. Our experimental results demonstrate that unsupervised meta-reinforcement learning effectively acquires accelerated reinforcement learning procedures without the need for manual task design, significantly exceeds the performance of learning from scratch, and even matches performance of meta-learning methods that use hand-specified task distributions.

Henry Zhu*, Abhishek Gupta*, Aravind Rajeswaran, Sergey Levine, Vikash Kumar
Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost

Dexterous multi-fingered robotic hands can perform a wide range of manipulation skills, making them an appealing component for general-purpose robotic manipulators. However, such hands pose a major challenge for autonomous control, due to the high dimensionality of their configuration space and complex intermittent contact interactions. In this work, we propose deep reinforcement learning (deep RL) as a scalable solution for learning complex, contact rich behaviors with multi-fingered hands. Deep RL provides an end-to-end approach to directly map sensor readings to actions, without the need for task specific models or policy classes. We show that contact-rich manipulation behavior with multi-fingered hands can be learned by directly training with model-free deep RL algorithms in the real world, with minimal additional assumption and without the aid of simulation. We learn a variety of complex behaviors on two different low-cost hardware platforms. We show that each task can be learned entirely from scratch, and further study how the learning process can be further accelerated by using a small number of human demonstrations to bootstrap learning. Our experiments demonstrate that complex multi-fingered manipulation skills can be learned in the real world in about 4-7 hours for most tasks, and that demonstrations can decrease this to 2-3 hours, indicating that direct deep RL training in the real world is a viable and practical alternative to simulation and model-based control. \url{this https URL}


Michael B. Chang, Abhishek Gupta, Sergey Levine, Thomas Griffiths
Automatically Composing Representation Transformations as a Means for Generalization
Accepted at International Conference on Learning Representations (ICLR 2019)

How can we build a learner that can capture the essence of what makes a hard problem more complex than a simple one, break the hard problem along characteristic lines into smaller problems it knows how to solve, and sequentially solve the smaller problems until the larger one is solved? To work towards this goal, we focus on learning to generalize in a particular family of problems that exhibit compositional and recursive structure: their solutions can be found by composing in sequence a set of reusable partial solutions. Our key idea is to recast the problem of generalization as a problem of learning algorithmic procedures: we can formulate a solution to this family as a sequential decision-making process over transformations between representations. Our formulation enables the learner to learn the structure and parameters of its own computation graph with sparse supervision, make analogies between problems by transforming one problem representation to another, and exploit modularity and reuse to scale to problems of varying complexity. Experiments on solving a variety of multilingual arithmetic problems demonstrate that our method discovers the hierarchical decomposition of a problem into its subproblems, generalizes out of distribution to unseen problem classes, and extrapolates to harder versions of the same problem, yielding a 10-fold reduction in sample complexity compared to a monolithic recurrent neural network.

Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine
Diversity is All You Need: Learning Skills without a Reward Function
Accepted at International Conference on Learning Representations (ICLR 2019)

In this paper, we propose DIAYN ("Diversity is All You Need"), a method for learning useful skills without a reward function. Our proposed method learns skills by maximizing an information theoretic objective using a maximum entropy policy. On a variety of simulated robotic tasks, we show that this simple objective results in the unsupervised emergence of diverse skills, such as walking and jumping. In a number of reinforcement learning benchmark environments, our method is able to learn a skill that solves the benchmark task despite never receiving the true task reward. In these environments, some of the learned skills correspond to solving the task, and each skill that solves the task does so in a distinct manner. Our results suggest that unsupervised discovery of skills can serve as an effective pretraining mechanism for overcoming challenges of exploration and data efficiency in reinforcement learning.

Dibya Ghosh, Abhishek Gupta, Sergey Levine
Learning Actionable Representations with Goal-Conditioned Policies
Accepted at International Conference on Learning Representations (ICLR 2019)

Representation learning is a central challenge across a range of machine learning areas. In reinforcement learning, effective and functional representations have the potential to tremendously accelerate learning progress and solve more challenging problems. Most prior work on representation learning has focused on generative approaches, learning representations that capture all underlying factors of variation in the observation space in a more disentangled or well-ordered manner. In this paper, we instead aim to learn functionally salient representations: representations that are not necessarily complete in terms of capturing all factors of variation in the observation space, but rather aim to capture those factors of variation that are important for decision making – that are “actionable.” These representations are aware of the dynamics of the environment, and capture only the elements of the observation that are necessary for decision making rather than all factors of variation, without explicit reconstruction of the observation. We show how these representations can be useful to improve exploration for sparse reward problems, to enable long horizon hierarchical reinforcement learning, and as a state representation for learning policies for downstream tasks. We evaluate our method on a number of simulated environments, and compare it to prior methods for representation learning, exploration, and hierarchical reinforcement learning.

John D Co-Reyes, Abhishek Gupta, Suvansh Sanjeev, Nick Altieri, John DeNero, Pieter Abbeel, Sergey Levine
Guiding Policies with Language via Meta-Learning
Accepted at International Conference on Learning Representations (ICLR 2019)

Behavioral skills or policies for autonomous agents are conventionally learned from reward functions, via reinforcement learning, or from demonstrations, via imitation learning. However, both modes of task specification have their disadvantages: reward functions require manual engineering, while demonstrations require a human expert to be able to actually perform the task in order to generate the demonstration. Instruction following from natural language instructions provides an appealing alternative: in the same way that we can specify goals to other humans simply by speaking or writing, we would like to be able to specify tasks for our machines. However, a single instruction may be insufficient to fully communicate our intent or, even if it is, may be insufficient for an autonomous agent to actually understand how to perform the desired task. In this work, we propose an interactive formulation of the task specification problem, where iterative language corrections are provided to an autonomous agent, guiding it in acquiring the desired skill. Our proposed language-guided policy learning algorithm can integrate an instruction and a sequence of corrections to acquire new skills very quickly. In our experiments, we show that this method can enable a policy to follow instructions and corrections for simulated navigation and manipulation tasks, substantially outperforming direct, non-interactive instruction following.

Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine
Meta-Reinforcement Learning of Structured Exploration Strategies
Accepted at Neural Information Processing Systems (NeurIPS 2018)

Exploration is a fundamental challenge in reinforcement learning (RL). Many of the current exploration methods for deep RL use task-agnostic objectives, such as information gain or bonuses based on state visitation. However, many practical applications of RL involve learning more than a single task, and prior tasks can be used to inform how exploration should be performed in new tasks. In this work, we explore how prior tasks can inform an agent about how to explore effectively in new situations. We introduce a novel gradient-based fast adaptation algorithm -- model agnostic exploration with structured noise (MAESN) -- to learn exploration strategies from prior experience. The prior experience is used both to initialize a policy and to acquire a latent exploration space that can inject structured stochasticity into a policy, producing exploration strategies that are informed by prior knowledge and are more effective than random action-space noise. We show that MAESN is more effective at learning exploration strategies when compared to prior meta-RL methods, RL without learned exploration strategies, and task-agnostic exploration methods. We evaluate our method on a variety of simulated tasks: locomotion with a wheeled robot, locomotion with a quadrupedal walker, and object manipulation.

John D Co-Reyes*, YuXuan Liu*, Abhishek Gupta*, Benjamin Eysenbach, Pieter Abbeel, Sergey Levine
Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings
Accepted at International Conference on Machine Learning (ICML 2018)

In this work, we take a representation learning perspective on hierarchical reinforcement learning, where the problem of learning lower layers in a hierarchy is transformed into the problem of learning trajectory-level generative models. We show that we can learn continuous latent representations of trajectories, which are effective in solving temporally extended and multi-stage problems. Our proposed model, SeCTAR, draws inspiration from variational autoencoders, and learns latent representations of trajectories. A key component of this method is to learn both a latent-conditioned policy and a latent-conditioned model which are consistent with each other. Given the same latent, the policy generates a trajectory which should match the trajectory predicted by the model. This model provides a built-in prediction mechanism, by predicting the outcome of closed loop policy behavior. We propose a novel algorithm for performing hierarchical RL with this model, combining model-based planning in the learned latent space with an unsupervised exploration objective. We show that our model is effective at reasoning over long horizons with sparse rewards for several simulated tasks, outperforming standard reinforcement learning methods and prior methods for hierarchical reasoning, model-based planning, and exploration.

Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, John Schulman, Emanuel Todorov, Sergey Levine
Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations
Accepted at Robotic Science and Systems (RSS 2018)

Dexterous multi-fingered hands are extremely versatile and provide a generic way to perform multiple tasks in human-centric environments. However, effectively controlling them remains challenging due to their high dimensionality and large number of potential contacts. Deep reinforcement learning (DRL) provides a model-agnostic approach to control complex dynamical systems, but has not been shown to scale to high-dimensional dexterous manipulation. Furthermore, deployment of DRL on physical systems remains challenging due to sample inefficiency. Thus, the success of DRL in robotics has thus far been limited to simpler manipulators and tasks. In this work, we show that model-free DRL with natural policy gradients can effectively scale up to complex manipulation tasks with a high-dimensional 24-DoF hand, and solve them from scratch in simulated experiments. Furthermore, with the use of a small number of human demonstrations, the sample complexity can be significantly reduced, and enable learning within the equivalent of a few hours of robot experience. We demonstrate successful policies for multiple complex tasks: object relocation, in-hand manipulation, tool use, and door opening.

YuXuan Liu*, Abhishek Gupta*, Pieter Abbeel, Sergey Levine
Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation
Accepted at International Conference on Robotics and Automation (ICRA 2018)

This work addresses the problem of learning behaviors by observing raw video demonstrations. We aim to enable a robot to learn complex manipulation behaviors by observing demonstration videos of a task being performed by a human demonstrator in a different context (eg viewpoint, lighting conditions, distractors, etc) than the one it has the perform the task in. We learn a context aware translation model which is able to model these context changes, and use a simple feature tracking perceptual reward in order to enable imitation from arbitrary contexts. We provide a large variety of experiments in both simulation and using a real 7 DoF sawyer robotic arm to illustrate our method.

Vikash Kumar, Abhishek Gupta, Emanuel Todorov, Sergey Levine
Learning Dexterous Manipulation Policies from Experience and Imitation
Accepted in IJRR Special Issue on Deep Learning

This paper presents simulated and real-world experiments on learning dexterous manipulation policies for a 5-finger robotic hand. Complex skills such as in-hand manipulation and grasping are learned through a combination of autonomous experience and imitation of a human operator. The skills can be represented as time-varying linear-Gaussian controllers, ensembles of time-varying controllers indexed via a nearest neighbor method, and deep neural networks.

Abhishek Gupta*, Coline Devin*, Yuxuan Liu, Pieter Abbeel, Sergey Levine
Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning
Published as a conference paper at ICLR 2017.

In this paper, we examine how reinforcement learning algorithms can transfer knowledge between morphologically different agents. We introduce a problem formulation where two agents are tasked with learning multiple skills by sharing information. Our method uses the skills that were learned by both agents to train invariant feature spaces that can then be used to transfer other skills from one agent to another. We evaluate our transfer learning algorithm in two simulated robotic manipulation skills

Abhishek Gupta*, Coline Devin*, Yuxuan Liu, Pieter Abbeel, Sergey Levine
Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer
ICRA 2017.

We propose modular policy networks, a general approach for transferring components of neural network policies between robots, tasks, and other degrees of variation. Modular policy networks consist of modules that can be mixed and matches to perform new robot-task combinations (or, in general, other combinations of the degrees of variation). For example, a module for opening a drawer can be combined with a module for controlling a four-link robot arm to enable a four-link arm to open the drawer. We demonstrate that modular policy networks can transfer knowledge to new tasks and even perform zero-shot learning for new task-robot combinations.

   Abhishek Gupta, Clemens Eppner, Sergey Levine, Pieter Abbeel
Learning Dexterous Manipulation for a Soft Robotic Hand from Human Demonstration
IROS 2016.

In this work, we present a method for learning dexterous manipulation skills for a low-cost, soft robotic hand. We show how we can learn a variety of motion skills using object-centric human demonstrations: demonstrations where the human manipulates an object using his own hand, and the robot then learns to track the trajectory of the object. By tracking a variety of human demonstrations with different initial conditions, the robot can acquire a generalizable neural network policy that can carry out the demonstrated behavior under new conditions. Control is performed directly at the level of inflation and deflation commands to the soft hand, and we demonstrate the method on a range of tasks, including turning a valve, moving the beads on an abacus, and grasping a bottle.

Rohan Chitnis, Dylan Hadfield-Menell, Abhishek Gupta, Siddhart Srivastava, Edward Groshev, Christopher Lin, Pieter Abbeel
Guided Search for Task and Motion Plans Using Learned Heuristics
ICRA 2016.

Task and motion planning(TAMP) methods integrate logical search over high-level actions with geometric reasoning to address this challenge. We present an algorithm that searches the space of possible task and motion plans and uses statistical machine learning to guide the search process. Our contributions are as follows:1) We present a complete algorithm for TAMP 2) We present a randomized local search algorithm for plan refinement that is easily formulated as a Markov decision process(MDP) 3) We apply reinforcement learning(RL) to learn a policy for this MDP 4) We learn from expert demonstrations to efficiently search the space of high-level task plans, given options that address different infeasibilities and 5) We run experiments to evaluate our system in a variety of simulated domains.

Alex Lee, Abhishek Gupta, Henry Lu, Sergey Levine, Pieter Abbeel
Learning from Multiple Demonstrations using Trajectory-Aware Non-Rigid Registration with Applications to Deformable Object Manipulation
IROS 2015.

Trajectory transfer using point cloud registration is powerful tool for learning from demonstration, but is typically unaware of which elements of the scene are relevant to the task. In this work, we determine relevance by considering the demonstrated trajectory, and perform registration with a trajectory-aware method to improve generalization.

Alex X. Lee, Henry Lu, Abhishek Gupta, Sergey Levine, Pieter Abbeel
Learning Force-Based Manipulation of Deformable Objects from Multiple Demonstrations.
ICRA 2015.

This paper combines trajectory transfer via point cloud registration with variable impedance control, in order to improve the generalization of behaviors that require a mix of precise, high-gain motion and force-driven behaviors like straightening a towel. Multiple example demonstrations are analyzed to determine which part of the motion should emphasize precise positioning, and which part requires matching the demonstrated forces. The method is demonstrated on rope tying, towel folding, and erasing a whiteboard.

Siddharth Srivastava, Shlomo Zilberstein, Abhishek Gupta, Pieter Abbeel, Stuart Russell
Tractability of Planning with Loops
AAAI 2015.

In this work, we create a unified framework for analyzing and synthesizing plans with loops for solving problems with non-deterministic numeric effects and a limited form of partial observability. Three different action models with deterministic, qualitative non-deterministic and Boolean non-deterministic semantics are handled using a single abstract representation. We establish the conditions under which the correctness and termination of solutions, represented as ab-stract policies, can be verified. We also examine the feasibility of learning abstract policies from examples. We demonstrate our techniques on several planning problems and show that they apply to challenging real-world tasks such as doing the laundry with a PR2 robot.

Research Support

National Science Foundation Graduate Research Fellowship, 2016-present