Abhishek Gupta

Ph.D. student, UC Berkeley EECS
Office & Mailing Address:
750 Sutardja Dai Hall
Berkeley CA 94720
I am a Ph.D. student in EECS at UC Berkeley advised by Professor Pieter Abbeel and Professor Sergey Levine in the Berkeley Artificial Intelligence Research (BAIR) Lab. Previously I was an undergraduate EECS Major at UC Berkeley working with Professor Pieter Abbeel.

My main research goal is to develop algorithms which enable robotic systems to learn how to perform complex tasks quickly and efficiently in a variety of unstructured environments. I am currently working in the areas of transfer learning and quick adaptation for Deep Reinforcement Learning algorithms applied to robotic systems. I am also working on getting robotic hands to be able to learn a variety of complex dexterous skills using Deep Reinforcement Learning. In the past I have worked on video prediction, learning from demonstration and hierarchical planning.

Preprints and Tech Reports

Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine
Meta-Reinforcement Learning of Structured Exploration Strategies

Exploration is a fundamental challenge in reinforcement learning (RL). Many of the current exploration methods for deep RL use task-agnostic objectives, such as information gain or bonuses based on state visitation. However, many practical applications of RL involve learning more than a single task, and prior tasks can be used to inform how exploration should be performed in new tasks. In this work, we explore how prior tasks can inform an agent about how to explore effectively in new situations. We introduce a novel gradient-based fast adaptation algorithm -- model agnostic exploration with structured noise (MAESN) -- to learn exploration strategies from prior experience. The prior experience is used both to initialize a policy and to acquire a latent exploration space that can inject structured stochasticity into a policy, producing exploration strategies that are informed by prior knowledge and are more effective than random action-space noise. We show that MAESN is more effective at learning exploration strategies when compared to prior meta-RL methods, RL without learned exploration strategies, and task-agnostic exploration methods. We evaluate our method on a variety of simulated tasks: locomotion with a wheeled robot, locomotion with a quadrupedal walker, and object manipulation.

Benjamin Eysenbach*, Abhishek Gupta*, Julian Ibarz, Sergey Levine
Diversity is All You Need: Learning Skills without a Reward Function

In this paper, we propose DIAYN ("Diversity is All You Need"), a method for learning useful skills without a reward function. Our proposed method learns skills by maximizing an information theoretic objective using a maximum entropy policy. On a variety of simulated robotic tasks, we show that this simple objective results in the unsupervised emergence of diverse skills, such as walking and jumping. In a number of reinforcement learning benchmark environments, our method is able to learn a skill that solves the benchmark task despite never receiving the true task reward. In these environments, some of the learned skills correspond to solving the task, and each skill that solves the task does so in a distinct manner. Our results suggest that unsupervised discovery of skills can serve as an effective pretraining mechanism for overcoming challenges of exploration and data efficiency in reinforcement learning.

Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, John Schulman, Emanuel Todorov, Sergey Levine
Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

Dexterous multi-fingered hands are extremely versatile and provide a generic way to perform multiple tasks in human-centric environments. However, effectively controlling them remains challenging due to their high dimensionality and large number of potential contacts. Deep reinforcement learning (DRL) provides a model-agnostic approach to control complex dynamical systems, but has not been shown to scale to high-dimensional dexterous manipulation. Furthermore, deployment of DRL on physical systems remains challenging due to sample inefficiency. Thus, the success of DRL in robotics has thus far been limited to simpler manipulators and tasks. In this work, we show that model-free DRL with natural policy gradients can effectively scale up to complex manipulation tasks with a high-dimensional 24-DoF hand, and solve them from scratch in simulated experiments. Furthermore, with the use of a small number of human demonstrations, the sample complexity can be significantly reduced, and enable learning within the equivalent of a few hours of robot experience. We demonstrate successful policies for multiple complex tasks: object relocation, in-hand manipulation, tool use, and door opening.


YuXuan Liu*, Abhishek Gupta*, Pieter Abbeel, Sergey Levine
Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation
Accepted at International Conference on Robotics and Automation (ICRA 2018)

This work addresses the problem of learning behaviors by observing raw video demonstrations. We aim to enable a robot to learn complex manipulation behaviors by observing demonstration videos of a task being performed by a human demonstrator in a different context (eg viewpoint, lighting conditions, distractors, etc) than the one it has the perform the task in. We learn a context aware translation model which is able to model these context changes, and use a simple feature tracking perceptual reward in order to enable imitation from arbitrary contexts. We provide a large variety of experiments in both simulation and using a real 7 DoF sawyer robotic arm to illustrate our method.

Vikash Kumar, Abhishek Gupta, Emanuel Todorov, Sergey Levine
Learning Dexterous Manipulation Policies from Experience and Imitation
Accepted in IJRR Special Issue on Deep Learning

This paper presents simulated and real-world experiments on learning dexterous manipulation policies for a 5-finger robotic hand. Complex skills such as in-hand manipulation and grasping are learned through a combination of autonomous experience and imitation of a human operator. The skills can be represented as time-varying linear-Gaussian controllers, ensembles of time-varying controllers indexed via a nearest neighbor method, and deep neural networks.

Abhishek Gupta*, Coline Devin*, Yuxuan Liu, Pieter Abbeel, Sergey Levine
Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning
Published as a conference paper at ICLR 2017.

In this paper, we examine how reinforcement learning algorithms can transfer knowledge between morphologically different agents. We introduce a problem formulation where two agents are tasked with learning multiple skills by sharing information. Our method uses the skills that were learned by both agents to train invariant feature spaces that can then be used to transfer other skills from one agent to another. We evaluate our transfer learning algorithm in two simulated robotic manipulation skills

Abhishek Gupta*, Coline Devin*, Yuxuan Liu, Pieter Abbeel, Sergey Levine
Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer
ICRA 2017.

We propose modular policy networks, a general approach for transferring components of neural network policies between robots, tasks, and other degrees of variation. Modular policy networks consist of modules that can be mixed and matches to perform new robot-task combinations (or, in general, other combinations of the degrees of variation). For example, a module for opening a drawer can be combined with a module for controlling a four-link robot arm to enable a four-link arm to open the drawer. We demonstrate that modular policy networks can transfer knowledge to new tasks and even perform zero-shot learning for new task-robot combinations.

   Abhishek Gupta, Clemens Eppner, Sergey Levine, Pieter Abbeel
Learning Dexterous Manipulation for a Soft Robotic Hand from Human Demonstration
IROS 2016.

In this work, we present a method for learning dexterous manipulation skills for a low-cost, soft robotic hand. We show how we can learn a variety of motion skills using object-centric human demonstrations: demonstrations where the human manipulates an object using his own hand, and the robot then learns to track the trajectory of the object. By tracking a variety of human demonstrations with different initial conditions, the robot can acquire a generalizable neural network policy that can carry out the demonstrated behavior under new conditions. Control is performed directly at the level of inflation and deflation commands to the soft hand, and we demonstrate the method on a range of tasks, including turning a valve, moving the beads on an abacus, and grasping a bottle.

Rohan Chitnis, Dylan Hadfield-Menell, Abhishek Gupta, Siddhart Srivastava, Edward Groshev, Christopher Lin, Pieter Abbeel
Guided Search for Task and Motion Plans Using Learned Heuristics
ICRA 2016.

Task and motion planning(TAMP) methods integrate logical search over high-level actions with geometric reasoning to address this challenge. We present an algorithm that searches the space of possible task and motion plans and uses statistical machine learning to guide the search process. Our contributions are as follows:1) We present a complete algorithm for TAMP 2) We present a randomized local search algorithm for plan refinement that is easily formulated as a Markov decision process(MDP) 3) We apply reinforcement learning(RL) to learn a policy for this MDP 4) We learn from expert demonstrations to efficiently search the space of high-level task plans, given options that address different infeasibilities and 5) We run experiments to evaluate our system in a variety of simulated domains.

Alex Lee, Abhishek Gupta, Henry Lu, Sergey Levine, Pieter Abbeel
Learning from Multiple Demonstrations using Trajectory-Aware Non-Rigid Registration with Applications to Deformable Object Manipulation
IROS 2015.

Trajectory transfer using point cloud registration is powerful tool for learning from demonstration, but is typically unaware of which elements of the scene are relevant to the task. In this work, we determine relevance by considering the demonstrated trajectory, and perform registration with a trajectory-aware method to improve generalization.

Alex X. Lee, Henry Lu, Abhishek Gupta, Sergey Levine, Pieter Abbeel
Learning Force-Based Manipulation of Deformable Objects from Multiple Demonstrations.
ICRA 2015.

This paper combines trajectory transfer via point cloud registration with variable impedance control, in order to improve the generalization of behaviors that require a mix of precise, high-gain motion and force-driven behaviors like straightening a towel. Multiple example demonstrations are analyzed to determine which part of the motion should emphasize precise positioning, and which part requires matching the demonstrated forces. The method is demonstrated on rope tying, towel folding, and erasing a whiteboard.

Siddharth Srivastava, Shlomo Zilberstein, Abhishek Gupta, Pieter Abbeel, Stuart Russell
Tractability of Planning with Loops
AAAI 2015.

In this work, we create a unified framework for analyzing and synthesizing plans with loops for solving problems with non-deterministic numeric effects and a limited form of partial observability. Three different action models with deterministic, qualitative non-deterministic and Boolean non-deterministic semantics are handled using a single abstract representation. We establish the conditions under which the correctness and termination of solutions, represented as ab-stract policies, can be verified. We also examine the feasibility of learning abstract policies from examples. We demonstrate our techniques on several planning problems and show that they apply to challenging real-world tasks such as doing the laundry with a PR2 robot.

Research Support

National Science Foundation Graduate Research Fellowship, 2016-present