I am a graduate student with Prof. Jitendra Malik in the Vision group at University of California, Berkeley. I did my undergraduate at the Indian Institute of Technology, Kanpur under the supervision of Prof. Surender Baswana. I also did an undergraduate internship at Microsoft Research, New England with Dr. Adam Kalai.

email | cv | github | google scholar

shubhtuls AT cs DOT berkeley DOT edu
My picture

Publications

Learning Shape Abstractions by Assembling Volumetric Primitives
Shubham Tulsiani, Hao Su, Leonidas J.Guibas, Alexei A. Efros, Jitendra Malik
arXiv, Dec 2016
pdf   abstract   bibtex

We present a learning framework for abstracting complex shapes by learning to assemble objects using 3D volumetric primitives. In addition to generating simple and geometrically interpretable explanations of 3D objects, our framework also allows us to automatically discover and exploit consistent structure in the data. We demonstrate that using our method allows predicting shape representations which can be leveraged for obtaining a consistent parsing across the instances of a shape collection and constructing an interpretable shape similarity measure. We also examine applications for image-based prediction as well as shape manipulation.

@incollection{abstractionTulsiani16,
author = {Shubham Tulsiani and 
Hao Su and
Leonidas J. Guibas and
Alexei A. Efros and
Jitendra Malik},
title = {Learning Shape Abstractions by Assembling Volumetric Primitives},
booktitle = {arXiv:1612.00404},
year = {2016}
}
            

Learning Category-Specific Deformable 3D Models for Object Reconstruction
Shubham Tulsiani*, Abhishek Kar*, João Carreira, Jitendra Malik
TPAMI, 2016 (to appear)
* equal contribution
pdf   abstract   bibtex   code

We address the problem of fully automatic object localization and reconstruction from a single image. This is both a very challenging and very important problem which has, until recently, received limited attention due to difficulties in segmenting objects and predicting their poses. Here we leverage recent advances in learning convolutional networks for object detection and segmentation and introduce a complementary network for the task of camera viewpoint prediction. These predictors are very powerful, but still not perfect given the stringent requirements of shape reconstruction. Our main contribution is a new class of deformable 3D models that can be robustly fitted to images based on noisy pose and silhouette estimates computed upstream and that can be learned directly from 2D annotations available in object detection datasets. Our models capture top-down information about the main global modes of shape variation within a class providing a ``low-frequency'' shape. In order to capture fine instance-specific shape details, we fuse it with a high-frequency component recovered from shading cues. A comprehensive quantitative analysis and ablation study on the PASCAL 3D+ dataset validates the approach as we show fully automatic reconstructions on PASCAL VOC as well as large improvements on the task of viewpoint prediction.

@article{pamishapeTulsianiKCM15,
author = {Shubham Tulsiani and
Abhishek Kar and
Jo{\~{a}}o Carreira and
Jitendra Malik},
title = {Learning Category-Specific Deformable 3D
Models for Object Reconstruction},
journal = {TPAMI},
year = {2016},
}
            

View Synthesis by Appearance Flow
Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, Alexei A. Efros
ECCV, 2016
pdf   abstract   bibtex

Given one or more images of an object (or a scene), is it possible to synthesize a new image of the same instance observed from an arbitrary viewpoint? In this paper, we attempt to tackle this problem, known as novel view synthesis, by re-formulating it as a pixel copying task that avoids the notorious difficulties of generating pixels from scratch. Our approach is built on the observation that the visual appearance of different views of the same instance is highly correlated. Such correlation could be explicitly learned by training a convolutional neural network (CNN) to predict \emph{appearance flows} -- 2-D coordinate vectors specifying which pixels in the input view could be used to reconstruct the target view. We show that for both objects and scenes, our approach is able to generate higher-quality synthesized views with crisp texture and boundaries than previous CNN-based techniques.

@incollection{appFlowZhou16,
author = {Tinghui Zhou and 
Shubham Tulsiani and 
Weilun Sun and
Jitendra Malik and
Alexei A. Efros},
title = {View Synthesis by Appearance Flow},
booktitle = {ECCV},
year = {2016}
}
            

The three R's of computer vision: Recognition, Reconstruction and Reorganization
Jitendra Malik, Pablo Arbelaez, João Carreira, Katerina Fragkiadaki, Ross Girshick, Georgia Gkioxari, Saurabh Gupta, Bharath Hariharan, Abhishek Kar, Shubham Tulsiani
Pattern Recognition Letters, 2016
pdf   abstract   bibtex

We argue for the importance of the interaction between recognition, reconstruction and re-organization, and propose that as a unifying framework for computer vision. In this view, recognition of objects is reciprocally linked to re-organization, with bottom-up grouping processes generating candidates, which can be classified using top down knowledge, following which the segmentations can be refined again. Recognition of 3D objects could benefit from a reconstruction of 3D structure, and 3D reconstruction can benefit from object category-specific priors. We also show that reconstruction of 3D structure from video data goes hand in hand with the reorganization of the scene. We demonstrate pipelined versions of two systems, one for RGB-D images, and another for RGB images, which produce rich 3D scene interpretations in this framework.

@article{malik2016three,
title={The three R's of computer vision:
  Recognition, reconstruction and reorganization},
author={Malik, Jitendra and
  Arbel{\'a}ez, Pablo and
  Carreira, Jo{\~a}o and
Fragkiadaki, Katerina and
Girshick, Ross and
Gkioxari, Georgia and
Gupta, Saurabh and
Hariharan, Bharath and
Kar, Abhishek and
Tulsiani, Shubham},
journal={Pattern Recognition Letters},
volume={72},
pages={4--14},
year={2016},
publisher={North-Holland}
}
            

Pose Induction for Novel Object Categories
Shubham Tulsiani, João Carreira, Jitendra Malik
ICCV, 2015
pdf   abstract   bibtex   code

We address the task of predicting pose for objects of unannotated object categories from a small seed set of annotated object classes. We present a generalized classifier that can reliably induce pose given a single instance of a novel category. In case of availability of a large collection of novel instances, our approach then jointly reasons over all instances to improve the initial estimates. We empirically validate the various components of our algorithm and quantitatively show that our method produces reliable pose estimates. We also show qualitative results on a diverse set of classes and further demonstrate the applicability of our system for learning shape models of novel object classes.

@inProceedings{poseInductionTCM15,
  author    = {Shubham Tulsiani and
               Jo{\~{a}}o Carreira and
               Jitendra Malik},
  title     = {Pose Induction for Novel Object Categories},
  year={2015},
  booktitle={International Conference on Computer Vision (ICCV)}
}
            

Amodal Completion and Size Constancy in Natural Scenes
Abhishek Kar, Shubham Tulsiani, João Carreira, Jitendra Malik
ICCV, 2015
pdf   abstract   bibtex

We consider the problem of enriching current object detection systems with veridical object sizes and relative depth estimates from a single image. There are several technical challenges to this, such as occlusions, lack of calibration data and the scale ambiguity between object size and distance. These have not been addressed in full generality in previous work. Here we propose to tackle these issues by building upon advances in object recognition and using recently created large-scale datasets. We first introduce the task of amodal bounding box completion, which aims to infer the the full extent of the object instances in the image. We then propose a probabilistic framework for learning category-specific object size distributions from available annotations and leverage these in conjunction with amodal completions to infer veridical sizes of objects in novel images. Finally, we introduce a focal length prediction approach that exploits scene recognition to overcome inherent scale ambiguities and demonstrate qualitative results on challenging real-world scenes.

 @inProceedings{amodalKTCM15,
  author    = {Abhishek Kar and
               Shubham Tulsiani and
               Jo{\~{a}}o Carreira and
               Jitendra Malik},
  title     = {Amodal Completion and Size Constancy in Natural Scenes},
  year={2015},
  booktitle={International Conference on Computer Vision (ICCV)}
 }
            

Viewpoints and Keypoints
Shubham Tulsiani, Jitendra Malik
CVPR, 2015
pdf   abstract   bibtex   supplemental (40 MB)   code

We characterize the problem of pose estimation for rigid objects in terms of determining viewpoint to explain coarse pose and keypoint prediction to capture the finer details. We address both these tasks in two different settings - the constrained setting with known bounding boxes and the more challenging detection setting where the aim is to simultaneously detect and correctly estimate pose of objects. We present Convolutional Neural Network based architectures for these and demonstrate that leveraging viewpoint estimates can substantially improve local appearance based keypoint predictions. In addition to achieving significant improvements over state-of-the-art in the above tasks, we analyze the error modes and effect of object characteristics on performance to guide future efforts towards this goal.

@inProceedings{vpsKpsTulsianiM15,
author    = {Shubham Tulsiani and Jitendra Malik},
title     = {Viewpoints and Keypoints},
year={2015},
booktitle={Computer Vision and Pattern Regognition (CVPR)}
}

Category-Specific Object Reconstruction from a Single Image
Abhishek Kar*, Shubham Tulsiani*, João Carreira, Jitendra Malik
CVPR, 2015 (Best Student Paper Award)
* equal contribution
pdf   project page   abstract   bibtex   code

Object reconstruction from a single image - in the wild - is a problem where we can make progress and get meaningful results today. This is the main message of this paper, which introduces an automated pipeline with pixels as inputs and 3D surfaces of various rigid categories as outputs in images of realistic scenes. At the core of our approach are deformable 3D models that can be learned from 2D annotations available in existing object detection datasets, that can be driven by noisy automatic object segmentations and which we complement with a bottom-up module for recovering high-frequency shape details. We perform a comprehensive quantitative analysis and ablation study of our approach using the recently introduced PASCAL 3D+ dataset and show very encouraging automatic reconstructions on PASCAL VOC.

@inProceedings{shapesKarTCM15,
  author    = {Abhishek Kar and
               Shubham Tulsiani and
               Jo{\~{a}}o Carreira and
               Jitendra Malik},
  title     = {Category-Specific Object Reconstruction from a Single Image},
  year={2015},
  booktitle={Computer Vision and Pattern Regognition (CVPR)}
}
            

Virtual View Networks for Object Reconstruction
João Carreira, Abhishek Kar, Shubham Tulsiani, Jitendra Malik
CVPR, 2015
pdf   abstract   bibtex   video   code

All that structure from motion algorithms "see" are sets of 2D points. We show that these impoverished views of the world can be faked for the purpose of reconstructing objects in challenging settings, such as from a single image, or from a few ones far apart, by recognizing the object and getting help from a collection of images of other objects from the same class. We synthesize virtual views by computing geodesics on novel networks connecting objects with similar viewpoints, and introduce techniques to increase the specificity and robustness of factorization-based object reconstruction in this setting. We report accurate object shape reconstruction from a single image on challenging PASCAL VOC data, which suggests that the current domain of applications of rigid structure-from-motion techniques may be significantly extended.

@inProceedings{vvnCarreiraKTM15,
  author    = {Jo{\~{a}}o Carreira and
               Abhishek Kar and
               Shubham Tulsiani and
               Jitendra Malik},
  title     = {Virtual View Networks for Object Reconstruction},
  year={2015},
  booktitle={Computer Vision and Pattern Regognition (CVPR)}
}

A colorful approach to text processing by example
Kuat Yessenov, Shubham Tulsiani, Aditya Menon, Robert C Miller, Sumit Gulwani, Butler Lampson, Adam Kalai
UIST, 2013
pdf   abstract   bibtex

Text processing, tedious and error-prone even for programmers, remains one of the most alluring targets of Programming by Example. An examination of real-world text processing tasks found on help forums reveals that many such tasks, beyond simple string manipulation, involve latent hierarchical structures.
We present STEPS, a programming system for processing structured and semi-structured text by example. STEPS users create and manipulate hierarchical structure by example. In a between-subject user study on fourteen computer scientists, STEPS compares favorably to traditional programming.

@inproceedings{yessenov2013colorful,
  title={A colorful approach to text processing by example},
  author={Yessenov, Kuat and 
  Tulsiani, Shubham and 
  Menon, Aditya and 
  Miller, Robert C and 
  Gulwani,Sumit and 
  Lampson, Butler and 
  Kalai, Adam},
  booktitle={UIST},
  pages={495--504},
  year={2013},
  organization={ACM}
}
            

Teaching

Image Manipulation and Computational Photography (CS194-26)
Fall 2014 (GSI)

Introduction to Artificial Intelligence (CS188)
Spring 2015 (GSI)


[Web-cite] : The webpage template is based on Jon and Bharath's websites