I am a graduate student with Prof. Jitendra Malik in the Vision group at University of California, Berkeley. I did my undergraduate at the Indian Institute of Technology, Kanpur under the supervision of Prof. Surender Baswana. I also did an undergraduate internship at Microsoft Research, New England with Dr. Adam Kalai.

email | cv | github | google scholar

shubhtuls AT cs DOT berkeley DOT edu
My picture

Publications

[New] Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency
Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, Jitendra Malik
CVPR, 2017 (oral)
pdf   project page   abstract   bibtex   code   blog post

We study the notion of consistency between a 3D shape and a 2D observation and propose a differentiable formulation which allows computing gradients of the 3D shape given an observation from an arbitrary view. We do so by reformulating view consistency using a differentiable ray consistency (DRC) term. We show that this formulation can be incorporated in a learning framework to leverage different types of multi-view observations e.g. foreground masks, depth, color images, semantics etc. as supervision for learning single-view 3D prediction. We present empirical analysis of our technique in a controlled setting. We also show that this approach allows us to improve over existing techniques for single-view reconstruction of objects from the PASCAL VOC dataset.

@inProceedings{drcTulsiani17,
  title={Multi-view Supervision for Single-view Reconstruction
  via Differentiable Ray Consistency},
  author = {Shubham Tulsiani
  and Tinghui Zhou
  and Alexei A. Efros
  and Jitendra Malik},
  booktitle={Computer Vision and Pattern Regognition (CVPR)},
  year={2017}
}
            

[New] Learning Shape Abstractions by Assembling Volumetric Primitives
Shubham Tulsiani, Hao Su, Leonidas J. Guibas, Alexei A. Efros, Jitendra Malik
CVPR, 2017
pdf   project page   abstract   bibtex   code

We present a learning framework for abstracting complex shapes by learning to assemble objects using 3D volumetric primitives. In addition to generating simple and geometrically interpretable explanations of 3D objects, our framework also allows us to automatically discover and exploit consistent structure in the data. We demonstrate that using our method allows predicting shape representations which can be leveraged for obtaining a consistent parsing across the instances of a shape collection and constructing an interpretable shape similarity measure. We also examine applications for image-based prediction as well as shape manipulation.

@inProceedings{abstractionTulsiani17,
  title={Learning Shape Abstractions by Assembling Volumetric Primitives},
  author = {Shubham Tulsiani
  and Hao Su
  and Leonidas J. Guibas
  and Alexei A. Efros
  and Jitendra Malik},
  booktitle={Computer Vision and Pattern Regognition (CVPR)},
  year={2017}
}
            

[New] Hierarchical Surface Prediction for 3D Object Reconstruction
Christian Häne, Shubham Tulsiani, Jitendra Malik
arXiv, April 2017
preprint   abstract   bibtex

Recently, Convolutional Neural Networks have shown promising results for 3D geometry prediction. They can make predictions from very little input data such as for example a single color image, depth map or a partial 3D volume. A major limitation of such approaches is that they only predict a coarse resolution voxel grid, which does not capture the surface of the objects well. We propose a general framework, called hierarchical surface prediction (HSP), which facilitates prediction of high resolution voxel grids. The main insight is that it is sufficient to predict high resolution voxels around the predicted surfaces. The exterior and interior of the objects can be represented with coarse resolution voxels. This allows us to predict significantly higher resolution voxel grids around the surface, from which triangle meshes can be extracted. Our approach is general and not dependent on a specific input type. In our experiments we show results for geometry prediction from color images, depth images and shape completion from partial voxel grids. Our analysis shows that the network is able to predict the surface more accurately than a low resolution prediction.

@incollection{hspHane17,
author = {Christian H{\"a}ne and
Shubham Tulsiani and 
Jitendra Malik},
title = {Hierarchical Surface Prediction for 3D Object Reconstruction},
booktitle = {arXiv},
year = {2017}
}
            

Learning Category-Specific Deformable 3D Models for Object Reconstruction
Shubham Tulsiani*, Abhishek Kar*, João Carreira, Jitendra Malik
TPAMI, 2016
* equal contribution
pdf   abstract   bibtex   code

We address the problem of fully automatic object localization and reconstruction from a single image. This is both a very challenging and very important problem which has, until recently, received limited attention due to difficulties in segmenting objects and predicting their poses. Here we leverage recent advances in learning convolutional networks for object detection and segmentation and introduce a complementary network for the task of camera viewpoint prediction. These predictors are very powerful, but still not perfect given the stringent requirements of shape reconstruction. Our main contribution is a new class of deformable 3D models that can be robustly fitted to images based on noisy pose and silhouette estimates computed upstream and that can be learned directly from 2D annotations available in object detection datasets. Our models capture top-down information about the main global modes of shape variation within a class providing a ``low-frequency'' shape. In order to capture fine instance-specific shape details, we fuse it with a high-frequency component recovered from shading cues. A comprehensive quantitative analysis and ablation study on the PASCAL 3D+ dataset validates the approach as we show fully automatic reconstructions on PASCAL VOC as well as large improvements on the task of viewpoint prediction.

@article{pamishapeTulsianiKCM15,
author = {Shubham Tulsiani and
Abhishek Kar and
Jo{\~{a}}o Carreira and
Jitendra Malik},
title = {Learning Category-Specific Deformable 3D
Models for Object Reconstruction},
journal = {TPAMI},
year = {2016},
}
            

View Synthesis by Appearance Flow
Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, Alexei A. Efros
ECCV, 2016
pdf   abstract   bibtex   code

Given one or more images of an object (or a scene), is it possible to synthesize a new image of the same instance observed from an arbitrary viewpoint? In this paper, we attempt to tackle this problem, known as novel view synthesis, by re-formulating it as a pixel copying task that avoids the notorious difficulties of generating pixels from scratch. Our approach is built on the observation that the visual appearance of different views of the same instance is highly correlated. Such correlation could be explicitly learned by training a convolutional neural network (CNN) to predict \emph{appearance flows} -- 2-D coordinate vectors specifying which pixels in the input view could be used to reconstruct the target view. We show that for both objects and scenes, our approach is able to generate higher-quality synthesized views with crisp texture and boundaries than previous CNN-based techniques.

@incollection{appFlowZhou16,
author = {Tinghui Zhou and 
Shubham Tulsiani and 
Weilun Sun and
Jitendra Malik and
Alexei A. Efros},
title = {View Synthesis by Appearance Flow},
booktitle = {ECCV},
year = {2016}
}
            

The three R's of computer vision: Recognition, Reconstruction and Reorganization
Jitendra Malik, Pablo Arbelaez, João Carreira, Katerina Fragkiadaki, Ross Girshick, Georgia Gkioxari, Saurabh Gupta, Bharath Hariharan, Abhishek Kar, Shubham Tulsiani
Pattern Recognition Letters, 2016
pdf   abstract   bibtex

We argue for the importance of the interaction between recognition, reconstruction and re-organization, and propose that as a unifying framework for computer vision. In this view, recognition of objects is reciprocally linked to re-organization, with bottom-up grouping processes generating candidates, which can be classified using top down knowledge, following which the segmentations can be refined again. Recognition of 3D objects could benefit from a reconstruction of 3D structure, and 3D reconstruction can benefit from object category-specific priors. We also show that reconstruction of 3D structure from video data goes hand in hand with the reorganization of the scene. We demonstrate pipelined versions of two systems, one for RGB-D images, and another for RGB images, which produce rich 3D scene interpretations in this framework.

@article{malik2016three,
title={The three R's of computer vision:
  Recognition, reconstruction and reorganization},
author={Malik, Jitendra and
  Arbel{\'a}ez, Pablo and
  Carreira, Jo{\~a}o and
Fragkiadaki, Katerina and
Girshick, Ross and
Gkioxari, Georgia and
Gupta, Saurabh and
Hariharan, Bharath and
Kar, Abhishek and
Tulsiani, Shubham},
journal={Pattern Recognition Letters},
volume={72},
pages={4--14},
year={2016},
publisher={North-Holland}
}
            

Pose Induction for Novel Object Categories
Shubham Tulsiani, João Carreira, Jitendra Malik
ICCV, 2015
pdf   abstract   bibtex   code

We address the task of predicting pose for objects of unannotated object categories from a small seed set of annotated object classes. We present a generalized classifier that can reliably induce pose given a single instance of a novel category. In case of availability of a large collection of novel instances, our approach then jointly reasons over all instances to improve the initial estimates. We empirically validate the various components of our algorithm and quantitatively show that our method produces reliable pose estimates. We also show qualitative results on a diverse set of classes and further demonstrate the applicability of our system for learning shape models of novel object classes.

@inProceedings{poseInductionTCM15,
  author    = {Shubham Tulsiani and
               Jo{\~{a}}o Carreira and
               Jitendra Malik},
  title     = {Pose Induction for Novel Object Categories},
  year={2015},
  booktitle={International Conference on Computer Vision (ICCV)}
}
            

Amodal Completion and Size Constancy in Natural Scenes
Abhishek Kar, Shubham Tulsiani, João Carreira, Jitendra Malik
ICCV, 2015
pdf   abstract   bibtex

We consider the problem of enriching current object detection systems with veridical object sizes and relative depth estimates from a single image. There are several technical challenges to this, such as occlusions, lack of calibration data and the scale ambiguity between object size and distance. These have not been addressed in full generality in previous work. Here we propose to tackle these issues by building upon advances in object recognition and using recently created large-scale datasets. We first introduce the task of amodal bounding box completion, which aims to infer the the full extent of the object instances in the image. We then propose a probabilistic framework for learning category-specific object size distributions from available annotations and leverage these in conjunction with amodal completions to infer veridical sizes of objects in novel images. Finally, we introduce a focal length prediction approach that exploits scene recognition to overcome inherent scale ambiguities and demonstrate qualitative results on challenging real-world scenes.

 @inProceedings{amodalKTCM15,
  author    = {Abhishek Kar and
               Shubham Tulsiani and
               Jo{\~{a}}o Carreira and
               Jitendra Malik},
  title     = {Amodal Completion and Size Constancy in Natural Scenes},
  year={2015},
  booktitle={International Conference on Computer Vision (ICCV)}
 }
            

Viewpoints and Keypoints
Shubham Tulsiani, Jitendra Malik
CVPR, 2015
pdf   abstract   bibtex   supplemental (40 MB)   code

We characterize the problem of pose estimation for rigid objects in terms of determining viewpoint to explain coarse pose and keypoint prediction to capture the finer details. We address both these tasks in two different settings - the constrained setting with known bounding boxes and the more challenging detection setting where the aim is to simultaneously detect and correctly estimate pose of objects. We present Convolutional Neural Network based architectures for these and demonstrate that leveraging viewpoint estimates can substantially improve local appearance based keypoint predictions. In addition to achieving significant improvements over state-of-the-art in the above tasks, we analyze the error modes and effect of object characteristics on performance to guide future efforts towards this goal.

@inProceedings{vpsKpsTulsianiM15,
author    = {Shubham Tulsiani and Jitendra Malik},
title     = {Viewpoints and Keypoints},
year={2015},
booktitle={Computer Vision and Pattern Regognition (CVPR)}
}

Category-Specific Object Reconstruction from a Single Image
Abhishek Kar*, Shubham Tulsiani*, João Carreira, Jitendra Malik
CVPR, 2015 (Best Student Paper Award)
* equal contribution
pdf   project page   abstract   bibtex   code

Object reconstruction from a single image - in the wild - is a problem where we can make progress and get meaningful results today. This is the main message of this paper, which introduces an automated pipeline with pixels as inputs and 3D surfaces of various rigid categories as outputs in images of realistic scenes. At the core of our approach are deformable 3D models that can be learned from 2D annotations available in existing object detection datasets, that can be driven by noisy automatic object segmentations and which we complement with a bottom-up module for recovering high-frequency shape details. We perform a comprehensive quantitative analysis and ablation study of our approach using the recently introduced PASCAL 3D+ dataset and show very encouraging automatic reconstructions on PASCAL VOC.

@inProceedings{shapesKarTCM15,
  author    = {Abhishek Kar and
               Shubham Tulsiani and
               Jo{\~{a}}o Carreira and
               Jitendra Malik},
  title     = {Category-Specific Object Reconstruction from a Single Image},
  year={2015},
  booktitle={Computer Vision and Pattern Regognition (CVPR)}
}
            

Virtual View Networks for Object Reconstruction
João Carreira, Abhishek Kar, Shubham Tulsiani, Jitendra Malik
CVPR, 2015
pdf   abstract   bibtex   video   code

All that structure from motion algorithms "see" are sets of 2D points. We show that these impoverished views of the world can be faked for the purpose of reconstructing objects in challenging settings, such as from a single image, or from a few ones far apart, by recognizing the object and getting help from a collection of images of other objects from the same class. We synthesize virtual views by computing geodesics on novel networks connecting objects with similar viewpoints, and introduce techniques to increase the specificity and robustness of factorization-based object reconstruction in this setting. We report accurate object shape reconstruction from a single image on challenging PASCAL VOC data, which suggests that the current domain of applications of rigid structure-from-motion techniques may be significantly extended.

@inProceedings{vvnCarreiraKTM15,
  author    = {Jo{\~{a}}o Carreira and
               Abhishek Kar and
               Shubham Tulsiani and
               Jitendra Malik},
  title     = {Virtual View Networks for Object Reconstruction},
  year={2015},
  booktitle={Computer Vision and Pattern Regognition (CVPR)}
}

A colorful approach to text processing by example
Kuat Yessenov, Shubham Tulsiani, Aditya Menon, Robert C Miller, Sumit Gulwani, Butler Lampson, Adam Kalai
UIST, 2013
pdf   abstract   bibtex

Text processing, tedious and error-prone even for programmers, remains one of the most alluring targets of Programming by Example. An examination of real-world text processing tasks found on help forums reveals that many such tasks, beyond simple string manipulation, involve latent hierarchical structures.
We present STEPS, a programming system for processing structured and semi-structured text by example. STEPS users create and manipulate hierarchical structure by example. In a between-subject user study on fourteen computer scientists, STEPS compares favorably to traditional programming.

@inproceedings{yessenov2013colorful,
  title={A colorful approach to text processing by example},
  author={Yessenov, Kuat and 
  Tulsiani, Shubham and 
  Menon, Aditya and 
  Miller, Robert C and 
  Gulwani,Sumit and 
  Lampson, Butler and 
  Kalai, Adam},
  booktitle={UIST},
  pages={495--504},
  year={2013},
  organization={ACM}
}
            

Teaching

Image Manipulation and Computational Photography (CS194-26)
Fall 2014 (GSI)

Introduction to Artificial Intelligence (CS188)
Spring 2015 (GSI)


[Web-cite] : The webpage template is based on Jon and Bharath's websites