Deepak Pathak
Email:

I am a third year graduate student in Computer Science at UC Berkeley advised by Prof. Trevor Darrell, researching computer vision and machine learning.

Earlier, I graduated from IIT Kanpur with Bachelors in Computer Science and Engineering in 2014. I did my undergraduate thesis with Prof. Amitabha Mukerjee on unsupervised anomaly detection in surveillance videos. I have also spent time at Microsoft Research in New York City lab working on forecasting and prediction markets.

I spent Summer 2016 at Facebook AI Research (FAIR, Seattle) working with Ross Girshick, Bharath Hariharan and Piotr Dollár.

CV | Google Scholar | Github | LinkedIn

News
  • New paper on arXiv about unsupervised feature learning using unlabeled videos.
  • Paper accepted at CVPR 2016 on unsupervised learning and inpainting. Check out !
  • Paper accepted at JMLR 2016; extension of CVPR'15 paper.
  • Paper about constrained structured regression (applied to intrinsics) on arXiv.
  • Paper accepted at ICCV 2015 on Constrained CNN for segmentation. Code released on github !
  • Undergrad paper related to predicting Oscars published at JPM. See live predictions.
Publications
sym

[NEW] Learning Features by Watching Objects Move
Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell and Bharath Hariharan
arXiv:1612.06370, 2016

pdf | abstract | bibtex | arXiv

This paper presents a novel yet intuitive approach to unsupervised feature learning. Inspired by the human visual system, we explore whether low-level motion-based grouping cues can be used to learn an effective visual representation. Specifically, we use unsupervised motion-based segmentation on videos to obtain segments, which we use as `pseudo ground truth' to train a convolutional network to segment objects from a single frame. Given the extensive evidence that motion plays a key role in the development of the human visual system, we hope that this straightforward approach to unsupervised learning will be more effective than cleverly designed `pretext' tasks studied in the literature. Indeed, our extensive experiments show that this is the case. When used for transfer learning on object detection, our representation significantly outperforms previous unsupervised approaches across multiple settings, especially when training data for the target task is scarce.

@inproceedings{pathakARXIV16learning,
    Author = {Pathak, Deepak and
    Girshick, Ross and
    Doll{\'a}r, Piotr and
    Darrell, Trevor and
    Hariharan, Bharath},
    Title = {Learning Features
    by Watching Objects Move},
    Booktitle = {arXiv:1612.06370},
    Year = {2016}
}
sym

[NEW] Context Encoders: Feature Learning by Inpainting
Deepak Pathak, Philipp Krähenbühl, Jeff Donahue, Trevor Darrell and Alexei Efros
Computer Vision and Pattern Recognition (CVPR), 2016

project webpage | pdf w/ supp | abstract | bibtex | arXiv | code

We present an unsupervised visual feature learning algorithm driven by context-based pixel prediction. By analogy with auto-encoders, we propose Context Encoders -- a convolutional neural network trained to generate the contents of an arbitrary image region conditioned on its surroundings. In order to succeed at this task, context encoders need to both understand the content of the entire image, as well as produce a plausible hypothesis for the missing part(s). When training context encoders, we have experimented with both a standard pixel-wise reconstruction loss, as well as a reconstruction plus an adversarial loss. The latter produces much sharper results because it can better handle multiple modes in the output. We found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures. We quantitatively demonstrate the effectiveness of our learned features for CNN pre-training on classification, detection, and segmentation tasks. Furthermore, context encoders can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods.

@inproceedings{pathakCVPR16context,
    Author = {Pathak, Deepak and
    Kr\"ahenb\"uhl, Philipp and
    Donahue, Jeff and
    Darrell, Trevor and
    Efros, Alexei},
    Title = {Context Encoders:
    Feature Learning by Inpainting},
    Booktitle = {CVPR},
    Year = {2016}
}
sym

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation
Deepak Pathak, Philipp Krähenbühl and Trevor Darrell
International Conference on Computer Vision (ICCV), 2015

pdf | supp | abstract | bibtex | arXiv | code

We present an approach to learn a dense pixel-wise labeling from image-level tags. Each image-level tag imposes constraints on the output labeling of a Convolutional Neural Network (CNN) classifier. We propose Constrained CNN (CCNN), a method which uses a novel loss function to optimize for any set of linear constraints on the output space (i.e. predicted label distribution) of a CNN. Our loss formulation is easy to optimize and can be incorporated directly into standard stochastic gradient descent optimization. The key idea is to phrase the training objective as a biconvex optimization for linear models, which we then relax to nonlinear deep networks. Extensive experiments demonstrate the generality of our new learning framework. The constrained loss yields state-of-the-art results on weakly supervised semantic image segmentation. We further demonstrate that adding slightly more supervision can greatly improve the performance of the learning algorithm.

@inproceedings{pathakICCV15ccnn,
    Author = {Pathak, Deepak and
    Kr\"ahenb\"uhl, Philipp and
    Darrell, Trevor},
    Title = {Constrained Convolutional
    Neural Networks for Weakly
    Supervised Segmentation},
    Booktitle = {ICCV},
    Year = {2015}
}
sym

Detector Discovery in the Wild: Joint Multiple Instance and Representation Learning
Judy Hoffman, Deepak Pathak, Trevor Darrell and Kate Saenko
Computer Vision and Pattern Recognition (CVPR), 2015

pdf | abstract | bibtex | arXiv

Journal Version:
[NEW] Large Scale Visual Recognition through Adaptation using Joint Representation and Multiple Instance Learning
Judy Hoffman, Deepak Pathak, Eric Tzeng, Jonathan Long, Sergio Guadarrama, Trevor Darrell and Kate Saenko
Journal of Machine Learning Research (JMLR), 2016

We develop methods for detector learning which exploit joint training over both weak (image-level) and strong (bounding box) labels and which transfer learned perceptual representations from strongly-labeled auxiliary tasks. Previous methods for weak-label learning often learn detector models independently using latent variable optimization, but fail to share deep representation knowledge across classes and usually require strong initialization. Other previous methods transfer deep representations from domains with strong labels to those with only weak labels, but do not optimize over individual latent boxes, and thus may miss specific salient structures for a particular category. We propose a model that subsumes these previous approaches, and simultaneously trains a representation and detectors for categories with either weak or strong labels present. We provide a novel formulation of a joint multiple instance learning method that includes examples from classification-style data when available, and also performs domain transfer learning to improve the underlying detector representation. Our model outperforms known methods on ImageNet-200 detection with weak labels.

@inproceedings{pathakCVPR15,
    Author = {Hoffman, Judy and
    Pathak, Deepak and
    Darrell, Trevor and
    Saenko, Kate},
    Title = {Detector Discovery
    in the Wild: Joint Multiple
    Instance and Representation
    Learning},
    Booktitle = {CVPR},
    Year = {2015}
}
sym

Fully Convolutional Multi-Class Multiple Instance Learning
Deepak Pathak, Evan Shelhamer, Jonathan Long and Trevor Darrell
International Conference on Learning Representations (ICLR), Workshop 2015

pdf | abstract | bibtex | arXiv

Multiple instance learning (MIL) can reduce the need for costly annotation in tasks such as semantic segmentation by weakening the required degree of supervision. We propose a novel MIL formulation of multi-class semantic segmentation learning by a fully convolutional network. In this setting, we seek to learn a semantic segmentation model from just weak image-level labels. The model is trained end-to-end to jointly optimize the representation while disambiguating the pixel-image label assignment. Fully convolutional training accepts inputs of any size, does not need object proposal pre-processing, and offers a pixelwise loss map for selecting latent instances. Our multi-class MIL loss exploits the further supervision given by images with multiple labels. We evaluate this approach through preliminary experiments on the PASCAL VOC segmentation challenge.

@inproceedings{pathakICLR15,
    Author = {Pathak, Deepak and
    Shelhamer, Evan and
    Long, Jonathan and
    Darrell, Trevor},
    Title = {Fully Convolutional
    Multi-Class Multiple Instance
    Learning},
    Booktitle = {ICLR Workshop},
    Year = {2015}
}
sym

Anomaly Localization in Topic-based Analysis of Surveillance Videos
Deepak Pathak, Abhijit Sharang and Amitabha Mukerjee
IEEE Winter Conference on Applications of Computer Vision (WACV), 2015

pdf | abstract | bibtex

Topic-models for video analysis have been used for unsupervised identification of normal activity in videos, thereby enabling the detection of anomalous actions. However, while intervals containing anomalies are detected, it has not been possible to localize the anomalous activities in such models. This is a challenging problem as the abnormal content is usually a small fraction of the entire video data and hence distinctions in terms of likelihood are unlikely. Here we propose a methodology to extend the topic based analysis with rich local descriptors incorporating quantized spatio-temporal gradient descriptors with image location and size information. The visual clips over this vocabulary are then represented in latent topic space using models like pLSA. Further, we introduce an algorithm to quantify the anomalous content in a video clip by projecting the learned topic space information. Using the algorithm, we detect whether the video clip is abnormal and if positive, localize the anomaly in spatio-temporal domain. We also contribute one real world surveillance video dataset for comprehensive evaluation of the proposed algorithm. Experiments are presented on the proposed and two other standard surveillance datasets.

@inproceedings{pathakWACV15,
    Author = {Pathak, Deepak and
    Sharang, Abhijit and
    Mukerjee, Amitabha},
    Title = {Anomaly Localization
    in Topic-based Analysis of
    Surveillance Videos},
    Booktitle = {WACV},
    Year = {2015}
}
sym

Where is my Friend? - Person identification in Social Networks
Deepak Pathak, Sai Nitish Satyavolu and Vinay P. Namboodiri
IEEE Conference on Automatic Face and Gesture Recognition (FG), 2015

pdf | abstract | bibtex

One of the interesting applications of computer vision is to be able to identify or detect persons in real world. This problem has been posed in the context of identifying people in television series or in multi-camera networks. However, a common scenario for this problem is to be able to identify people among images prevalent on social networks. In this paper we present a method that aims to solve this problem in real world conditions where the person can be in any pose, profile and orientation and the face itself is not always clearly visible. Moreover, we show that the problem can be solved with as weak supervision only a label whether the person is present or not, which is usually the case as people are tagged in social networks. This is challenging as there can be ambiguity in association of the right person. The problem is solved in this setting using a latent max-margin formulation where the identity of the person is the latent parameter that is classified. This framework builds on other off the shelf computer vision techniques for person detection and face detection and is able to also account for inaccuracies of these components. The idea is to model the complete person in addition to face, that too with weak supervision. We also contribute three real-world datasets that we have created for extensive evaluation of the solution. We show using these datasets that the problem can be effectively solved using the proposed method.

@inproceedings{pathakFG15,
    Author = {Pathak, Deepak and
    Satyavolu, Sai Nitish and
    Namboodiri, Vinay P.},
    Title = {Where is my Friend? -
    Person identification in Social
    Networks},
    Booktitle = {Automatic Face and
    Gesture Recognition (FG)},
    Year = {2015}
}
sym

A Comparison Of Forecasting Methods: Fundamentals, Polling, Prediction Markets, and Experts
Deepak Pathak, David Rothschild and Miro Dudík
Journal of Prediction Markets (JPM), 2015

pdf | abstract | bibtex | predictions2014 | predictions2016

We compare Oscar forecasts derived from four data types (fundamentals, polling, prediction markets, and domain experts) across three attributes (accuracy, timeliness and cost effectiveness). Fundamentals-based forecasts are relatively expensive to construct, an attribute the academic literature frequently ignores, and update slowly over time, constraining their accuracy. However, fundamentals provide valuable insights into the relationship between key indicators for nominated movies and their chances of victory. For instance, we find that the performance in other awards shows is highly predictive of the Oscar victory whereas box office results are not. Polling- based forecasts have the potential to be both accurate and timely. Timeliness requires incentives for frequent responses by high-information users. Accuracy is achieved by a proper transformation of raw polls. Prediction market prices are accurate forecasts, but can be further improved by simple transformations of raw prices, yielding the most accurate forecasts in our study. Expert forecasts exhibit some characteristics of fundamental models, but are generally not comparatively accurate or timely. This study is unique in both comparing and aggregating four traditional data sources, and considering critical attributes beyond accuracy. We believe that the results of this study generalize to many other domains.

@inproceedings{pathakJPM15,
    Author = {Pathak, Deepak and
    Rothschild, David and
    Dudik, Miro},
    Title = {A Comparison Of Forecasting
    Methods: Fundamentals, Polling,
    Prediction Markets, and Experts},
    Booktitle = {Journal of Prediction Markets (JPM)},
    Year = {2015}
}
Technical Reports
sym

[NEW] Constrained Structured Regression with Convolutional Neural Networks
Deepak Pathak, Philipp Krähenbühl, Stella X. Yu and Trevor Darrell
arXiv:1511.07497, 2015

pdf | abstract | bibtex | arXiv

Convolutional Neural Networks (CNNs) have recently emerged as the dominant model in computer vision. If provided with enough training data, they predict almost any visual quantity. In a discrete setting, such as classification, CNNs are not only able to predict a label but often predict a confidence in the form of a probability distribution over the output space. In continuous regression tasks, such a probability estimate is often lacking. We present a regression framework which models the output distribution of neural networks. This output distribution allows us to infer the most likely labeling following a set of physical or modeling constraints. These constraints capture the intricate interplay between different input and output variables, and complement the output of a CNN. However, they may not hold everywhere. Our setup further allows to learn a confidence with which a constraint holds, in the form of a distribution of the constrain satisfaction. We evaluate our approach on the problem of intrinsic image decomposition, and show that constrained structured regression significantly increases the state-of-the-art.

@inproceedings{pathakArxiv15,
    Author = {Pathak, Deepak and
    Kr\"ahenb\"uhl, Philipp and
    Yu, Stella X. and
    Darrell, Trevor},
    Title = {Constrained Structured
    Regression with Convolutional
    Neural Networks},
    Booktitle = {arXiv:1511.07497},
    Year = {2015}
}
Teaching
pacman

CS189/289: Introduction to Machine Learning - Fall '15 (GSI)
Instructor: Prof. Alexei Efros and Dr. Isabelle Guyon

CS280: Computer Vision - Spring '16 (GSI)
Instructor: Prof. Trevor Darrell and Prof. Alexei Efros


Template: this, this, this and this