Saurabh Gupta

I am a Computer Science graduate student at UC Berkeley. I am interested in computer vision and robotics, and am advised by Jitendra Malik. Earlier, I graduated from IIT Delhi with a Bachelors in Computer Science and Engineering, and was awarded with the President of India Gold Medal.

I will be starting at UIUC as an Assistant Professor from Fall 2019.

E Mail / CV / Scholar / Github / Research Statement
Publications

Visual Navigation
Unifying Map and Landmark Based Representations for Visual Navigation
Saurabh Gupta, David Fouhey, Sergey Levine, Jitendra Malik
arXiv, 2017
abstract / bibtex / webpage / arXiv link

This works presents a formulation for visual navigation that unifies map based spatial reasoning and path planning, with landmark based robust plan execution in noisy environments. Our proposed formulation is learned from data and is thus able to leverage statistical regularities of the world. This allows it to efficiently navigate in novel environments given only a sparse set of registered images as input for building representations for space. Our formula- tion is based on three key ideas: a learned path planner that outputs path plans to reach the goal, a feature synthesis engine that predicts features for locations along the planned path, and a learned goal-driven closed loop controller that can follow plans given these synthesized features. We test our approach for goal-driven navigation in simulated real world environments and report performance gains over competitive baseline approaches.

@article{gupta2017unifying,
author = "Gupta, Saurabh and Fouhey, David and Levine, Sergey and Malik, Jitendra",
title = "Unifying Map and Landmark based Representations for Visual Navigation",
journal = "arXiv preprint arXiv:1712.08125",
year = "2017"
}
Cognitive Mapping and Planning for Visual Navigation
Saurabh Gupta, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik
Computer Vision and Pattern Recognition (CVPR), 2017
abstract / bibtex / website / arXiv link / code+simulation environment

We introduce a neural architecture for navigation in novel environments. Our proposed architecture learns to map from first-person views and plans a sequence of actions towards goals in the environment. The Cognitive Mapper and Planner (CMP) is based on two key ideas: a) a unified joint architecture for mapping and planning, such that the mapping is driven by the needs of the planner, and b) a spatial memory with the ability to plan given an incomplete set of observations about the world. CMP constructs a topdown belief map of the world and applies a differentiable neural net planner to produce the next action at each time step. The accumulated belief of the world enables the agent to track visited regions of the environment. Our experiments demonstrate that CMP outperforms both reactive strategies and standard memory-based architectures and performs well in novel environments. Furthermore, we show that CMP can also achieve semantically specified goals, such as "go to a chair".

@inproceedings{gupta2017cognitive,
author = "Gupta, Saurabh and Davidson, James and Levine, Sergey and Sukthankar, Rahul and Malik, Jitendra",
title = "Cognitive mapping and planning for visual navigation",
booktitle = "Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition",
year = "2017"
}

3D Scene Understanding
Factoring Shape, Pose, and Layout From the 2D Image of a 3D Scene
Shubham Tulsiani, Saurabh Gupta, David Fouhey, Alexei Efros, Jitendra Malik
arXiv, 2017
abstract / bibtex / webpage / arXiv link / code

The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects represented in terms of shape and pose. We propose a convolutional neural network-based approach to predict this representation and benchmark it on a large dataset of indoor scenes. Our experiments evaluate a number of practical design questions, demonstrate that we can infer this representation, and quantitatively and qualitatively demonstrate its merits compared to alternate representations.

@article{tulsiani2017factoring,
author = "Tulsiani, Shubham and Gupta, Saurabh and Fouhey, David and Efros, Alexei A and Malik, Jitendra",
title = "Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene",
journal = "arXiv preprint arXiv:1712.01812",
year = "2017"
}
Aligning 3D Models to RGB-D Images of Cluttered Scenes
Saurabh Gupta, Pablo Arbelaez, Ross Girshick, Jitendra Malik
Computer Vision and Pattern Recognition (CVPR), 2015
abstract / bibtex / arXiv link / poster

The goal of this work is to represent objects in an RGB-D scene with corresponding 3D models from a library. We approach this problem by first detecting and segmenting object instances in the scene and then using a convolutional neural network (CNN) to predict the pose of the object. This CNN is trained using pixel surface normals in images containing renderings of synthetic objects. When tested on real data, our method outperforms alternative algorithms trained on real data. We then use this coarse pose estimate along with the inferred pixel support to align a small number of prototypical models to the data, and place into the scene the model that fits best. We observe a 48% relative improvement in performance at the task of 3D detection over the current state-of-the-art, while being an order of magnitude faster.

@inproceedings{gupta2015aligning,
author = "Gupta, Saurabh and Arbel{\'a}ez, Pablo and Girshick, Ross and Malik, Jitendra",
title = "Aligning 3D models to RGB-D images of cluttered scenes",
booktitle = "Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition",
pages = "4731--4740",
year = "2015"
}
Learning Rich Features From RGB-D Images for Object Detection and Segmentation
Saurabh Gupta, Ross Girshick, Pablo Arbelaez, Jitendra Malik
European Conference on Computer Vision (ECCV), 2014
abstract / bibtex / code / supplementary material / poster / slides / pretrained SUN RGB-D models / pretrained NYUD2 models

In this paper we study the problem of object detection for RGB-D images using semantically rich image and depth features. We propose a new geocentric embedding for depth images that encodes height above ground and angle with gravity for each pixel in addition to the horizontal disparity. We demonstrate that this geocentric embedding works better than using raw depth images for learning feature representations with convolutional neural networks. Our final object detection system achieves an average precision of 37.3%, which is a 56% relative improvement over existing methods. We then focus on the task of instance segmentation where we label pixels belonging to object instances found by our detector. For this task, we propose a decision forest approach that classifies pixels in the detection window as foreground or background using a family of unary and binary tests that query shape and geocentric pose features. Finally, we use the output from our object detectors in an existing superpixel classification framework for semantic scene segmentation and achieve a 24% relative improvement over current state-of-the-art for the object categories that we study. We believe advances such as those represented in this paper will facilitate the use of perception in fields like robotics.

@inproceedings{gupta2014learning,
author = "Gupta, Saurabh and Girshick, Ross and Arbel{\'a}ez, Pablo and Malik, Jitendra",
title = "Learning rich features from RGB-D images for object detection and segmentation",
booktitle = "European Conference on Computer Vision",
pages = "345--360",
year = "2014",
organization = "Springer, Cham"
}
Indoor Scene Understanding With RGB-D Images: Bottom-Up Segmentation, Object Detection and Semantic Segmentation
Saurabh Gupta, Pablo Arbelaez, Ross Girshick, Jitendra Malik
International Journal of Computer Vision (IJCV), 2015
abstract / bibtex / code / dev code

In this paper, we address the problems of contour detection, bottom-up grouping, object detection and semantic segmentation on RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset. We propose algorithms for object boundary detection and hierarchical segmentation that generalize the gPb-ucm approach by making effective use of depth information. We show that our system can label each contour with its type (depth, normal or albedo). We also propose a generic method for long-range amodal completion of surfaces and show its effectiveness in grouping. We train RGB-D object detectors by analyzing and computing Histogram of Oriented Gradients (HOG) on the depth image and using them with deformable part models (DPM). We observe that this simple strategy for training object detectors significantly outperforms more complicated models in the literature. We then turn to the problem of semantic segmentation for which we propose an approach that classifies superpixels into the dominant object categories in the NYUD2 dataset. We design generic and class-specific features to encode the appearance and geometry of objects. We also show that additional features computed from RGB-D object detectors and scene classifiers further improves semantic segmentation accuracy. In all of these tasks, we report significant improvements over the state-of-the-art.

@article{gupta2015indoor,
author = "Gupta, Saurabh and Arbel{\'a}ez, Pablo and Girshick, Ross and Malik, Jitendra",
title = "Indoor scene understanding with RGB-D images: Bottom-up segmentation, object detection and semantic segmentation",
journal = "International Journal of Computer Vision",
volume = "112",
number = "2",
pages = "133--149",
year = "2015",
publisher = "Springer US"
}
Perceptual Organization and Recognition of Indoor Scenes From RGB-D Images
Saurabh Gupta, Pablo Arbelaez, Jitendra Malik
Computer Vision and Pattern Recognition (CVPR), 2013
abstract / bibtex / code / dev code / supp / poster / slides / data

We address the problems of contour detection, bottomup grouping and semantic segmentation using RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset. We propose algorithms for object boundary detection and hierarchical segmentation that generalize the gPb-ucm approach by making effective use of depth information. We show that our system can label each contour with its type (depth, normal or albedo). We also propose a generic method for long-range amodal completion of surfaces and show its effectiveness in grouping. We then turn to the problem of semantic segmentation and propose a simple approach that classifies superpixels into the 40 dominant object categories in NYUD2. We use both generic and class-specific features to encode the appearance and geometry of objects. We also show how our approach can be used for scene classification, and how this contextual information in turn improves object recognition. In all of these tasks, we report signifi- cant improvements over the state-of-the-art.

@inproceedings{gupta2013perceptual,
author = "Gupta, Saurabh and Arbelaez, Pablo and Malik, Jitendra",
title = "Perceptual organization and recognition of indoor scenes from RGB-D images",
booktitle = "Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on",
pages = "564--571",
year = "2013",
organization = "IEEE"
}
The Three R's of Computer Vision: Recognition, Reconstruction and Reorganization
Jitendra Malik, Pablo Arbelaez, Joao Carreira, Katerina Fragkiadaki, Ross Girshick, Georgia Gkioxari, Saurabh Gupta, Bharath Hariharan, Abhishek Kar, Shubham Tulsiani
Pattern Recognition Letters, 2016
abstract / bibtex

We argue for the importance of the interaction between recognition, reconstruction and re-organization, and propose that as a unifying framework for computer vision. In this view, recognition of objects is reciprocally linked to re-organization, with bottom-up grouping processes generating candidates, which can be classified using top down knowledge, following which the segmentations can be refined again. Recognition of 3D objects could benefit from a reconstruction of 3D structure, and 3D reconstruction can benefit from object category-specific priors. We also show that reconstruction of 3D structure from video data goes hand in hand with the reorganization of the scene. We demonstrate pipelined versions of two systems, one for RGB-D images, and another for RGB images, which produce rich 3D scene interpretations in this framework.

@article{malik2016three,
author = "Malik, Jitendra and Arbel{\'a}ez, Pablo and Carreira, Joao and Fragkiadaki, Katerina and Girshick, Ross and Gkioxari, Georgia and Gupta, Saurabh and Hariharan, Bharath and Kar, Abhishek and Tulsiani, Shubham",
title = "The three R's of computer vision: Recognition, reconstruction and reorganization",
journal = "Pattern Recognition Letters",
volume = "72",
pages = "4--14",
year = "2016",
publisher = "North-Holland"
}

Cross Modal Learning
Cross Modal Distillation for Supervision Transfer
Saurabh Gupta, Judy Hoffman, Jitendra Malik
Computer Vision and Pattern Recognition (CVPR), 2016
abstract / bibtex / arXiv link / data / NYUD2 Detectors + Supervision Transfer Models

In this work we propose a technique that transfers supervision between images from different modalities. We use learned representations from a large labeled modality as supervisory signal for training representations for a new unlabeled paired modality. Our method enables learning of rich representations for unlabeled modalities and can be used as a pre-training procedure for new modalities with limited labeled data. We transfer supervision from labeled RGB images to unlabeled depth and optical flow images and demonstrate large improvements for both these cross modal supervision transfers.

@inproceedings{gupta2016cross,
author = "Gupta, Saurabh and Hoffman, Judy and Malik, Jitendra",
title = "Cross modal distillation for supervision transfer",
booktitle = "Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition",
pages = "2827--2836",
year = "2016"
}
Learning With Side Information Through Modality Hallucination
Judy Hoffman, Saurabh Gupta, Trevor Darrell
Computer Vision and Pattern Recognition (CVPR), 2016
abstract / bibtex

We present a modality hallucination architecture for training an RGB object detection model which incorporates depth side information at training time. Our convolutional hallucination network learns a new and complementary RGB image representation which is taught to mimic convolutional mid-level features from a depth network. At test time images are processed jointly through the RGB and hallucination networks to produce improved detection performance. Thus, our method transfers information commonly extracted from depth training data to a network which can extract that information from the RGB counterpart. We present results on the standard NYUDv2 dataset and report improvement on the RGB detection task.

@inproceedings{hoffman2016learning,
author = "Hoffman, Judy and Gupta, Saurabh and Darrell, Trevor",
title = "Learning with side information through modality hallucination",
booktitle = "Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition",
pages = "826--834",
year = "2016"
}
Cross-Modal Adaptation for RGB-D Detection
Judy Hoffman, Saurabh Gupta, Jian Leong, Sergio Guadarrama, Trevor Darrell
International Conference on Robotics and Automation (ICRA), 2016
abstract / bibtex

In this paper we propose a technique to adapt convolutional neural network (CNN) based object detectors trained on RGB images to effectively leverage depth images at test time to boost detection performance. Given labeled depth images for a handful of categories we adapt an RGB object detector for a new category such that it can now use depth images in addition to RGB images at test time to produce more accurate detections. Our approach is built upon the observation that lower layers of a CNN are largely task and category agnostic and domain specific while higher layers are largely task and category specific while being domain agnostic. We operationalize this observation by proposing a mid-level fusion of RGB and depth CNNs. Experimental evaluation on the challenging NYUD2 dataset shows that our proposed adaptation technique results in an average 21% relative improvement in detection performance over an RGB-only baseline even when no depth training data is available for the particular category evaluated. We believe our proposed technique will extend advances made in computer vision to RGB-D data leading to improvements in performance at little additional annotation effort.

@inproceedings{hoffman2016cross,
author = "Hoffman, Judy and Gupta, Saurabh and Leong, Jian and Guadarrama, Sergio and Darrell, Trevor",
title = "Cross-modal adaptation for RGB-D detection",
booktitle = "Robotics and Automation (ICRA), 2016 IEEE International Conference on",
pages = "5032--5039",
year = "2016",
organization = "IEEE"
}

Human Object Interaction
Visual Semantic Role Labeling
Saurabh Gupta, Jitendra Malik
arXiv, 2015
abstract / bibtex / arXiv link / v-coco dataset

In this paper we introduce the problem of Visual Semantic Role Labeling: given an image we want to detect people doing actions and localize the objects of interaction. Classical approaches to action recognition either study the task of action classification at the image or video clip level or at best produce a bounding box around the person doing the action. We believe such an output is inadequate and a complete understanding can only come when we are able to associate objects in the scene to the different semantic roles of the action. To enable progress towards this goal, we annotate a dataset of 16K people instances in 10K images with actions they are doing and associate objects in the scene with different semantic roles for each action. Finally, we provide a set of baseline algorithms for this task and analyze error modes providing directions for future work.

@article{gupta2015visual,
author = "Gupta, Saurabh and Malik, Jitendra",
title = "Visual semantic role labeling",
journal = "arXiv preprint arXiv:1505.04474",
year = "2015"
}
Exploring Person Context and Local Scene Context for Object Detection
Saurabh Gupta*, Bharath Hariharan*, Jitendra Malik
arXiv, 2015
abstract / bibtex

In this paper we explore two ways of using context for object detection. The first model focusses on people and the objects they commonly interact with, such as fashion and sports accessories. The second model considers more general object detection and uses the spatial relationships between objects and between objects and scenes. Our models are able to capture precise spatial relationships between the context and the object of interest, and make effective use of the appearance of the contextual region. On the newly released COCO dataset, our models provide relative improvements of up to 5% over CNN-based state-of-the-art detectors, with the gains concentrated on hard cases such as small objects (10% relative improvement).

@article{gupta2015exploring,
author = "Gupta*, Saurabh and Hariharan*, Bharath and Malik, Jitendra",
title = "Exploring person context and local scene context for object detection",
journal = "arXiv preprint arXiv:1511.08177",
year = "2015"
}
Semantic Segmentation Using Regions and Parts
Pablo Arbelaez, Bharath Hariharan, Chunhui Gu, Saurabh Gupta, Lubomir Bourdev, Jitendra Malik
Computer Vision and Pattern Recognition (CVPR), 2012
abstract / bibtex

We address the problem of segmenting and recognizing objects in real world images, focusing on challenging articulated categories such as humans and other animals. For this purpose, we propose a novel design for region-based object detectors that integrates efficiently top-down information from scanning-windows part models and global appearance cues. Our detectors produce class-specific scores for bottom-up regions, and then aggregate the votes of multiple overlapping candidates through pixel classification. We evaluate our approach on the PASCAL segmentation challenge, and report competitive performance with respect to current leading techniques. On VOC2010, our method obtains the best results in 6/20 categories and the highest performance on articulated objects.

@inproceedings{arbelaez2012semantic,
author = "Arbel{\'a}ez, Pablo and Hariharan, Bharath and Gu, Chunhui and Gupta, Saurabh and Bourdev, Lubomir and Malik, Jitendra",
title = "Semantic segmentation using regions and parts",
booktitle = "Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on",
pages = "3378--3385",
year = "2012",
organization = "IEEE"
}

Vision and Language
From Captions to Visual Concepts and Back
Hao Fang*, Saurabh Gupta*, Forrest Iandola*, Rupesh Srivastava*, Li Deng, Piotr Dollar, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John Platt, Lawrence C Zitnick, Geoffrey Zweig
Computer Vision and Pattern Recognition (CVPR), 2015
abstract / bibtex / slides / extended abstract / COCO leader board / webpage / poster / blog / arXiv link / visual concept detection code

This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a dataset of image captions. We use multiple instance learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives. The word detector outputs serve as conditional inputs to a maximum-entropy language model. The language model learns from a set of over 400,000 image descriptions to capture the statistics of word usage. We capture global semantics by re-ranking caption candidates using sentence-level features and a deep multimodal similarity model. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a BLEU-4 score of 29.1%. When human judges compare the system captions to ones written by other people on our heldout test set, the system captions have equal or better quality 34% of the time.

@inproceedings{fang2015captions,
author = "Fang*, Hao and Gupta*, Saurabh and Iandola*, Forrest and Srivastava*, Rupesh K and Deng, Li and Doll{\'a}r, Piotr and Gao, Jianfeng and He, Xiaodong and Mitchell, Margaret and Platt, John C and Zitnick, C Lawrence and Zweig, Geoffrey",
title = "From captions to visual concepts and back",
booktitle = "Proceedings of the IEEE conference on computer vision and pattern recognition",
pages = "1473--1482",
year = "2015"
}
Language Models for Image Captioning: The Quirks and What Works
Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig, Margaret Mitchell
Association for Computational Linguistics (ACL), 2015
abstract / bibtex / arXiv link

Two recent approaches have achieved state-of-the-art results in image captioning. The first uses a pipelined process where a set of candidate words is generated by a convolutional neural network (CNN) trained on images, and then a maximum entropy (ME) language model is used to arrange these words into a coherent sentence. The second uses the penultimate activation layer of the CNN as input to a recurrent neural network (RNN) that then generates the caption sequence. In this paper, we compare the merits of these different language modeling approaches for the first time by using the same state-ofthe-art CNN as input. We examine issues in the different approaches, including linguistic irregularities, caption repetition, and data set overlap. By combining key aspects of the ME and RNN methods, we achieve a new record performance over previously published results on the benchmark COCO dataset. However, the gains we see in BLEU do not translate to human judgments.

@article{devlin2015language,
author = "Devlin, Jacob and Cheng, Hao and Fang, Hao and Gupta, Saurabh and Deng, Li and He, Xiaodong and Zweig, Geoffrey and Mitchell, Margaret",
title = "Language models for image captioning: The quirks and what works",
journal = "arXiv preprint arXiv:1505.01809",
year = "2015"
}
Exploring Nearest Neighbor Approaches for Image Captioning
Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, Lawrence C Zitnick
arXiv, 2015
abstract / bibtex / arXiv link

We explore a variety of nearest neighbor baseline approaches for image captioning. These approaches find a set of nearest neighbor images in the training set from which a caption may be borrowed for the query image. We select a caption for the query image by finding the caption that best represents the "consensus" of the set of candidate captions gathered from the nearest neighbor images. When measured by automatic evaluation metrics on the MS COCO caption evaluation server, these approaches perform as well as many recent approaches that generate novel captions. However, human studies show that a method that generates novel captions is still preferred over the nearest neighbor approach.

@article{devlin2015exploring,
author = "Devlin, Jacob and Gupta, Saurabh and Girshick, Ross and Mitchell, Margaret and Zitnick, C Lawrence",
title = "Exploring nearest neighbor approaches for image captioning",
journal = "arXiv preprint arXiv:1505.04467",
year = "2015"
}
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, Lawrence C Zitnick
arXiv, 2015
abstract / bibtex / arXiv link / code

In this paper we describe the Microsoft COCO Caption dataset and evaluation server. When completed, the dataset will contain over one and a half million captions describing over 330,000 images. For the training and validation images, five independent human generated captions will be provided. To ensure consistency in evaluation of automatic caption generation algorithms, an evaluation server is used. The evaluation server receives candidate captions and scores them using several popular metrics, including BLEU, METEOR, ROUGE and CIDEr. Instructions for using the evaluation server are provided.

@article{chen2015microsoft,
author = "Chen, Xinlei and Fang, Hao and Lin, Tsung-Yi and Vedantam, Ramakrishna and Gupta, Saurabh and Doll{\'a}r, Piotr and Zitnick, C Lawrence",
title = "Microsoft COCO captions: Data collection and evaluation server",
journal = "arXiv preprint arXiv:1504.00325",
year = "2015"
}

Machine Learning Applications
A Data Driven Approach for Algebraic Loop Invariants.
Rahul Sharma, Saurabh Gupta, Bharath Hariharan, Alex Aiken, Percy Liang, Aditya Nori
European Symposium on Programming (ESOP), 2013
abstract / bibtex

We describe a Guess-and-Check algorithm for computing algebraic equation invariants. The 'guess' phase is data driven and derives a candidate invariant from data generated from concrete executions of the program. This candidate invariant is subsequently validated in a 'check' phase by an off-the-shelf SMT solver. Iterating between the two phases leads to a sound algorithm. Moreover, we are able to prove a bound on the number of decision procedure queries which Guess-and-Check requires to obtain a sound invariant. We show how Guess-and-Check can be extended to generate arbitrary boolean combinations of linear equalities as invariants, which enables us to generate expressive invariants to be consumed by tools that cannot handle non-linear arithmetic. We have evaluated our technique on a number of benchmark programs from recent papers on invariant generation. Our results are encouraging - we are able to efficiently compute algebraic invariants in all cases, with only a few tests.

@inproceedings{sharma2013data,
author = "Sharma, Rahul and Gupta, Saurabh and Hariharan, Bharath and Aiken, Alex and Liang, Percy and Nori, Aditya V",
title = "A Data Driven Approach for Algebraic Loop Invariants.",
booktitle = "ESOP",
volume = "13",
pages = "574--592",
year = "2013"
}
Verification as Learning Geometric Concepts
Rahul Sharma, Saurabh Gupta, Bharath Hariharan, Alex Aiken, Aditya Nori
Static Analysis Symposium (SAS), 2013
abstract / bibtex

We formalize the problem of program verification as a learning problem, showing that invariants in program verification can be regarded as geometric concepts in machine learning. Safety properties define bad states: states a program should not reach. Program verification explains why a program's set of reachable states is disjoint from the set of bad states. In Hoare Logic, these explanations are predicates that form inductive assertions. Using samples for reachable and bad states and by applying well known machine learning algorithms for classification, we are able to generate inductive assertions. By relaxing the search for an exact proof to classifiers, we obtain complexity theoretic improvements. Further, we extend the learning algorithm to obtain a sound procedure that can generate proofs containing invariants that are arbitrary boolean combinations of polynomial inequalities. We have evaluated our approach on a number of challenging benchmarks and the results are promising.

@inproceedings{sharma2013verification,
author = "Sharma, Rahul and Gupta, Saurabh and Hariharan, Bharath and Aiken, Alex and Nori, Aditya V",
title = "Verification as learning geometric concepts",
booktitle = "International Static Analysis Symposium",
pages = "388--411",
year = "2013",
organization = "Springer, Berlin, Heidelberg"
}

Teaching
CS280: Computer Vision
As GSI with Prof. Jitendra Malik, Prof. Alexei Efros
Fall 2013
CS188: Introduction to Artificial Intelligence
As GSI with Prof. Dan Klein, Prof. Pieter Abbeel
Fall 2012