> Xin Wang

Xin Wang

I am Xin Wang, a forth-year Ph.D. candidate in Computer Science at UC Berkeley, advised by Prof.Trevor Darrell and Prof. Joseph E. Gonzalez. I am part of the BAIR Lab, RISE Lab, and BDD Lab. Prior to UC Berkeley, I obtained my B.S. degree in Computer Science from Shanghai Jiao Tong University in 2015.

My research interest lies in computer vision and learning system designs including dynamic representation frameworks for low shot learning, continual learning, efficient inference, interactive data processing and model serving systems.

Google Scholar / Github / CV

   Office: 465 Soda Hall, Berkeley, CA 94720         Email: xinw [at] eecs [dot] berkeley [dot] edu

     We are organizing the ICML 2019 workshop on Human In the Loop Learning on June 14, Long Beach, CA

Publications

   Task-aware Deep Sampling for Feature Generation
   Xin Wang, Fisher Yu, Trevor Darrell, Joseph E. Gonzalez
   Preprint

    paper

The human ability to imagine the variety of appearances of novel objects based on past experience is crucial for quickly learning novel visual concepts based on few examples. Endowing machines with a similar ability to generate feature distributions for new visual concepts is key to sample-efficient model generalization. In this work, we propose a novel generator architecture suitable for feature generation in the zero-shot setting. We introduce task-aware deep sampling (TDS) which injects task-aware noise layer-by-layer in the generator, in contrast to existing shallow sampling (SS) schemes where random noise is only sampled once at the input layer of the generator. We propose a sample efficient learning model composed of a TDS generator, a discriminator and a classifier (e.g., a soft-max classifier). We find that our model achieves state-of-the-art results on the compositional zero-shot learning benchmarks as well as improving upon the established benchmarks in conventional zero-shot learning with a faster convergence rate.
   ACE: Adapting to Changing Environments for Semantic Segmentation
   Zuxuan Wu, Xin Wang, Joseph E. Gonzalez, Tom Goldstein, Larry S. Davis
   International Conference in Computer Vision (ICCV) 2019

    paper

Deep neural networks exhibit exceptional accuracy when they are trained and tested on the same data distributions. However, neural classifiers are often extremely brittle when confronted with domain shift---changes in the input distribution that occur over time. We present ACE, a framework for semantic segmentation that dynamically adapts to changing environments over the time. By aligning the distribution of labeled training data from the original source domain with the distribution of incoming data in a shifted domain, ACE synthesizes labeled training data for environments as it sees them. This stylized data is then used to update a segmentation model so that it performs well in new environments. To avoid forgetting knowledge from past environments, we introduce a memory that stores feature statistics from previously seen domains. These statistics can be used to replay images in any of the previously observed domains, thus preventing catastrophic forgetting. In addition to standard batch training using stochastic gradient decent (SGD), we also experiment with fast adaptation methods based on adaptive meta-learning. Extensive experiments are conducted on two datasets from SYNTHIA, the results demonstrate the effectiveness of the proposed approach when adapting to a number of tasks.

@InProceedings{wu2019ace,
  title={ACE: Adapting to Changing Environments for Semantic Segmentation},
  author={Wu, Zuxuan and Wang, Xin and Gonzalez, Joseph E and Goldstein, Tom and Davis, Larry S},
  booktitle={International Conference in Computer Vision (ICCV)},
  month={October}
  year={2019}
}
   Few-shot Object Detection via Feature Reweighting
   Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, Trevor Darrell
   International Conference in Computer Vision (ICCV) 2019

    paper

This work aims to solve the challenging few-shot object detection problem where only a few annotated examples are available for each object category to train a detection model. Such an ability of learning to detect an object from just a few examples is common for human vision systems, but remains absent for computer vision systems. Though few-shot meta learning offers a promising solution technique, previous works mostly target the task of image classification and are not directly applicable for the much more complicated object detection task. In this work, we propose a novel meta-learning based model with carefully designed architecture, which consists of a meta-model and a base detection model. The base detection model is trained on several base classes with sufficient samples to offer basis features. The meta-model is trained to reweight importance of features from the base detection model over the input image and adapt these features to assist novel object detection from a few examples. The meta-model is light-weight, end-to-end trainable and able to entail the base model with detection ability for novel objects fast. Through experiments we demonstrated our model can outperform baselines by a large margin for few-shot object detection, on multiple datasets and settings. Our model also exhibits fast adaptation speed to novel few-shot classes.

@InProceedings{kang2018few,
  title={Few-shot Object Detection via Feature Reweighting},
  author={Kang, Bingyi and Liu, Zhuang and Wang, Xin and Yu, Fisher and Feng, Jiashi and Darrell, Trevor},
  booktitle={International Conference in Computer Vision (ICCV)},
  month={October}
  year={2019}
}
   TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning
   Xin Wang, Fisher Yu, Ruth Wang, Trevor Darrell, Joseph E. Gonzalez
   Conference on Computer Vision and Pattern Recognition (CVPR) 2019

    paper code

Learning good feature embeddings for images often requires substantial training data. As a consequence, in settings where training data is limited (e.g., few-shot and zero-shot learning), we are typically forced to use a general feature embedding across prediction tasks. Ideally, we would like to construct feature embeddings that are tuned for the given task and even input image. In this work, we propose Task Aware Feature Embedding Networks (TAFE-Nets) to learn how to adapt the image representation to a new task in a meta learning fashion. Our network is composed of a meta learner and a prediction network, where the meta learner generates parameters for the feature layers in the prediction network based on a task input so that the feature embedding can be accurately adjusted for that task. We show that TAFE-Net is highly effective in generalizing to new tasks or concepts and evaluate the TAFE-Net on a range of benchmarks in zero-shot and few-shot learning. Our model matches or exceeds the state-of-the-art on all tasks. In particular, our approach improves the prediction accuracy of unseen attribute-object pairs by 4 to 15 points on the challenging visual attribute-object composition task. 

@InProceedings{Wang_2019_CVPR,
author = {Wang, Xin and Yu, Fisher and Wang, Ruth and Darrell, Trevor and Gonzalez, Joseph E.},
title = {TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}
   Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video
   Samvit Jain, Xin Wang, Joseph E. Gonzalez
   Conference on Computer Vision and Pattern Recognition (CVPR) 2019

    paper code

We present Accel, a novel semantic video segmentation system that achieves high accuracy at low inference cost by combining the predictions of two network branches: (1) a reference branch that extracts high-detail features on a reference keyframe, and warps these features forward using frame-to-frame optical flow estimates, and (2) an update branch that computes features of adjustable quality on the current frame, performing a temporal update at each video frame. The modularity of the update branch, where feature subnetworks of varying layer depth can be inserted (e.g. ResNet-18 to ResNet-101), enables operation over a new, state-of-the-art accuracy-throughput trade-off spectrum. Over this curve, Accel models achieve both higher accuracy and faster inference times than the closest comparable single-frame segmentation networks. In general, Accel significantly outperforms previous work on efficient semantic video segmentation, correcting warping-related error that compounds on datasets with complex dynamics. Accel is end-to-end trainable and highly modular: the reference network, the optical flow network, and the update network can each be selected independently, depending on application requirements, and then jointly fine-tuned. The result is a robust, general system for fast, high-accuracy semantic segmentation on video. 

@InProceedings{Jain_2019_CVPR,
author = {Jain, Samvit and Wang, Xin and Gonzalez, Joseph E.},
title = {Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}
   Deep Mixture of Experts via Shallow Embedding
   Xin Wang, Fisher Yu, Lisa Dunlap, Yi-An Ma, Ruth Wang, Azalia Mirhoseini, Trevor Darrell, Joseph E. Gonzalez
   Conference on Uncertainty in Artificial Intelligence (UAI) 2019

    paper

Larger networks generally have greater representational power at the cost of increased computational complexity. Sparsifying such networks has been an active area of research but has been generally limited to static regularization or dynamic approaches using reinforcement learning. We explore a mixture of experts (MoE) approach to deep dynamic routing, which activates certain experts in the network on a per-example basis. Our novel DeepMoE architecture increases the representational power of standard convolutional networks by adaptively sparsifying and recalibrating channel-wise features in each convolutional layer. We employ a multi-headed sparse gating network to determine the selection and scaling of channels for each input, leveraging exponential combinations of experts within a single convolutional network. Our proposed architecture is evaluated on four benchmark datasets and tasks, and we show that Deep-MoEs are able to achieve higher accuracy with lower computation than standard convolutional networks. 

@InProceedings{Wang_2019_UAI,
author={Wang, Xin and Yu, Fisher and Dunlap, Lisa and Ma, Yi-An and Wang, Ruth and Mirhoseini, Azalia and Darrell, Trevor and Gonzalez, Joseph E},
title={Deep mixture of experts via shallow embedding},
booktitle = {Conference on Uncertainty in Artificial Intelligence (UAI)},
month = {July},
year={2019}
}
   SkipNet: Learning Dynamic Routing in Convolutional Networks
   Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, Joseph E. Gonzalez
   European Conference on Computer Vision (ECCV) 2018

    paper code

While deeper convolutional networks are needed to achieve maximum accuracy in visual perception tasks, for many inputs shallower networks are sufficient. We exploit this observation by learning to skip convolutional layers on a per-input basis. We introduce SkipNet, a modified residual network, that uses a gating network to selectively skip convolutional blocks based on the activations of the previous layer. We formulate the dynamic skipping problem in the context of sequential decision making and propose a hybrid learning algorithm that combines supervised learning and reinforcement learning to address the challenges of non-differentiable skipping decisions. We show SkipNet reduces computation by 30-90% while preserving the accuracy of the original model on four benchmark datasets and outperforms the state-of-the-art dynamic networks and static compression methods. We also qualitatively evaluate the gating policy to reveal a relationship between image scale and saliency and the number of layers skipped.

@InProceedings{Wang_2018_ECCV,
author = {Wang, Xin and Yu, Fisher and Dou, Zi-Yi and Darrell, Trevor and Gonzalez, Joseph E.},
title = {SkipNet: Learning Dynamic Routing in Convolutional Networks},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}
}
   IDK Cascades: Fast Deep Learning by Learning not to Overthink
   Xin Wang, Yujia Luo, Daniel Crankshaw, Alexey Tumanov, Fisher Yu, Joseph E. Gonzalez
   Conference on Uncertainty in Artificial Intelligence (UAI) 2018

    paper

Advances in deep learning have led to substantial increases in prediction accuracy but have been accompanied by increases in the cost of rendering predictions. We conjecture that fora majority of real-world inputs, the recent advances in deep learning have created models that effectively "overthink" on simple inputs. In this paper, we revisit the classic question of building model cascades that primarily leverage class asymmetry to reduce cost. We introduce the "I Don't Know"(IDK) prediction cascades framework, a general framework to systematically compose a set of pre-trained models to accelerate inference without a loss in prediction accuracy. We propose two search based methods for constructing cascades as well as a new cost-aware objective within this framework. The proposed IDK cascade framework can be easily adopted in the existing model serving systems without additional model re-training. We evaluate the proposed techniques on a range of benchmarks to demonstrate the effectiveness of the proposed framework.

@InProceedings{Wang_2018_UAI,
author={Wang, Xin and Luo, Yujia and Crankshaw, Daniel and Tumanov, Alexey and Yu, Fisher and Gonzalez, Joseph E},
title={IDK Cascades: Fast Deep Learning by Learning not to Overthink},
booktitle = {Conference on Uncertainty in Artificial Intelligence (UAI)},
month = {July},
year={2018}
}
   Clipper: A Low-Latency Online Prediction Serving System
   Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, Ion Stoica
   USENIX Symposium on Networked Systems Design and Implementation (NSDI) 2017

    Presentation paper documentation blog

Machine learning is being deployed in a growing number of applications which demand real-time, accurate, and robust predictions under heavy query load. However, most machine learning frameworks and systems only address model training and not deployment.
In this paper, we introduce Clipper, a general-purpose low-latency prediction serving system. Interposing between end-user applications and a wide range of machine learning frameworks, Clipper introduces a modular architecture to simplify model deployment across frameworks and applications. Furthermore, by introducing caching, batching, and adaptive model selection techniques, Clipper reduces prediction latency and improves prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. We evaluate Clipper on four common machine learning benchmark datasets and demonstrate its ability to meet the latency, accuracy, and throughput demands of online serving applications. Finally, we compare Clipper to the Tensorflow Serving system and demonstrate that we are able to achieve comparable throughput and latency while enabling model composition and online learning to improve accuracy and render more robust predictions.

@inproceedings {Crankshaw_2017_NSDI,
author = {Daniel Crankshaw and Xin Wang and Guilio Zhou and Michael J. Franklin and Joseph E. Gonzalez and Ion Stoica},
title = {Clipper: A Low-Latency Online Prediction Serving System},
booktitle = {14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17)},
year = {2017},
isbn = {978-1-931971-37-9},
address = {Boston, MA},
pages = {613--627},
url = {https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/crankshaw},
publisher = {{USENIX} Association},
}
   Scalable Training and Serving of Personalized Models
   Daniel Crankshaw, Xin Wang, Joseph E. Gonzalez, Michael J. Franklin
   LearningSys 2015

    paper

The past decade has seen substantial growth in Learning Systems research combining advances in system design with new efficient algorithms to enable the training of complex models on vast amounts of data. As a consequence we have seen widespread adoption of machine learning techniques to address important real-world problems. While this work has been wildly successful in shaping both the machine learning and systems fields, it also ignores a big part of real-world machine learning. In particular, much of the work in Learning Systems has operated under the fiction: the world hands me a static, potentially very large, dataset and I train an accurate, potentially complex, model. This fiction departs from reality in two key regards that we begin to address in this work.

@article{crankshaw2015scalable,
  title={Scalable training and serving of personalized models},
  author={Crankshaw, Daniel and Wang, Xin and Gonzalez, Joseph E and Franklin, Michael J}
}

Services

Experience