Welcome to Allen Yang's Homepage

Research Overview:

Drawing from my experience in both academia and entrepreneurial careers, I am passionate about investigating disruptive new technologies in emerging AR, VR, and AI areas focusing on Computer Vision, Human-Centered User Experience Design, and Autonomy. Some core research topics include Localization and Mapping, Natural Human-Computer Interface, Pattern Recognition, and Embedded Computer Vision for Mobile Applications.

I am also excited about the revolution of new education models that will supercharge the emerging exponential-growth economies. In the EECS Department, I have co-founded graduate degree programs in the areas of AR/VR, Autonomous Driving, and Blockchain and Defi. I work closely with the College of Engineering to promote novel STEM education programs to K-12 schools and students around the world. One such program is Robot Open Autonomous Racing (ROAR).

Current Projects:

Ursa -- LLM-based 3D Immersive User Interface and Robot Interaction

We explore LLM-based 3D immersive UI for challenging edge AI and autonomy applications, where human users may interact with cyberphysical systems and their virtual digital twins purely based on verbal conversation with an LLM AI agent. The underpinning technologies that are used to drive the new UI/UX are derived from ROAR Autonomous Driving, OpenARK Digital Twin Modeling, and open source LLM models.

The project is in collaboration with NASA and Qualcomm.

OpenARK -- An Open-Source AR Software Developer Kit

OpenARK is an open-source wearable augmented reality (AR) system founded at UC Berkeley in 2016. The C++ based software offers innovative core functionalities to power a wide range of off-the-shelf AR components, including see-through glasses, depth cameras, and IMUs.

OpenARK currently offers integration with pmd, Microsoft Kinect, and Intel RealSense cameras. Real-time functionalities include:

Gesture recognition
SLAM
3D Reconstruction

ROAR -- Robot Open Autonomous Racing

Led by its faculty members with deep expertise in AI and autonomous driving, Berkeley is proud to announce competing in new AI racecar competitions since 2021. The Robot Open Autonomous Racing (ROAR) competition will pit multiple student racing teams to compete for speed and vehicle skills at the heart of the iconic Berkeley campus. The team also participates in the highest level of AI racing in Indy Autonomous Challenge, currently ranked No.1 in the US and No.3 globally.

Past Projects:

Robust Face Recognition via Sparse Representation

This research is featured in the following reports:

Communications of the ACM
ABC7 Bay Area News
Wired.com
Nuit Balanche
Rice Compressive Sensing Online
IEEE Xplore Top 100 Download, June 2010.

Robust 3D natural gesture recognition for wearable Android platforms.

As the first employee and part of the founding team, I served various functions at Atheer. My primary responsibilities were developing 3D sensing and augmented reality algorithms for Atheer's wearable 3D platform. My team developed a real-time 3D natural gesture recognition algorithm on ARM-based Android platforms that was regarded as the best mobile gesture recognition solution. Our proprietary augmented reality algorithms provided industry-leading low latency and accurate 3D localization performance. In 2014, I also served as Acting COO overseeing the overall operation of the company.

Video Demo: https://www.youtube.com/watch?v=Rp6iawf0Dgk

Atheer Labs is features in the following reports:

Large-scale 3-D Reconstruction of Urban Scenes via Low-Rank Textures

We introduce a new approach to reconstruct accurate camera geometry and 3-D models for urban structures in a holistic fashion without relying on extraction of matching of traditional local features such as points and edges. Instead, the new method relies on a new set of semi-grlobal or global features called transform invariant low-rank texture (TILT), which are ubiquitous in urban scenes. Modern high-dimensional optimization techniques enable us to accurately and robustly recover precise and consistent camera calibration and scene geometry from a single or multiple images of the scene.

CPRL: An Extension of Compressive Sensing to the Phase Retrieval Problem

This paper presents a novel extension of CS to the phase retrieval problem, where intensity measurements of a linear system are used to recover a complex sparse signal. We propose a novel solution using a lifting technique -- CPRL, which relaxes the NP-hard problem to a nonsmooth semidefinite program. Our analysis shows that CPRL inherits many desirable properties from CS, such as guarantees for exact recovery. We further provide scalable numerical solvers to accelerate its implementation.

Matlab Code: http://users.isy.liu.se/rt/ohlsson/code.html
arXiv Tech Report: http://arxiv.org/pdf/1111.6323.pdf

L-1 Minimization via Augmented Lagrangian Methods and Benchmark

We provide a comprehensive review of five representative approaches, namely, Gradient Projection, Homotopy, Iterative Shrinkage-Thresholding, Proximal Gradient, and Augmented Lagrangian Methods. The work is intended to fill in a gap in the existing literature to systematically benchmark the performance of these algorithms using a consistent experimental setting. In particular, the paper will focus on a recently proposed face recognition algorithm, where a sparse representation framework has been used to recover human identities from facial images that may be affected by illumination, occlusion, and facial disguise.

SOLO: Sparse Online Low-Rank Projection and Outlier Rejection

Motivated by an emerging theory of robust low-rank matrix representation, we introduce a novel solution for online rigid-body motion registration. The goal is to develop algorithmic techniques that enable a robust, real-time motion registration solution suitable for low-cost, portable 3-D camera devices. The accuracy of the solution is validated through extensive simulation and a real-world experiment, while the system enjoys one to two orders of magnitude speed-up compared to well-established RANSAC solutions.

Sparse PCA via Augmented Lagrangian Methods and Application to Informaitve Feature Selection

We propose a novel method to select informative object features using a more efficient algorithm called Sparse PCA. First, we show that using a large-scale multiple-view object database, informative features can be reliably identified from a high-dimensional visual dictionary by applying Sparse PCA on the histograms of each object category. Our experiment shows that the new algorithm improves recognition accuracy compared to the traditional BoW methods and SfM methods. Second, we present a new solution to Sparse PCA as a semidefinite programming problem using the Augmented Lagrangian Method.

Source code in MATLAB: http://www.eecs.berkeley.edu/~yang/software/SPCA/SPCA_ALM.zip

d-Oracle: Distributed Object Recognition via a Camera Wireless Net

Harnessing the multiple-view information from a wireless camera sensor network to improve the recognition of objects or actions.

Berkeley Multiview Wireless (BMW) database now available!

d-WAR: Distributed Wearable Action Recognition

We propose a distributed recognition method to classify human actions using a low-bandwidth wearable motion sensor network. Given a set of pre-segmented motion sequences as training examples, the algorithm simultaneously segments and classifies human actions, and it also rejects outlying actions that are not in the training set. The classification is distributedly operated on individual sensor nodes and a base station computer. Using up to eight body sensors, the algorithm achieves state-of-the-art 98.8% accuracy on a set of 12 action categories. We further demonstrate that the recognition precision only decreases gracefully using smaller subsets of sensors, which validates the robustness of the distributed framework.

Wearable Action Recognition Database (WARD) ver 1.0 available for download.

Image Analysis and Segmentation via Lossy Data Compression

We cast natural-image segmentation as a problem of clustering texure features as multivariate mixed data. We model the distribution of the texture features using a mixture of Gaussian distributions. Unlike most existing clustering methods, we allow the mixture components to be degenerate or nearly-degenerate. We contend that this assumption is particularly important for mid-level image segmentation, where degeneracy is typically introduced by using a common feature representation for different textures in an image. We show that such a mixture distribution can be effectively segmented by a simple agglomerative clustering algorithm derived from a lossy data compression approach.

Feature Selection in Face Recognition: A Sparse Representation Perspective

Formulating the problem of face recognition under the emerging theory of compressed sensing, we examine the role of feature selection/dimensionality reduction from the perspective of sparse representation. Our experiments show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical is whether the number of features is sufficient and whether the sparse representation is correctly found.

Robust Algebraic Segmentation of Mixed Rigid-Body and Planar Motions in Two Views

We study segmentation of multiple rigidbody motions in a 3-D dynamic scene under perspective camera projection. Based on the well-known epipolar and homography constraints between two views, we propose a hybrid perspective constraint (HPC) to unify the representation of rigid-body and planar motions. Given a mixture of K hybrid perspective constraints, we propose an algebraic process to partition image correspondences to the individual 3-D motions, called Robust Algebraic Segmentation (RAS). We conduct extensive simulations and real experiments to validate the performance of the new algorithm. The results demonstrate that RAS achieves notably higher accuracy than most existing robust motion segmentation methods, including random sample consensus (RANSAC) and its variations. The implementation of the algorithm is also two to three times faster than the existing methods.We will make the implementation of the algorithm and the benchmark scripts available on our website.

Generalized Principal Component Analysis (GPCA)

An algebraic framework for modeling and segmenting mixed data using a union of subspaces, a.k.a. subspace arrangements. Yet the statistical implementation of the framework is robust to data noise and outliers.

Symmetry-based 3-D Reconstruction from Perspective Images

We investigated a unified framework to extract poses and structures of 2-D symmetric patterns from perspective images. The framework uniformly encompasses all three fundamental types of symmetry: Reflection, Rotation, and Translation, based on a systematic study of the homography groups in image induced by the symmetry groups in space.

We claim the following principle: If a planar object admits rich enough symmetry, no 3-D geometric information is lost through perspective imaging.

RoboTalk

A unified robot motion interface and tele-communication protocols for controlling arms, bases, and androids.
Copyright (c) Honda Research, Mountain View, CA.

Current Students

Aaron Xie, EECS Undergraduate

Caitlin Wolfe, EECS PhD

Jazzy Rao, Architecture Undergrad

Encheng Liu, ME MEng

Francisco Ramos, ME MEng

Jiahao Zhao, ME MEng

Qingyue Liu, ME MEng

Huynh Quang, ME MEng

Zixun Huang, EECS MDes

Adrian Botvinik, EECS MEng

Boyuan Ma, EECS MEng

Virgile Foussereau, EECS MEng

Yukun Song, EECS MEng

Kathy Zhuang, EECS MEng

Zoe Zhou, EECS MEng

Rui Li, EECS MEng

Jackson Gao, EECS MEng

Graduated Students

Siyu Catherine Gai, EECS Undergraduate (Harvard)

Jack Wang, EECS Undergraduate

Weiyu Feng, EECS MEng (SMT)

Adam Chang, EECS Undergraduate (Jane Street)

Zhihao (Seth) Zhao (UCLA PhD)

Yue Fan, EECS MDes (HTC Vive)

Tianlun Zhang, EECS MS (Berkeley)

Jingjing Wei, EECS MEng (Oracle)

Yi-Hsuan Sung, EECS MEng

Sunisha Fernandez, EECS MEng

Jad Yahya, EECS MEng (Tesla)

Aman Saraf, EECS MEng (Tesla)

Carl Gan, EECS MS (Cornell Tech PhD)

Adam Curtis, EECS MEng (Dexterity)

Siyi Ji, EECS MEng

Yi-Chen Chen, EECS MEng (Bloomberg)

Afroz Alam, EECS MEng (Snowflake)

Hung-Chun Chen, EECS MEng (Google)

Nianxu Wang, EECS MEng (Morgan Stanley)

Tiangcheng Sun, EECS MEng (Amazon)

Ang Wang, EECS MEng (Tiktok)

Jianing Fu, EECS MEng (Meta)

Hsin-Li Chu, EECS MEng (Google)

Siming Liu, EECS MEng

James Cheney, EECS MEng (Ojjo)

Weiyan (Wayne) Zhu, EECS MEng (EBay)

Xuan Liu, EECS MEng

Bingxin Zhang, EECS MEng

Federico Palacios, EECS MEng

Franco Leonardo Huang, EECS MEng (Hitch Interactive)

Xiangyu Zhou, EECS MEng (Pinterest)

Yihan Li, EECS MEng (Oracle)

Yuzhou Ge, EECS MEng

Venapally Rao, Civil MEng

Peilin Zhou, Civil MEng

Eric Wang, EECS Undergrad (Benchling)

Woojin Ko, EECS Undergrad (Cornell Tech PhD)

Xiao Song, EECS MEng (Nvidia)

Ryan Ong, EECS Master

Michael Wu, EECS Master (TikTok)

Mohammad Keshavarzi, Architecture PhD (Meta Reality Lab)

David McPherson, EECS PhD

Kewei Sui, 2021 (Forma)

Shubha Jagannatha, 2021 (Pixar Animation)

Michael Khorram, 2021

Tiantian Wang, 2021

Jiya (Maggie) Zhang, 2021 (Google)

Alan Zhang, 2021 (Palo Alto Network)

Yohan Mysore, 2021 (NVidia)

Christian Reyes, 2021 (Bloomberg)

Peru Dayani, 2021 (Bloomberg)

Chufan Guo, 2021

Jiuqi Wang, 2021 (A&C Capital)

Michael Cui, 2021 (HRL Lab)

Yiliang Sun, 2021

Zizhao Gong, 2021

Joseph Menke, 2020 (Apple)

Oladapo Afolabi, 2020

Zhe Zhao, 2020 (Northeastern)

Qi Wu, 2020

Zixi Cai, 2020

Yitong Li, 2020

(Roy) Lingfeng Guo, 2020

Gang Yao, 2020

Xi Chen, 2020

Jiewen Lai, 2020

Hsin-Pei Lee, 2020

Yu Zhang, 2019 (ZF Group)

Zhijiong Huang, 2019

Jesse Paterson, 2019 (STRIVR)

Jiwoong Han, 2019 (NVidia)

Yujie Wang, 2019 (TBSI PhD)

Yiwen Xie, 2019 (Google)

Zheqing Yu, 2019 (VMware)

Xinyue Zhu, 2019 (Facebook)

Yu-Chieh Lan, 2019 (Amazon)

Alice Shan, 2019 (Facebook)

Fan Zhang, 2019 (Oracle)

Xie Zhou, 2019

Jiasheng Zhu, 2019 (Oracle)

Bill Zhou, 2018 (Facebook)

Tom Cheng, 2018 (Harvard)

Yifei Liu, 2018 (Facebook)

Nancy Yang, 2018

Alyssa Li, 2018 (Berkeley PhD)

Katherine Bell, 2018

Tongyu Chen, 2018

Weijia Huang, 2018 (Pengram)

Rongbing Zhou, 2018 (Zoetic AI)

Diyu Luo, 2018 (LinkedIn)

Chu-Ming Chang, 2018 (Google)

Daniel Amar, 2018 (Samsung)

Hai Zhang, 2018 (Google)

Yidong Huang, 2018

Kuan Lu, 2018 (Microsoft)

Yang Ding, 2018 (VMware)

Wei-Luan Tseng, 2018 (Microsoft)

Rachel Lee, 2018

	Ursa -- LLM-based 3D Immersive User Interface and Robot Interaction We explore LLM-based 3D immersive UI for challenging edge AI and autonomy applications, where human users may interact with cyberphysical systems and their virtual digital twins purely based on verbal conversation with an LLM AI agent. The underpinning technologies that are used to drive the new UI/UX are derived from ROAR Autonomous Driving, OpenARK Digital Twin Modeling, and open source LLM models. The project is in collaboration with NASA and Qualcomm.
	OpenARK -- An Open-Source AR Software Developer Kit OpenARK is an open-source wearable augmented reality (AR) system founded at UC Berkeley in 2016. The C++ based software offers innovative core functionalities to power a wide range of off-the-shelf AR components, including see-through glasses, depth cameras, and IMUs. OpenARK currently offers integration with pmd, Microsoft Kinect, and Intel RealSense cameras. Real-time functionalities include: Gesture recognition SLAM 3D Reconstruction
	ROAR -- Robot Open Autonomous Racing Led by its faculty members with deep expertise in AI and autonomous driving, Berkeley is proud to announce competing in new AI racecar competitions since 2021. The Robot Open Autonomous Racing (ROAR) competition will pit multiple student racing teams to compete for speed and vehicle skills at the heart of the iconic Berkeley campus. The team also participates in the highest level of AI racing in Indy Autonomous Challenge, currently ranked No.1 in the US and No.3 globally.