Allen Y. Yang, PhD

Executive Director

FHL Vive Center for Enhanced Reality

Berkeley Defi Research Initiative

Department of EECS, UC Berkeley

Rm 333A, Cory Hall
Berkeley, CA 94720
Email: yang AT

Office Hour: Fridays 11am - 12pm (appointment only)

What is New!

  • CS 294-137: Introduction to Virtual Reality and Immersive Computing, Fall 2024
  • 2024 NASA SUITS: Immersive 3D UI Design for Space Missions [website]
  • 2023 IJAC: Synthesis and Generation for 3D Architecture Volume with Generative Modeling, 2023
  • 2023 OpenARK Release: Digital Twin Tracking Database v1.0, CVPRW 2023 [github]
  • 2023 Berkeley ROAR Racing: AI Racing Tech wins No.3 @ Indy Autonomous Challenge CES [Youtube]
  • 2023 EECS MEng Capstone: Autonomy and Autonomous Racing [website]
  • 2023 EECS MEng Capstone: Augmented Reality and Virtual Reality [website]
  • OpenARK: An Open-Source AR Developer Kit. [link] [Git]

Research Overview:

Drawing from my experience in both academia and entrepreneurial careers, I am passionate about investigating disruptive new technologies in emerging AR, VR, and AI areas focusing on Computer Vision, Human-Centered User Experience Design, and Autonomy. Some core research topics include Localization and Mapping, Natural Human-Computer Interface, Pattern Recognition, and Embedded Computer Vision for Mobile Applications.

I am also excited about the revolution of new education models that will supercharge the emerging exponential-growth economies. In the EECS Department, I have co-founded graduate degree programs in the areas of AR/VR, Autonomous Driving, and Blockchain and Defi. I work closely with the College of Engineering to promote novel STEM education programs to K-12 schools and students around the world. One such program is Robot Open Autonomous Racing (ROAR).

Current Projects:

              LLM-based 3D UI
Ursa -- LLM-based 3D Immersive User Interface and Robot Interaction

We explore LLM-based 3D immersive UI for challenging edge AI and autonomy applications, where human users may interact with cyberphysical systems and their virtual digital twins purely based on verbal conversation with an LLM AI agent. The underpinning technologies that are used to drive the new UI/UX are derived from ROAR Autonomous Driving, OpenARK Digital Twin Modeling, and open source LLM models.

The project is in collaboration with NASA and Qualcomm.
OpenARK -- An Open-Source AR Software Developer Kit

OpenARK is an open-source wearable augmented reality (AR) system founded at UC Berkeley in 2016. The C++ based software offers innovative core functionalities to power a wide range of off-the-shelf AR components, including see-through glasses, depth cameras, and IMUs.

OpenARK currently offers integration with pmd, Microsoft Kinect, and Intel RealSense cameras. Real-time functionalities include:
  • Gesture recognition
  • SLAM
  • 3D Reconstruction
Indy Dallas
ROAR -- Robot Open Autonomous Racing

Led by its faculty members with deep expertise in AI and autonomous driving, Berkeley is proud to announce competing in new AI racecar competitions since 2021. The Robot Open Autonomous Racing (ROAR) competition will pit multiple student racing teams to compete for speed and vehicle skills at the heart of the iconic Berkeley campus.  The team also participates in the highest level of AI racing in Indy Autonomous Challenge, currently ranked No.1 in the US and No.3 globally.

Past Projects:

Robust Face Recognition via Sparse Representation

This research is featured in the following reports:
Robust 3D natural gesture recognition for wearable Android platforms.

As the first employee and part of the founding team, I served various functions at Atheer. My primary responsibilities were developing 3D sensing and augmented reality algorithms for Atheer's wearable 3D platform. My team developed a real-time 3D natural gesture recognition algorithm on ARM-based Android platforms that was regarded as the best mobile gesture recognition solution. Our proprietary augmented reality algorithms provided industry-leading low latency and accurate 3D localization performance. In 2014, I also served as Acting COO overseeing the overall operation of the company.

Video Demo:

Atheer Labs is features in the following reports:
Large-scale 3-D Reconstruction of Urban Scenes via Low-Rank Textures

We introduce a new approach to reconstruct accurate camera geometry and 3-D models for urban structures in a holistic fashion without relying on extraction of matching of traditional local features such as points and edges. Instead, the new method relies on a new set of semi-grlobal or global features called transform invariant low-rank texture (TILT), which are ubiquitous in urban scenes. Modern high-dimensional optimization techniques enable us to accurately and robustly recover precise and consistent camera calibration and scene geometry from a single or multiple images of the scene.
CPRL CPRL: An Extension of Compressive Sensing to the Phase Retrieval Problem

This paper presents a novel extension of CS to the phase retrieval problem, where intensity measurements of a linear system are used to recover a complex sparse signal. We propose a novel solution using a lifting technique -- CPRL, which relaxes the NP-hard problem to a nonsmooth semidefinite program. Our analysis shows that CPRL inherits many desirable properties from CS, such as guarantees for exact recovery. We further provide scalable numerical solvers to accelerate its implementation.

Matlab Code:
arXiv Tech Report:
L-1 Minimization via Augmented Lagrangian Methods and Benchmark

We provide a comprehensive review of five representative approaches, namely, Gradient Projection, Homotopy, Iterative Shrinkage-Thresholding, Proximal Gradient, and Augmented Lagrangian Methods. The work is intended to fill in a gap in the existing literature to systematically benchmark the performance of these algorithms using a consistent experimental setting. In particular, the paper will focus on a recently proposed face recognition algorithm, where a sparse representation framework has been used to recover human identities from facial images that may be affected by illumination, occlusion, and facial disguise.
SOLO: Sparse Online Low-Rank Projection and Outlier Rejection

Motivated by an emerging theory of robust low-rank matrix representation, we introduce a novel solution for online rigid-body motion registration. The goal is to develop algorithmic techniques that enable a robust, real-time motion registration solution suitable for low-cost, portable 3-D camera devices. The accuracy of the solution is validated through extensive simulation and a real-world experiment, while the system enjoys one to two orders of magnitude speed-up compared to well-established RANSAC solutions.
Sparse PCA via Augmented Lagrangian Methods and Application to Informaitve Feature Selection

We propose a novel method to select informative object features using a more efficient algorithm called Sparse PCA. First, we show that using a large-scale multiple-view object database, informative features can be reliably identified from a high-dimensional visual dictionary by applying Sparse PCA on the histograms of each object category. Our experiment shows that the new algorithm improves recognition accuracy compared to the traditional BoW methods and SfM methods. Second, we present a new solution to Sparse PCA as a semidefinite programming problem using the Augmented Lagrangian Method.

Source code in MATLAB:
d-Oracle: Distributed Object Recognition via a Camera Wireless Net

Harnessing the multiple-view information from a wireless camera sensor network to improve the recognition of objects or actions.

Berkeley Multiview Wireless (BMW) database now available!
d-WAR: Distributed Wearable Action Recognition

We propose a distributed recognition method to classify human actions using a low-bandwidth wearable motion sensor network. Given a set of pre-segmented motion sequences as training examples, the algorithm simultaneously segments and classifies human actions, and it also rejects outlying actions that are not in the training set. The classification is distributedly operated on individual sensor nodes and a base station computer. Using up to eight body sensors, the algorithm achieves state-of-the-art 98.8% accuracy on a set of 12 action categories. We further demonstrate that the recognition precision only decreases gracefully using smaller subsets of sensors, which validates the robustness of the distributed framework.

Wearable Action Recognition Database (WARD) ver 1.0 available for download.
Image Analysis and Segmentation via Lossy Data Compression

We cast natural-image segmentation as a problem of clustering texure features as multivariate mixed data. We model the distribution of the texture features using a mixture of Gaussian distributions. Unlike most existing clustering methods, we allow the mixture components to be degenerate or nearly-degenerate. We contend that this assumption is particularly important for mid-level image segmentation, where degeneracy is typically introduced by using a common feature representation for different textures in an image. We show that such a mixture distribution can be effectively segmented by a simple agglomerative clustering algorithm derived from a lossy data compression approach.
Feature Selection in Face Recognition: A Sparse Representation Perspective

Formulating the problem of face recognition under the emerging theory of compressed sensing, we examine the role of feature selection/dimensionality reduction from the perspective of sparse representation. Our experiments show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical is whether the number of features is sufficient and whether the sparse representation is correctly found.
Robust Algebraic Segmentation of Mixed Rigid-Body and Planar Motions in Two Views

We study segmentation of multiple rigidbody motions in a 3-D dynamic scene under perspective camera projection. Based on the well-known epipolar and homography constraints between two views, we propose a hybrid perspective constraint (HPC) to unify the representation of rigid-body and planar motions. Given a mixture of K hybrid perspective constraints, we propose an algebraic process to partition image correspondences to the individual 3-D motions, called Robust Algebraic Segmentation (RAS). We conduct extensive simulations and real experiments to validate the performance of the new algorithm. The results demonstrate that RAS achieves notably higher accuracy than most existing robust motion segmentation methods, including random sample consensus (RANSAC) and its variations. The implementation of the algorithm is also two to three times faster than the existing methods.We will make the implementation of the algorithm and the benchmark scripts available on our website.

Generalized Principal Component Analysis (GPCA)                                            

An algebraic framework for modeling and segmenting mixed data using a union of subspaces, a.k.a. subspace arrangements. Yet the statistical implementation of the framework is robust to data noise and outliers.

Symmetry-based 3-D Reconstruction from Perspective Images

We investigated a unified framework to extract poses and structures of 2-D symmetric patterns from perspective images. The framework uniformly encompasses all three fundamental types of symmetry: Reflection, Rotation, and Translation, based on a systematic study of the homography groups in image induced by the symmetry groups in space.

We claim the following principle: If a planar object admits rich enough symmetry, no 3-D geometric information is lost through perspective imaging.


A unified robot motion interface and tele-communication protocols for controlling arms, bases, and androids.
Copyright (c) Honda Research, Mountain View, CA.

Current Students

Graduated Students

(Last Modified on June 8, 2020)