Banghua Zhu

alt text 

PhD Candidate,
Department of Electrical Engineering and Computer Sciences,
University of California, Berkeley
Office: 264 Cory Hall, Berkeley, CA
Email: banghua [@] berkeley [DOT] edu
Google Scholar

About me

I'm a final-year Ph.D. student at the Department of EECS, University of California, Berkeley. I am very fortunate to be advised by Prof. Jiantao Jiao and Prof. Michael I. Jordan. I'm a recipient of the 2023 David J. Sakrison Memorial Prize from Berkeley EECS for truly outstanding PhD research.

I'm affiliated with Berkeley AI Research (BAIR), Berkeley Laboratory for Information and System Sciences (BLISS) and the Center for the Theoretical Foundations of Learning, Inference, Information, Intelligence, Mathematics and Microeconomics at Berkeley (CLIMB). Prior to Berkeley, I received B.S. in Eletronic Engineering from Tsinghua University in 2018. In 2017, I spent a wonderful summer at Stanford, working with Prof. David Tse. From Jan 2022 to Nov 2022, I worked as a student researcher at Google Robotics. From Mar 2023 to July 2023, I worked as a research intern at the Knowledge and Language Team in Microsoft Research.

I'm on the 2023-2024 academic job market!


I work on statistics, information theory and machine learning, with applications on foundation models, game theory, robust statistics, reinforcement learning and human-AI interactions.

Checkout our recent open 7B model, Starling-7B, which ranks first in all existing 7B models according to human evaluation in Chatbot Arena! Starling-7B is trained with our open-source high-quality preference dataset, Nectar, using our new reward training and policy-finetuning algorithms.

We have also open-sourced NexusRaven-V2-13B, a function-calling model that surpasses GPT-4 in generic function calling tasks, along with our Huggingface function calling leaderboard.

Main research Interests:

  • Foundation models, or specially large language models (LLMs).

    • Finetuning of LLMs: In my research on reinforcement learning with human feedback (RLHF), I identify the fundamental limit and develop near-optimal algorithms with improved sample complexity for reward training [ZJJ23]. I also propose alternative of Proximal Policy Optimization (PPO) for policy optimization that is more stable and sample-efficient [ZSFDZJJ23]. These advancements enable more efficient and reliable model fine-tuning.

    • Inference in LLMs: I analyze and propose near-optimal algorithms for caching and model multiplexing for serving large models, significantly enhancing the efficiency of inference in LLMs [ZSZBJJ23].

  • Human-AI interactions.

    • ML w/ Helpful Human: I focus on designing advanced human-in-the-loop AI systems that learn from and collaborate with humans. Using a theoretical framework based on Stackelberg game theory, my research explores how AI benefits from interactions with human experts, contributing to a deeper understanding of these domains [ZZJJ23].

    • ML w/ Strategic Human: My research explores the interaction between machine learning systems and self-interested, strategic humans—a crucial topic in economics. By modeling and analyzing online learning in contract theory and the creator economy, I provide near-optimal regret bounds for both problems, addressing the longstanding challenge of sample complexity in online contract design [ZBYWJJ23, ZKJJ23].

    • ML w Malicious Human: I explore techniques to enhance the resilience of AI models against malicious attacks. I extend the theory in high-dimensional robust statistics [ZJS22], and propose efficient algorithms for outlier detection, robust mean estimation, robust covariance estimation, robust linear regression [ZJS21], and Byzantine-robust distributed learning distributed systems [ZPWWJSJ23].

Besides the topics above, I am also actively working on the following research areas:

  • Bandit and Reinforcement Learning: I study online learning and offline learning, off-policy evaluation and inverse RL [RZMJR21, MZJW22];

  • Information-theoretic Lower Bounds: I investigate achieving fundamental limits in noisy searching, sorting, and computing tasks using information-theoretic tools. [WGZW22, ZWGJW23];

  • Semi-Supervised and Unsupervised Learning: I design doubly-robust estimators that outperform traditional self-training pipelines in computer vision and autonomous driving [ZDJWZJJ23]. Furthermore, I conduct theoretical analyses of Generative Adversarial Networks (GANs), providing insights for practical implementations [ZJT19].


1. 2023 David J. Sakrison Memorial Prize

2. 1st place, 2019 Stanford Citadel Datathon

3. Berkeley EECS Award for Undergraduate Researcher Mentoring

4. Berkeley EECS Department Award

5. Beijing Excellent Undergrad Award

6. Qualcomm Scholarship