Banghua Zhu

alt text 

Incoming Assistant Professor of Electrical and Computer Engineering at the University of Washington,
Adjunct Professor of Computer Science and Engineering at the University of Washington,
Cofounder, Nexusflow AI
Email: banghua [@] uw [DOT] edu
Google Scholar
Twitter

About me

I'm an incoming assistant professor at UW ECE, with adjunct appointments in CSE. I received my PhD from the Department of EECS, UC Berkeley. I am very fortunate to be advised by Prof. Jiantao Jiao and Prof. Michael I. Jordan. I'm a recipient of the 2023 David J. Sakrison Memorial Prize from Berkeley EECS for truly outstanding PhD research.

I co-founded Nexusflow AI since 2023. We are dedicated to provide reliable LLM solutions for enterprise use-cases. Please feel free to contact me if you are interested in collaborations!

I'm recruiting PhD students in the 2024-2025 cycle! Please mention my name in the application of UW ECE / CSE if you'd like to work with me on LLM, game theory and / or reinforcement learning!

Research

I'm currently interested in the training, serving, evaluation, and applications of foundation models. I have also been working on statistics, information theory and machine learning, with applications on game theory, robust statistics, reinforcement learning and human-AI interactions.

Training Large Langugage Models

Checkout our open 7B model, Starling-7B, which ranks first in all existing Mistral-based 7B models according to human evaluation in Chatbot Arena! Starling-7B is trained with our open-source high-quality preference dataset, Nectar, using our new reward training and policy-finetuning algorithms.

We also released Athene-70B, currently ranking 11th among all models on Chatbot Arena. As a model fine-tuned from Llama-3-70B, the ranking increases from 22th to 11th on Arena, through our unique data curation and post-training pipeline.

We have open-sourced NexusRaven-V2-13B, a strong function-calling model that surpasses GPT-4 in complex function calling tasks, especially in terms of parallel and nested calls.

Evaluation

We have released Huggingface function calling leaderboard, which is used in Llama-3.1 technical report for evaluating function calling capabilities.

Chatbot Arena is one of the most reliable platforms for evaluating model with human preferences.

Arena-Hard-Auto is an automatic benchmark creation pipeline that uses LLM-as-a-judge to quickly evaluate the model performance.

Main research Interests:

  • Foundation models, or specially large language models (LLMs).

    • Finetuning and Evaluation of LLMs: In my research on reinforcement learning with human feedback (RLHF), I identify the fundamental limit and develop near-optimal algorithms with improved sample complexity for reward training [ZJJ23]. I also propose alternative of Proximal Policy Optimization (PPO) for policy optimization that is more stable and sample-efficient [ZSFDZJJ23]. These advancements enable more efficient and reliable model fine-tuning.

    • Inference in LLMs: I analyze and propose near-optimal algorithms for caching and model multiplexing for serving large models, significantly enhancing the efficiency of inference in LLMs [ZSZBJJ23].

  • Human-AI interactions.

    • ML w/ Helpful Human: I focus on designing advanced human-in-the-loop AI systems that learn from and collaborate with humans. Using a theoretical framework based on Stackelberg game theory, my research explores how AI benefits from interactions with human experts, contributing to a deeper understanding of these domains [ZZJJ23].

    • ML w/ Strategic Human: My research explores the interaction between machine learning systems and self-interested, strategic humans—a crucial topic in economics. By modeling and analyzing online learning in contract theory and the creator economy, I provide near-optimal regret bounds for both problems, addressing the longstanding challenge of sample complexity in online contract design [ZBYWJJ23, ZKJJ23].

    • ML w Malicious Human: I explore techniques to enhance the resilience of AI models against malicious attacks. I extend the theory in high-dimensional robust statistics [ZJS22], and propose efficient algorithms for outlier detection, robust mean estimation, robust covariance estimation, robust linear regression [ZJS21], and Byzantine-robust distributed learning distributed systems [ZPWWJSJ23].

Besides the topics above, I am also actively working on the following research areas:

  • Bandit and Reinforcement Learning: I study online learning and offline learning, off-policy evaluation and inverse RL [RZMJR21, MZJW22];

  • Information-theoretic Lower Bounds: I investigate achieving fundamental limits in noisy searching, sorting, and computing tasks using information-theoretic tools. [WGZW22, ZWGJW23];

  • Semi-Supervised and Unsupervised Learning: I design doubly-robust estimators that outperform traditional self-training pipelines in computer vision and autonomous driving [ZDJWZJJ23]. Furthermore, I conduct theoretical analyses of Generative Adversarial Networks (GANs), providing insights for practical implementations [ZJT19].