Banghua Zhu
About meI'm an incoming assistant professor at UW ECE, with adjunct appointments in CSE. I received my PhD from the Department of EECS, UC Berkeley. I am very fortunate to be advised by Prof. Jiantao Jiao and Prof. Michael I. Jordan. I'm a recipient of the 2023 David J. Sakrison Memorial Prize from Berkeley EECS for truly outstanding PhD research. I co-founded Nexusflow AI since 2023. We are dedicated to provide reliable LLM solutions for enterprise use-cases. Please feel free to contact me if you are interested in collaborations! I'm recruiting PhD students in the 2024-2025 cycle! Please mention my name in the application of UW ECE / CSE if you'd like to work with me on LLM, game theory and / or reinforcement learning! ResearchI'm currently interested in the training, serving, evaluation, and applications of foundation models. I have also been working on statistics, information theory and machine learning, with applications on game theory, robust statistics, reinforcement learning and human-AI interactions. Training Large Langugage Models Checkout our open 7B model, Starling-7B, which ranks first in all existing Mistral-based 7B models according to human evaluation in Chatbot Arena! Starling-7B is trained with our open-source high-quality preference dataset, Nectar, using our new reward training and policy-finetuning algorithms. We also released Athene-70B, currently ranking 11th among all models on Chatbot Arena. As a model fine-tuned from Llama-3-70B, the ranking increases from 22th to 11th on Arena, through our unique data curation and post-training pipeline. We have open-sourced NexusRaven-V2-13B, a strong function-calling model that surpasses GPT-4 in complex function calling tasks, especially in terms of parallel and nested calls. Evaluation We have released Huggingface function calling leaderboard, which is used in Llama-3.1 technical report for evaluating function calling capabilities. Chatbot Arena is one of the most reliable platforms for evaluating model with human preferences. Arena-Hard-Auto is an automatic benchmark creation pipeline that uses LLM-as-a-judge to quickly evaluate the model performance. Main research Interests:
Besides the topics above, I am also actively working on the following research areas:
|