avatar

Zhuohan Li

PhD Student
UC Berkeley
zhuohan [at] cs.berkeley.edu


About Me

I am a PhD student in Computer Science at UC Berkeley advised by Ion Stoica. Before that, I received my B.S. in Computer Science from Peking University, advised by Liwei Wang and Di He.

My interest lies in the intersection of machine learning and distributed systems. I use insights from different domains to improve the performance (accuracy, efficiency, and interpretability) of current machine learning models.

Projects

vLLM: A high-throughput and memory-efficient serving engine for large language models, accelerated with PagedAttention.

Vicuna: An open-source chatbot impressing GPT-4 with 90% ChatGPT quality.

AlpaServe: Use model parallelism to accelerate deep learning serving, even when models fit a single GPU.

Alpa: Automate model parallel training with just a few lines of code.

Education

University of California, Berkeley
Ph.D. in Computer Science
2019 - Present

Peking University
B.S. in Computer Science (Summa Cum Laude)
2015 - 2019

Experience

Google Brain / Google Deepmind
Research Intern
Hosts: Yanping Huang, Yuanzhong Xu, and Zhifeng Chen
May 2021 - Present

Anyscale
Software Engineer Intern
May 2020 - August 2020

Microsoft Research Asia
Research Intern
Hosts: Di He and Tao Qin
June 2017 - March 2019

Publications

  1. Efficient Memory Management for Large Language Model Serving with PagedAttention
    Woosuk Kwon*, Zhuohan Li*, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica
    SOSP 2023

  2. FlexGen: High-throughput Generative Inference of Large Language Models with a Single GPU
    Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher RĂ©, Ion Stoica, Ce Zhang
    ICML 2023

  3. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality
    Wei-Lin Chiang*, Zhuohan Li*, Zi Lin*, Ying Sheng*, Zhanghao Wu*, Hao Zhang*, Lianmin Zheng*, Siyuan Zhuang*, Yonghao Zhuang*, Joseph E. Gonzalez, Ion Stoica, Eric P. Xing

  4. AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
    Zhuohan Li*, Lianmin Zheng*, Yinmin Zhong*, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E. Gonzalez, Ion Stoica
    OSDI 2023

  5. On Optimizing the Communication of Model Parallelism
    Yonghao Zhuang*, Hexu Zhao*, Lianmin Zheng, Zhuohan Li, Eric P. Xing, Qirong Ho, Joseph E. Gonzalez, Ion Stoica, Hao Zhang
    MLSys 2022

  6. Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
    Lianmin Zheng*, Zhuohan Li*, Hao Zhang*, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica
    OSDI 2022

  7. Rearchitecting In-Memory Object Stores for Low Latency
    Danyang Zhuo, Kaiyuan Zhang, Zhuohan Li, Siyuan Zhuang, Stephanie Wang, Ang Chen, Ion Stoica
    VLDB 2022

  8. TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
    Zhuohan Li, Siyuan Zhuang, Shiyuan Guo, Danyang Zhuo, Hao Zhang, Dawn Song, Ion Stoica
    ICML 2021

  9. Hoplite: Efficient and Fault-Tolerant Collective Communication for Task-Based Distributed Systems
    Siyuan Zhuang*, Zhuohan Li*, Danyang Zhuo, Stephanie Wang, Eric Liang, Robert Nishihara, Philipp Moritz, Ion Stoica
    SIGCOMM 2020

  10. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
    Zhuohan Li*, Eric Wallace*, Sheng Shen*, Kevin Lin*, Kurt Keutzer, Dan Klein, Joseph E. Gonzalez
    ICML 2020

  11. Fast Structured Decoding for Sequence Models
    Zhiqing Sun*, Zhuohan Li*, Haoqing Wang, Zi Lin, Di He, Zhi-Hong Deng
    NeurIPS 2019

  12. Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View
    Yiping Lu*, Zhuohan Li*, Di He, Zhiqing Sun, Bin Dong, Tao Qin, Liwei Wang, Tie-Yan Liu
    NeurIPS 2019 Workshop on Machine Learning and the Physical Sciences
    ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations

  13. Hint-Based Training for Non-Autoregressive Machine Translation
    Zhuohan Li, Zi Lin, Di He, Fei Tian, Tao Qin, Liwei Wang, Tie-Yan Liu
    EMNLP 2019

  14. Efficient Training of BERT by Progressively Stacking
    Linyuan Gong, Di He, Zhuohan Li, Tao Qin, Liwei Wang, Tie-Yan Liu
    ICML 2019

  15. Towards Binary-Valued Gates for Robust LSTM Training
    Zhuohan Li, Di He, Fei Tian, Wei Chen, Tao Qin, Liwei Wang, Tie-Yan Liu
    ICML 2018

  16. * denotes equal contribution.

Tutorials

  1. Welcome to the “Big Model” Era: Techniques and Systems to Train and Serve Bigger Models
    with Hao Zhang, Lianmin Zheng, and Ion Stoica
    ICML 2022 Tutorial

  2. Simple and Automatic Distributed Machine Learning on Ray
    with Hao Zhang, Lianmin Zheng, and Ion Stoica
    KDD 2021 Tutorial