Postdoctoral Scholar
EECS, UC Berkeley
Email: sjtu.haozhang AT gmail.com
I am currently a postdoctoral researcher at RISE Lab, UC Berkeley, working with Prof. Ion Stoica.
I am recently working on building end-to-end composable and automated systems for large-scale distributed DL. The most recent projects is Alpa that automates model-parallel training on large-scale distributed GPU/TPU clusters.
I study large-scale distributed ML in the joint context of ML and systems, concerning both performance and usability. My work spans across parallel ML programmability, representations of parallelisms, performance optimizations, system architectures, auto-parallelization techniques, and AutoML, with applications in computer vision, natural language processing, and healthcare.
I completed my Ph.D. at the School of Computer Science, Carnegie Mellon University. My advisor was Prof. Eric Xing. My Ph.D. thesis is Machine Learning Parallelism Could Be Adaptive, Composable and Automated. Several of works including Poseidon, Cavs, GeePS, and Alpa are parts of the CASL project and the Ray project, and now being commercialized at multiple start-ups including Petuum and AnyScale.
News: