I am a Professor of Computer Science at UC Berkeley, specializing in large-scale data management infrastructure and applications (these days called "Big Data"). I work primarily in the Database (DB) and Operating Systems and Networking Technology (OSNT) areas. I am Director of the Algorithms, Machines and People Lab (AMPLab) - an industry and government-supported collaboration of students, postdocs, and faculty who specialize in data management, cloud computing, statistical machine learning and other important topics necessary for making sense of vast amounts of varied and unruly data. The AMPLab received a National Science Foundation CISE "Expeditions in Computing" award, which was announced as part of the White House Big Data Research initiative in March 2012.

A brief bio and photo for talk announcements and other PR can be found here.

Research Topics

  • Cloud-Computing/Distributed Systems
  • Mobile and Pervasive Computing
  • Data Streams/Continuous Analytics
  • Large-Scale Data Integration
  • Database System Architecture and Performance

Contact Information

Computer Science Division, EECS
465 Soda Hall #1776
Berkeley, CA 94720

Email: my address
Phone: (510) 642-1662 (voice mail only)
Fax: (510) 642-5615
Administrative Assistants:
  Kattt Atchley and Boban Zarkovich
  Phone: (510) 643-3499 and 643-0264
  Email: amp-admin@cs.berkeley.edu

Grant Asst: Damon Hinson
  Phone: (510) 642-9982
  Email: AA address
Office Hours:
   By appointment.

Research Projects


  • AMPLab - Algorithms, Machines & People
  • BDAS - The Berkeley Data Analytics Stack
  • Spark - Data-Intensive Cluster Computing
  • MLbase and KeystoneML - Distributed Machine Learning for the Masses
  • SampleClean and AMPCrowd - Crowd-powered Data Cleaning and Human Computation
  • Velox - Low-latency Model and Data Serving


  • GraphX - Scalable Graph Processing
  • Shark - SQL + Machine Learning at Scale
  • CrowdDB - Crowdsourced Query Processing
  • PIQL - The Performance-Insightful Query Language
  • SCADS - Scalable Distributed Storage
  • Dataspaces - Pay-as-you-go Data Integration
  • BayesStore - Probabilistic Databases
  • RADLab - Cloud Computing
  • HiFi - Distributed Stream Processing
  • TelegraphCQ - Stream Processing
  • TinyDB - Sensor Networks
  • YFilter - High-volume Data Dissemination

Related Links