Aditya Parameswaran

I am an Associate Professor at the University of California, Berkeley, with a joint appointment at the I School and EECS. I co-founded and serve as the President of Ponder. I co-direct the EPIC Data Lab. I am part of the Data Systems & Foundations and Human-Computer Interaction groups, and I am affiliated with the Berkeley Institute of Data SCience.

My research interests are broadly in building tools for simplifying data science at scale, i.e., empowering individuals and teams to leverage and make sense of their large datasets more easily, efficiently, and effectively.



We are always looking for postdocs, PhD, MS, and UG students or research/development staff to join our efforts! If you are a postdoc or staff applicant, feel free to email me directly with your CV and qualifications. If you are an aspiring PhD student, please apply to the EECS or I School PhD programs. If you are an MS or UG student, feel free to fill out this form: it is rare that we will work with UG/MS students outside UC Berkeley except in cases of unusually good fit.

Formal Biographical Sketch

Aditya Parameswaran is an Associate Professor in the School of Information (I School) and Electrical Engineering and Computer Sciences (EECS) at UC Berkeley. Aditya co-directs the EPIC Data Lab, a lab targeted at low/no-code data tooling with a special emphasis on social justice applications. Aditya also serves as the President of Ponder, a company he co-founded with his students based on popular data science tools developed at Berkeley. Aditya is affiliated with the RISELab, the Berkeley Institute of Design, and the Berkeley Institute of Data Science — and is part of the Data Systems & Foundations and Human-Computer Interaction groups at Berkeley. Aditya develops human-centered tools for scalable data science — making it easy for end-users and teams to leverage and make sense of their large and complex datasets — by synthesizing techniques from data systems and human-computer interaction. His visualization and data exploration tools have been downloaded and used by millions of users in a variety of domains.

Click here for a longer bio.

News

  • May 31, 2023: Dixin Tang, a postdoc working with me, will be an assistant professor at University of Texas, Austin. Go Dixin!
  • May 31, 2023: Dixin and Fanchao's visualization tool for spreadsheet computation networks was accepted at VLDB'23.
  • May 24, 2023: I enjoyed appearing on the data stack show, talking about how decoupling APIs from execution engines is important.
  • May 18, 2023: Our work with journalists and public defenders as part of the EPIC lab was highlighted in the LA Times news article about the new college of computing, data science, and society.
  • May 11, 2023: I enjoyed serving on a panel on AI-meets-databases at the NorCal DB day.
  • May 1, 2023: Our paper on GATE, an efficient, automated, and precise system for ML-centric data validation is up on arxiv.
  • March 27, 2023: Our work with journalists as part of the EPIC data lab is having real impact: we helped journalists identify that responses to a FOIA request was inadequate, as detailed in this news story.
  • March 17, 2023: Modin now supports NumPy in addition to Pandas; we can push down linear algebra operations to distributed computing engines.
  • March 6, 2023: I wrote a rejoinder to the "Big Data is Dead" blogpost: read it here.
  • March 4, 2023: Delighted to be inducted as a Kavli Fellow by the National Academy of Sciences at the US Frontiers of Science Symposium in Irvine.
  • February 9. 2023: Enjoyed speaking at the Data Dividend, targeted at "harnessing the power of people, processes and technology to unlock value from data", an event hosted by the Economist and by IBM
  • February 10, 2023: Dixin's paper on compaction of spreadsheet graphs was accepted at ICDE!
  • January 13, 2023: Dixin's paper on transactional panorama — bridging transactions and interactive dashboards — was accepted at VLDB!
  • January 1, 2023: What a year for open-source efforts! Modin reached 5M downloads, Lux reached 500K downloads, and Nbsafety reached 150K downloads.
  • December 1, 2022: Modin now ships as part of the AWS SDK for Pandas as well as AWS Glue — showcasing how useful Modin is for data wrangling and ETL with the pandas API at scale.
  • November 29, 2022: Congrats to my former student, Doris Lee, on being named a Forbes' 30 under 30 in Enterprise Technology!
  • September 16, 2022: Shreya and Rolando's paper on lessons from interviewing ML Engineers on MLOps practices and challenges has been posted on arxiv.
  • September 13, 2022: We formally unveiled our new lab, the EPIC Data Lab. Here is an article from CDSS about the lab. We are thankful to our current sponsors, Microsoft, Google, and Sigma Computing, along with the National Science Foundation.
  • September 1, 2022: Our paper on NBSlicer (led by Shreya and Stephen) and a vision paper on MLOps (led by Shreya) were both accepted at VLDB'23!
  • August 18, 2022: Here's our fourth blog post comparing Pandas to SQL.
  • July 26, 2022: Here's our third blog post comparing Pandas to SQL.
  • July 14, 2022: Here's our second blog post comparing Pandas to SQL.
  • June 30, 2022: Our CACM paper on ShapeSearch — the invited extended version of our SIGMOD best paper award, is out here.
  • June 28, 2022: Here's our first blog post comparing Pandas to SQL.
  • June 23, 2022: Congrats to Drs. Doris Lee, Devin Petersohn, and Doris Xin on their graduation!
  • June 15, 2022: New paper from Joe and I on our new data engineering class at Cal!
  • June 1, 2022: New paper with CMU folks on extending Lux to remember analysis history in a notebook, and use it to bias recommendations towards recent operations or past interest - to appear at Eurovis.
        Click here for more news.

Synergistic Activities

I serve on the steering committees of HILDA (Human-in-the-loop Data Analytics) at SIGMOD and DSIA (Data Systems for Interactive Analysis) at VIS. Lots of excitement around this nascent area at the intersection of databases, data mining, and visualization/HCI - join us!

I also serve as the Faculty Equity Advisor at the School of Information.

In the recent past, I was a co-chair of Workshops for SIGMOD 2020 and 2021. I served as the US Sponsor Chair for VLDB 2021; I also served as an Area/Associate Chair for HCOMP 2020, VLDB 2020, and SIGMOD 2020, as a Program Committee member for VLDB Demo 2019 and HILDA 2019 (phew!) I've served on the program committees of VLDB, KDD, SIGMOD, WSDM, WWW, SOCC, HCOMP, ICDE, and EDBT, many of them multiple times. I am serving on the program committee for VLDB 2023-2024.

Recent Releases



Medium Blog




Selected Projects

lux

Lux: An always-on visualization recommendation system

Lux is a tool for effortlessly visualizing insights from very large data sets in dataframe workflows. Lux builds on half a decade of work on visualization recommendation systems.

Project page here.


modin

Modin: A Scalable Dataframe System

Modin applies database and distributed systems ideas to help run dataframe workloads faster, with over 2M open-source downloads.

Project page here.


nbsafety

NBTools: Better Computational Notebooks

NBsafety and NBslicer make it easy for data scientists to write correct, reproducible code in computational notebooks.

Project page here.


helix

Helix: An Accelerated Human-in-the-loop Machine Learning System

Helix accelerates the iterative development of machine learning pipelines with a human developer "in the loop" via intelligent assistance and reuse.

Project page here.


dataspread

DataSpread: A Spreadsheet-Database Hybrid

DataSpread is a tool that marries the best of databases and spreadsheets.

Project page: here


Datasift

Orpheus: Relational Dataset Version Management at Scale

DataHub (or "GitHub for Data") is a system that enables collaborative data science by keeping track of large numbers of versions and their dependencies compactly, and allowing users to progressively clean, integrate and visualize their datasets. OrpheusDB is a component of DataHub focused on using a relational database for versioning.

Project page: here


crowd-alg

Populace: A Suite of Crowd-Powered Algorithms

Our work has developed a number of algorithms for gathering, processing, and understanding data obtained from humans (or crowds), while minimizing cost, latency, and error. Since 2014, our focus has been on optimizing open-ended crowdsourcing: an understudied and challenging class.

Project page: here