Aditya Parameswaran

I am an Associate Professor in EECS at the University of California, Berkeley. I co-direct the EPIC Data Lab and the California Police Records Access project. I co-founded Ponder, which was acquired by Snowflake in October 2023. I am part of the Data Systems & Foundations and Human-Computer Interaction groups, and I am affiliated with the Berkeley Institute of Data Science.

My research interests are broadly in building tools for simplifying data science at scale, i.e., empowering individuals and teams to leverage and make sense of their large datasets more easily, efficiently, and effectively.



We are always looking for postdocs, PhD, MS, and UG students or research/development staff to join our efforts! If you are a postdoc or staff applicant, feel free to email me directly with your CV and qualifications. If you are an aspiring PhD student, please apply to our PhD program. If you are an MS or UG student, feel free to fill out this form: it is rare that we will work with UG/MS students outside UC Berkeley except in cases of unusually good fit.

Formal Biographical Sketch

Aditya Parameswaran is an Associate Professor in Electrical Engineering and Computer Sciences (EECS) at UC Berkeley. Aditya co-directs the EPIC Data Lab, a lab targeted at low/no-code data tooling with a special emphasis on social justice applications. Aditya also co-directs the California Police Records Access project, an initiative to build a first-of-its-kind state-wide police use-of-force and misconduct database. Until its acquisition by Snowflake in October 2023, Aditya served as the President of Ponder, a company he co-founded with his students based on popular data science tools developed at Berkeley. Aditya is affiliated with the Berkeley Institute of Design, and the Berkeley Institute of Data Science — and is part of the Data Systems & Foundations and Human-Computer Interaction groups at Berkeley. Aditya develops human-centered tools for scalable data science — making it easy for end-users and teams to leverage and make sense of their large and complex datasets — by synthesizing techniques from data systems and human-computer interaction. His visualization and data exploration tools have been downloaded and used by millions of users in a variety of domains.

Click here for a longer bio.

News

  • June 26, 2024: Shreya's work on evaluation assistants was deployed by LangChain! Read more here. She also wrote a piece with other LLM practitioners on best practices with LLMs here.
  • June 1, 2024: Our papers on SMASH (a string alignment algorithm) and SPADE (an LLM-powered assertion generation system) were accepted at VLDB! Five letter acroynms FTW!
  • May 5, 2024: Our paper on dataset search, led by Madelon, was accepted at HILDA! Preprint here.
  • May 1, 2024: New preprint up on document analytics with LLMs as part of our new ZenDB system, led by Yiming. Read about it here!
  • April 28, 2024: Shreya summarized her work on evaluating LLM pipelines, from SPADE to EvalGen, as part of an MLSys seminar. Listen here!
  • April 24, 2024: Devin gave a nice talk on our journey through dataframe land from Berkeley to Ponder, now to Snowflake at CMU. Listen here!
  • April 22, 2024: Our work on building a UI for evaluating LLM pipelines was just released as a preprint.
  • April 1, 2024: Our demo on Motion was accepted to SIGMOD!
  • April 1, 2024: Welcome to Tarak Shah, who is joining us from the Human Rights Data Advocacy Group for work on the police records project.
  • March 1, 2024: Welcome to our newest postdoc, Sep Zeighami, joining us from USC!
  • February 28, 2024: Our work on LLM-based extraction of information from police misconduct PDFs is having real impact; check out this news story from Stockton.
  • February 14, 2024: Two of three organizers for the DEEM workshop this year work in our group, so look out for an exciting event!
  • January 16, 2024: New preprint on automatic assertion generation for LLM pipelines, led by Shreya!
  • January 13, 2024: We presented our work on prompt engineering-meets-crowdsourcing at CIDR'24!
        Click here for more news.

Synergistic Activities

I serve on the steering committees of HILDA (Human-in-the-loop Data Analytics) at SIGMOD and DSIA (Data Systems for Interactive Analysis) at VIS. Lots of excitement around this nascent area at the intersection of databases, data mining, and visualization/HCI - join us! I also served as the Faculty Equity Advisor at the School of Information for two terms in 2023 and 2021.

In the recent past, I was a co-chair of Workshops for SIGMOD 2020 and 2021. I served as the US Sponsor Chair for VLDB 2021; I also served as an Area/Associate Chair for HCOMP 2020, VLDB 2020, and SIGMOD 2020, as a Program Committee member for VLDB Demo 2019 and HILDA 2019 (phew!) I've served on the program committees of VLDB, KDD, SIGMOD, WSDM, WWW, SOCC, HCOMP, ICDE, and EDBT, many of them multiple times. I am serving on the program committee for VLDB 2023-2024.

Recent Releases



Medium Blog




Selected Projects

Motion

LLMs In Production

Supporting and sustaining LLMs in production, including building robust pipelines, identifying valuable constraints/assertions, evaluating performance, and maintaining state for long-running pipelines.


ZenDB

LLM-Powered Document Analytics

Supporting structured queries on unstructured data, including PDFs, with applications in social justice.


lux

Lux: An always-on visualization recommendation system

Lux is a tool for effortlessly visualizing insights from very large data sets in dataframe workflows. Lux builds on half a decade of work on visualization recommendation systems.

Project page here.


modin

Modin: A Scalable Dataframe System

Modin applies database and distributed systems ideas to help run dataframe workloads faster, with over 2M open-source downloads.

Project page here.


nbsafety

NBTools: Better Computational Notebooks

NBsafety and NBslicer make it easy for data scientists to write correct, reproducible code in computational notebooks.

Project page here.


dataspread

DataSpread: A Spreadsheet-Database Hybrid

DataSpread is a tool that marries the best of databases and spreadsheets.

Project page: here