Aditya Parameswaran

I am an Associate Professor in EECS at the University of California, Berkeley. I co-direct the EPIC Data Lab and the California Police Records Access project. I co-founded Ponder, which was acquired by Snowflake in October 2023. I am part of the Data Systems & Foundations and Human-Computer Interaction groups, and I am affiliated with the Berkeley Institute of Data Science and Berkeley Institute of Design.

My research interests are broadly in building tools for simplifying data science at scale, i.e., empowering individuals and teams to leverage and make sense of their large datasets more easily, efficiently, and effectively.



We are always looking for postdocs, PhD, MS, and UG students or research/development staff to join our efforts! If you are a postdoc or staff applicant, feel free to email me directly with your CV and qualifications. If you are an aspiring PhD student, please apply to our PhD program. If you are an MS or UG student, feel free to fill out this form: it is rare that we will work with UG/MS students outside UC Berkeley except in cases of unusually good fit.

Formal Biographical Sketch

Aditya Parameswaran is an Associate Professor in Electrical Engineering and Computer Sciences (EECS) at UC Berkeley. Aditya co-directs the EPIC Data Lab, a lab targeted at low/no-code data tooling with a special emphasis on social justice applications. Aditya also co-directs the California Police Records Access project, an initiative to build a first-of-its-kind state-wide police use-of-force and misconduct database. Until its acquisition by Snowflake in October 2023, Aditya served as the President of Ponder, a company he co-founded with his students based on popular data science tools developed at Berkeley. Aditya is affiliated with the Berkeley Institute of Design, and the Berkeley Institute of Data Science — and is part of the Data Systems & Foundations and Human-Computer Interaction groups at Berkeley. Aditya develops human-centered tools for scalable data science — making it easy for end-users and teams to leverage and make sense of their large and complex datasets — by synthesizing techniques from data systems and human-computer interaction. His visualization and data exploration tools have been downloaded and used by millions of users in a variety of domains.

Click here for a longer bio.

News

  • November 1, 2024: We released TARGET, a table retrieval benchmark, led by Madelon. See here.
  • October 22, 2024: I appeared on a podcast called Disseminate and spoke about our CIDR'11 paper, one of the most influential database papers from that year according to a ranking from Ryan Marcus. Listen here.
  • October 16, 2024: Our DocETL preprint is out! Lots of people on GitHub and Discord getting value from it already - spanning domains ranging from forensic analysis to climate science to medical data analysis. Over a 1000 GitHub stars already. Excited to have this out there.
  • September 20, 2024: Sep's NUDGE framework is now part of LlamaIndex.
  • September 20, 2024: Rachel's paper on RequestAtlas was accepted at CSCW'2025!
  • September 16, 2024: Excited to see pandas-to-SQL translation that we pioneered at Ponder now generally available to all Snowflake customers. Congrats Ponder team!
  • September 5, 2024: Sep's paper on lightweight fine-tuning of retrieval for RAG is now on Arxiv. Blog post here. This is a no-brainer for folks wanting to improve retrieval without paying the cost of expensive fine-tuning!
  • September 5, 2024: We restarted our blog! See here.
  • September 4, 2024: Congrats to incoming PhD student Bhavya Chopra for a VL/HCC best paper (her second in two years!)
  • September 1, 2024: Congrats to alum Madelon Hulsebos for starting as faculty at CWI, a leading research institution in Amsterdam!
  • August 29, 2024: Shreya presented her work on EvalGen and SPADE at Princeton.
  • August 15, 2024: Welcome to incoming PhD students Bhavya Chopra and HC Moore! And goodbyes to Prof. Eugene Wu from Columbia who was visiting us this past year - we'll miss you Eugene!
  • July 23, 2024: Our work on translating dataframes to databases was awarded two patents, see here and here.
  • July 15, 2024: Our work on transactional panorama won a "best of VLDB 2023" award; congrats Dixin!
  • July 1, 2024: Shreya's EvalGen paper was accepted at UIST'24!
  • June 26, 2024: Shreya's work on evaluation assistants was deployed by LangChain! Read more here. She also wrote a piece with other LLM practitioners on best practices with LLMs here.
  • June 1, 2024: Our papers on SMASH (a string alignment algorithm) and SPADE (an LLM-powered assertion generation system) were accepted at VLDB! Five letter acroynms FTW!
  • May 5, 2024: Our paper on dataset search, led by Madelon, was accepted at HILDA! Preprint here.
  • May 1, 2024: New preprint up on document analytics with LLMs as part of our new ZenDB system, led by Yiming. Read about it here!
  • April 28, 2024: Shreya summarized her work on evaluating LLM pipelines, from SPADE to EvalGen, as part of an MLSys seminar. Listen here!
  • April 24, 2024: Devin gave a nice talk on our journey through dataframe land from Berkeley to Ponder, now to Snowflake at CMU. Listen here!
  • April 22, 2024: Our work on building a UI for evaluating LLM pipelines was just released as a preprint.
  • April 1, 2024: Our demo on Motion was accepted to SIGMOD!
  • April 1, 2024: Welcome to Tarak Shah, who is joining us from the Human Rights Data Advocacy Group for work on the police records project.
  • March 1, 2024: Welcome to our newest postdoc, Sep Zeighami, joining us from USC!
  • February 28, 2024: Our work on LLM-based extraction of information from police misconduct PDFs is having real impact; check out this news story from Stockton.
  • February 14, 2024: Two of three organizers for the DEEM workshop this year work in our group, so look out for an exciting event!
  • January 16, 2024: New preprint on automatic assertion generation for LLM pipelines, led by Shreya!
  • January 13, 2024: We presented our work on prompt engineering-meets-crowdsourcing at CIDR'24!
        Click here for more news.

Synergistic Activities

I serve on the steering committees of Data AI Systems Workshop @ICDE, HILDA (Human-in-the-loop Data Analytics) at SIGMOD and DSIA (Data Systems for Interactive Analysis) at VIS. Lots of excitement around this nascent area at the intersection of AI, databases, data mining, and visualization/HCI - join us! I currently serve as the Faculty Equity Advisor for the Computer Science Division in EECS; I also served as the Faculty Equity Advisor at the School of Information for two terms in 2023 and 2021.

I am serving as an Area Chair for SIGMOD 2026 and VLDB 2025 Demo. I am serving on the program committee for VLDB Tutorials 2025, VLDB 2024-25, and CIDR 2025. I've served on the Program Committees and as Area Chair/Editor of VLDB, KDD, SIGMOD, WSDM, WWW, SOCC, HCOMP, ICDE, and EDBT, many of them multiple times.

Recent Releases



Medium Blog




Selected Projects

Motion

LLMs In Production

Supporting and sustaining LLMs in production, including building robust pipelines, identifying valuable constraints/assertions, evaluating performance, and maintaining state for long-running pipelines.


ZenDB

LLM-Powered Document Analytics

Supporting structured queries on unstructured data, including PDFs, with applications in social justice.


lux

Lux: An always-on visualization recommendation system

Lux is a tool for effortlessly visualizing insights from very large data sets in dataframe workflows. Lux builds on half a decade of work on visualization recommendation systems.

Project page here.


modin

Modin: A Scalable Dataframe System

Modin applies database and distributed systems ideas to help run dataframe workloads faster, with over 2M open-source downloads.

Project page here.


nbsafety

NBTools: Better Computational Notebooks

NBsafety and NBslicer make it easy for data scientists to write correct, reproducible code in computational notebooks.

Project page here.


dataspread

DataSpread: A Spreadsheet-Database Hybrid

DataSpread is a tool that marries the best of databases and spreadsheets.

Project page: here