Aditya Parameswaran

I am an Associate Professor in EECS at the University of California, Berkeley. I co-direct the EPIC Data Lab and the Police Records Access project. I co-founded Ponder, which was acquired by Snowflake in October 2023.

I am broadly interested in simplifying data science at scale, i.e., empowering individuals and teams to leverage and make sense of their large datasets more easily, efficiently, and effectively.

Of late, my team and I have been exploring LLM-powered data tooling: what are new approaches for data extraction, data transformation, and insight discovery, now with LLMs as key ingredient.



We are always looking for postdocs, PhD, MS, and UG students or research/development staff to join our efforts! If you are a postdoc or staff applicant, feel free to email me directly with your CV and qualifications. If you are an aspiring PhD student, please apply to our PhD program. If you are an MS or UG student, feel free to fill out this form: it is rare that we will work with UG/MS students outside UC Berkeley except in cases of unusually good fit.

Formal Biographical Sketch

Aditya Parameswaran is an Associate Professor in Electrical Engineering and Computer Sciences (EECS) at UC Berkeley. Aditya co-directs the EPIC Data Lab, a lab targeted at low/no-code data tooling powered by LLMs. Aditya also co-directs the Police Records Access project, an initiative to build a first-of-its-kind state-wide police use-of-force and misconduct database. Until its acquisition by Snowflake in October 2023, Aditya served as the President of Ponder, a company he co-founded with his students based on popular data science tools developed at Berkeley. Aditya develops next-generation data tooling — making it easy for end-users and teams to leverage and make sense of their large and complex datasets — by synthesizing techniques from data systems, human-computer interaction, and artificial intelligence. Multiple tools from his group (DocETL, IPyFlow, Modin, Lux) have been widely adopted by end-users, with tens of millions of downloads.

Click here for a longer bio.

News

  • April 18, 2025: We released a preprint on a new interface, RAGGY, for debugging RAG pipelines.
  • April 15, 2025: Grad school admission decisions: undergrad alums Ruiying and Reya got into grad school from a number of top places! Reya will be heading to Columbia, while Ruiying will be returning as a grad student to Berkeley!
  • April 15, 2025: We contributed to the once-every-five-years report on the future of database research, take a look!
  • April 7, 2025: Chroma's Generative Benchmarking approach uses Shreya et al's EvalGen work.
  • March 1, 2025: Our paper, led by Reya and Shreya, on a benchmark of assertions for LLM pipelines was accepted to NAACL as an oral talk!
  • February 24, 2025: Turns out, with LLMs, your data systems can now become more proactive. Read about our new vision paper here.
  • January 16, 2025: I co-wrote a blog post on challenges in data exploration and visual analytics that appeared on the SIGMOD blog.
  • January 15, 2025: Our preprint on extracting information from templatized documents, led by Yiming, is up! The results are astounding: 20-30% higher precision and recall than state-of-the-art, at a fraction of the cost and time.
  • January 13, 2025: We've been building an IDE for authoring semantic data processing pipelines in DocETL, called DocWrangler. You can play with it in our playground. Led by Shreya and Bhavya. Blogpost here.
  • January 4, 2025: Call for papers at the first Data-AI Systems workshop is out.
  • November 12, 2024: Shreya presented our paper on the three V's of MLOps at CSCW. Paper here.
  • November 1, 2024: We released TARGET, a table retrieval benchmark, led by Madelon. See here.
  • October 22, 2024: I appeared on a podcast called Disseminate and spoke about our CIDR'11 paper, one of the most influential database papers from that year according to a ranking from Ryan Marcus. Listen here.
  • October 16, 2024: Our DocETL preprint is out! Lots of people on GitHub and Discord getting value from it already - spanning domains ranging from forensic analysis to climate science to medical data analysis. Over a 1000 GitHub stars already. Excited to have this out there.
  • September 20, 2024: Sep's NUDGE framework is now part of LlamaIndex.
  • September 20, 2024: Rachel's paper on RequestAtlas was accepted at CSCW'2025!
  • September 16, 2024: Excited to see pandas-to-SQL translation that we pioneered at Ponder now generally available to all Snowflake customers. Congrats Ponder team!
  • September 5, 2024: Sep's paper on lightweight fine-tuning of retrieval for RAG is now on Arxiv. Blog post here. This is a no-brainer for folks wanting to improve retrieval without paying the cost of expensive fine-tuning!
  • September 5, 2024: We restarted our blog! See here.
  • September 4, 2024: Congrats to incoming PhD student Bhavya Chopra for a VL/HCC best paper (her second in two years!)
  • September 1, 2024: Congrats to alum Madelon Hulsebos for starting as faculty at CWI, a leading research institution in Amsterdam!
  • August 29, 2024: Shreya presented her work on EvalGen and SPADE at Princeton.
  • August 15, 2024: Welcome to incoming PhD students Bhavya Chopra and HC Moore! And goodbyes to Prof. Eugene Wu from Columbia who was visiting us this past year - we'll miss you Eugene!
  • July 23, 2024: Our work on translating dataframes to databases was awarded two patents, see here and here.
  • July 15, 2024: Our work on transactional panorama won a "best of VLDB 2023" award; congrats Dixin!
  • July 1, 2024: Shreya's EvalGen paper was accepted at UIST'24!
  • June 26, 2024: Shreya's work on evaluation assistants was deployed by LangChain! Read more here. She also wrote a piece with other LLM practitioners on best practices with LLMs here.
  • June 1, 2024: Our papers on SMASH (a string alignment algorithm) and SPADE (an LLM-powered assertion generation system) were accepted at VLDB! Five letter acroynms FTW!
  • May 5, 2024: Our paper on dataset search, led by Madelon, was accepted at HILDA! Preprint here.
  • May 1, 2024: New preprint up on document analytics with LLMs as part of our new ZenDB system, led by Yiming. Read about it here!
  • April 28, 2024: Shreya summarized her work on evaluating LLM pipelines, from SPADE to EvalGen, as part of an MLSys seminar. Listen here!
        Click here for more news.

Synergistic Activities

I serve on the steering committees of Data AI Systems Workshop @ICDE, HILDA (Human-in-the-loop Data Analytics) at SIGMOD and DSIA (Data Systems for Interactive Analysis) at VIS. Lots of excitement around this nascent area at the intersection of AI, databases, data mining, and visualization/HCI - join us! I currently serve as the Faculty Equity Advisor and the Space Lead for Systems for the Computer Science Division in EECS; I also served as the Faculty Equity Advisor at the School of Information for two terms in 2023 and 2021.

I am serving as an Area Chair for SIGMOD 2026 and VLDB 2025 Demo. I also serve as the Associate Editor for VLDB Journal. I am serving on the program committee for VLDB Tutorials 2025, VLDB 2024-25, and CIDR 2025. I've served on the Program Committees and as Area Chair/Editor of VLDB, KDD, SIGMOD, WSDM, WWW, SOCC, HCOMP, ICDE, and EDBT, many of them multiple times.

Recent Releases



Medium Blog




Selected Projects

Motion

LLMs In Production

Supporting and sustaining LLMs in production, including building robust pipelines, identifying valuable constraints/assertions, evaluating performance, and maintaining state for long-running pipelines.


data-agent

Question Answering with LLMs

Building general-purpose question-answering systems with LLM-powered agents operating on stuctured data.


ZenDB

LLM-Powered Document Analytics

Supporting structured queries on unstructured data, including PDFs, with applications in social justice.


lux

Lux: An always-on visualization recommendation system

Lux is a tool for effortlessly visualizing insights from very large data sets in dataframe workflows. Lux builds on half a decade of work on visualization recommendation systems.

Project page here.


modin

Modin: A Scalable Dataframe System

Modin applies database and distributed systems ideas to help run dataframe workloads faster, with over 2M open-source downloads.

Project page here.


nbsafety

NBTools: Better Computational Notebooks

NBsafety and NBslicer make it easy for data scientists to write correct, reproducible code in computational notebooks.

Project page here.


dataspread

DataSpread: A Spreadsheet-Database Hybrid

DataSpread is a tool that marries the best of databases and spreadsheets.

Project page: here