I am an Associate Professor in EECS at the University of California, Berkeley. I co-direct the EPIC Data Lab and the Police Records Access project. I co-founded Ponder, which was acquired by Snowflake in October 2023.
I am broadly interested in simplifying data science at scale, i.e., empowering individuals and teams to leverage and make sense of their large datasets more easily, efficiently, and effectively.
Of late, my team and I have been exploring LLM-powered data tooling: what are new approaches for data extraction, data transformation, and insight discovery, now with LLMs as key ingredient.
We are always looking for postdocs, PhD, MS, and UG students or research/development staff to join our efforts! If you are a postdoc or staff applicant, feel free to email me directly with your CV and qualifications. If you are an aspiring PhD student, please apply to our PhD program. If you are an MS or UG student, feel free to fill out this form: it is rare that we will work with UG/MS students outside UC Berkeley except in cases of unusually good fit.
Aditya Parameswaran is an Associate Professor in Electrical Engineering and Computer Sciences (EECS) at UC Berkeley. Aditya co-directs the EPIC Data Lab, a lab targeted at low/no-code data tooling powered by LLMs. Aditya also co-directs the Police Records Access project, an initiative to build a first-of-its-kind state-wide police use-of-force and misconduct database. Until its acquisition by Snowflake in October 2023, Aditya served as the President of Ponder, a company he co-founded with his students based on popular data science tools developed at Berkeley. Aditya develops next-generation data tooling — making it easy for end-users and teams to leverage and make sense of their large and complex datasets — by synthesizing techniques from data systems, human-computer interaction, and artificial intelligence. Multiple tools from his group (DocETL, IPyFlow, Modin, Lux) have been widely adopted by end-users, with tens of millions of downloads.
Click here for a longer bio.I serve on the steering committees of Data AI Systems Workshop @ICDE, HILDA (Human-in-the-loop Data Analytics) at SIGMOD and DSIA (Data Systems for Interactive Analysis) at VIS. Lots of excitement around this nascent area at the intersection of AI, databases, data mining, and visualization/HCI - join us! I currently serve as the Faculty Equity Advisor and the Space Lead for Systems for the Computer Science Division in EECS; I also served as the Faculty Equity Advisor at the School of Information for two terms in 2023 and 2021.
I am serving as an Area Chair for SIGMOD 2026 and VLDB 2025 Demo. I also serve as the Associate Editor for VLDB Journal. I am serving on the program committee for VLDB Tutorials 2025, VLDB 2024-25, and CIDR 2025. I've served on the Program Committees and as Area Chair/Editor of VLDB, KDD, SIGMOD, WSDM, WWW, SOCC, HCOMP, ICDE, and EDBT, many of them multiple times.
Supporting and sustaining LLMs in production, including building robust pipelines, identifying valuable constraints/assertions, evaluating performance, and maintaining state for long-running pipelines.
PAPER PromptEvals: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines.Building general-purpose question-answering systems with LLM-powered agents operating on stuctured data.
PRE-PRINT Why Do Multi-Agent LLM Systems Fail?.Supporting structured queries on unstructured data, including PDFs, with applications in social justice.
PAPER RequestAtlas: Supporting the Slow and Iterative Process of Requesting Public Records.Lux is a tool for effortlessly visualizing insights from very large data sets in dataframe workflows. Lux builds on half a decade of work on visualization recommendation systems.
Project page here.
PAPER Lux: Always-on Visualization Recommendations for Exploratory Data Science.Modin applies database and distributed systems ideas to help run dataframe workloads faster, with over 2M open-source downloads.
Project page here.
PAPER Flexible Rule-Based Decomposition and Metadata Independence in Modin: A Parallel Dataframe System.NBsafety and NBslicer make it easy for data scientists to write correct, reproducible code in computational notebooks.
Project page here.
PAPER Bolt-on, Compact, and Rapid Program Slicing for Notebooks.DataSpread is a tool that marries the best of databases and spreadsheets.
Project page: here
PAPER Visualizing Spreadsheet Formula Graphs Compactly.