Aditya Parameswaran

I am an Associate Professor at the University of California, Berkeley, with a joint appointment at the I School and EECS. I co-founded and serve as the President of Ponder. I co-direct the EPIC Data Lab. I am part of the Data Systems & Foundations and Human-Computer Interaction groups.

My research interests are broadly in building tools for simplifying data science at scale, i.e., empowering individuals and teams to leverage and make sense of their large datasets more easily, efficiently, and effectively.



We are always looking for postdocs, PhD, MS, and UG students or research/development staff to join our efforts! If you are a postdoc or staff applicant, feel free to email me directly with your CV and qualifications. If you are an aspiring PhD student, please apply to the EECS or I School PhD programs. If you are an MS or UG student, feel free to fill out this form: it is rare that we will work with students outside UC Berkeley except in cases of unusually good fit.

Formal Biographical Sketch

Aditya Parameswaran is an Associate Professor in the School of Information (I School) and Electrical Engineering and Computer Sciences (EECS) at UC Berkeley. Aditya co-directs the EPIC Data Lab, a lab targeted at low/no-code data tooling with a special emphasis on social justice applications. Aditya also serves as the President of Ponder, a company he co-founded with his students based on popular data science tools developed at Berkeley. Aditya is affiliated with the RISELab, the Berkeley Institute of Design, and the Berkeley Institute of Data Science — and is part of the Data Systems & Foundations and Human-Computer Interaction groups at Berkeley. Aditya develops human-centered tools for scalable data science — making it easy for end-users and teams to leverage and make sense of their large and complex datasets — by synthesizing techniques from data systems and human-computer interaction. His visualization and data exploration tools have been downloaded and used by millions of users in a variety of domains.

Click here for a longer bio.

Quick Project Links

                                                 

News

  • September 16, 2022: Shreya and Rolando's paper on lessons from interviewing ML Engineers on MLOps practices and challenges has been posted on arxiv.
  • September 13, 2022: We formally unveiled our new lab, the EPIC Data Lab. Here is an article from CDSS about the lab. We are thankful to our current sponsors, Microsoft, Google, and Sigma Computing, along with the National Science Foundation.
  • September 1, 2022: Our paper on NBSlicer (led by Shreya and Stephen) and a vision paper on MLOps (led by Shreya) were both accepted at VLDB'23!
  • August 18, 2022: Here's our fourth blog post comparing Pandas to SQL.
  • July 26, 2022: Here's our third blog post comparing Pandas to SQL.
  • July 14, 2022: Here's our second blog post comparing Pandas to SQL.
  • June 30, 2022: Our CACM paper on ShapeSearch — the invited extended version of our SIGMOD best paper award, is out here.
  • June 28, 2022: Here's our first blog post comparing Pandas to SQL.
  • June 23, 2022: Congrats to Drs. Doris Lee, Devin Petersohn, and Doris Xin on their graduation!
  • June 15, 2022: New paper from Joe and I on our new data engineering class at Cal!
  • June 1, 2022: New paper with CMU folks on extending Lux to remember analysis history in a notebook, and use it to bias recommendations towards recent operations or past interest - to appear at Eurovis.
  • May 20, 2022: We released a position paper on the vision behind the Sky lab.
  • May 19, 2022: Startups all-around! My former student, Doris Xin now serves as a CEO of a startup, Linea.
  • April 30, 2022: Enjoyed giving a keynote at the SIAM Data Mining Conference.
  • March 9, 2022: Excited to announce the launch of Ponder, a company I co-founded with my students. Ponder raised $7M from top-tier VCs to develop scalable, enterprise-ready data science tools, leveraging our success with Modin and Lux. I am serving as the President of Ponder, while staying on as faculty at Berkeley.
  • February 15, 2022: Modin hit 1.5M downloads this week, with 200K downloads in the last month - grateful to see the groundswell of adoption.
  • February 2, 2022: Delighted to receive the Young Alumni Achiever's award from my alma mater, IIT Bombay.
  • January 6, 2022: I presented a talk on our mission to develop enterprise-ready pandas at CIDR, see recording here.
  • November 18, 2021: Doris presented Lux at the Data Umbrella; see recording here.
  • November 11, 2021: Shreya Shankar presented her work on MLTrace at the Toronto ML Society as well as at Facebook; recording here.
  • October 26, 2021: Stephen Macke was interviewed about NBSafety in the Software Engineering Daily Podcast.
  • October 28, 2021: Doris Lee presented Lux at PyData, while Modin was featured on the Anaconda blog.
  • October 1, 2021: Our papers on Lux and Modin were accepted to VLDB 2022!
  • September 15, 2021: Tenure!
  • September 7, 2021: We received a 2M NSF grant to kickstart the EPIC Data Lab, short for Effective Programming, Interaction, and Computation with Data. Articles on EPIC here and here. We're working on the messy data challenges in criminal justice, along with public defenders via the NACDL (National Association of Criminal Defense Lawyers) and journalists via the Big Local News/KQED. We're part of a consortium called CLEAN: Community Law Enforcement Accountability Network.
  • September 1, 2021: Delighted to welcome Nithin Chalapathi and Shreya Shankar as new EECS PhD students!
  • August 13, 2021: Doris Lee and Devin Petersohn wrap up their dissertations on visualization recommendation and dataframe systems respectively. Congratulations Dr. Lee and Dr. Petersohn! Exciting times ahead!!
  • August 9, 2021: Devin Petersohn gives his dissertation talk on dataframe systems. Devin's work has laid the groundwork for how to reason about and optimize dataframe computation. Congrats Devin!
  • August 3, 2021: Stephen Macke wraps up his dissertation! Stephen's work on minimizing error while ensuring interactivity in a range of settings has been a joy to be a part of.
  • July 26, 2021: Tana Wattanawaroon defends his dissertation on supporting efficient computation in spreadsheets, titled "Generalizing Spreadsheet Computation for Evolving Spreadsheets at Scale". Congratulations Tana!
  • July 12, 2021: Our new PhD student, Shreya Shankar, is off to the races with MLTrace, a lightweight approach for instrumenting ML code to allow for end-to-end debugging, reproducibility, and introspection. Super exciting stuff; watch this space!
  • July 1, 2021: Our scalable dataframe system, Modin, has over 6000 github stars and over 1M downloads at this point. We're excited that there is so much organic traction and interest!
  • May 14, 2021: Our technical report on Lux is out! Lux has users across a range of industries at this point, including insurance, retail, and education, and thousands of github stars. Our paper introduces our always-on visualization framework, our lightweight intent language, and how we made Lux interactive when operating on large dataframes.
  • Apr 19, 2021: New Lux release; v0.3.0 supports a relational database backend, geovis, Jupyter lab, and matplotlib instead of altair, and more! Dashboard export (to streamlit, data pane, HTML, etc. forthcoming.) More here.
  • Apr 16, 2021: Doris Xin presents her dissertation talk on "Usable and Efficient Systems for Machine Learning"! Congratulations Dr. Xin! Doris is off to be a CEO at a new startup!
  • Apr 13, 2021: Doris Lee's paper in collaboration with folks at Tableau Research (Vidya Setlur, Melanie Tory) on developing a taxonomy for visualization recommendation was accepted at VIS 2021 via TVCG!
  • March 26, 2021: Doris Xin's paper in collaboration with Google folks on understanding production machine learning pipelines in TFX (along with my longtime collaborator Alkis Polyzotis and Hui Miao) was accepted as an industry paper at SIGMOD'21!
  • March 15, 2021: Our paper on leveraging think-time for opportunistic dataframe query evaluation was published at IEEE Data Engineering Bulletin and showcased at ML and Data Projects to know
  • February 15, 2021: Thrilled to see continued Lux adoption and interest. We had another industry blog post, yet another one, and hit over 700 stars on github.
  • January 15, 2021: Stephen Macke won the CIDR gong show for presenting his take on the next generation of notebooks! Devin Petersohn also presented his work on scalable dataframes.
  • January 10, 2021: Dual VLDB'21 accepts! Our paper on a general-purpose spreadsheet exploration tool, enabling zoom in/out, called NOAH, led by Sajjadur Rahman, was one. Our paper on NBSafety, a Jupyter kernel for safe notebook interactions, led by Stephen Macke, was another.
  • January 6, 2021: A virtual welcome to our new postdoc, Dixin Tang, who is joining us from U Chicago having worked with Aaron Elmore and Mike Franklin.
  • January 1, 2021: I took on the role of the Faculty Equity Advisor at the School of Information. See more here.
        Click here for more news.

Synergistic Activities

I serve on the steering committees of HILDA (Human-in-the-loop Data Analytics) at SIGMOD and DSIA (Data Systems for Interactive Analysis) at VIS. Lots of excitement around this nascent area at the intersection of databases, data mining, and visualization/HCI - join us!

I served as the Faculty Equity Advisor at the School of Information in Spring 2021. I co-chaired of Workshops for SIGMOD 2020 and 2021. I served as the US Sponsor Chair for VLDB 2021.

In the recent past, I served as an Area/Associate Chair for HCOMP 2020, VLDB 2020, and SIGMOD 2020, as a Program Committee member for VLDB Demo 2019 and HILDA 2019 (phew!) I've served on the program committees of VLDB, KDD, SIGMOD, WSDM, WWW, SOCC, HCOMP, ICDE, and EDBT, many of them multiple times.

Past: I stepped down as Associate Editor for SIGMOD Record after nearly half a decade. I co-organized HILDA 2017. I was the SIGMOD 2016 Undergraduate Research Chair.

Recent Releases



Medium Blog




Selected Projects

lux

Lux: An always-on visualization recommendation system

Lux is a tool for effortlessly visualizing insights from very large data sets in dataframe workflows. Lux builds on half a decade of work on visualization recommendation systems.

Project page here.


modin

Modin: A Scalable Dataframe System

Modin applies database and distributed systems ideas to help run dataframe workloads faster, with over 2M open-source downloads.

Project page here.


helix

Helix: An Accelerated Human-in-the-loop Machine Learning System

Helix accelerates the iterative development of machine learning pipelines with a human developer "in the loop" via intelligent assistance and reuse.

Project page here.


dataspread

DataSpread: A Spreadsheet-Database Hybrid

DataSpread is a tool that marries the best of databases and spreadsheets.

Project page: here


Datasift

Orpheus: Relational Dataset Version Management at Scale

DataHub (or "GitHub for Data") is a system that enables collaborative data science by keeping track of large numbers of versions and their dependencies compactly, and allowing users to progressively clean, integrate and visualize their datasets. OrpheusDB is a component of DataHub focused on using a relational database for versioning.

Project page: here


crowd-alg

Populace: A Suite of Crowd-Powered Algorithms

Our work has developed a number of algorithms for gathering, processing, and understanding data obtained from humans (or crowds), while minimizing cost, latency, and error. Since 2014, our focus has been on optimizing open-ended crowdsourcing: an understudied and challenging class.

Project page: here