Aditya Parameswaran

I am an assistant professor at the University of California, Berkeley, with a joint appointment at the I School and EECS. I am part of the Data Systems & Foundations group and the Human-Computer Interaction group, and I am affiliated with the RISELab and the Berkeley Institute of Design.

My research interests are broadly in building tools for simplifying data analytics, i.e., empowering individuals and teams to leverage and make sense of their datasets more easily, efficiently, and effectively.



We are always looking for postdocs, PhD, MS, and UG students or research/development staff to join our efforts! If you are a postdoc or staff applicant, feel free to email me directly with your CV and qualifications. If you are an aspiring PhD student, please apply to the EECS or I School PhD programs. If you are an MS or UG student, feel free to fill out this form: it is rare that we will work with students outside UC Berkeley except in cases of unusually good fit.

Biographical Sketch

Aditya Parameswaran is an Assistant Professor in the School of Information (I School) and Electrical Engineering and Computer Sciences (EECS) at the University of California, Berkeley. Until June 2019, Aditya was an Assistant Professor in Computer Science at the University of Illinois, Urbana-Champaign. He spent a year as a PostDoc at MIT CSAIL following his PhD at Stanford University. He develops systems and algorithms for "human-in-the-loop" data analytics, synthesizing techniques from database systems, data mining, and human-computer interaction.

Click here for a longer bio.

Quick Project Links

                                                 

News

  • February 15, 2021: Thrilled to see continued Lux adoption and interest. We had another industry blog post, yet another one, and hit over 700 stars on github.
  • January 15, 2021: Stephen Macke won the CIDR gong show for presenting his take on the next generation of notebooks! Devin Petersohn also presented his work on scalable dataframes.
  • January 10, 2021: Dual VLDB'21 accepts! Our paper on a general-purpose spreadsheet exploration tool, enabling zoom in/out, called NOAH, led by Sajjadur Rahman, was one. Our paper on NBSafety, a Jupyter kernel for safe notebook interactions, led by Stephen Macke, was another.
  • January 6, 2021: A virtual welcome to our new postdoc, Dixin Tang, who is joining us from U Chicago having worked with Aaron Elmore and Mike Franklin.
  • January 1, 2021: I took on the role of the Faculty Equity Advisor at the School of Information. See more here.
  • December 13, 2020: Our paper studying industry users of AutoML and how next-gen AutoML tools should look like was accepted at CHI'21, led by Doris Xin, Eva Wu, and Doris Lee.
  • December 9, 2020: Doris Lee passed her qualifying exam!
  • November 20, 2020: Much Lux love coming from industry. This LinkedIn post has nearly 8000 shares. This Medium article was written by another industry person to introduce Lux to the general public. Lux was also listed as a "Project to Know" here.
  • November 10, 2020: Our quest for safer notebooks continues with NBSafety, led by Stephen Macke! NBSafety automatically tracks lineage of variables in the notebook, providing suggestions to avoid executions of stale cells that could lead to incorrect or non-reproducible behavior. Lots of heavyweight program analysis techniques leading to a delightful, unobtrusive user experience. NBSafety was first presented at PLATEAU 2020, a PL-HCI workshop.
  • November 4, 2020: Congratulations to Doris Xin for passing her qualifying exam!
  • November 3, 2020: Congratulations to Stephen Macke for defending his thesis!
  • October 23, 2020: Enjoyed writing this perspective piece on the challenges and opportunities from working with domain experts on building interactive visualization tools, appearing at Patterns.
  • October 10, 2020: We've poured all of our experience building visualization recommendation tools over the years into Lux, led by Doris Lee. Lux provides in-situ visualization recommendations within notebooks as an add-on to dataframes instead of the standard tabular dataframe view. Doris demonstrated Lux at JupyterCon 2020.
  • October 7, 2020: Devin Petersohn wrote about how Modin helps data scientists be more productive here.
  • October 9, 2020: Congratulations to Tana Wattanawaroon for passing his preliminary exam!
  • October 3, 2020: Yay! Stephen Macke's paper on tighter confidence intervals for approximate query processing was accepted to ICDE'21!
  • September 14, 2020: Excited to have our paper on visualization recommendation for genomics out in the Patterns journal from Cell Press, led by Silu Huang (now at MSR) and Charles Blatti. Paper here. Our work on this project started as part of the NIH BD2K center a while back.
  • September 1, 2020: I presented a two-part keynote at the VLDB PhD workshop. One of the parts was on my experiences dealing with rejection; check out the recording here.
  • July 15, 2020: Our vision paper on scalable dataframe systems has been accepted at VLDB'20.
  • June 16, 2020: Our paper on ShapeSearch won the SIGMOD 2020 Best paper award, one of two awards out of over 450 submissions. It's amazing to see SIGMOD appreciate non-traditional usability-oriented work. Congratulations Tarique! Here's an (overly generous) article on the award.
  • June 10, 2020: We have open-sourced our spreadsheet benchmark, sheetperf.
  • June 9, 2020: I appeared on the Software Engineering Daily podcast to discuss Human-in-the-loop Data Analytics.
  • May 26, 2020: My former student Silu Huang, now a senior researcher at Microsoft Research, was awarded the Jim Gray dissertation award honorable mention, given to the second ranked dissertation in data management. Congratulations, Silu!
  • May 19, 2020: Slides and other materials from my Spring class on data engineering are available here. I squished the entirety of a traditional database class into the first half of the semester, focusing on user-facing aspects. I then covered non-traditional topics: json/doc stores, IR, spreadsheets, data frames, OLAP/vis, col. stores & compression, parallel proc., streaming/sketching, security & privacy, graph proc., contrasting with the relational approach. The key emphasis was on principles underlying data systems that data scientists may consider using, and how to pick between them for a given situation.
  • May 16, 2020: Congratulations to my Ph.D. student graduates! Tarique Siddiqui is off to be a Senior Researcher at Microsoft Research in the Databases group. Sajjadur Rahman is off to be a Researcher at Megagon Labs. Liqi Xu is off to be a Research Scientist at Facebook.
  • May 16, 2020: Congratulations to my M.S. student graduates! Angela Lee is off to Google, while Jaewoo Kim is off to AWS.
  • May 4, 2020: We released another version of our covidvis tool. We also released a manually gathered intervention dataset to go with the tool. Grateful to receive positive feedback for our efforts.
  • May 1, 2020: Two papers accepted to HILDA this year: Pingjing et al. lead a paper on results from the first lab-based user evaluation across a range of analytical tasks on spreadsheets, identifying roadblocks and opportunities. Doris et al. lead a paper on understanding ML development patterns from surveying a large collection of real-world ML traces.
  • April 15, 2020: Doris Lee participated in the CHI'20 doctoral consortium!
  • April 10, 2020: We released our first version of our covidvis tool to help make sense of the impact of interventions on the progress of the COVID-19. Collaborators include epidemiologists, statisticians, and public health folks.
  • April 1, 2020: SIGMOD paper acceptances: our paper on benchmarking spreadsheets, led by Sajjadur Rahman, and our paper on ShapeSearch, a multi-modality interface for querying for visual patterns led by Tarique Siddiqui.
  • February 12, 2020: Excited to be named one of the Alfred P. Sloan research fellows in Computer Science this year! Articles here, here, and here.
  • February 11, 2020: Preprint on Genvisage is live. Genvisage is a tool for explaining the results of genomics experiments; done in collaboration with Saurabh Sinha and Charles Blatti, led by Silu Huang.
  • January 28, 2020: Congrats Doris Lee for winning a Facebook fellowship! She's one of 36 fellows from over 1800 applicants. Article here.
  • January 26, 2020: A photo tribute to Hector Garcia-Molina, my advisor, here, who passed away earlier this year, assembled by his Ph.D. students.
  • January 24, 2020: The SIGMOD 2020 workshops list is live. Great list of upcoming workshops!
  • January 7, 2020: The first of many papers on the Modin project is live! ArXiV link here. Modin is aiming to make dataframes more scalable by applying database techniques, and is currently being used in a number of companies, with dozens of collaborators and lots of interest on github. Led by Devin Petersohn.
        Click here for more news.

Synergistic Activities

I serve as the Faculty Equity Advisor at the School of Information. I am a Co-Chair of Workshops for SIGMOD 2020 and 2021. Please send us your exciting and engaging community-building ideas! Interdisciplinary/novel workshop ideas encouraged. Here is the call for proposals. I am the US Sponsor Chair for VLDB 2021. Please reach out if you'd like to sponsor VLDB 2021!

I serve on the steering committees of HILDA (Human-in-the-loop Data Analytics) at SIGMOD and DSIA (Data Systems for Interactive Analysis) at VIS. Lots of excitement around this nascent area at the intersection of databases, data mining, and visualization/HCI - join us!

In the recent past, I am serving as an Area/Associate Chair for HCOMP 2020, VLDB 2020, and SIGMOD 2020, as a Program Committee member for VLDB Demo 2019 and HILDA 2019 (phew!) I've served on the program committees of VLDB, KDD, SIGMOD, WSDM, WWW, SOCC, HCOMP, ICDE, and EDBT, many of them multiple times.

Past: I stepped down as Associate Editor for SIGMOD Record after nearly half a decade. I co-organized HILDA 2017. I was the SIGMOD 2016 Undergraduate Research Chair.

Recent Releases



Medium Blog




Selected Projects

zenvisage

Zenvisage: A visualization recommendation system

Zenvisage is a tool for effortlessly visualizing insights from very large data sets. It automates finding the right visualization for a query, significantly simplifying the laborious task of identifying appropriate visualizations.

Project page here.


helix

Helix: An Accelerated Human-in-the-loop Machine Learning System

Helix accelerates the iterative development of machine learning pipelines with a human developer "in the loop" via intelligent caching and reuse.

Project page here.


dataspread

DataSpread: A Spreadsheet-Database Hybrid

DataSpread is a tool that marries the best of databases and spreadsheets.

Project page: here


Datasift

Orpheus: Relational Dataset Version Management at Scale

DataHub (or "GitHub for Data") is a system that enables collaborative data science by keeping track of large numbers of versions and their dependencies compactly, and allowing users to progressively clean, integrate and visualize their datasets. OrpheusDB is a component of DataHub focused on using a relational database for versioning.

Project page: here


crowd-alg

Populace: A Suite of Crowd-Powered Algorithms

Our work has developed a number of algorithms for gathering, processing, and understanding data obtained from humans (or crowds), while minimizing cost, latency, and error. Since 2014, our focus has been on optimizing open-ended crowdsourcing: an understudied and challenging class.

Project page: here