Aditya Parameswaran@UIUC

Aditya Parameswaran

I am an assistant professor of Computer Science at the University of Illinois (UIUC) . My research interests are broadly in simplifying and improving data analytics, i.e., helping users make better use of their data.

My work involves building real data analytics systems with principled foundations, designing algorithms (with formal guarantees) for the systems, as well as mining data obtained from such systems.

Biographical Sketch

Aditya Parameswaran is an Assistant Professor in Computer Science at the University of Illinois (UIUC). He spent the 2013-14 year visiting MIT CSAIL and Microsoft Research New England, after completing his Ph.D. from Stanford University, advised by Prof. Hector Garcia-Molina. He is broadly interested in data analytics, with research results in human computation, visual analytics, information extraction and integration, and recommender systems.

Aditya is a recipient of the Arthur Samuel award for the best dissertation in CS at Stanford (2014), the SIGMOD Jim Gray dissertation award (2014), the SIGKDD dissertation award runner up (2014), a Google Faculty Research Award (2015), the Key Scientific Challenges Award from Yahoo! Research (2010), three best-of-conference citations (VLDB 2010, KDD 2012 and ICDE 2014), the Terry Groswith graduate fellowship at Stanford (2007), and the Gold Medal in Computer Science at IIT Bombay (2007). His research group is supported with funding from by the NIH, the NSF, and Google.

some small text READ MORE>> here come the full text to replace the small text Less>>

News

June 9, 2015: Release of a new preprint on calibrating the output of confidence estimates from classification algorithms, using classical learning theory tools. This is work driven by my awesome student Yihan Gao.
June 6, 2015: Our DataHub query language proposal was accepted at TaPP, a focused provenance workshop.
June 1, 2015: Final tally for VLDB 2015 -- three papers and three demos on a variety of topics:
- papers: crowds, visualizations, and versioning;
- demos: data exploration, Excel-meets-databases, and collaborative data analytics.
May 27, 2015: Our paper on versioning principles was accepted at VLDB'15 without any revisions!
May 15, 2015: Undergraduate research news: Andrew Kuznetsov, a freshman working in our group won the ISUR undergraduate research prize, and Andrew with two other freshmen -- Andrew Thieck and Radhir Kothuri won the third prize in the Illinois Engineering Open House competition for their crowdsourcing tool.
May 12, 2015: Our paper on debiasing was accepted at KDD 2015!
April 9, 2015: Our first release of a new project, titled Data-Spread, with my esteemed colleague Kevin Chang and student Mangesh Bendre. Data-Spread is a tool that unifies databases and spreadsheets. You have to see it to believe it!
- Here is a YouTube video showing Data-Spread in action.
- Here is our demo paper on Data-Spread.
March 10, 2015: Four more new preprints in the last month! These were:
- our paper on SeeDB for query driven automatic visualization generation;
- our jellybean paper on counting objects in images; turns out we can do way better than humans or computer vision algorithms!
- our paper on debiasing of batches; crowdsourcing practitioners often use batching to save costs, but this can lead to non-independence: we deal with this issue.
- our versioning theory paper; to build a solid foundation for our DataHub project, we explored how to trade off storage and retrieval costs.
February 9, 2015: Our paper on exploiting correlations to avoid expensive predicate evaluations was accepted at SIGMOD 2015!
February 12, 2015: Many thanks to Google for their support via a Google Faculty Research Award! Excited to be building the next generation visualization toolkit.
December 10, 2014: Three new preprints in the last month! These were:
- smart drill-down, our tool for zooming into portions of a dataset quickly;
- our paper on globally optimal crowdsourcing quality management; and
- our paper on gathering data using the crowd, exploiting a hierarchy and MABs.
November 10, 2014: Three new paper acceptances in the last month!
- Our Datahub paper was accepted at CIDR;
- our rapid approximate visualization generation paper was accepted at VLDB;
- and our paper on generalized confidence intervals for crowdsourced workers was accepted at ICDE!
October 10, 2014: Thrilled to be a part of the new NIH BD2K (Big Data 2 Knowledge) center for revolutionizing genomic data analysis. Thank you, NIH, for the support!
September 2, 2014: We can finally talk about our exciting new project, titled Datahub (i.e., GitHub for Data) on collaborative data science and version management. The ambitious goal is to eliminate the pain-points of data book-keeping while doing collaborative data science.
September 1, 2014: Our paper on pricing for crowdsourcing tasks has been accepted for presentation at VLDB 2015! The paper studies a simple, but important problem: if you have a batch of tasks and a deadline, how should you vary price to meet the deadline?
August 25, 2014: Pleasantly surprised to be selected as the KDD dissertation award runner-up, having already been given the SIGMOD dissertation award! Feel truly lucky to have two communities - SIGMOD and KDD - supporting my work!
August 24, 2014: Had a blast being a keynote speaker at KDD IDEA 2014 - a big thank you to the organizers for inviting me! If this year was any indication, IDEA is going to flourish as a workshop for many years!
August 20, 2014: Our paper on optimally learning maximum-likelihood worker accuracies has been accepted as a work-in-progress paper for HCOMP 2014! The paper tackles the problem of worker quality estimation in a way EM-based algorithms cannot - by providing optimality guarantees.
August 15, 2014: Started at Illinois; exciting times ahead!

Synergistic Activities

I am currently serving on or have served on the Program Committees of: VLDB 2013-14-15, KDD 2015, SIGMOD 2014-15, WSDM 2015, WWW 2014, SOCC 2014, HCOMP 2014, ICDE 2014, and EDBT 2014.

Visual Analytics

Automatically recommending visualizations or visual summaries on very large volumes of data

View details »

Interactive Analytics

Interactive querying of large datasets, keeping track of versions, while possibly sacrificing slightly on accuracy of query results

View details »

Crowd-Powered Analytics

Using crowdsourcing to process and make sense of large volumes of data

View details »

Information Extraction

Extracting information from the web, integrating it with existing information, and surfacing this information to users

View details »

Recommendation Systems

Building scalable recommendation systems that take into account contextual information

View details »

Recent Releases

PAPER RequestAtlas: Supporting the Slow and Iterative Process of Requesting Public Records.
Rachel Warren, Aditya G. Parameswaran, Lisa Pickoff-White, Niloufar Salehi. 28th Int'l Conference on Computer-Supported Cooperative Work (CSCW), Bergen, Norway. November 2025
(Used by journalists for managing 1000s of public record requests.)
PAPER Towards Accurate and Efficient Document Analytics with Large Language Models.
Yiming Lin, Madelon Hulsebos, Ruiying Ma, Shreya Shankar, Sepanta Zeighami, Aditya G. Parameswaran, Eugene Wu. 41st IEEE Int’l Conf on Data Engineering (ICDE), Hong Kong. May 2025
PAPER PromptEvals: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines.
Reya Vir*, Shreya Shankar*, Harrison Chase, William Hinthorn, Aditya G. Parameswaran. 20th Conf. of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), Albuquerque, USA. April 2025
(Done in collaboration with LangChain, a leading LLM workflow company.)
PRE-PRINT RAG Without the Lag: Interactive Debugging for Retrieval-Augmented Generation Pipelines.
Quentin Romero Lauro*, Shreya Shankar*, Sepanta Zeighami, Aditya G. Parameswaran. Technical Report. April 2025
PRE-PRINT TWIX: Automatically Reconstructing Structured Data from Templatized Documents.
Yiming Lin, Mawil Hasan, Rohan Kosalge, Alvin Cheung, Aditya G. Parameswaran. Technical Report. April 2025
PRE-PRINT Steering Semantic Data Processing with DocWrangler.
Shreya Shankar*, Bhavya Chopra*, Mawil Hasan, Stephen Lee, Björn Hartmann, Joseph M. Hellerstein, Aditya G. Parameswaran, Eugene Wu. Technical Report. April 2025
(The deployed version of DocWrangler has been used over 1500 times.)
PRE-PRINT The Cambridge Report on Database Research.
Anastasia Ailamaki, ..., Aditya Parameswaran, .... Technical Report. April 2025
(This is a once-every-five-years report on data management research, written by experts in the field.)
PRE-PRINT Why Do Multi-Agent LLM Systems Fail?.
Mert Cemri, Melissa Z Pan, Shuyi Yang, Lakshya A Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, Matei Zaharia, Joseph E Gonzalez, Ion Stoica. Technical Report. March 2025
PAPER NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval.
Sepanta Zeighami, Zac Wellmer, Aditya G. Parameswaran. 13th Int’l Conf on Learning Representations (ICLR), Singapore. March 2025
(Deployed by LlamaIndex as part of their open-source.)
PAPER LLM-Powered Proactive Data Systems.
Sepanta Zeighami, Yiming Lin, Shreya Shankar, Aditya G. Parameswaran. IEEE Data Engineering Bulletin, Issue on LLMs-meets-data. March 2025

Selected Projects

zenvisage

zenvisage is a tool for effortlessly visualizing insights from very large data sets. It automates finding the right visualization for a query, significantly simplifying the laborious task of identifying appropriate visualizations.

PAPER SeeDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics.
Manasi Vartak, Sajjadur Rahman, Samuel Madden, Aditya Parameswaran, and Neoklis Polyzotis. 42nd International Conference on Very Large Data Bases (VLDB), New Delhi, India. September 2016
PAPER Rapid Sampling for Visualizations with Ordering Guarantees.
Albert Kim, Eric Blais, Aditya Parameswaran, Piotr Indyk, Samuel Maddem, Ronitt Rubinfeld. 41st International Conference on Very Large Data Bases (VLDB), Kohala Coast, Hawaii, USA. September 2015
PAPER SeeDB: Automatically Generating Query Visualizations (Demo).
Manasi Vartak, Samuel Madden, Aditya Parameswaran, Neoklis Polyzotis. 40th International Conf. on Very Large Data Bases (VLDB), Hangzhou, China. September 2014
PAPER SeeDB: Visualizing Database Queries Efficiently (Vision Paper).
Aditya Parameswaran, Neoklis Polyzotis, and Hector Garcia-Molina. 40th International Conf. on Very Large Data Bases (VLDB), Hangzhou, China. September 2014

DataSpread

DataSpread is a tool that marries the best of databases and spreadsheets.

PAPER Data-Spread: Unifying Databases and Spreadsheets (Demo).
Mangesh Bendre, Bofan Sun, Xinyan Zhou, Ding Zhang, Shy-Yauer Lin, Kevin Chang, and Aditya Parameswaran. 41st International Conference on Very Large Data Bases (VLDB), Kohala Coast, Hawaii, USA. September 2015

DataHub: Collaborative Dataset Version Management at Scale

DataHub (or "GitHub for Data") is a system that enables collaborative data science by keeping track of large numbers of versions and their dependencies compactly, and allowing users to progressively clean, integrate and visualize their datasets.

PAPER Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff.
Souvik Bhattacherjee, Amit Chavan, Silu Huang, Amol Deshpande, and Aditya Parameswaran. 41st International Conference on Very Large Data Bases (VLDB), Kohala Coast, Hawaii, USA. September 2015
PAPER Collaborative Data Analytics with Datahub (Demo).
Anant Bhardwaj, Amol Deshpande, Aaron Elmore, David Karger, Sam Madden, Aditya Parameswaran, Harihar Subramanyam, Eugene Wu, and Rebecca Zhang. 41st International Conference on Very Large Data Bases (VLDB), Kohala Coast, Hawaii, USA. September 2015
PAPER Towards a Unified Query Language for Provenance and Versioning.
Amit Chavan, Silu Huang, Amol Deshpande, Aaron Elmore, Sam Madden, and Aditya Parameswaran. 7th International Workshop on Theory and Practice of Provenance (TaPP), Edinburgh, Scotland. July 2015
PAPER DataHub: Collaborative Data Science & Dataset Version Management at Scale.
Anant Bhardwaj, Souvik Bhattacherjee, Amit Chavan, Amol Deshpande, Aaron J. Elmore, Samuel Madden, Aditya Parameswaran. Conference on Innovative Database Research (CIDR), Asilomar, USA. January 2015

DataSift: A Crowd-Powered Search Engine

DataSift is a crowd-powered search engine that is useful for long or complex queries that traditional search engines have trouble with, or with queries that contain rich media, such as images or videos.

PAPER DataSift: A Crowd-Powered Search Toolkit (Demo).
Aditya Parameswaran, Ming Han Teh, Hector Garcia-Molina and Jennifer Widom. SIGMOD International Conf. on Management of Data, Snowbird, Utah, USA. June 2014
PAPER An Expressive and Accurate Crowd-Powered Search Toolkit.
Aditya Parameswaran, Ming Han Teh, Hector Garcia-Molina and Jennifer Widom. 1st Conf. on Human Computation and Crowdsourcing (HCOMP), Palm Springs, USA. November 2013

Crowd Algorithms

Our work has developed a number of algorithms for gathering, processing, and understanding data obtained from humans (or crowds), while minimizing cost, latency, and error.

PAPER Surpassing Humans and Computers with JellyBean: Crowd-Vision-Hybrid Image Counting Algorithms.
Akash Das Sarma, Ayush Jain, Arnab Nandi, Aditya Parameswaran and Jennifer Widom. 3rd International Conference on Human Computation and Crowdsourcing (HCOMP), San Diego, USA. November 2015
PAPER Finish Them!: Pricing Algorithms for Human Computation.
Yihan Gao, Aditya Parameswaran. 41st International Conference on Very Large Data Bases (VLDB), Kohala Coast, Hawaii, USA. September 2015
PAPER Comprehensive and Reliable Crowd Assessment Algorithms.
Manas Joglekar, Hector Garcia-Molina, and Aditya Parameswaran . 31st International Conf. on Data Engineering (ICDE), Seoul, Korea. April 2015
PAPER Optimal Crowd-Powered Rating and Filtering Algorithms.
Aditya Parameswaran, Stephen Boyd, Hector Garcia-Molina, Ashish Gupta, Neoklis Polyzotis, Jennifer Widom. 40th International Conf. on Very Large Data Bases (VLDB), Hangzhou, China. September 2014
PAPER Crowd-Powered Find Algorithms.
Anish Das Sarma, Aditya Parameswaran, Hector Garcia-Molina and Alon Halevy. 30th International Conf. on Data Engineering (ICDE), Chicago, USA. April 2014
(Invited to: Special Issue of TKDE Journal for ICDE 2014 Best Papers)
PAPER Human-Powered Data Management.
Aditya Parameswaran. Doctoral Dissertation, Stanford University. September 2013
(Thesis awards: Stanford U., SIGMOD's Jim Gray award, and SIGKDD's thesis award Runner-up)
PAPER Evaluating the Crowd with Confidence.
Manas Joglekar, Hector Garcia-Molina and Aditya Parameswaran. 19th International Conf. on Knowledge Discovery and Data Mining (KDD), Chicago, USA. August 2013
PAPER So Who Won? Dynamic Max Discovery with the Crowd.
Stephen Guo, Aditya Parameswaran and Hector Garcia-Molina. SIGMOD International Conf. on Management of Data, Scottsdale, Arizona, USA. June 2012
PAPER CrowdScreen: Algorithms for Filtering Data with Humans.
Aditya Parameswaran, Hector Garcia-Molina, Hyunjung Park, Neoklis Polyzotis, Aditya Ramesh and Jennifer Widom. SIGMOD International Conf. on Management of Data, Scottsdale, Arizona, USA. June 2012

NeedleTail: A System for Browsing

NeedleTail is a system tuned towards instantly returning a small number (a "screenful") of query results very quickly on extremely large datasets.

PRE-PRINT NeedleTail: A System for Browsing Queries (Demo).
Albert Kim, Samuel Madden and Aditya Parameswaran . Technical Report. April 2014