Crowd-Powered Analytics

(December 10, 2016) Update: Please head on over to the website for the latest. This page is no longer actively maintained.

There are many tasks done easily and better by people than by current computer algorithms: tasks dealing with understanding and analyzing images, video, text and speech, as well as subjective opinions and abstract concepts. Due to the proliferation of cheap and reliable internet connectivity, a large number of people are now online and willing to answer questions for monetary gain. There are a number of human computation (a.k.a. crowdsourcing) marketplaces  —  Mechanical Turk, oDesk, LiveOps, and others  —  that enable workers to find tasks easily. sCOOP is a project whose broad theme is to leverage people as processing units, much like computer processes or subroutines, to achieve some global objective. A primary focus of sCOOP is to optimize this computation  —  while there may be many ways to orchestrate a particular task, our goal is to use as few resources (e.g., time, money) as possible, while getting equally good or better results as unoptimized computation.

Crowdscreen

Optimizing Crowd Algorithms

Here, the goal is to optimize some fundamental data processing algorithms where the unit operations are performed by people. Examples of algorithms include: sorting, clustering, classification, and categorization. Over the last year, we worked on algorithms for gathering, max, filtering, graph search, and lineage debugging. The goal of gathering is to extract as many entities as possible from a specific domain, e.g., find all events happening today in New York. The goal of the Max problem is to find the best item among a given set of items (e.g., photos, videos or songs), given a budget on the number of pairwise comparisons that may be asked of humans. We also looked at the Filtering problem, where we wanted to find which items in a given data set satisfy a given set of properties (that may be verified by humans), and the goal was to find the cost-optimal filtering algorithm, given constraints on error and time. We also considered the problem of human-assisted graph search, which has applications in many domains that can utilize human intelligence, including curation of hierarchies, image segmentation and categorization, interactive search and filter synthesis.

PAPER Surpassing Humans and Computers with JellyBean: Crowd-Vision-Hybrid Image Counting Algorithms.
Akash Das Sarma, Ayush Jain, Arnab Nandi, Aditya Parameswaran and Jennifer Widom. 3rd International Conference on Human Computation and Crowdsourcing (HCOMP), San Diego, USA. November 2015
PAPER Finish Them!: Pricing Algorithms for Human Computation.
Yihan Gao, Aditya Parameswaran. 41st International Conference on Very Large Data Bases (VLDB), Kohala Coast, Hawaii, USA. September 2015
PAPER Optimal Crowd-Powered Rating and Filtering Algorithms.
Aditya Parameswaran, Stephen Boyd, Hector Garcia-Molina, Ashish Gupta, Neoklis Polyzotis, Jennifer Widom. 40th International Conf. on Very Large Data Bases (VLDB), Hangzhou, China. September 2014
PAPER Crowd-Powered Find Algorithms.
Anish Das Sarma, Aditya Parameswaran, Hector Garcia-Molina and Alon Halevy. 30th International Conf. on Data Engineering (ICDE), Chicago, USA. April 2014
(Invited to: Special Issue of TKDE Journal for ICDE 2014 Best Papers)
PAPER Human-Powered Data Management.
Aditya Parameswaran. Doctoral Dissertation, Stanford University. September 2013
(Thesis awards: Stanford U., SIGMOD's Jim Gray award, and SIGKDD's thesis award Runner-up)
PAPER So Who Won? Dynamic Max Discovery with the Crowd.
Stephen Guo, Aditya Parameswaran and Hector Garcia-Molina. SIGMOD International Conf. on Management of Data, Scottsdale, Arizona, USA. June 2012
PAPER CrowdScreen: Algorithms for Filtering Data with Humans.
Aditya Parameswaran, Hector Garcia-Molina, Hyunjung Park, Neoklis Polyzotis, Aditya Ramesh and Jennifer Widom. SIGMOD International Conf. on Management of Data, Scottsdale, Arizona, USA. June 2012
PAPER Human-assisted Graph Search: It's Okay to Ask Questions.
Aditya Parameswaran, Anish Das Sarma, Hector Garcia-Molina, Neoklis Polyzotis and Jennifer Widom. 37th International Conf. on Very Large Data Bases (VLDB), Seattle, USA. September 2011

Needletail

Crowdsourcing Quality Management

The goal of crowdsourcing quality management is to manage the quality of task responses, as well as the quality of the workers. We have studied various aspects of this problem, including generating confidence intervals for worker quality, determining global optimal worker quality estimates, and determining whether to hire or fire workers.

PAPER Towards Globally Optimal Crowdsourcing Quality Management.
Akash Das Sarma, Aditya Parameswaran, Jennifer Widom. SIGMOD International Conf. on Management of Data, San Francisco, USA. June 2016
PAPER Debiasing Crowdsourced Batches.
Honglei Zhuang, Aditya Parameswaran, Dan Roth, and Jiawei Han. 21th International Conf. on Knowledge Discovery and Data Mining (KDD), Sydney, Australia. August 2015
PAPER Comprehensive and Reliable Crowd Assessment Algorithms.
Manas Joglekar, Hector Garcia-Molina, and Aditya Parameswaran . 31st International Conf. on Data Engineering (ICDE), Seoul, Korea. April 2015
PAPER Optimal Worker Quality and Answer Estimates in Crowd-Powered Filtering and Rating.
Akash Das Sarma, Aditya Parameswaran, Jennifer Widom. 2nd International Conference on Human Computation and Crowdsourcing (HCOMP), Pittsburgh, USA. November 2014
PAPER Evaluating the Crowd with Confidence.
Manas Joglekar, Hector Garcia-Molina and Aditya Parameswaran. 19th International Conf. on Knowledge Discovery and Data Mining (KDD), Chicago, USA. August 2013
PRE-PRINT Identifying Reliable Workers Swiftly.
Aditya Ramesh, Aditya Parameswaran, Hector Garcia-Molina and Neoklis Polyzotis. Infolab Technical Report. June 2012

Deco

Declarative Crowdsourcing

Here, the goal is to combine human and algorithmic computation with traditional database operations in order to perform complex tasks. This combination involves several optimization objectives: minimizing total elapsed time, minimizing the monetary cost to perform human computation (minimizing the number of questions and pricing them accordingly), and maximizing confidence in the obtained answers. Our proposed approach views the crowd-sourcing service as another database where facts are computed by human processors. By promoting the crowd-sourcing service to a first-class citizen on the same level as extensional data, it is possible to write a declarative query that seamlessly combines information from both. The system becomes responsible for optimizing the order in which tuples are processed, the order in which tasks are scheduled, whether tasks are handled by algorithms or a crowd-sourcing service, the pricing of the latter tasks, and the seamless transfer of information between the database system and the external services. Moreover, it provides built-in mechanisms to handle uncertainty, so that the developer can explicitly control the quality of the query results. Using the declarative approach, we can facilitate the development of complex applications that combine knowledge from human computation, algorithmic computation, and data. Our current design and details of our initial prototype can be found in the Deco paper.

PAPER An Overview of the Deco System: Data Model and Query Language; Query Processing and Optimization.
Hyunjung Park, Richard Pang, Aditya Parameswaran, Hector Garcia-Molina, Neoklis Polyzotis, and Jennifer Widom. SIGMOD Record, Volume 41. December 2012
PAPER Deco: Declarative Crowdsourcing.
Aditya Parameswaran, Hyunjung Park, Hector Garcia-Molina, Neoklis Polyzotis and Jennifer Widom. 21th International Conf. on Information and Knowledge Management (CIKM), Maui, Hawaii, USA. November 2012
PAPER Deco: A System for Declarative Crowdsourcing (Demo).
Hyunjung Park, Richard Pang, Aditya Parameswaran, Hector Garcia-Molina, Neoklis Polyzotis and Jennifer Widom. 38th International Conf. on Very Large Data Bases (VLDB), Istanbul, Turkey. September 2012
PRE-PRINT Query Processing over Crowdsourced Data.
Hyunjung Park, Aditya Parameswaran and Jennifer Widom. Infolab Technical Report. August 2012
PAPER Answering Queries using Humans, Algorithms and Databases.
Aditya Parameswaran and Neoklis Polyzotis. Conference on Innovative Database Research (CIDR), Asilomar, USA. January 2011

Datasift

Datasift: Crowd-Powered Search

Traditional search engines are unable to support a large number of potential queries issued by users, for instance, queries containing non-textual fragments such as images or videos, queries that are very long, ambiguous, or those that require subjective judgment, or semantically-rich queries over non-textual corpora. We have developed DataSift, a crowd-powered search toolkit that can be instrumented over any corpus supporting a keyword search API, and supports efficient and accurate querying for a rich general class of queries, including those described previously.

PAPER DataSift: A Crowd-Powered Search Toolkit (Demo).
Aditya Parameswaran, Ming Han Teh, Hector Garcia-Molina and Jennifer Widom. SIGMOD International Conf. on Management of Data, Snowbird, Utah, USA. June 2014
PAPER An Expressive and Accurate Crowd-Powered Search Toolkit.
Aditya Parameswaran, Ming Han Teh, Hector Garcia-Molina and Jennifer Widom. 1st Conf. on Human Computation and Crowdsourcing (HCOMP), Palm Springs, USA. November 2013