This project analyzes a Question & Answer site for programmers, Stack Overflow, that dramatically improves
on the utility and performance of Q&A systems for technical domains. Over 92% of Stack Overflow questions
about expert topics are answered — in a median time of 11 minutes. Using a mixed methods approach that combines
statistical data analysis with user interviews, we seek to understand this success. We argue that it is not primarily
due to an a priori superior technical design, but also to the high visibility and daily involvement of the design team
within the community they serve. This model of continued community leadership presents challenges to both CSCW
systems research as well as to attempts to apply the Stack Overflow model to other specialized knowledge domains..
This project is complete and no longer under active development.
Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann. 2011. Design lessons from the fastest q&a site in the west. In Proceedings of CHI 2011. ACM, New York, NY, USA, 2857-2866.
ACM Digital Library| Local pdf
CHI2011 Talk Slides: Powerpoint file | pdf
Björn Hartmann
Manas Mittal
Our analysis is based on the August 2010 Stack Exchange Data Dump (creative-commons licensed). We analyzed two years of user activity — from July 31, 2008 to July 31, 2010. As of early August 2010, Stack Overflow had a total of 300k registered users who asked 833k questions, provided 2,2M answers, and posted 2,9M comments.
Our analysis code is available for download under a BSD license:
stackoverflowanalysis-188.zip (40mb)
We converted the XML data dump files into a SQLite3 database. Analysis code is written in Python 2.x and SQL. Graphs are generated using matplotlib. This file is large because it contains some intermediate results of large queries.
Build Instructions
- Download the source file above and unzip into a directory of your choice.
- Download the data dump and place XML files into directory xml/ - see xml/README.txt.
- Run the import script import/import-all.sh (it calls python scripts to import individual tables)
- Create indices to speed up queries using import/create-indices.sql.
- Individual analysis scripts (both in Python and SQL) are in folder analysis/