Emma Pierson

I am an assistant professor of computer science at Berkeley, affiliated with the Berkeley AI Research Lab, Computational Precision Health, and the Center for Human-Compatible AI. I develop data science and machine learning methods to study two broad areas: inequality and healthcare. For representative publications, please see my papers on fair clinical prediction (New England Journal of Medicine, 2024; PNAS, 2024); sparse autoencoders for hypothesis generation (ICML, 2025); using LLMs to support health equity (New England Journal of Medicine AI, 2025); inequality in pain (Nature Medicine, 2021); inequality in policing (Nature Human Behaviour, 2020; FAccT, 2023); inequality in COVID-19 (Nature, 2021); and segregation (Nature, 2023). My work has been recognized by best paper awards at KDD and AISTATS, an NSF CAREER award, a Rhodes Scholarship, Hertz Fellowship, Rising Star in EECS, MIT Technology Review 35 Innovators Under 35, Forbes 30 Under 30 in Science, AI2050 Early Career Fellowship, and Samsung AI Researcher of the Year. Here's my full CV and list of publications, and here's a professional bio and photo.

Previously, I was an assistant professor at Cornell Tech, a senior researcher at Microsoft Research New England, and a PhD student in Jure Leskovec's lab at Stanford. Before my PhD, I did a master's in statistics at Oxford, and before that I spent a year as a data scientist at 23andMe and Coursera. I write a statistics blog, Obsession with Regression, and have also written for The New York Times, FiveThirtyEight, The Atlantic, The Washington Post, Wired, Times Higher Education, and various other publications. I always like hearing from people with cool ideas for things to do with data: shoot me an email at emmapierson@berkeley.edu!

What's new?

June 2025. Our paper on capturing health disparities in disease progression models won a best paper award at CHIL!

May 2025. HypotheSAEs, a sparse autoencoder method for hypothesis generation, is accepted at ICML (Python package).

April 2025. Essay defending science against federal funding cuts is published in Nature. Paper proposing a disease progression model which interpretably captures and accounts for three types of health disparities is accepted at CHIL.

March 2025. Preprints: HypotheSAEs, a new sparse autoencoder method for hypothesis generation (with an easy-to-use Python package); and inferring fine-grained migration patterns across the United States (data, project website). Paper on generating social networks with LLMs is accepted at ICWSM.

January 2025. Our paper on using LLMs to promote health equity is published in New England Journal of Medicine AI; our piece on how philanthropic funders can help protect science from funding cuts is published in Nature.

December 2024. Our paper on positively shaping AI impacts (accompanying Economist article); review paper on Generative AI in Medicine (to appear in Annual Review of Biomedical Data Science).

September 2024. Papers comparing LLM and human perceptions of safety and assessing question-asking in medical LLMs accepted at EMNLP and NeurIPS.

August 2024. Our paper on race adjustments in cancer risk prediction is published in PNAS.

July 2024. Our paper on how the updated cardiovascular risk equations could change eligibility for cholesterol and blood pressure medications is published in JAMA.

June 2024. Honored to have been named one of City & State's Trailblazers in Higher Education!

June 2024. Our paper on trends in large language model research is published at NAACL (see also Data Skeptic podcast episode). Our paper on participation in the age of foundation models is published at FAccT.

May 2024. Honored to have been named the inaugural Andrew H. and Ann R. Tisch Assistant Professor at Cornell Tech!

May 2024. Our paper on using domain constraints to improve risk prediction in the presence of missing data is published at ICLR. Our paper on race adjustments in lung function equations is published in the New England Journal of Medicine. Our paper on quantifying disparities in underreported health conditions is published in npj Women's Health.

April 2024. Our review of AI in cardiovascular care [Part 1, Part 2] is published in the Journal of the American College of Cardiology. Our piece on using unlabeled data to improve generalization and fairness of medical AI is published in Nature Medicine.

March 2024. Honored to have been awarded an AI2050 Early Career Fellowship!

January 2024. New paper: Accuracy and Equity in Clinical Risk Prediction. Pierson. New England Journal of Medicine, 2024. [Paper]

January 2024. New paper: Reconciling the accuracy-diversity trade-off in recommendations. Peng, Raghavan, Pierson, Kleinberg, and Garg. To appear, TheWebConf, 2024. [Paper]

December 2023. New paper: A Bayesian Spatial Model to Correct Under-Reporting in Urban Crowdsourcing. Agostini, Pierson*, and Garg*. To appear, AAAI, 2024. [Paper]

November 2023. New paper: Human mobility networks reveal increased segregation in large cities. Nilforoshan*, Looi*, Pierson*, Villanueva, Fishman, Chen, Sholar, Redbird, Grusky, and Leskovec. Nature, 2023. [Paper] [Press Briefing]

November 2023. Our Coursera class on algorithmic fairness, "Practical Steps for Building Fair Algorithms", is now available! The class is freely available to audit and designed to be accessible to everyone.

November 2023. Honored to have been named a Samsung AI Researcher of the Year, awarded to five early-career AI researchers worldwide!

August 2023. New paper: Coarse race data conceals disparities in clinical risk score performance. Movva*, Shanmugam*, Hou, Pathak, Guttag, Garg, and Pierson. Machine Learning for Healthcare Conference, 2023; best findings paper honorable mention, ML4H Symposium. [Paper] [Press coverage in The New York Times, Cornell News and Upturn newsletter]

July 2023. New paper: Topics, Authors, and Networks in Large Language Model Research: Trends from a Survey of 17K arXiv Papers. Movva*, Balachandar*, Peng*, Agostini*, Garg**, and Pierson**. Under review; New Directions in Analyzing Text as Data Meeting (TADA), 2023. [Paper] [Data Skeptic podcast episode]

June 2023. New paper: Detecting disparities in police deployments using dashcam data. Franchi, Zamfirescu-Pereira, Ju, and Pierson. FAccT, 2023. [Paper] [Code] [Press coverage in Cornell News and WNYC/Gothamist]

What's new?

Contact