Erik Jones

Hello! I am a third year Ph.D. student in computer science at Berkeley advised by Jacob Steinhardt and Anca Dragan. I am affiliated with the Berkeley AI Research Lab.

Before starting at Berkeley, I received my B.S. in math and M.S. in computer science from Stanford, where I was very fortunate to be advised by Percy Liang. I am interested in making generative machine learning systems more robust, reliable, and aligned, with a focus on large language models. My work is supported by a Vitalik Buterin Fellowship in AI Existential Safety.

Email: er + mylastname at berkeley.edu
[Google Scholar] [Twitter]

Preprints

Orca 2: Teaching Small Language Models How to Reason
Arindam Mitra, Luciano Del Corro, Shweti Mahajan, Andres Codas, Clarisse Simoes, Sahaj Agrawal, Xuxi Chen, Anastasia Razdaibiedina, Erik Jones, Kriti Aggarwal, Hamid Palangi, Guoqing Zheng, Corby Rosset, Hamed Khanpour, Ahmed Awadallah
arXiv 2023

Publications

Feedback Loops With Language Models Drive In-Context Reward Hacking
Alexander Pan, Erik Jones, Meena Jagadeesan, Jacob Steinhardt
ICML 2024

Teaching Language Models to Hallucinate Less with Synthetic Tasks
Erik Jones, Hamid Palangi, Clarisse Simões, Varun Chandrasekaran, Subhabrata Mukherjee, Arindam Mitra, Ahmed Awadallah, Ece Kamar
ICLR 2024

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones, Suriya Gunasekar, Ranjita Naik, Hamid Palangi, Ece Kamar, Besmira Nushi
ICLR 2024

Mass-Producing Failures of Multimodal Systems with Language Models
Shengbang Tong*, Erik Jones*, Jacob Steinhardt
NeurIPS 2023
[Code]

Automatically Auditing Large Language Models via Discrete Optimization
Erik Jones, Anca Dragan, Aditi Raghunathan, Jacob Steinhardt
ICML 2023
[Code]

Capturing Failures of Large Language Models via Human Cognitive Biases
Erik Jones, Jacob Steinhardt
NeurIPS 2022
[Code]

Selective Classification Can Magnify Disparities Across Groups
Erik Jones*, Shiori Sagawa*, Pang Wei Koh*, Ananya Kumar, Percy Liang
ICLR 2021
[Code] [Codalab]

Robust Encodings: A Framework for Combating Adversarial Typos.
Erik Jones, Robin Jia*, Aditi Raghunathan*, and Percy Liang
ACL 2020
[Code] [Codalab]

Impact of a Deep Learning Assistant on the Histopathologic Classification of Liver Cancer
Amirhossein Kiani*, Bora Uyumazturk*, Pranav Rajpurkar*, Alex Wang, Rebecca Gao, Erik Jones, Yifan Yu, Curtis P. Langlotz, Robyn L. Ball, Thomas J. Montine, Brock A. Martin, Gerald J. Berry, Michael G. Ozawa, Florette K. Hazard, Ryanne A. Brown, Simon B. Chen, Mona Wood, Libby S. Allard, Lourdes Ylagan, Andrew Y. Ng, Jeanne Shen
npj Digital Medicine, 2020

Deep Learning for the Digital Pathologic Diagnosis of Cholangiocarcinoma and Hepatocellular Carcinoma: Evaluating the Impact of a Web-based Diagnostic Assistant
Bora Uyumazturk*, Amirhossein Kiani*, Pranav Rajpurkar*, Alex Wang, Robyn L. Ball, Rebecca Gao, Yifan Yu, Erik Jones, Curtis P. Langlotz, Brock Martin, Gerald J. Berry, Michael G. Ozawa, Florette K. Hazard, Ryanne A. Brown, Simon B. Chen, Mona Wood, Libby S. Allard, Lourdes Ylagan, Andrew Y. Ng, and Jeanne Shen
NeurIPS Workshop on Machine Learning for Health, 2019

Deep-learning-assisted Diagnosis for Knee Magnetic Resonance Imaging: Development and Retrospective Validation of MRNet
Nicholas Bien*, Pranav Rajpurkar*, Robyn L Ball, Jeremy Irvin, Allison Park, Erik Jones, Michael Bereket, Bhavik N Patel, Kristen W Yeom, Katie Shpanskaya, Safwan Halabi, Evan Zucker, Gary Fanton, Derek F Amanatullah, Christopher F Beaulieu, Geoffrey M Riley, Russell J Stewart, Francis G Blankenberg, David B Larson, Ricky H Jones, Curtis P Langlotz, Andrew Y Ng, and Matthew P Lungren
PLoS medicine, 2018


Teaching Experience