I study language and machine learning. I'm interested in natural language processing problems as a window into reasoning, planning and perception; these days I'm especially focused on using language as a scaffold for more efficient learning and as a probe for understanding model behavior. I'm also broadly interested in structured neural methods that combine the advantages of deep representations and discrete compositionality.
I'm a fourth-year Ph.D. student in the Berkeley NLP Group and the Berkeley AI Research Lab. Previously I worked with the Cambridge NLIP Group, and the Center for Computational Learning Systems and NLP Group at Columbia.
- Learning to reason: End-to-end module networks for visual question answering. Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell and Kate Saenko. paper [arxiv], code [git]
- Correlates of linguistic structure in learned representations. Jacob Andreas and Dan Klein. EMNLP 2017.
- Modular multitask reinforcement learning with policy sketches. Jacob Andreas, Dan Klein and Sergey Levine. ICML 2017 (talk). paper [arxiv], code [git]
- Translating neuralese. Jacob Andreas, Anca Dragan and Dan Klein. ACL 2017 (talk). paper [arxiv], code [git]
- A minimal span-based neural constituency parser. Mitchell Stern, Jacob Andreas and Dan Klein. ACL 2017 (talk). paper [arxiv]
- Modeling relationships in referential expressions with compositional modular networks. Ronghang Hu, Marcus Rohrbach, Jacob Andreas, Trevor Darrell and Kate Saenko. CVPR 2017 (spotlight). paper [arxiv] code [git]
- Reasoning about pragmatics with neural listeners and speakers. Jacob Andreas and Dan Klein. EMNLP 2016 (talk). paper [arxiv], code [git], slides [pdf]
- Learning to compose neural networks for question answering. Jacob Andreas, Marcus Rohrbach, Trevor Darrell and Dan Klein. NAACL 2016 (best paper). paper [arxiv], code [git], slides [pdf], video [youtube]
- Neural module networks. Jacob Andreas, Marcus Rohrbach, Trevor Darrell and Dan Klein. CVPR 2016 (talk). paper [arxiv], code [git], slides [pdf], video [youtube]
- On the accuracy of self-normalized log-linear models. Jacob Andreas*, Maxim Rabinovich*, Dan Klein and Michael I. Jordan. NIPS 2015 (poster). paper [arxiv]
- Alignment-based compositional semantics for instruction following. Jacob Andreas and Dan Klein. EMNLP 2015 (talk). paper [arxiv], code [git], slides [pdf], video [vimeo]
- When and why are log-linear models self-normalizing? Jacob Andreas and Dan Klein. NAACL 2015 (talk). paper [pdf], code [git], video [techtalks.tv]
- Unsupervised transcription of piano music. Taylor Berg-Kirkpatrick, Jacob Andreas and Dan Klein. NIPS 2014 (spotlight & demo). paper [pdf]
- Grounding language with points and paths in continuous spaces. Jacob Andreas and Dan Klein. CoNLL 2014 (talk). paper [pdf], code [tgz], slides [pptx]
- How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein. ACL 2014 (talk). paper [pdf], code [tgz], slides [pptx]
- Semantic parsing as machine translation. Jacob Andreas, Andreas Vlachos and Stephen Clark. ACL 2013 (talk). paper [pdf], code [git]
- Parsing graphs with hyperedge replacement grammars. David Chiang, Jacob Andreas, Daniel Bauer, Karl Moritz Hermann, Bevan Jones and Kevin Knight. ACL 2013 (poster). paper [pdf]
A generative model of vector space semantics.
Jacob Andreas and Zoubin Ghahramani.
ACL 2013 Workshop on continuous vector space models and their
paper [pdf], code [git]
- Semantics-based machine translation with hyperedge replacement grammars. Bevan Jones*, Jacob Andreas*, Daniel Bauer*, Karl Moritz Hermann* and Kevin Knight. COLING 2012. paper [pdf], code [git]
Detecting influencers in written online conversations.
Or Biran, Sara Rosenthal, Jacob Andreas,
Kathleen McKeown and Owen Rambow.
NAACL 2012 Workshop on language and social media.
- Fuzzy syntactic reordering for phrase-based statistical machine translation. Jacob Andreas, Nizar Habash and Owen Rambow. WMT 2011. paper [pdf]
- Resources & annotation
- Annotating agreement and disagreement in threaded discussion. Jacob Andreas, Sara Rosenthal and Kathleen McKeown. LREC 2012. paper [pdf], slides [pdf], data
Corpus creation for new genres: a crowdsourced
approach to PP attachment.
Mukund Jha, Jacob Andreas, Kapil Thadani, Sara
Rosenthal and Kathleen McKeown.
NAACL 2010 Workshop on creating speech and language data with Mechanical
- Towards semi-automated annotation for prepositional phrase attachment. Sara Rosenthal, William J. Lipovsky, Kathleen McKeown, Kapil Thadani and Jacob Andreas. LREC 2010. paper [pdf]
[*Authors are listed in arbitrary order.]
I am currently supported by a Facebook fellowship and a Huawei / Berkeley AI fellowship. I was a Churchill scholar from 2012–2013 and a National Science Foundation fellow from 2013–2016.
Collaborators: Anca Dragan, Ronghang Hu, Kate Saenko, Sergey Levine, Trevor Darrell, Marcus Rohrbach, Mike Jordan, Max Rabinovich, Taylor Berg-Kirkpatrick, Dan Klein, Zoubin Ghahramani, Stephen Clark, Andreas Vlachos, Kevin Knight, Bevan Jones, Daniel Bauer, Karl Moritz Hermann, David Chiang, Michael Collins, Nizar Habash, Owen Rambow, Or Biran, Kathy McKeown, Kapil Thadani, Mukund Jha, Sara Rosenthal, William Lipovsky.
Collaboration graph trivia: My Erdős number is at most four (J Andreas to K McKeown to Z Galil to N Alon to P Erdős). My Kevin Bacon number (and consequently my Erdős-Bacon number) remains lamentably undefined, but my Kevin Knight number (since apparently that's a thing) is one. I have never starred in a film with Kevin Knight. Noam Chomsky is my great-great-grand-advisor (J Andreas to D Klein to C Manning to J Bresnan to N Chomsky).