Dan Klein
Computer Science Division
University of California at Berkeley
Email |
Mail |
Dan Klein, Sutardja Dai Hall, Berkeley, CA
94720 |
My research focuses on the automatic organization of natural language
information. Some topics of interest to me are:
- Unsupervised language acquisition
- Machine translation
- Efficient algorithms for NLP
- Information extraction
- Linguistically rich models of language
- Integrating symbolic and statistical methods for NLP
- Historical
My group is the Berkeley Natural
Language Processing Group. Here is a list of
my amazing students,
past and present!
I'm also interested in AI more broadly; we've been increasingly
involved in search, planning, and agent design. Our StarCraft agent, the Overmind, won the AIIDE 2010 StarCraft
AI competition!
My education, in reverse order:
Some fellowships / awards:
- Diane S. McEntyre Award for Excellence in Teaching, 2011
- UC Berkeley Distinguished Teaching Award,
- Jim and Donna Gray Award for Excellence in UG Teaching,
- Okawa Research Award,
- ACM Grace Murray Hopper Award,
- Alfred
P. Sloan Fellowship, 2007
CAREER Award, 2007
- Microsoft Faculty
Fellowship, 2005
- Microsoft
Graduate Fellowship, 2003
- British Marshall
Fellowship, 1998
Some paper awards we've won:
- Best Paper Award, ACL 2003, for "Accurate Unlexicalized
Parsing" with Chris Manning
- Best Paper Award, EMNLP 2004, for "Max-Margin Parsing"
with Ben Taskar, Mike Collins, Chris Manning, and Daphne Koller
- Best Student Paper Award, NAACL 2006, for "Prototype-Driven
Learning for Sequence Models" with Aria Haghighi
- Best Paper Award, ACL 2009, for "K-Best A* Parsing" with Adam Pauls
- Best Paper Award, NAACL 2010, for "Coreference Resolution in a Modular, Entity-Centered Model" with Aria Haghighi
- Distinguished Paper, EMNLP 2012, for "Training
Factored PCFGs with Expectation Propagation" with David Hall
Introduction to AI: At the undergraduate level, I teach
cs188, the
undergraduate introduction to artificial intelligence here at Berkeley,
which I have been actively developing since 2006. We are now
cs188x, a free online version of cs188 (joint with
Pieter Abbeel).
The cs188 projects we developed are available for use by other instructors
-- see
(with John DeNero).
Statistical NLP: At the graduate level, I teach
cs288, the statistical NLP course
here at Berkeley.
My tutorials are below, in the publication list.
My newest publications are always available at my group's web page.
- Automated reconstruction of ancient languages using probabilistic models of sound change, Alexandre Bouchard-Cote, David Hall, Thomas L. Griffiths, and Dan Klein, Proceedings of the National Academy of Sciences 2013. [pdf]
- Unsupervised Transcription of Historical Documents, Taylor Berg-Kirkpatrick, Greg Durrett, and Dan Klein, Proceedings of ACL 2013. [pdf]
- Decentralized Entity-Level Modeling for Coreference Resolution, Greg Durrett, David Hall, and Dan Klein, Proceedings of ACL 2013. [pdf]
- An Empirical Examination of Challenges in Chinese Parsing, Jonathan K. Kummerfeld, Daniel Tse, James R. Curran, and Dan Klein, Proceedings of ACL (Short Papers) 2013. [pdf]
- Faster Optimal Planning with Partial-Order Pruning, David Hall, Aloni Cohen, David Burkett, and and Dan Klein, Proceedings of ICAPS 2013. [pdf]
- Training Factored PCFGs with Expectation Propagation, David Hall and Dan Klein, Proceedings of EMNLP 2012. [pdf]
- An Empirical Investigation of Statistical Significance in NLP, Taylor Berg-Kirkpatrick, David Burkett, and Dan Klein, Proceedings of EMNLP 2012. [pdf]
- Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output, Jonathan K. Kummerfeld, David Hall, James R. Curran, and Dan Klein, Proceedings of EMNLP 2012. [pdf]
- Transforming Trees to Improve Syntactic Convergence, David Burkett and Dan Klein, Proceedings of EMNLP 2012. [pdf]
- Syntactic Transfer Using a Bilingual Lexicon, Greg Durrett, Adam Pauls, and Dan Klein, Proceedings of EMNLP 2012. [pdf]
- Coreference Semantics from Web Features, Mohit Bansal and Dan Klein, Proceedings of ACL 2012. [pdf]
- Robust Conversion of CCG Derivations to Phrase Structure Trees, Jonathan K. Kummerfeld, James R. Curran, and Dan Klein, Proceedings of ACL (Short Papers) 2012. [pdf]
- Large-Scale Syntactic Language Modeling with Treelets, Adam Pauls and Dan Klein, Proceedings of ACL 2012. [pdf]
- Fast Inference in Phrase Extraction Models with Belief Propagation, David Burkett and Dan Klein, Proceedings of NAACL 2012. [pdf]
- Web-Scale Features for Full-Scale Parsing, Mohit Bansal and Dan Klein, Proceedings of ACL 2011. [pdf]
- The Surprising Variance in Shortest-Derivation Parsing, Mohit Bansal and Dan Klein, Proceedings of ACL 2011. [pdf]
- Jointly Learning to Extract and Compress, Taylor Berg-Kirkpatrick, Dan Gillick, and Dan Klein, Proceedings of ACL 2011. [pdf]
- An Empirical Investigation of Discounting in Cross-Domain Language Models, Greg Durrett and Dan Klein, Proceedings of ACL 2011. [pdf]
- Learning Dependency-Based Compositional Semantics, Percy Liang, Michael I. Jordan, and Dan Klein, Proceedings of ACL 2011. [pdf]
- Faster and Smaller N-Gram Language Models, Adam Pauls and Dan Klein, Proceedings of ACL 2011.
- Large-Scale Cognate Recovery, David Hall and Dan Klein, Proceedings of EMNLP 2011. [pdf]
- Simple Effective Decipherment via Combinatorial Optimization, Taylor Berg-Kirkpatrick and Dan Klein, Proceedings of EMNLP 2011. [pdf]
- Mention Detection: Heuristics for the OntoNotes annotations, Jonathan K. Kummerfeld, Mohit Bansal, David Burkett, and Dan Klein, Proceedings of CoNLL 2011. [pdf]
- Iterative Monotonically Bounded A*, David Burkett, David Hall, and Dan Klein, AAAI 2011. [pdf]
- A Game-Theoretic Approach to Generating Spatial Descriptions, Dave Golland, Percy Liang, and Dan Klein, In proceedings of EMNLP 2010. [pdf]
- A Simple Domain-Independent Probabilistic Approach to Generation, Gabor Angeli, Percy Liang,
and Dan Klein, In proceedings of EMNLP 2010. [pdf]
- Learning Programs: A Hierarchical Bayesian Approach, Percy Liang, Michael Jordan, and Dan Klein, In proceedings of ICML 2010. [pdf]
- Learning Better Monolingual Models with Unannotated Bilingual Text, David Burkett, John Blitzer, and Dan Klein, In proceedings of CoNLL 2010. [pdf]
- An Entity-Level Approach to Information Extraction, Aria Haghighi and Dan Klein, In proceedings of ACL 2010. [pdf]
- Discriminative Modeling of Extraction Sets for Machine Translation, John DeNero and Dan Klein, In proceedings of ACL 2010. [pdf]
- Top-Down K-Best A* Parsing, Adam Pauls, Dan Klein, and Chris Quirk, In proceedings of ACL 2010. [pdf]
- Hierarchical A* Parsing with Bridge Outside Scores, Adam Pauls and Dan Klein, In proceedings of ACL 2010. [pdf]
- Simple, Accurate Parsing with an All-Fragments Grammar, Mohit Bansal and Dan Klein, In proceedings of ACL 2010. [pdf]
- Phylogenetic Grammar Induction, Taylor Berg-Kirkpatrick and Dan Klein, In proceedings of ACL 2010. [pdf]
- Finding Cognate Groups using Phylogenies, David LW Hall and Dan Klein, In proceedings of ACL 2010. [pdf]
- Coreference Resolution in a Modular, Entity-Centered Model, Aria Haghighi and Dan Klein, In proceedings of NAACL 2010. [pdf]
- Joint Parsing and Alignment with Weakly Synchronized Grammars, David Burkett, John Blitzer, and Dan Klein, In proceedings of NAACL 2010. [pdf]
- Type-Based MCMC, Percy Liang, Michael Jordan, and Dan Klein, In proceedings of NAACL 2010. [pdf]
- Painless Unsupervised Learning with Features, Taylor Berg-Kirkpatrick, John DeNero, and Dan Klein, In proceedings of NAACL 2010. [pdf]
- Unsupervised Syntactic Alignment with Inversion Transduction Grammars, Adam Pauls, David Chiang, and Kevin Knight, In proceedings of NAACL 2010. [pdf]
- Probabilistic grammars and hierarchical Dirichlet processes, Percy Liang, Michael Jordan, and Dan Klein, Book chapter in The Oxford Handbook of Applied Bayesian Analysis 2009. [pdf]
- Consensus Training for Consensus Decoding in Machine Translation, Adam Pauls, John DeNero, and Dan Klein, In proceedings of EMNLP 2009. [pdf]
- Asynchronous Binarization for Synchronous Grammars, John DeNero, Adam Pauls, and Dan Klein, In proceedings of ACL-IJCNLP Short Paper Track 2009. [pdf]
- Better Word Alignments with Supervised ITG Models, Aria Haghighi, John Blitzer, John DeNero, and Dan Klein, In proceedings of ACL-IJCNLP 2009. [pdf]
- Simple Coreference Resolution with Rich Syntactic and Semantic Features, Aria Haghighi and Dan Klein, In proceedings of EMNLP 2009. [pdf]
- Efficient Parsing for Transducer Grammars, John DeNero, Mohit Bansal, Adam Pauls, and Dan Klein, In proceedings of NAACL 2009. [pdf]
- Convergence Bounds for Language Evolution by Iterated Learning, Anna N. Rafferty, Thomas L. Griffiths, and Dan Klein, In Proceedings of the 31st Annual Conference of the Cognitive Science Society 2009. [pdf]
- Learning Semantic Correspondences with Less Supervision, Percy Liang, Michael Jordan, and Dan Klein, In proceedings of ACL 2009. [pdf] [slides]
- Learning from Measurements in Exponential Families, Percy Liang, Michael Jordan, and Dan Klein, In proceedings of ICML 2009. [pdf] [slides]
- Online EM for Unsupervised Models, Percy Liang and Dan Klein, In proceedings of NAACL 2009. [pdf] [slides]
- K-Best A* Parsing, Adam Pauls and Dan Klein, In Proceedings of ACL 2009. [pdf]
- Hierarchical Search for Parsing, Adam Pauls and Dan Klein, In Proceedings of NAACL 2009. [pdf]
- Efficient Inference in Phylogenetic InDel Trees , Alexandre Bouchard-Côté, Michael I. Jordan, and Dan Klein, In proceedings of NIPS 2009. [pdf]
- Improved Reconstruction of Protolanguage Word Forms, Alexandre Bouchard-Côté, Thomas Griffiths, and Dan Klein, In proceedings of NAACL 2009. [pdf]
- Coarse-to-Fine Syntactic Machine Translation using Language Projections, Slav Petrov, Aria Haghighi and Dan Klein, In proceedings of EMNLP 2008. [pdf] [bib] [slides]
- Sparse Multi-Scale Grammars for Discriminative Latent Variable Parsing, Slav Petrov and Dan Klein, In proceedings of EMNLP 2008. [pdf] [bib] [slides]
- Two Languages are Better than One (for Syntactic Parsing), David Burkett and Dan Klein, In proceedings of EMNLP 2008. [pdf]
- Sampling Alignment Structure under a Bayesian Translation Model, John DeNero, Alex Bouchard-Côté, and Dan Klein, In proceedings of EMNLP 2008. [pdf]
- Fully Distributed EM for Very Large Datasets, Jason Wolfe, Aria Haghighi, and Dan Klein, In proceedings of ICML 2008. [pdf] [slides]
- Learning Bilingual Lexicons from Monolingual Corpora, Aria Haghighi, Taylor Berg-Kirkpatrick, and Dan Klein, In proceedings of ACL 2008. [pdf] [slides]
- Structured Compilation: Trading off Structure for Features, Percy Liang, Hal Daume, and Dan Klein, In proceedings of ICML 2008. [pdf] [slides]
- Analyzing the Errors of Unsupervised Induction, Percy Liang and Dan Klein, In proceedings of ACL 2008. [pdf] [slides]
- The Complexity of Phrase Alignment Models, John DeNero and Dan Klein, In proceedings of ACL Short Paper Track 2008. [pdf] [slides]
- Discriminative Log-Linear Grammars with Latent Variables, Slav Petrov and Dan Klein, In proceedings of NIPS 2008. [pdf] [bib] [slides]
- Efficient Sentence Segmentation using Syntactic Features, Benoit Favre, Dile Hakkani-Tur, Slav Petrov and Dan Klein, In proceedings of SLT 2008. [pdf] [bib] [slides]
- A Probabilistic Approach to Language Change, Alexandre Bouchard-Côté, Thomas Griffiths, and Dan Klein, In proceedings of NIPS 2008. [pdf] [slides]
- Agreement-Based Learning, Percy Liang, Dan Klein, and Michael Jordan, In proceedings of NIPS 2008. [pdf] [slides]
- Mixture-of-Parents Maximum Entropy Markov Models, David Rosenberg, Dan Klein, and Ben Taskar, In proceedings of Uncertainty in Artificial Intelligence (UAI) 2007. [pdf]
- A Probabilistic Approach to Diachronic Phonology, Alexandre Bouchard-Côté, Percy Liang, Thomas Griffiths, and Dan Klein, In proceedings of EMNLP 2007. [pdf] [slides]
- The Infinite PCFG using Hierarchical Dirichlet Processes, Percy Liang, Slav Petrov, Michael Jordan, and Dan Klein, In proceedings of EMNLP 2007. [pdf] [slides]
- Learning Structured Models for Phone Recognition, Slav Petrov, Adam Pauls, and Dan Klein, In proceedings of EMNLP-CoNLL 2007. [pdf] [slides] [bib]
- A* Search via Approximate Factoring, Aria Haghighi, John DeNero, and Dan Klein, In proceedings of AAAI (Nectar Track) 2007. [pdf]
- Learning and Inference for Hierarchically Split PCFGs, Slav Petrov and Dan Klein, In proceedings of AAAI (Nectar Track) 2007. [pdf] [slides] [bib]
- Unsupervised Coreference Resolution in a Nonparametric Bayesian Model, Aria Haghighi and Dan Klein, In proceedings of ACL 2007. [pdf] [slides] [bib]
- Tailoring Word Alignments to Syntactic Machine Translation, John DeNero and Dan Klein, In proceedings of ACL 2007. [pdf] [slides]
- Improved Inference for Unlexicalized Parsing, Slav Petrov and Dan Klein, In proceedings of HLT-NAACL 2007. [pdf] [slides] [bib]
- Approximate Factoring for A* Search, Aria Haghighi, John DeNero, and Dan Klein, In proceedings of HLT-NAACL 2007. [pdf] [slides] [bib]
- Learning Accurate, Compact, and Interpretable Tree Annotation, Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein, In proceedings of COLING-ACL 2006. [pdf] [slides] [bib]
- Non-Local Modeling with a Mixture of PCFGs, Slav Petrov, Leon Barrett, and Dan Klein, In proceedings of CoNLL 2006. [pdf] [slides] [bib]
- An End-to-End Discriminative Approach to Machine Translation, Percy Liang, Alexandre Bouchard-Côté, Dan Klein, and Ben Taskar, In proceedings of COLING-ACL 2006. [pdf] [slides] [bib]
- Why Generative Phrase Models Underperform Surface Heuristics, John DeNero, Dan Gillick, James Zhang, and Dan Klein, Workshop on Statistical Machine Translation at HLT-NAACL 2006. [pdf] [slides] [bib]
- Alignment by Agreement, Percy Liang, Ben Taskar, and Dan Klein, In proceedings of NAACL 2006. [pdf] [slides] [bib]
- Prototype-Driven Learning for Sequence Models, Aria Haghighi and Dan Klein, In proceedings of HLT-NAACL 2006. [pdf] [slides] [bib]
- Prototype-Driven Grammar Induction, Aria Haghighi and Dan Klein, In proceedings of COLING-ACL 2006. [pdf] [slides] [bib]
- Word Alignment Via Quadratic Assignment, Simon Lacoste-Julien, Ben Taskar, Dan Klein, and Michael Jordan, In proceedings of NAACL 2006. [pdf] [bib]
- A Discriminative Matching Approach to Word Alignment, Ben Taskar, Simon Lacoste-Julien, and Dan Klein, In proceedings of EMNLP 2005. [pdf] [bib]
- The Unsupervised Learning of Natural Language Structure, Dan Klein, Ph.D. Thesis, Stanford University 2005. [pdf]
- Unsupervised Learning of Field Segmentation Models for Information Extraction, Trond Grenager, Dan Klein, and Chris Manning, In Proceedings of the Association for Computational Linguistics (ACL) 2005. [pdf]
- Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency, Dan Klein and Chris Manning, In Proceedings of the Association for Computational Linguistics (ACL) 2004. [pdf]
- Max-Margin Parsing, Ben Taskar, Dan Klein, Michael Collins, Daphne Koller, and Chris Manning, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2004. [pdf]
- Review of Data-Oriented Parsing, edited by Rens Bod, Remko Scha, and Khalil Sima'an, Dan Klein, Computational Linguistics 2004.
- Accurate Unlexicalized Parsing, Dan Klein and Chris Manning, In Proceedings of the Association for Computational Linguistics (ACL) 2003. [pdf]
- Factored A* Search for Models over Sequences and Trees, Dan Klein and Chris Manning, In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) 2003. [pdf]
- A* Parsing: Fast Exact Viterbi Parse Selection, Dan Klein and Chris Manning, In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL) 2003. [pdf]
- Named Entity Recognition with Character-Level Models, Dan Klein, Joseph Smarr, Huy Nguyen, and Chris Manning, In Proceedings of the Conference on Natural Language Learning (CoNLL) 2003. [pdf]
- Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network, Kristina Toutanova, Dan Klein, Chris Manning, and Yoram Singer, In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL) 2003. [pdf]
- Spectral Learning, Sepandar Kamvar, Dan Klein, and Chris Manning, In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) 2003. [pdf]
- A Generative Constituent-Context Model for Improved Grammar Induction, Dan Klein and Chris Manning, In Proceedings of the Association for Computational Linguistics (ACL) 2002. [pdf]
- Parsing and Hypergraphs, Dan Klein and Chris Manning, Bunt, Carroll, and Satta, eds., New Developments in Parsing Technology, Kluwer Academic Publishers 2002.
- Fast Exact Inference with a Factored Model for Natural Language Processing, Dan Klein and Chris Manning, In Advances in Neural Information Processing Systems 15 (NIPS) 2002. [pdf]
- Conditional Structure versus Conditional Estimation in NLP Models, Dan Klein and Chris Manning, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2002. [pdf]
- Combining Heterogeneous Classifiers for Word-Sense Disambiguation, Dan Klein, Kristina Toutanova, Tolga Ilhan, Sepandar Kamvar, and Chris Manning, ACL Workshop on Word Sense Disambiguation 2002. [pdf]
- From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering, Dan Klein, Sepandar Kamvar, and Chris Manning, In Proceedings of the International Conference on Machine Learning (ICML) 2002. [pdf]
- Interpreting and Extending Classical Agglomerative Clustering Algorithms using a Model-Based Approach, Sepandar Kamvar, Dan Klein, and Chris Manning, In Proceedings of the International Conference on Machine Learning (ICML) 2002. [pdf]
- Evaluating Strategies for Similarity Search on the Web, Taher Haveliwala, Aristides Gionis, Dan Klein,, and Piotr Indyk, In Proceedings of the International World Wide Web Conference (WWW) 2002. [pdf]
- Natural Language Grammar Induction Using a Constituent-Context Model, Dan Klein and Chris Manning, In Advances in Neural Information Processing Systems (NIPS) 2001. [pdf]
- Distributional Phrase Structure Induction, Dan Klein and Chris Manning, In Proceedings of the Conference on Natural Language Learning (CoNLL) 2001. [pdf]
- Parsing and Hypergraphs, Dan Klein and Chris Manning, In Proceedings of the International Workshop on Parsing Technologies (IWPT) 2001. [pdf]
- Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank, Dan Klein and Chris Manning, In Proceedings of the Association for Computational Linguistics (ACL) 2001. [pdf]
- An O(n^3) Agenda-Based Chart Parser for Arbitrary Probabilistic Context-Free Grammars, Dan Klein and Chris Manning, Stanford Technical Report 2001. [pdf]
- Variational Inference in Structured NLP Models, Presented at NAACL 2012 with David Burkett. [pdf]
- Structured Bayesian Nonparametric Models with Variational Inference, Presented at ACL 2007 with Percy Liang. [pdf]
- Introduction to Classification: Likelihoods, Margins, Features, and Kernels, Presented at NAACL 2007. [pdf]
- Machine Learning for Natural Language Processing: New Developments and Challenges, Presented at NIPS 2006.
- Max-Margin Methods for NLP: Estimation, Structure, and Applications, Presented at ACL 2005 with Ben Taskar. [pdf]
- Maxent Models, Conditional Estimation, and Optimization, without the Magic, Presented at NAACL 2003 and ACL 2003 with Chris Manning. [pdf slides] [pdf handouts]
- Lagrange Multipliers without Permanent Scarring. Permanently in rough draft form, it seems! [pdf-draft]
I do actually exist outside of the CS/linguistics world. I took
karate for most of my life, and then spent many years with ballroom
dance. Competitive ballroom dance is just like karate, but with more music
and less scowling. I competed and taught for the Stanford
Ballroom Dance Team, and previously competed for the Cornell
Team and the Oxford Team.
Last modified: