CS 288: Statistical Natural Language Processing, Fall 2014

 
Instructor: Dan Klein
Lecture: Tuesday and Thursday 11:00am-12:30pm, 320 Soda Hall
Office Hours: Tuesday 12:30pm-2:00pm 730 SDH
 
GSI: Greg Durrett
Office Hours: Thursday 3:00pm-5:00pm 751 Soda (alcove)
 
Forum: Piazza

Announcements

11/6/14:  Project 5 has been released. It is due November 26 at 5pm.
10/18/14:  Project 4 has been released. It is due November 7 at 5pm.
9/30/14:  Project 3 has been released. It is due October 17 at 5pm.
9/16/14:  Project 2 is now due September 29 at 5pm.
9/15/14:  Project 2 has been released. It is due September 26 at 5pm.
8/30/14:  Project 1 has been released. It is due September 12 at 5pm.
8/28/14:  The previous website has been archived.

Description

This course will explore current statistical techniques for the automatic analysis of natural (human) language data. The dominant modeling paradigm is corpus-driven statistical learning, with a split focus between supervised and unsupervised methods.  This term, we are introducing a few new projects to give increased hands-on experience with a greater variety of NLP tasks and commonly used techniques.

This course assumes a good background in basic probability and a strong ability to program in Java. Prior experience with linguistics or natural languages is helpful, but not required.  There will be a lot of statistics, algorithms, and coding in this class.  The recommended background is CS 188 (or CS 281A) and CS 170 (or CS 270).  An A in CS 188 (or CS 281A) is required.  This course will be more work-intensive than most graduate or undergraduate courses.

Readings

The primary recommended texts for this course are:

Note that M&S is free online.  Also, make sure you get the purple 2nd edition of J+M, not the white 1st edition.

Syllabus [subject to substantial change!]

Week Date Topics Readings Assignments (Out)
1 Aug 28 Course Introduction J+M 1, M+S 1-3
2 Sept 2 Language Modeling I J+M 4, M+S 6, Chen & Goodman, Interpreting KN P1: Language Modeling
Sept 4 Language Modeling II Massive Data, Bloom, Perfect, Efficient LMs  
3 Sept 9 Language Modeling III (1PP)  
Sept 11 Speech Recognition I (1PP) J+M 7
4 Sept 16 Speech Recognition II (1PP) J+M 9, Decoding P2: ASR
Sept 18 Speech Recognition III, HMMs (1PP)
5 Sept 23 POS Tagging, NER, CRFs (1PP) J+M 5, Brants, Toutanova & Manning  
Sept 25 Parsing I (1PP) M+S 3.2, 12.1, J+M 13
6 Sept 30 Parsing II (1PP) M+S 11, J+M 14, Best-First, A*, Unlexicalized P3: PCFG Parser
Oct 2 Parsing III (1PP) Split, Lexicalized, K-Best A*, Coarse-to-fine  
7 Oct 7 Parsing IV (1PP)  
Oct 9 No class  
8 Oct 14 Structured Classification I (1PP)
Oct 16 Structured Classification II (1PP) LOGO, Pegasos, Cutting Plane P4: Discriminative Reranker
9 Oct 21 Structured Classification III (1PP)
Oct 23 Semantics I (1PP) J+M 16, 18, 19, Parsing to LF  
10 Oct 28 Machine Translation: Alignment I (1PP) J+M 25, IBM Models, HMM, Agreement, Discriminative
Oct 30 Machine Translation: Alignment II (1PP)
11 Nov 4 Machine Translation: Phrase-Based (1PP) Decoding
Nov 6 Machine Translation: Syntactic (1PP) Hiero, String-Tree, Tree-String, Tree-Tree P5: Machine Translation
12 Nov 11 Veteran's Day
Nov 13 Coreference Resolution (1PP) Rule-based, Learning-based
13 Nov 18 Grounded Semantics (1PP)  
Nov 20 Summarization and Music Transcription (1PP) Topic-based, N-gram based
14 Nov 25 Question Answering (1PP) N-gram-based, Grammar-based  
Nov 27 Thanksgiving Day
15 Dec 2 Diachronics (1PP) Reconstruction
Dec 4 Decipherment and OCR (1PP, no 6PP) OCR