CS 288: Statistical Natural Language Processing, Spring 2011 |
![]() |
Instructor:
Dan Klein Lecture: Tuesday and Thursday 12:30pm-2:00pm, 405 Soda Hall Office Hours: Tuesday and Thursday 3:30pm-4:30pm in 724 (or 730) Sutardja Dai Hall. |
|
GSI:
Adam Pauls Office Hours : Wednesday 4-5pm, 751 Soda Hall |
Announcements
1/16/11: The previous
website has been archived.
1/20/11: Assignment 1 has been posted. It is due on February 3rd.
2/07/11: An online forum has been created for this class. The course staff (Adam) will check this forum regularly and answer questions as they arise. Important class announcements will also be posted here (and on this web page).
2/07/11: Assignment 2 has been posted. It is due on February 24th.
2/10/11: A bug in Assignment 2 has been fixed. Please download the latest version of the code.
2/21/11: Writing comments are available.
Also, some sample write-ups.
2/21/11: Amazon has given us a grant that provides each student with $100 of EC2 credits! Email Adam for an access code. Some (brief) instructions are available.
2/28/11: Another bug in Assignment 2 has been fixed. This bug incorrectly ignored the distortion score when scoring a hypothesis. Please download the latest version of the code.
3/01/11: Assignment 3 has been posted. It is due on March 14th.
3/01/11: Final project guidelines have been posted.
4/08/11: Assignment 4 has been posted. It is due on May 9th.
Description
This course will explore current statistical techniques for the automatic
analysis of natural (human) language data. The dominant modeling paradigm is
corpus-driven statistical learning, with a split focus between supervised and
unsupervised methods. This term has a new syllabus concentrating on
machine translation and, to a lesser extent, structured classification, so (1)
the syllabus is more tentative than usual and (2) the projects will be new this
term.
This course assumes a good background in basic probability and a strong ability
to program in Java. Prior experience with linguistics or natural languages is
helpful, but not required. There will be a lot of statistics, algorithms,
and coding in this class. The recommended background is cs188 (or cs281a)
and cs170 (or cs270). An A in cs 188 (or cs281a) is required. This
course will be more work-intensive than most graduate or undergraduate courses.
Readings
The primary recommended texts for this course are:
Note that M&S is free online. Also, make sure you get the purple 2nd edition of J+M, not the white 1st edition.
Syllabus [subject to substantial change!]
Week | Date | Topics | Techniques | Readings | Assignments (Out) | Assignments (Due) |
1 | Jan 18 | Course Introduction [2PP] [6PP] | J+M 1, M+S 1-3 | |||
Jan 20 | Language Modeling I [2PP] [6PP] | KN / Smoothing | J+M 4, M+S 6, Chen & Goodman, Interpreting KN | HW1: Language Models | ||
2 | Jan 25 | Language Modeling II [2PP] [6PP] | Large Data | Massive Data, Bloom, Perfect, Efficient LMs | ||
Jan 27 | Speech Recognition I [2PP] [6PP] | Phonetics | J+M 7 | |||
3 | Feb 1 | Speech Recognition II [2PP] [6PP] | HMMs | J+M 9 | ||
Feb 3 | Part-of-Speech [2PP] [6PP] | Decoding | J+M 5, Brants, Toutanova & Manning | HW1 | ||
4 | Feb 8 | Phrase-Based MT [2PP] [6PP] | Decoding | HW2: Phrase MT |
|
|
Feb 10 | Alignment I [2PP] [6PP] | IBM Models | J+M 25, IBM Models, HMM Agreement Discriminative, Decoding | |||
5 | Feb 15 | Alignment II [2PP] [6PP] | EM | |||
Feb 17 | Phrase Alignment [2PP] [6PP] | Phrase Alignment | Learning Phrases, Generative | |||
6 | Feb 22 | Structured Classification I [2PP] [6PP] | Margin | |||
Feb 24 | Structured Classification II [2PP] [6PP] | Likelihood | HW2 | |||
7 | Mar 1 | Stuctured Classification III [2PP] [6PP] | Kernels | HW3: Alignment | ||
Mar 3 | Structured Classifiication IV [2PP] [6PP] | Structure | M3Ns, Cutting Plane | |||
8 | Mar 8 | Parsing I [2PP] [6PP] | PCFGs | M+S 3.2, 12.1, J+M 11 | ||
Mar 10 | Parsing II [2PP] [6PP] | PCFGs | M+S 11, J+M 12, Best-First, A*, Unlexicalized | |||
9 | Mar 15 | Parsing III [2PP] [6PP] | Other Models | Split, Lexicalized, K-Best | HW3 | |
Mar 17 | Parsing IV [2PP] [6PP] | Reranking | ||||
10 | Mar 22 | Spring Break | ||||
Mar 24 | Spring Break | |||||
11 | Mar 29 | Syntactic MT I [2PP] [6PP] | Hiero, String-Tree, Tree-String, Tree-Tree | FP Guidelines | ||
Mar 31 | Syntactic MT II [a-2PP] [a-6PP] [b] | |||||
12 | Apr 5 | Semantics I [2PP] [6PP] | SRL / Montague | J+M 16, 19, Manning, J+M 18 | ||
Apr 7 | Semantics II [2PP] [6PP] | LF Parsing | Parsing to LF | HW4: Parsing / Classification | ||
13 | Apr 12 | Semantics III [1PP] | Grounded | |||
Apr 14 | Coreference [2PP] [6PP] | Supervised, Unsupervised, J+M 21 | ||||
14 | Apr 19 | Summarization [2PP] [6PP] | Topic-based, N-gram based | |||
Apr 21 | Question Answering [2PP] [6PP] | N-gram-based, Grammar-based | ||||
15 | Apr 26 | Grammar Induction [2PP] [6PP] | ||||
Apr 28 | Diachronics [2PP] [6PP] | Reconstruction | FP Due May 17th |