CS 288: Statistical Natural Language Processing, Spring 2010 |
|
Instructor:
Dan Klein Lecture: Monday and Wednesday, 2:30pm-4:00pm, 405 Soda Hall Office Hours: Monday 4pm-5pm and Thursday 2:30pm-3:30pm in 724 (or 730) Sutardja Dai Hall. |
|
Announcements
1/19/10: The course newsgroup is
ucb.class.cs288. If you use it, I'll use it!
1/19/10: The previous website has
been archived.
1/19/10: Assignment 1 is posted.
2/2/10: Assignment 2 is posted.
2/18/10: Assignment 3 is posted.
3/5/10: Comments on writeups posted.
3/7/10: Assignment 4 is posted.
4/4/10: Final project guidelines are posted.
4/4/10: Assignment 5 is posted.
4/12/10: There are extra office hours on 4/12 (4-6pm) and 4/15 (2:30-4:00pm).
Description
This course will explore current statistical techniques for the automatic
analysis of natural (human) language data. The dominant modeling paradigm is
corpus-driven statistical learning, with a split focus between supervised and
unsupervised methods.
In the first part of the course, we will examine the core tasks in natural
language processing, including language modeling, syntactic analysis, semantic
interpretation, coreference resolution, and discourse analysis. In each case, we
will discuss the underlying linguistic phenomena, which features are relevant to the task, how to design
efficient models which can accommodate those features, and how to learn such models. In the second part of the course, we will
explore how these core techniques can be applied to user applications such as
information extraction, question answering, speech recognition, machine
translation, and interactive dialog systems.
Course assignments will highlight several core NLP tasks and methods. For each task, you
will construct a basic system, then improve it through a cycle of linguistic
error analysis and model redesign. There will also be a final project, which
will investigate a single topic or application in greater depth. This course
assumes a good background in basic probability and a strong ability to program
in Java. Prior experience with linguistics or natural languages is helpful, but
not required. There will be a lot of statistics, algorithms,
and coding in this class.
Readings
The primary texts for this course are:
Note that M&S is free online. Also, make sure you get the purple 2nd edition of J+M, not the white 1st edition.
Syllabus [subject to substantial change!]
Week | Date | Topics | Techniques | Readings | Assignments (Out) | Assignments (Due) |
1 | Jan 20 | Course Introduction [6PP] [2PP] | J+M 1, M+S 1-3 | HW1: Language Models | ||
2 | Jan 25 | Words: Language Modeling [6PP] [2PP] | N-Grams, Smoothing | J+M 4, M+S 6, Chen & Goodman, Interpreting KN Massive Data | ||
Jan 27 | Words: LMs II [6PP] [2PP] | Smoothing, Naive Bayes | M+S 7, Event Models | |||
3 | Feb 1 | Words: Text Cat [6PP] [2PP] | ||||
Feb 3 | Words: WSD [6PP] [2PP] | Maxent | Classification Tutorial, Maxent Tutorial 1, 2, J+M 6 | HW2: PNP Classification | HW1 | |
4 | Feb 8 | Parts-of-Speech: Tagging [6PP] [2PP] | HMMs/CRFs | J+M 5,
Toutanova &
Manning, Brants, Brill |
|
|
Feb 10 | Parts-of-Speech: Induction [6PP] [2PP] | EM | J+M 6, M+S 9-10, HMM Learning, Distributional Clustering, Johnson | |||
5 | Feb 15 |
NO CLASS |
||||
Feb 17 | Speech Recognition [6PP] [2PP] | Speech Signal | J+M 7 | HW3: POS Tagging | HW2 | |
6 | Feb 22 | Speech Recognition II [6PP] [2PP] | Acoustic Modeling | J+M 9 | ||
Feb 24 | Interlude: Competitive Parsing | |||||
7 | Mar 1 | Interlude: Competitive Parsing | ||||
Mar 3 | Syntax: PCFGs [6PP] [2PP] | M+S 3.2, 12.1, J+M 11 | ||||
8 | Mar 8 | Syntax: Algorithms [6PP] [2PP] | M+S 11, J+M 12, Best-First, A*, K-best | HW4: Parsing | HW3 | |
Mar 10 | Syntax: Richer Models [6PP] [2PP] | Unlexicalized, Split, Lexicalized | ||||
9 | Mar 15 |
NO CLASS |
||||
Mar 17 | Syntax: Grammar Induction [6PP] [2PP] | |||||
10 | Mar 22 | Spring Break | ||||
Mar 24 | Spring Break | |||||
11 | Mar 29 | Machine Translation I [6PP] [2PP] | Word-Based Models | J+M 25, IBM Models, HMM Agreement Discriminative, Decoding | ||
Mar 31 | Machine Translation II [6PP] [2PP] | Phrase-Based Systems | Decoding, Learning Phrases | FP Guidelines | HW4 | |
12 | Apr 5 | Machine Translation III [6PP] [2PP] | Syntactic Systems | GHKM, Vs Phrases, Decoding | HW5: Machine Translation | |
Apr 7 | Semantics: Roles [6PP] [2PP] | J+M 16, 19 | ||||
13 | Apr 12 | Semantics: Compositional [6PP] [2PP] | Manning, J+M 18 | |||
Apr 14 | Semantics: Interpretation [6PP] [2PP] | Parsing to LF | ||||
14 | Apr 19 | Discourse: Coreference [6PP] [2PP] | Supervised, Unsupervised, J+M 21 | HW5 | ||
Apr 21 | Discourse: Summarization [6PP] [2PP] | Topic-based, N-gram based | ||||
15 | Apr 26 | Question Answering [6PP] [2PP] | N-gram-based, Grammar-based | |||
Apr 28 | Diachronics [6PP] [2PP] | Reconstruction | FP Due May 21 |