CS 294-7: Statistical Natural Language Processing, Spring 2007 |
|
Instructor: Dan
Klein Lecture: Mondays and Wednesdays, 1:00-2:30pm, 405 Soda Hall Office Hours: Mondays and Wednesdays 2:30-3:30pm in 775 Soda Hall |
|
GSI: Aria
Haghighi Section: Friday 11am-12pm, 320 Soda Hall Office Hours: Tuesdays and Thursdays 2:10-3:10pm in 525 Soda Hall |
Announcements
4/1/07: Assignment 5
is posted, due April 16th.
3/20/07: Final project
guidelines are up.
3/6/07: Assignment 4 is posted, due March
21st.
2/20/07: Assignment 3 is posted, due March
7th.
2/4/07: Assignment 2 is posted, due Feb
21st.
1/27/07: Assignment 1 is posted, due Feb
7th.
1/17/07: The course newsgroup is ucb.class.cs294-7.
If you use it, I'll use it!
1/17/07: The previous website has been
archived.
Description
This course will explore current statistical techniques
for the automatic analysis of natural (human) language data. The dominant
modeling paradigm is corpus-driven supervised learning, but unsupervised methods
and even hand-coded rule-based systems will be mentioned when appropriate.
In the first part of the course, we will examine the core tasks in natural
language processing, including language modeling, word-sense disambiguation,
morphological analysis, part-of-speech tagging, syntactic parsing, semantic
interpretation, coreference resolution, and discourse analysis. In each case, we
will discuss which linguistic features are relevant to the task, how to design
efficient models which can accommodate those features, and how to estimate
parameters for such models in data-sparse contexts. In the second part of the
course, we will explore how these core techniques can be applied to user
applications such as information extraction, question answering, speech
recognition, machine translation, and interactive dialog systems.
Course assignments will highlight several core NLP tasks. For each task, we will
construct a basic system, then improve it through a cycle of linguistic error
analysis and model redesign. There will also be a final project, which will
investigate a single topic or application in greater depth. This course assumes
a familiarity with basic probability and the ability to program in Java. Prior
experience with linguistics or natural languages is helpful, but not
required. Disclaimer: there will be a lot of statistics and algorithms in
this class, as well as some serious coding.
Readings
The texts for this course are:
I recommend both, but most of what you need is online.
Syllabus [subject to change!]
Week | Date | Topics | Techniques | Readings | Assignments (Out) | Assignments (Due) |
1 | Jan 17 | Course Introduction | M+S 1-3 | |||
2 | Jan 22 | Language Models (6pp) (2pp) | Multinomial Smoothing | M+S 6, J+M 2nd Ed Ch 4, Chen & Goodman |
||
Jan 24 | Language Models (6pp) (2pp) | More Smoothing | Interpreting KN | HW1: Language Models | ||
3 | Jan 29 | Text Categorization (6pp) (2pp) | Naive-Bayes | M+S 7, Event Models | ||
Jan 31 | Word-Sense Disambiguation (6pp) (2pp) | Maximum Entropy | Tutorial 1, 2, J+M 6 | |||
4 | Feb 5 | Part-of-Speech Tagging (6pp) (2pp) | HMMs | J+M
5, Toutanova
& Manning, Brants, Brill |
HW2: PNP Classification |
|
Feb 7 | Tagging / Word Classes (6pp) (2pp) | J+M 6, M+S 9-10, HMM Learning, Distributional Clustering | HW1 | |||
5 | Feb 12 | Speech Recognition (6pp) (2pp) | J+M 7 | |||
Feb 14 | Speech Recognition (6pp) (2pp) | J+M 9 | ||||
6 | Feb 19 |
NO CLASS |
|
|||
Feb 21 | Machine Translation (6pp) (2pp) | Word Alignment | J+M 24, IBM Models, HMM | HW3: POS Tagging | HW2 | |
7 | Feb 26 | Machine Translation | Word Decoding | Decoding | ||
Feb 28 | Machine Translation (6pp) (2pp) | Phrase Alignment | Phrases | |||
8 | Mar 5 | Machine Translation (6pp) (2pp) | Phrase Alignment | |||
Mar 7 | Syntax (6pp) (2pp) | M+S 3.2, 12.1, J+M 11 | HW4: Machine Translation | HW3 | ||
9 | Mar 12 | Syntactic Ambiguity (6pp) (2pp) | M+S 11, J+M 12 | |||
Mar 14 | Chart Parsing (6pp) (2pp) | PCFGs | ||||
10 | Mar 19 | PCFGs (6pp) (2pp) | Unlexicalized, Split | |||
Mar 21 | Lexicalized Models (6pp) (2pp) | M+S 12.2, J+M 12.3-4, Best-First, A*, Collins, Charniak and Johnson | FP Guidelines | HW4 | ||
11 | Mar 26 | NO CLASS | ||||
Mar 28 | NO CLASS | |||||
12 | Apr 2 | Frame Semantics (6pp) (2pp) | J+M 16, 19 | HW5: Parsing | ||
Apr 4 | Compositional Semantics (6pp) (2pp) | Manning | ||||
13 | Apr 9 | Semantics (6pp) (2pp) | ||||
Apr 11 | Semantics | |||||
14 | Apr 16 | Coreference / QA (6pp) (2pp) |
HW5 |
|||
Apr 18 | Question Answering | |||||
15 | Apr 23 | Syntactic Translation | ||||
Apr 25 | TBD | |||||
16 | Apr 30 | Final Presentations | ||||
May 2 | Final Presentations | |||||
17 | May 7 | Conclusion / Grammar Induction | FPs by May 18 |