CS 294-19: Statistical Natural Language Processing, Spring 2008 |
![]() |
Instructor:
Dan Klein Lecture: Tuesday and Thursday, 11:00am-12:30pm, 320 Soda Hall Office Hours: Tuesday and Thursday 12:30-1:30pm in 775 Soda Hall. |
|
GSI:
Aria Haghighi Section: Wed. 12-1pm in 320 Soda Office Hours: Wed. 1-5pm in 525 Soda Hall (or by appointment) |
Announcements
3/17/08:
Assignment 5 is
posted, due April 3.
2/17/08: Telebears mistakenly kicked out half the class. I'm working
on it -- but don't panic, you're all in the class.
2/16/08: Assignment
3 is posted, due Feb 28. A few days extension is likely, but start early as
your experiments will be compute intensive!
2/12/08: New section time! Friday 4-5pm in 320 Soda, starting this week.
2/4/08: Assignment 2
is posted, due Feb 14.
2/3/08:
Section time survey for possible new section time. Please vote even if you
like the current time. This week (2/6) section still W 12-1pm.
2/2/08:
Newsgroup exists!
2/2/08: Readers now available at Copy Central at Hearst and Euclid.
1/29/08: Section is now Wednesday 12-1pm in 320 Soda.
1/21/08: Assignment
1 is posted, due Feb 5.
1/21/08: The course newsgroup is
ucb.class.cs294-19. If you use it, I'll use it!
1/21/08: The previous
website has been archived.
Description
This course will explore current statistical techniques for the automatic
analysis of natural (human) language data. The dominant modeling paradigm is
corpus-driven statistical learning, with a split focus between supervised and
unsupervised methods.
In the first part of the course, we will examine the core tasks in natural
language processing, including language modeling, word-sense disambiguation,
morphological analysis, part-of-speech tagging, syntactic parsing, semantic
interpretation, coreference resolution, and discourse analysis. In each case, we
will discuss which linguistic features are relevant to the task, how to design
efficient models which can accommodate those features, and how to learn with
such models in data-sparse contexts. In the second part of the course, we will
explore how these core techniques can be applied to user applications such as
information extraction, question answering, speech recognition, machine
translation, and interactive dialog systems.
Course assignments will highlight several core NLP tasks. For each task, we will
construct a basic system, then improve it through a cycle of linguistic error
analysis and model redesign. There will also be a final project, which will
investigate a single topic or application in greater depth. This course assumes
a good familiarity with basic probability and a strong ability to program in
Java. Prior experience with linguistics or natural languages is helpful, but not
required. Disclaimer: there will be a lot of statistics and algorithms in
this class, as well as some serious coding.
Readings
The primary texts for this course are:
I recommend both, but everything that you need is online.
Syllabus [subject to substantial change!]