CS 294-5: Statistical Natural Language Processing, Fall 2005 |
|
Instructor: Dan Klein | |
Lecture: Mondays and Wednesdays, 1:00-2:30pm, 310 Soda Hall | |
Office Hours: Mondays and Wednesdays 2:30-3:30pm in 775 Soda Hall, or by appointment |
Announcements
11/4/05: Homework 5
10/18/05: Homework
4
10/18/05: Section on 10/21 in Soda, 1-2pm, on word alignment
10/15/05: Extension: Homework 3 due on Wednesday 10/19
10/4/05: Homework 3
10/1/05: Update: No class on 10/3 or 10/5 (HW2 still late if timestamped after 10/3)
9/26/05: No class on 10/5
9/26/05: Invite: StatNLP lunch, Tuesdays 12:30 in Soda 373 [topic]
9/26/05: Final
project guidelines
9/19/05: Homework 2
9/19/05: Reminder: my office hours are cancelled on Tuesday, but I'll be back on Wednesday.
9/14/05: Update: Aria's office hours will be extended to F 12-3 in Soda 493, at least this week.
9/12/05: Aria's office hours will be F 12-1 in Soda 493
9/11/05: My office hours have moved, by popular demand, to T 11-12, W 2:30-3:30
9/02/05: Want a Millennium account? Fill out the form
by Tuesday morning if you want it soon.
8/31/05: Problems with the newsgroup? Check here.
8/31/05: Homework 1
8/31/05: Accounts and
access
8/29/05: Class policies
8/29/05: Class questionnaire
8/16/05: The course newsgroup is
ucb.class.cs294-5. If you use it, I'll use it!
8/16/05: The previous website has been archived.
Description
This course will explore current statistical techniques for the automatic
analysis of natural (human) language data. The dominant modeling paradigm is
corpus-driven supervised learning, but unsupervised methods and even hand-coded
rule-based systems will be mentioned when appropriate.
In the first part of the course, we will examine the core tasks in natural
language processing, including language modeling, word-sense disambiguation,
morphological analysis, part-of-speech tagging, syntactic parsing, semantic
interpretation, coreference resolution, and discourse analysis. In each case, we
will discuss which linguistic features are relevant to the task, how to design
efficient models which can accommodate those features, and how to estimate
parameters for such models in data-sparse contexts. In the second part of the
course, we will explore how these core techniques can be applied to user
applications such as information extraction, question answering, speech
recognition, machine translation, and interactive dialog systems.
Course assignments will highlight several core NLP tasks. For each task, we will
construct a basic system, then improve it through a cycle of linguistic error
analysis and model redesign. There will also be a final project, which will
investigate a single topic or application in greater depth. This course assumes
a familiarity with basic probability and the ability to program in Java. Prior
experience with linguistics or natural languages is helpful, but not required.
Readings
The texts for this course are:
The former is loosely required (i.e. you'll want access to a copy) while the latter is recommended as supplementary reading. Both are on reserve in the Engineering library.
Syllabus