Computer Science 294
Practical Machine Learning
(Fall 2009)
Prof. Michael Jordan (jordan-AT-cs)
Lecture: Thursday 5-7pm, Soda 306
Office hours of the lecturer of the week: Mon, 3-4 (751 Soda); Weds, 2-3 (751 Soda)
Office hours of Prof. Jordan: Weds, 3-4 (429 Evans)
This course introduces core statistical machine learning algorithms in a
(relatively) non-mathematical way, emphasizing applied problem-solving. The
prerequisites are light; some prior exposure to basic probability and to linear
algebra will suffice. A list of topics can be found here.
[Announcements]
[Administrivia]
[Lectures]
[Homework]
[Project]
[Readings]
[Software]
-
Dec 1: Prof. Jordan's office hours will be on Friday from 4:00 to 5:00 this week.
-
Nov 19: Project writeups will be due on Tues, December 15th at 5:00 pm
(on bSpace). The poster session will be on Thurs, December 17th at 5:00 in
306 Soda.
- Course prerequisites: some prior exposure to probability and to linear algebra.
- Coursework and grading:
Students will be required to complete bi-weekly homework assignments.
These must be turned in on time to receive credit. There will also
be a final project. A project report will be required and projects
will also be presented in an end-of-term poster session. The homeworks
will count for 60% of the grade and the project will count for
40% of the grade.
- bSpace: use the forum group there to
discuss homeworks, project topics, ask questions about the class, etc.
To access bSpace, simply visit
https://bspace.berkeley.edu
and login using your CalNet ID. If you don't have a CalNet ID,
send an email to jordan-AT-cs to request a guest account.
If you're not registered to the class or the tab for the course doesn't show up,
you can add it by going through My Workspace | Membership, then click
on 'Joinable Sites' and search for 'COMPSCI 294 LEC 034 Fa09'.
There will be bi-weekly homeworks, worth a total of 60% of your grade.
Each homework is due at the beginning of class.
Please keep your responses succinct and clear.
There is no need to attach code.
Turn in your homework on bSpace (click Assignments on the left menu).
Homework 1: [hw1.pdf]. Additional
files for homework 1: [hw1-files.zip].
Direct questions on the classification questions to Mike Jordan
(jordan@eecs) and on the regression questions to Fabian Wauthier (flw@eecs).
Homework 2: [hw2.pdf]. Additional
files for homework 2: [hw2-data.zip].
Direct questions on the clustering questions to Sriram Sankararaman
(sriram_s@eecs) and on the dimensionality reduction questions to Percy
Liang (pliang@eecs).
Homework 3: [hw3.pdf]. Additional
files for homework 3: [hw3.zip].
Direct questions on the feature selection questions to Alex Bouchard
(bouchard@eecs) and on the HMM questions to Alex Simma (asimma@eecs).
Homework 4: [hw4.pdf]. Additional
files for homework 4: [hw4.zip].
Direct questions on the collaborative filtering questions to Lester
Mackey (lmackey@eecs) and on the active learning question to Daniel
Ting (dting@stat).
The project counts for roughly 40% of your grade.
The general idea for the project is to have you apply a
concept from the class in your own research, or explore it
further through experimentation. The evaluation of the
project will be based on the following three deliverables:
- Submit on bSpace one paragraph describing your project plan or ideas by Thursday, November 5.
The idea is to have you start working on the project before
December... Feel free to come to OH to discuss project ideas,
to send emails to the lecturers, or to use the wiki/discussion
group on bSpace to brainstorm ideas.
- Present a poster about your project.
- Submit your project write-up on bSpace.
Readings for the specific sections will be provided in the future. There are
several good resources which contain general information.
-
Hastie, Tibshirani and Friedman.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
Book's web site
-
Witten and Frank.
Data Mining: Practical Machine Learning Tools and Techniques.
Book's web site
-
Andrew Moore's Tutorials are
a collection of PDF tutorials on many of the topics that will be covered in the class.
There is a wide variety of free data mining and machine learning software
available. You might find them useful for doing the homeworks or the final project.
- Weka is a large
Java package implementing many learning algorithms.
- RapidMiner (formerly known as YALE)
is an alternative (and complementary) Java package. It includes a GUI which
allows automation of the whole data path from feature normalization through
feature selection, learning and cross validation.
- SVM-Light and LibSVM are two popular implementations of
various SVM algorithms
- R is an interactive programming
language designed for statistics. Many very useful libraries are available.
Last updated Aug. 22, 2009.