U.C. Berkeley CS267/EngC233 Home Page
Applications of Parallel Computers
Spring 2009
M W 9-10:30, 290 Hearst Mining Building
Instructors:
TA:
Vasily Volkov
Office: 447 Soda Hall, (510)642-3979
Office Hours: M 3-4pm
(send email)
Administrative Assistants:
Laura Rebusi / Tammy Johnson
Offices: 563 / 565 Soda Hall
Phones: (510)643-1455 / (510)643-4816
Email: (to Laura) /
(to Tammy)
Link to webcasting of lectures
(Active during lectures only; see below under "Lecture Notes" for archived video).
Syllabus and Motivation
CS267 was originally designed to teach students how to
program parallel computers to efficiently solve
challenging problems in science and engineering,
where very fast computers are required
either to perform complex simulations or to analyze enormous datasets.
CS267 is intended to be useful for students from many departments
and with different backgrounds, although we will assume reasonable
programming skills in a conventional (non-parallel) language,
as well as enough mathematical skills to understand the
problems and algorithmic solutions presented.
CS267 satisfies part of the course requirements for a new
Designated Emphasis ("graduate minor") in
Computational Science and Engineering.
While this general outline remains, a large change
in the computing world has started in the last few years:
not only are the fastest computers
parallel, but nearly all computers will soon be parallel,
because the physics of semiconductor manufacturing
will no longer let conventional sequential processors
get faster year after year, as they have for so long
(roughly doubling in speed every 18 months for
many years). So all programs that need to
run faster will have to become parallel programs.
(It is considered very unlikely that compilers will be
able to automatically find enough parallelism in most
sequential programs to solve this problem.)
For background on this trend toward parallelism, click
here.
This will be a huge change not just for science
and engineering but the entire computing industry,
which has depended on selling new computers by running
their users' programs faster without the users
having to reprogram them. Large research activities
to address this issue are underway at many computer
companies and universities, including
Berkeley's ParLab,
whose research agenda is outlined
here.
While the ultimate solutions to the parallel programming
problem are far from determined, students in CS267 will
get the skills to use some of the best existing parallel programming
tools, and be exposed to a number of open research questions.
Tentative Detailed Syllabus
Grading
There will be several programming assignments to acquaint students
with basic issues in memory locality and parallelism needed for
high performance. Most of the grade will be based on a final project
(in which students are encouraged to work in small interdisciplinary teams),
which could involve parallelizing an interesting application, or
developing or evaluating a novel parallel computing tool. Students
are expected to have identified a likely project by mid semester,
so that they can begin working on it. We will provide many suggestions
of possible projects as the class proceeds.
Announcements
(May 2) Final projects will be due at noon on Monday, May 18.
(May 1; updated 5pm) The lecture on Monday, May 4, will be held in 290 Hearst as usual.
Please ignore previous announcements of a room change.
(Apr 28) On the last class of semester, May 11, we will have a poster session for
class projects. Each student will also give a ~2 minute overview of their project and poster.
We ask students to send URLs for their posters for us to post on this page;
this will permit students at all the campuses to participate in the poster session.
(Apr 3) A webpage defining
commonly used acronyms
is now available.
(Apr 3) On Monday April 6 class will be held in 540 A/B Cory.
(Mar 29) There will be no class on Monday, Mar 30, since many
students will be attending
HotPar,
a conference on parallel computing being held in Berkeley.
(Jan 28) For students who want to try some on-line self-paced
courses to improve basic programming skills, click
here.
You can use this material without having to register.
In particular, courses like CS 9C (for programming in C) might
be useful.
(Jan 26) Homework Assignment 0 has been posted
here, due
Feb 2.
(Jan 22) Please fill out the following
class survey.
(Jan 18) This course satisfies part of the course requirements
for a new Designated Emphasis ("graduate minor") in
Computational Science and Engineering.
(Jan 18) This course will have students attending from
all four
CITRIS campuses: UC Berkeley,
UC Davis, UC Merced and UC Santa Cruz.
CITRIS is generously providing the webcasting facilities and other
resources to help run the course.
Lectures will be webcast
here
(active during lectures only).
This will include, among other things,
class handouts, homework assignments,
the class roster, information about class accounts, pointers
to documentation for machines and software tools we will use,
reports and books on supercomputing,
pointers to old CS267 class webpages (including old class projects),
and pointers to other useful websites.
Lecture Notes
Notes from previous offerings of CS267 are posted on old
class webpages available under
Class Resources
In particular, the web page from the
1996 offering
has detailed, textbook-style notes available on-line that are still
largely up-to-date in their presentations of parallel algorithms
(the slides to be posted during this semester will contain some more
recently invented algorithms as well).
Lectures (power point and archived video) for lectures from Spr 2009
Lecture 1 - Introduction
(in powerpoint)
or
(archived video)
Description of CSE program
(in powerpoint),
discussed briefly at end of Lecture 1
Lecture 2 - Single Processor Machines:
Memory Hierarchies and Processor Features; Case Study:
Tuning Matrix Multiply
(in powerpoint) or
(archived video)
Lecture 3
(archived video)
Completion of last lecture
(updated powerpoint)
Introduction to Parallel Machines and Programming Models
(in powerpoint)
Top 500 list from Nov 2008
(in powerpoint)
Lecture 4 - Shared Memory Programming: OpenMP and Threads
(in powerpoint)
or
(archived video)
Lecture 5 - Distributed Memory Machines and Programming
(archived video)
Architectures and Performance Models,
(in powerpoint)
MPI Programming,
(in pdf)
or
(in powerpoint)
Lecture 6 - Sources of Parallelism and Locality in Simulation - Part 1
(in powerpoint)
or
(archived video)
Lecture 7
(archived video)
Sources of Parallelism and Locality in Simulation - Part 2
(in powerpoint)
Tricks with Trees
(in powerpoint)
Notes (and hints) on Homework 1
(in powerpoint)
Lecture 8 - Introduction to CUDA and GPUs
(in powerpoint)
or
(archived video)
(guest lecture by Bryan Catanzaro)
Lecture 9 - UPC (Unified Parallel C)
(in powerpoint)
or
(in pdf)
or
(archived video)
(guest lecture by Kathy Yelick)
Lecture 10 - Dense Linear Algebra - Part 1
(in powerpoint)
or
(archived video)
Lecture 11 - Dense Linear Algebra - Part 2
(in powerpoint)
or
(archived video)
Lecture 12
(archived video)
Part 1 - Floating Point Arithmetic - Impact on Algorithms and Parallelism
(in powerpoint)
Part 2 - Class Project Suggestions
(in powerpoint)
Lecture 13
(archived video)
Part 1 - complete Class Project Suggestions from last time
Part 2 - begin Graph Partitioning
(in powerpoint)
Lecture 14 - complete Graph Partitioning (continuing using slides
from last lecture, updated slightly)
(archived video)
Lecture 15 - Automatic Performance Tuning and
Sparse-Matrix-Vector-Multiplication (SpMV)
(in powerpoint)
or
(archived video)
Lecture 16 - Performance Analysis Tools
(in powerpoint)
or
(archived video)
(guest lecture by Karl Fuerlinger)
There is no class on Monday, Mar 30, because many
students will be attending
HotPar,
a conference on parallel computing being held in Berkeley.
Lecture 17 - Autotuning Memory Intensive Kernels for Multicore
(in powerpoint)
or
(archived video)
(guest lecture by Sam Williams)
Lecture 18 - Sparse direct methods for solving Ax=b
on high performance computers
(in powerpoint)
or
(archived video)
(guest lecture by Xiaoye Sherry Li)
Lecture 19 - Architecting Parallel Software with Patterns
(in powerpoint)
or
(archived video)
(guest lecture by Kurt Keutzer)
Lecture 20 - Structured Grids
(in powerpoint)
or
(archived video)
(guest lecture by Horst Simon)
Lecture 21 - FFT (Fast Fourier Transform)
(in powerpoint)
Future Trends in High Performance Computing 2009-2018
(in powerpoint) ;
(archived video)
(continuation of guest lecture by Horst Simon)
Lecture 22 - Hierarchical Methods for the N-Body problem
(in powerpoint)
or
(archived video)
Lecture 23 - Introduction to MapReduce and Hadoop (Cloud Computing)
(in powerpoint)
or
(archived video)
(guest lecture by Matei Zaharia)
Lecture 24 - Dynamic Load Balancing
(in powerpoint), and
Parallel Sorting
(in powerpoint);
both in
(archived video)
Lecture 25 - Parallel Methods for Nano/Material Science
(in powerpoint)
or
(archived video)
(guest lecture by Andrew Canning)
Lecture 26 - Parallel Graph Algorithms
(in powerpoint)
or
(archived video)
(guest lecture by Kamesh Madduri)
Lecture 27 - Music and Audio Applications
(in pdf)
or
(archived video)
(guest lecture by David Wessel)
Lecture 28 -
Student Poster Session
(archived video of student presentations)
Sharks and Fish
"Sharks and Fish" are a collection of simplified simulation programs
that illustrate a number of common parallel programming techniques
in various programming languages (some current ones, and some
old ones no longer in use).
Basic problem description, and (partial) code from 1999 class,
written in Matlab, CMMD, CMF, Split-C, Sun Threads, and pSather,
available
here.
Code (partial) from 2004 class, written in MPI, pthreads, OpenMP,
available
here.