U.C. Berkeley CS267/EngC233 Home Page

Applications of Parallel Computers

Spring 2009

M W 9-10:30, 290 Hearst Mining Building


  • Jim Demmel
  • Offices:
    564 Soda Hall ("Virginia"), (510)643-5386
    831 Evans Hall, same phone number
  • Office Hours: M 1-2, T 11-12, in 564 Soda Hall
  • (send email)
  • Horst Simon
  • Office Hours: by appointment
  • (send email)
  • TA:

  • Vasily Volkov
  • Office: 447 Soda Hall, (510)642-3979
  • Office Hours: M 3-4pm
  • (send email)
  • Administrative Assistants:

  • Laura Rebusi / Tammy Johnson
  • Offices: 563 / 565 Soda Hall
  • Phones: (510)643-1455 / (510)643-4816
  • Email: (to Laura) / (to Tammy)
  • Link to webcasting of lectures (Active during lectures only; see below under "Lecture Notes" for archived video).

    Syllabus and Motivation

    CS267 was originally designed to teach students how to program parallel computers to efficiently solve challenging problems in science and engineering, where very fast computers are required either to perform complex simulations or to analyze enormous datasets. CS267 is intended to be useful for students from many departments and with different backgrounds, although we will assume reasonable programming skills in a conventional (non-parallel) language, as well as enough mathematical skills to understand the problems and algorithmic solutions presented. CS267 satisfies part of the course requirements for a new Designated Emphasis ("graduate minor") in Computational Science and Engineering.

    While this general outline remains, a large change in the computing world has started in the last few years: not only are the fastest computers parallel, but nearly all computers will soon be parallel, because the physics of semiconductor manufacturing will no longer let conventional sequential processors get faster year after year, as they have for so long (roughly doubling in speed every 18 months for many years). So all programs that need to run faster will have to become parallel programs. (It is considered very unlikely that compilers will be able to automatically find enough parallelism in most sequential programs to solve this problem.) For background on this trend toward parallelism, click here.

    This will be a huge change not just for science and engineering but the entire computing industry, which has depended on selling new computers by running their users' programs faster without the users having to reprogram them. Large research activities to address this issue are underway at many computer companies and universities, including Berkeley's ParLab, whose research agenda is outlined here.

    While the ultimate solutions to the parallel programming problem are far from determined, students in CS267 will get the skills to use some of the best existing parallel programming tools, and be exposed to a number of open research questions.

  • Tentative Detailed Syllabus
  • Grading

    There will be several programming assignments to acquaint students with basic issues in memory locality and parallelism needed for high performance. Most of the grade will be based on a final project (in which students are encouraged to work in small interdisciplinary teams), which could involve parallelizing an interesting application, or developing or evaluating a novel parallel computing tool. Students are expected to have identified a likely project by mid semester, so that they can begin working on it. We will provide many suggestions of possible projects as the class proceeds.

    Pointers to possible class projects


  • (May 2) Final projects will be due at noon on Monday, May 18.
  • (May 1; updated 5pm) The lecture on Monday, May 4, will be held in 290 Hearst as usual. Please ignore previous announcements of a room change.
  • (Apr 28) On the last class of semester, May 11, we will have a poster session for class projects. Each student will also give a ~2 minute overview of their project and poster. We ask students to send URLs for their posters for us to post on this page; this will permit students at all the campuses to participate in the poster session.
  • (Apr 3) A webpage defining commonly used acronyms is now available.
  • (Apr 3) On Monday April 6 class will be held in 540 A/B Cory.
  • (Mar 29) There will be no class on Monday, Mar 30, since many students will be attending HotPar, a conference on parallel computing being held in Berkeley.
  • (Jan 28) For students who want to try some on-line self-paced courses to improve basic programming skills, click here. You can use this material without having to register. In particular, courses like CS 9C (for programming in C) might be useful.
  • (Jan 26) Homework Assignment 0 has been posted here, due Feb 2.
  • (Jan 22) Please fill out the following class survey.
  • (Jan 18) This course satisfies part of the course requirements for a new Designated Emphasis ("graduate minor") in Computational Science and Engineering.
  • (Jan 18) This course will have students attending from all four CITRIS campuses: UC Berkeley, UC Davis, UC Merced and UC Santa Cruz. CITRIS is generously providing the webcasting facilities and other resources to help run the course. Lectures will be webcast here (active during lectures only).
  • Newsgroup

    Class Resources and Homework Assignments

  • This will include, among other things, class handouts, homework assignments, the class roster, information about class accounts, pointers to documentation for machines and software tools we will use, reports and books on supercomputing, pointers to old CS267 class webpages (including old class projects), and pointers to other useful websites.
  • Lecture Notes

  • Notes from previous offerings of CS267 are posted on old class webpages available under Class Resources
  • In particular, the web page from the 1996 offering has detailed, textbook-style notes available on-line that are still largely up-to-date in their presentations of parallel algorithms (the slides to be posted during this semester will contain some more recently invented algorithms as well).

  • Lectures (power point and archived video) for lectures from Spr 2009
  • Lecture 1 - Introduction (in powerpoint) or (archived video)
  • Description of CSE program (in powerpoint), discussed briefly at end of Lecture 1
  • Lecture 2 - Single Processor Machines: Memory Hierarchies and Processor Features; Case Study: Tuning Matrix Multiply (in powerpoint) or (archived video)
  • Lecture 3 (archived video)
  • Completion of last lecture (updated powerpoint)
  • Introduction to Parallel Machines and Programming Models (in powerpoint)
  • Top 500 list from Nov 2008 (in powerpoint)
  • Lecture 4 - Shared Memory Programming: OpenMP and Threads (in powerpoint) or (archived video)
  • Lecture 5 - Distributed Memory Machines and Programming (archived video)
  • Architectures and Performance Models, (in powerpoint)
  • MPI Programming, (in pdf) or (in powerpoint)
  • Lecture 6 - Sources of Parallelism and Locality in Simulation - Part 1 (in powerpoint) or (archived video)
  • Lecture 7 (archived video)
  • Sources of Parallelism and Locality in Simulation - Part 2 (in powerpoint)
  • Tricks with Trees (in powerpoint)
  • Notes (and hints) on Homework 1 (in powerpoint)
  • Lecture 8 - Introduction to CUDA and GPUs (in powerpoint) or (archived video) (guest lecture by Bryan Catanzaro)
  • Lecture 9 - UPC (Unified Parallel C) (in powerpoint) or (in pdf) or (archived video) (guest lecture by Kathy Yelick)
  • Lecture 10 - Dense Linear Algebra - Part 1 (in powerpoint) or (archived video)
  • Lecture 11 - Dense Linear Algebra - Part 2 (in powerpoint) or (archived video)
  • Lecture 12 (archived video)
  • Part 1 - Floating Point Arithmetic - Impact on Algorithms and Parallelism (in powerpoint)
  • Part 2 - Class Project Suggestions (in powerpoint)
  • Lecture 13 (archived video)
  • Part 1 - complete Class Project Suggestions from last time
  • Part 2 - begin Graph Partitioning (in powerpoint)
  • Lecture 14 - complete Graph Partitioning (continuing using slides from last lecture, updated slightly) (archived video)
  • Lecture 15 - Automatic Performance Tuning and Sparse-Matrix-Vector-Multiplication (SpMV) (in powerpoint) or (archived video)
  • Lecture 16 - Performance Analysis Tools (in powerpoint) or (archived video) (guest lecture by Karl Fuerlinger)
  • There is no class on Monday, Mar 30, because many students will be attending HotPar, a conference on parallel computing being held in Berkeley.
  • Lecture 17 - Autotuning Memory Intensive Kernels for Multicore (in powerpoint) or (archived video) (guest lecture by Sam Williams)
  • Lecture 18 - Sparse direct methods for solving Ax=b on high performance computers (in powerpoint) or (archived video) (guest lecture by Xiaoye Sherry Li)
  • Lecture 19 - Architecting Parallel Software with Patterns (in powerpoint) or (archived video) (guest lecture by Kurt Keutzer)
  • Lecture 20 - Structured Grids (in powerpoint) or (archived video) (guest lecture by Horst Simon)
  • Lecture 21 - FFT (Fast Fourier Transform) (in powerpoint)
    Future Trends in High Performance Computing 2009-2018 (in powerpoint) ;
    (archived video) (continuation of guest lecture by Horst Simon)
  • Lecture 22 - Hierarchical Methods for the N-Body problem (in powerpoint) or (archived video)
  • Lecture 23 - Introduction to MapReduce and Hadoop (Cloud Computing) (in powerpoint) or (archived video) (guest lecture by Matei Zaharia)
  • Lecture 24 - Dynamic Load Balancing (in powerpoint), and Parallel Sorting (in powerpoint);
    both in (archived video)
  • Lecture 25 - Parallel Methods for Nano/Material Science (in powerpoint) or (archived video) (guest lecture by Andrew Canning)
  • Lecture 26 - Parallel Graph Algorithms (in powerpoint) or (archived video) (guest lecture by Kamesh Madduri)
  • Lecture 27 - Music and Audio Applications (in pdf) or (archived video) (guest lecture by David Wessel)
  • Lecture 28 - Student Poster Session (archived video of student presentations)
  • Sharks and Fish

  • "Sharks and Fish" are a collection of simplified simulation programs that illustrate a number of common parallel programming techniques in various programming languages (some current ones, and some old ones no longer in use).
  • Basic problem description, and (partial) code from 1999 class, written in Matlab, CMMD, CMF, Split-C, Sun Threads, and pSather, available here.
  • Code (partial) from 2004 class, written in MPI, pthreads, OpenMP, available here.