# U.C. Berkeley CS267/EngC233

## Tentative Syllabus

This syllabus may be modified during the semester, depending on feedback from students and the availability of guest lecturers. Topics that we have covered before are shown in standard font, and possible new topics are in italics.

• Computer Architectures (at a high level, in order to understand what can and cannot be done in parallel, and the relative costs of operations like arithmetic, moving data, etc.).
• Sequential computers, including memory hierarchies
• Shared memory computers and Multicore
• Distributed memory computers
• GPUs (Graphical Processing Units, eg NVIDIA cards)
• Cloud and Grid Computing
• Programming Languages and Models for these architectures
• OpenMP
• Message Passing (MPI)
• UPC and/or Titanium
• Communication Collectives (reduce, broadcase, etc.)
• CUDA/OpenCL etc. (for GPUS)
• Programming "Patterns" from software engineering
• Cilk
• Sources of parallelism and locality in simulation
• The "7 dwarfs" of high performance computing: The following categories of computations have been identified as appearing frequently in many different CSE problems. For each category, we will discuss its structure and usage, algorithms, measuring and tuning its performance (automatically when possible), and available software tools and libraries.
• Dense linear algebra (matrix multiply, solving linear systems of equations, etc.)
• Sparse linear algebra (similar to the dense case, but where the matrices have mostly zero entries and the algorithms neither store more operate on these entries).
• Structured Grids (where the data is organized to lie on a "grid", eg a 2-dimensional mesh, and the basic operations are the same at each mesh point (eg "average the value at each mesh point with its neighbors").
• Unstructured Grids (similar to the above, but where "neighbor" can be defined by an arbitrary graph)
• Spectral Methods (the FFT, or Fast Fourier Transform, is typical).
• Particle Methods (where many "particles" (eg atoms, planets, people,...) are updated (eg moved) depending on the values of some or all other particles (eg by electrostatic forces, gravity, etc.)
• "Embarrassing parallelism", where every task is completely independent, but may finish at a different time are require different resources.
• The additional "6 motifs" of parallel computing: By examining a broad arrary of nonscientific applications that require higher performance via parallelism, not only did the above "7 dwarfs" appear, but 6 other computational patterns, that we may cover too (see here for details):
• Finite State Machines, where the "state" is updated using rules based on the current state and most recent input
• Combinational Logic, performing logical operations (Boolean Algebra) on large amounts of data
• Graph traversal, traversing a large graph and performing operations on the nodes
• "Graphical models" involve special graphs representing random variables and probabilities, and are used in machine learning techniques
• Dynamic Programming, an algorithmic technique for combining solutions of small subproblems into solutions of larger problems
• Branch-and-Bound search, a divide-and-conquer technique for searching extremely large search spaces, like those arising in games like chess
• Measuring performance and finding bottlenecks
• Load balancing techniques, both dynamic and static
• Parallel Sorting
• Correctness
• Verification and Validation (V&V) of the results (how to convince yourself and others to believe the result of a large computation, important not only with parallelism)
• Automatic code derivation (sketching)
• Proofs and testing of code
• Assorted possible guest lectures (some repeats, some new; depend on availability of lecturers)
• Performance Measuring Tools
• Volunteer Computing (eg how seti@home etc work)
• Climate Modeling
• Computational Nanoscience
• Computational Astrophysics
• Computational Biology
• Musical performance and delivery (ParLab application)
• Image Processing (ParLab application)
• Speech Recognition (ParLab application)
• Modeling Circulatory System of Stroke Victims (ParLab application)
• Parallel Web Browers (ParLab application)