CS 258 Course Materials

Readings and Lecture Slides

Fundamentals and Introduction

Chapter 1 : Fundamentals. Reading for lectures 1,2,3. Lecture 1 : Why Parallel Architecture. 1/18/95
Lecture 2 and 3 : Evolution of Parallel Machines. 1/23/95 and 1/25/95

Parallel Software Basics

Chapter 2A: Parallel Software Basics, part A. Reading for lectures 4,5.
Lecture 4 : Parallel Software Basics. 2/1/95
Lecture 5 : Programming for Performance. 2/3/95

Scaling Parallel Programs for Multiprocessors: Methodology and Examples. Read for lecture 6.
12 Ways to Fool the Masses When Giving Performance Results on Parallel Computers, D. H. Bailey.
Working Sets, Cache Sizes, and Node Granularity Issues for Large-Scale Multiprocessors
Lecture 6a : Towards Workload-Driven Architectural Evaluation: Scaling
Lecture 6b: Scaling Applications and Machines. (Defered to 2/10/95).

NAS Parallel Benchmark Results
Methodological Considerations and Characterization of the SPLASH-2 Parallel Applications Suite
Architectural Requirements of Parallel Scientific Applications with Explicit Communication, Cypher, Ho, Konstantinidou, and Messina, ISCA 93. To be handed out in class.
ParkBench Public International Benchmarks for Parallel Computers, Section 1-5. ( Full 5 MB version).
Lecture 7a: Reflections on Publis hed Results (2/10/95)
Lecture 7b : Picking Parameters and Analyzing Sensitivity
Lecture 8 : Choosing Metrics and Presenting Results (2/14/95).

Small-Scale Shared Memory

Chapter 3 : Small Scale Shared-Memory. Reading for Lectures 9, 10
Lecture 9 : Small-Scale Shared Memory (2/17/05).
Lecture 10 : Small-Scale Shared Memory Design Tradeoffs.
Lecture 11 : Small-Scale Shared Memory Implementation.
Lecture 12 : Small-Scale Shared Memory Implementation (cont).

Large-Scale Distributed-Memory Multiprocessors

Chapter 4A : Large Scale Distributed Memory Multiprocessors, Part A. Reading for Lecture 13
Lecture 13 : Realizing Programming Models on Large-Scale Distributed-Memory Multiprocessors.
Lecture 14 : Desing of Large-Scale Distributed-Memory Multiprocessors: Part 1.
Active Messages: a Mechanism for Integrated Communication and Computation, ISCA92
Lecture 15 : Desing of Large-Scale Distributed-Memory Multiprocessors: Part 2.
Intel Paragon
Experience with Active Messages on the Meiko CS-2, Schauser and Scheiman, IPPS 95
Lecture 16 : Desing of Large-Scale Shared Physical Address Space

Large-Scale Shared Address Space Multiprocessors

Chapter 5A : Large Scale Shared Address Space Multiprocessors, Part A. Reading for Lecture 17
Lecture 17 : Memory Consistency Models
Lecture 18: TNET
Lecture 19 : Large-scale CC Designs
Lecture 20: Project Discussion
Lecture 21 : Case Studies: Large Scale CC-NUMA Machines

Latency Tolerance

Lecture 22: Latency Tolerance

Scalable Interconnection Networks

Lecture 23 Design Space of Interconnection Networks
Lecture 24 Routing


Lecture 25 Synchronization
"Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors," Mellor-Curmmey and Scott, ACM TOCS, v. 9, no. 1, Feb. 1991, pp 21-65
"Synchronization Algorithms for Shared Memory Multiprocessors", Graunke and Thakker, IEEE Computer, v. 23, no. 6, jun. 1990.
Reactive Synchronization Algorithms for Multiprocessors , Beng-Hong Lim and Anant Agarwal Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 25-35, October 1994.


Handout 1 : Course Information
Handout 2 : Assignment 1, due 2/3.


Final Project Schedule and Presentations

MINI PROJECTS (in order of submission, but not necessarily completion)

Other places to go look on the net.

NAS Applied Research
PARKBENCH (PARallel Kerenels and BENCHmarks)
David Walker's Benchmarks hop-off.
Stanford FLASH Project, including the Wisconsin Wind Tunnel
MIT Computer Architecture Group Home Page
MIT Computation Structures Group
WWW Computer Architecture Home Page

Large Scale Parallel Computers