Protocol Design Tradeoffs in Snooping Cache Coherent Multiprocessors

2/10/99


Click here to start


Table of Contents

Protocol Design Tradeoffs in Snooping Cache Coherent Multiprocessors

Recap

Sequential Consistency

Outline for Today

Write-back Caches

MSI Invalidate Protocol

Example: Write-Back Protocol

Correctness

Write Serialization for Coherence

Sequential Consistency

Sufficient conditions

Lower-level Protocol Choices

MESI (4-state) Invalidation Protocol

Hardware Support for MESI

MESI State Transition Diagram

Lower-level Protocol Choices

Update Protocols

Dragon Write-back Update Protocol

Dragon State Transition Diagram

Lower-level Protocol Choices

Assessing Protocol Tradeoffs

Workload-Driven Evaluation

Evaluation in Uniprocessors

More Difficult for Multiprocessors

A Lot Depends on Sizes

Scaling: Why Worry?

Too Large a Problem

Demonstrating Scaling Problems

Communication and Replication

Working Set Perspective

Working Set Curves (for given P)

Working Sets Change with P

Where the Time Goes: NPB LU-a

False Sharing Misses: Artifactual Comm.

Questions in Scaling

Under What Constraints to Scale?

Problem Constrained Scaling

Time Constrained Scaling

Memory Constrained Scaling

Types of Workloads

Coverage: Stressing Features

Coverage: Levels of Optimization

Concurrency

Workload/Benchmark Suites

Evaluating a Fixed-size Machine

Steps in Choosing Problem Sizes

Steps in Choosing Problem Sizes

Choosing Problem Sizes (contd.)

Evaluating an Idea or Tradeoff

Multiprocessor Simulation

Execution-driven Simulation

Difficulties in Simulation-based Evaluation

Choosing Parameters

Returning to protocol tradeoffs

Bandwidth per transition

Bandwidth Trade-off

Smaller (64KB) Caches

Cache Block Size

Miss Classification

Breakdown of Miss Rates with Block Size

Breakdown (cont)

Breakdown with 64KB Caches

Traffic

Traffic with 64 KB caches

Traffic SimOS 1 MB

Making Large Blocks More Effective

Update versus Invalidate

Update vs Invalidate: Miss Rates

Upgrade and Update Rates (Traffic)

Summary

Author: David E. Culler

Home Page: http://www.cs.berkeley.edu/~culler/cs258-s99/

Other information:
David E. Culler UC Berkeley CS258 Parallel Computer Architecture Lecture 5