CS267: Lecture 2, Jan 21 1999

Memory Hierarchies

Abstract

We study the structure and performance properties of memory hierarchies, in particular caches, on uniprocessors. We then show how to optimize matrix multiplication to take caches into account.

Lecture Notes

Power Point version
PDF version (2 slides/page)

Primary Readings

  • Notes from the Lecture 2 from CS267 (part 1), Spring 1996
  • Secondary Readings and References

  • "Empirical Evaluation of the Cray T3D - A Compiler Perspective" (explains memory performance plot on page 9 of Lecture Notes)
  • PHIPAC
  • PHIPAC overview
  • New PHIPAC Homepage
  • Old PHIPAC Homepage
  • ATLAS Homepage
  • Notes from the Lecture 2 from CS267 (part 2), Spring 1996. (Part 2 discusses the details of the IBM RS6000 architecture. We will not be using the RS6000 this semester.)
  • BLAS (Basic Linear Algebra Subroutines), Reference (unoptimized) implementations of the BLAS, with documentation and historical papers
  • LAPACK (Linear Algebra PACKage), a standard linear algebra library optimized to use the BLAS effectively on uniprocessors and shared memory machines (software, documentation and reports)
  • ScaLAPACK (Scalable LAPACK), a parallel version of LAPACK for distributed memory machines (software, documentation and reports)
  • Assignments

    (to be posted)