Anant Agarwal, Ricardo
Bianchini,
David Chaiken,
Kirk
L. Johnson, David Kranz,
John
Kubiatowicz,
Beng-Hong
Lim, Ken Mackenzie,
and Donald Yeung. The
MIT Alewife Machine, Proceedings of the IEEE, vol.87, (no.3),
IEEE, March 1999. p.430-44.
(pdf)
Abstract:
A variety of models for parallel architectures, such as shared memory,
message passing, and data flow, have converged in the recent past to a
hybrid architecture form called distributed shared memory (DSM). Alewife,
an early prototype of such DSM architectures, uses hybrid software and
hardware mechanisms to support coherent shared memory, efficient user level
messaging, fine grain synchronization, and latency tolerance. Alewife supports
up to 512 processing nodes connected over a scalable and cost effective
mesh network at a constant cost per node. Four mechanisms combine to achieve
Alewife's goals of scalability and programmability: software extended coherent
shared memory provides a global, linear address space; integrated message
passing allows compiler and operating system designers to provide efficient
communication and synchronization; support for fine grain computation allows
many processors to cooperate on small problem sizes; and latency tolerance
mechanisms-including block multithreading and prefetching-mask unavoidable
delays due to communication. Extensive results from microbenchmarks, together
with over a dozen complete applications running on a 32-node prototype,
demonstrate that integrating message passing with shared memory enables
a cost efficient solution to the cache coherence problem and provides a
rich set of programming primitives. Our results further show that messaging
and shared memory operations are both important because each helps the
programmer to achieve the best performance for various machine configurations.