Anant Agarwal, Ricardo Bianchini, David Chaiken, Kirk L. Johnson, David Kranz, John Kubiatowicz, Beng-Hong Lim, Ken Mackenzie, and Donald Yeung. The MIT Alewife Machine, Proceedings of the IEEE, vol.87, (no.3), IEEE, March 1999. p.430-44.


A variety of models for parallel architectures, such as shared memory, message passing, and data flow, have converged in the recent past to a hybrid architecture form called distributed shared memory (DSM). Alewife, an early prototype of such DSM architectures, uses hybrid software and hardware mechanisms to support coherent shared memory, efficient user level messaging, fine grain synchronization, and latency tolerance. Alewife supports up to 512 processing nodes connected over a scalable and cost effective mesh network at a constant cost per node. Four mechanisms combine to achieve Alewife's goals of scalability and programmability: software extended coherent shared memory provides a global, linear address space; integrated message passing allows compiler and operating system designers to provide efficient communication and synchronization; support for fine grain computation allows many processors to cooperate on small problem sizes; and latency tolerance mechanisms-including block multithreading and prefetching-mask unavoidable delays due to communication. Extensive results from microbenchmarks, together with over a dozen complete applications running on a 32-node prototype, demonstrate that integrating message passing with shared memory enables a cost efficient solution to the cache coherence problem and provides a rich set of programming primitives. Our results further show that messaging and shared memory operations are both important because each helps the programmer to achieve the best performance for various machine configurations.