Reference:

Donald Yeung, John Kubiatowicz, and Anant Agarwal. Multigrain Shared Memory. ACM Transactions on Computer Systems. Vol. 18, No.
2, pages 154-196. May 2000. (c) 2000 ACM
(pdf, compressed postscript)

Abstract:

Parallel workstations, each comprising tens of processors based on shared memory, promise cost-effective scalable multiprocessing. This article explores the coupling of such small- to medium-scale shared-memory multiprocessors through software over a local area network to synthesize larger shared-memory systems. We call these systems Distributed Shared-memory MultiProcessors (DSMPs).

This article introduces the design of a shared-memory system that uses multiple granularities of sharing, called MGS, and presents a prototype implementation of MGS on the MIT Alewife multiprocessor. Multigrain shared memory enables the collaboration of hardware and software shared memory, thus synthesizing a single transparent shared-memory address space across a cluster of multiprocessors. The system leverages the efficient support for fine-grain cache-line sharing within multiprocessor nodes as often as possible, and resorts to coarse-grain page-level sharing across nodes only when absolutely necessary.

Using our prototype implementation of MGS, an in-depth study of several shared-memory application is conducted to understand the behavior of DSMPs. Our study is the first to comprehensively explore the DSMP design space, and teh compare the performance of DSMPs against all-software and all-hardware DSMs on a signle experimental platform. Keeping the total number of processors fixed, we show that applications execute up to 85% faster on a DSMP as compared to an all-software DSM. We also show that all-hardware DSMs hold a significant performance advantage over DSMPs on challenging applications, between 159% and 1014%. However, program transformations to improve data locality for these applications allow DSMPs to almost match the performance of an all-hardware multiprocessor of the same size.