Transportable Parallel Algorithms: from Theory to Practice David E. Culler CS Division UC Berkeley Workshop on Portability and Performance for Parallel Processors July 13-15, 1993 Parallel architectures have long seemed on the verge of breakthrough in any one of many diverse directions: SIMD, Message Passing, Shared Memory, Dataflow, etc. This made the future very difficult to predict, divided the research community along lines of belief, frustrated attempts to develop a common theoretical model, and essentially paralyzed the development parallel software. However, technological and economic forces have quelled the debate; for the foreseeable future all large parallel machines will look essentially like a network of workstations, with a fast microprocessor, large DRAM memory, and sophisticated cache structure on each node. The key difference is that the network is considerably more power and robust. Furthermore, Active Messages has demonstrated that all programming models are built on the same primitive; the key difference is how it is packaged: in hardware, system, or user level. Thus, parallel machines differ primarily in the costs of various communication operations, not in basic functionality. Given this convergence it is reasonable to formulate an abstract parallel machine characterized by its performance characteristics along a small number of dimensions to serve as a basis for transportable algorithm design. I will present one such model, called LogP, and a programming extension to C, called Split-C, which exposes the full capabilities of the model. We will examine the translation from theory to practice on the CM-5.