Transportable Parallel Algorithms: from Theory to Practice
				   
			   David E. Culler
			     CS Division
			     UC Berkeley
				   
   Workshop on Portability and Performance for Parallel Processors
			   July 13-15, 1993

Parallel architectures have long seemed on the verge of breakthrough
in any one of many diverse directions: SIMD, Message Passing, Shared
Memory, Dataflow, etc.  This made the future very difficult to
predict, divided the research community along lines of belief,
frustrated attempts to develop a common theoretical model, and
essentially paralyzed the development parallel software.  However,
technological and economic forces have quelled the debate; for the
foreseeable future all large parallel machines will look essentially
like a network of workstations, with a fast microprocessor, large DRAM
memory, and sophisticated cache structure on each node.  The key
difference is that the network is considerably more power and robust.
Furthermore, Active Messages has demonstrated that all programming
models are built on the same primitive; the key difference is how it
is packaged: in hardware, system, or user level.  Thus, parallel
machines differ primarily in the costs of various communication
operations, not in basic functionality.  Given this convergence it is
reasonable to formulate an abstract parallel machine characterized by
its performance characteristics along a small number of dimensions to
serve as a basis for transportable algorithm design.  I will present
one such model, called LogP, and a programming extension to C, called
Split-C, which exposes the full capabilities of the model.  We will
examine the translation from theory to practice on the CM-5.