Active Messages: Foundation for Parallel Software and Framework for Hardware Development Parallel systems research is entering an exciting phase because the traditional boundaries between architecture, compiler and operating system are shifting to meet the new demands of parallel computing. At the same time, the underlying technology is forcing a convergence in machine structure toward systems based on state-of-the-art microprocessors with sizable local memory and a fast network interface. Active messages is an extremely simple communication primitive that maps efficiently to existing and emerging hardware and is the basis from which Shared Memory, Message Passing, and Data Parallel models are built. Previous realizations of these models have utilized restricted forms of Active Messages in hardware or in the operating system. Exposing the primitive at the user level allows compilers to synthesize a variety of specific communication and global access operations. This opens the door to a wider inquiry of the new boundaries and a framework for analyzing various levels of hardware support. At Berkeley, the CASTLE project is developing ``industrial-strength'' parallel software in a layered fashion, starting with Active Messages as a foundation. Tuesday: Split-C: a Practical Parallel Language for Distributed Memory Machines The traditional distinctions between Shared Memory, Message Passing, and SIMD are disappearing as technology forces a convergence in the underlying machine structure and universal communication primitives are developed. However, current parallel languages still force a single model on the programmer and frustrate efforts to tune a program toward a high performance implementation. Split-C seeks to abstract the emerging common hardware structure while retaining its fundamental performance characteristics, much as C does for modern uniprocessors. Split-C provides the programmer with an explicit global address space, while retaining the ability to tune code operating on the region local to a processor. It provides support for linked and regular global data structures and a rich set of global access operations to allow communication to be optimized. Within this framework, shared memory, message passing, and data parallel programming can be freely mixed as alternative programming styles. We will look at several examples that illustrate aspects of the language and the performance tuning process. Wednesday: What have we Learned from Dataflow: The TAM Perspective This talk reflects on the lessons that we have learned from a decade of research in dataflow. Starting with Iannucci's ``Two Fundamental Issues in Multiprocessing'' argument, we observe two key problems with dataflow. First, the justification of extensive multithreading is based on an overly simplistic view of the storage hierarchy. Second, the local greedy scheduling policy embodied in dataflow is inadequate in many circumstances. A more realistic model of the storage hierarchy imposes significant constraints on the scheduling of computation and requires a degree of parsimony in the scheduling policy. These issues are addressed in TAM by establishing a scheduling hierarchy that reflects the underlying storage hierarchy. Recent data collected under the CM-5 implementation of TAM demonatrates that the compiler can utilize the storage hierarchy effectively and reduce the cost of scheduling. However, we see that even with these techniques, simple local scheduling policies are unlikely to be adequate. These lessons apply broadly to parallel execution models with dynamic scheduling of work.