Active Messages:
                       Foundation for Parallel Software
                                    and
                      Framework for Hardware Development

Parallel systems research is entering an exciting phase because the
traditional boundaries between architecture, compiler and operating
system are shifting to meet the new demands of parallel computing.  At
the same time, the underlying technology is forcing a convergence in
machine structure toward systems based on state-of-the-art
microprocessors with sizable local memory and a fast network
interface.  Active messages is an extremely simple communication
primitive that maps efficiently to existing and emerging hardware and
is the basis from which Shared Memory, Message Passing, and Data
Parallel models are built.  Previous realizations of these models have
utilized restricted forms of Active Messages in hardware or in the
operating system.  Exposing the primitive at the user level allows
compilers to synthesize a variety of specific communication and global
access operations.  This opens the door to a wider inquiry of the new
boundaries and a framework for analyzing various levels of hardware
support.  At Berkeley, the CASTLE project is developing
``industrial-strength'' parallel software in a layered fashion, starting
with Active Messages as a foundation.


Tuesday:

		Split-C: a Practical Parallel Language
		   for Distributed Memory Machines

The traditional distinctions between Shared Memory, Message Passing,
and SIMD are disappearing as technology forces a convergence in the
underlying machine structure and universal communication primitives
are developed.  However, current parallel languages still force a
single model on the programmer and frustrate efforts to tune a program
toward a high performance implementation.  Split-C seeks to abstract
the emerging common hardware structure while retaining its fundamental
performance characteristics, much as C does for modern uniprocessors.
Split-C provides the programmer with an explicit global address space,
while retaining the ability to tune code operating on the region local
to a processor. It provides support for linked and regular global data
structures and a rich set of global access operations to allow
communication to be optimized.  Within this framework, shared memory,
message passing, and data parallel programming can be freely mixed as
alternative programming styles.  We will look at several examples that
illustrate aspects of the language and the performance tuning process.

Wednesday:

		What have we Learned from Dataflow:
			The TAM Perspective

This talk reflects on the lessons that we have learned from a decade
of research in dataflow.  Starting with Iannucci's ``Two Fundamental
Issues in Multiprocessing'' argument, we observe two key problems with
dataflow.  First, the justification of extensive multithreading is
based on an overly simplistic view of the storage hierarchy.  Second,
the local greedy scheduling policy embodied in dataflow is inadequate
in many circumstances.  A more realistic model of the storage
hierarchy imposes significant constraints on the scheduling of
computation and requires a degree of parsimony in the scheduling
policy.  These issues are addressed in TAM by establishing a
scheduling hierarchy that reflects the underlying storage hierarchy.
Recent data collected under the CM-5 implementation of TAM
demonatrates that the compiler can utilize the storage hierarchy
effectively and reduce the cost of scheduling.  However, we see that
even with these techniques, simple local scheduling policies are
unlikely to be adequate.  These lessons apply broadly to parallel
execution models with dynamic scheduling of work.