Optimizing Parallel Programs with Explicit Synchronization

               Arvind Krishnamurthy and Katherine Yelick

We present compiler analyses and optimizations for explicitly parallel 
programs that communicate through a shared address space.  Any type 
of code motion on explicitly parallel programs requires a new
kind of analysis to ensure that operations reordered on one processor
cannot be observed by another.  The analysis, based on work by Shasha
and Snir, checks for cycles among interfering accesses.  We improve
the accuracy of their analysis by using additional information from
post-wait synchronization, barriers, and locks.

We demonstrate the use of this analysis by optimizing remote access on
distributed memory machines.  The optimizations include {\it message
pipelining}, to allow multiple outstanding remote memory operations,
conversion of two-way to one-way communication, and elimination of
communication through data re-use.  The performance improvements are
as high as 20-35% for programs running on a CM-5 multiprocessor using
the Split-C language as a global address layer