Optimizing Parallel SPMD Programs Arvind Krishnamurthy and Katherine Yelick We present compiler optimization techniques for explicitly parallel programs that communicate through a shared address space. The source programs are written in a single program multiple data (SPMD) style, and the machine target is a multiprocessor with physically distributed memory and hardware or software support for a single address space. Unlike sequential programs or data-parallel programs, SPMD programs require {\it cycle detection}, as defined by Shasha and Snir, to perform any kind of code motion on shared variable accesses. Cycle detection finds those accesses that, if reordered by either the hardware or software, could violate sequential consistency. We improve on Shasha and Snir's algorithm for cycle detection by providing a polynomial time algorithm for SPMD programs, whereas their formulation leads to an algorithm that is exponential in the number of processors. Once cycles and local dependencies have been computed, we perform optimizations to overlap communication and computation, change two-way communication into one-way communication, and apply scalar code optimizations. Using these optimizations, we improve the execution times of certain application kernels by about 20-50%.