Performance Portable Language and Algorithm Design meets the T3D Computer Science Division University of California, Berkeley In 1992 a wave of new MPP systems were arose that followed the ``shell'' approach, including the Thinking Machines CM-5, Intel Paragon, Meiko CS-2, and Cray T3D. In this approach the core of each node is realized by a state-of-the-art commercial microprocessor and its memory system, surrounded by a shell of additional logic to support communication and synchronization. Based on the announced designs, we developed a simple parallel extension to the C language, called Split-C, with the goal of extracting the full performance capability out of this wave of machines. This provides a full C on each node operating out of the local memory, augmented with a rich set of assignment operations on the collective global address space. As the announcements were followed by delivery of the machines, we have conducted the experiment of implementing the language on the machine and assessing its performance. The T3D provides a very interesting case study because the shell is so elaborate, including support global-memory access, prefetch, atomic operations, barriers, and block transfers. The semantics of hardware primitives for global operation are at essentially the same level as the language primitives. Many distinct mechanisms exist to perform the same function, and the performance characteristics of the various mechanisms are not obvious. This talk has two parts. The first part presents our language implementation approach, which begins by establishing the performance characteristics of the machine empirically via microbenchmarks and then tries to minimize the additional cost in mapping the language to the hardware. The second part is application case study using a "hybrid algorithm" motivated by the local/global cost model of Split-C, which yields the current worlds fastest solution connected components to the problem arising in cluster methods.