CS267: Feb 13, 1996
Programming with pSather

Sather is an object oriented language designed to be simple, efficient, safe, and non-proprietary. Sather was developed at the International Computer Science Institute , a research institute affiliated with the computer science department of University of California at Berkeley. It was first introduced in 1991. Since then, considerable practical experience has been obtained with the language by the hundreds of users making up the Sather community. Sather offers many safety and convenience features to help programmers avoid common errors and reuse code.

Sather has garbage collection, statically-checked strong (contravariant) typing, multiple inheritance, separate implementation and type inheritance, parameterized classes, dynamic dispatch, iteration abstraction, higher-order routines and iters, exception handling, assertions, preconditions, postconditions, and class invariants.

pSather, the parallel and distributed extension of Sather, presents a shared memory abstraction to the programmer while allowing explicit placement of data and threads. pSather adds threads and synchronization mechanisms to the language. Even though pSather programs can run on distributed computer systems, they offer shared memory abstraction across all threads

Serial Sather has been ported to almost any Unix platform, as well as Macinstosh and PCs. The Sather and pSather compilers are now integrated and distributed jointly. The current version of pSather compiler is known to run on

Sun SMPs running SunOS, and Solaris.

Clusters of Sun SMPs and single processing machines connected by Myrinet

Clusters of SMPs and single processing machines connected by Ethernet

Meiko CS-2

There have been ports of older versions of pSather to CM5.

We begin with the explanation of some of the serial Sather features necessary for the following discussion of pSather and Sharks and Fish problem 1.

Important Concepts

This section briefly introduces some concepts important to Sather that the reader may not have been exposed to in C++ or other popular OO languages. It isn't meant as a complete language tutorial. More information of a tutorial nature, manuals, class browsers, and numerous examples are available from the WWW page:
http://www.icsi.berkeley.edu/~sather

Safety

Sather is designed to shield programmers from common sources of bugs. Two important language features are strong typing and garbage collection .

Sather programs are strongly typed , so variables can't point to memory of an incorrect type. For example, in C it is common practice to freely mix different data types, such as signed and unsigned integers, characters, and even pointers. This can lead to subtle bugs and prohibits many compiler optimizations because there is very little information known at compile time about the behavior of pointers (recall assignment 1 and how much trouble you had to go through to make sure you know what code the compiler generates).

Like many object-oriented languages, serial Sather is garbage collected , so programmers never have to free memory explicitly. The runtime system does so automatically when it can be proven to be safe. With explicit deallocation, this work is done by the programmer and could lead to "dangling pointers" and "memory leaks". These problems are severe enough that a small industry has arisen in providing tools to assist C and C++ programmers find such bugs (Purify).

Work is under way to complete a garbage collector for pSather. However, for Assignment 2 you will need to free memory explicitly, or structure the code to avoid generating garbage.

When checking options have been turned on in a Sather program by compiler flags ( -check all turns on all checking options), the resulting program cannot crash disastrously or mysteriously. All sources of errors that cause crashes are either eliminated at compile-time or funneled into narrow circumstances (such as accessing beyond array bounds) that are found at run-time precisely at the source of the error.

Separation of Subtyping and Code Inclusion

In many object-oriented languages, the term `inheritance' is used to mean two things simultaneously:

subtyping - the requirement that a class provide implementations for the abstract methods in a supertype.

code inheritance - allows a class to reuse a portion of the implementation of another class.

Sather provides separate mechanisms for these two concepts. Abstract classes represent interfaces: sets of signatures that subtypes of the abstract class must provide. A name of an abstract class has to start with a ``$'', as in $FISH. Other kinds of classes provide implementation. Classes may include implementations from other classes using a special `include' clause ; this does not affect the subtyping relationship between classes. Separating these two concepts simplifies the language considerably and makes it easier to understand code.

Subtyping

Abstract class	Concrete class

Code Inclusion

Concrete class	Concrete class

Issues surrounding the decision to explicitly separate subtyping and code inclusion in Sather are discussed in the ICSI technical report TR 93-064: ``Engineering a Programming Language: The Type and Class System of Sather,''

No Implicit Calls

Sather does as little as possible behind the user's back at runtime. There are no implicitly constructed temporary objects, and therefore no rules to learn or circumvent. This extends to class constructors: all calls that can construct an object are explicitly written by the programmer.

In Sather, constructors are ordinary routines distinguished only by a convenient but optional calling syntax. With garbage collection there is no need for destructors; however, explicit finalization is available when desired (SYS::destroy).

Sather never converts types implicitly, such as from integer to character, integer to floating point, single to double precision, or subclass to superclass. With neither implicit construction nor conversion, Sather resolves routine overloading (choosing one of several similarly named operations based on argument types) much more clearly than C++. The programmer can easily deduce which routine will be called.

"#" is syntactic sugar for function "create". ``::'' is used as a shorthand, when the type of the lefthand side could be infered from that of the righthand side: i::=2 is equivalent to i:INT:=2;

Object Creation

Create Feature (constructor)	Object Creation

Iteration Abstraction

Earlier versions of Sather used a conventional until...end statement much like other languages. This made Sather susceptible to bugs that afflict looping constructs in most languages. Code which controls loop iteration is known for tricky ``fencepost errors'' (incorrect initialization or termination). Traditional iteration constructs also require the internal implementation details of data structures to be exposed when iterating over their elements.

An important language improvement in Sather 1.0 over earlier versions was the addition of iterators (or just iters ); please, check out our Iterator Page . Iterators encapsulate user defined looping control structures just as routines do for algorithms. Code using iterators is safe, because the creation, increment, and termination check are bound together at one point. Each class may define many sorts of iters, whereas a traditional approach requires a different yet intimately coupled class for each kind of iteration over the major class.

Iterators are part of the class interface just like routines. Instead of a return statement, they use yield and quit , and may only be called in loops. When an iter yields, it returns control to the calling loop. When it is called in the next iteration of the loop, execution resumes in the iterator at the statement following the yield. When an iter quits, it terminates the loop in which it appears.

Iterator names must end with a `!', which textually points out all places where a loop may exit. The Sather loop construct is simply loop...end . Built-in iters until! , while! , and break! offer traditional control constructs. The standard libraries define many other useful iters in many classes such as upto! (generate successive numbers), elt! (yield the elements of a container) and set! (store elements into a container). Such iterators make it convenient and safe to traverse complicated data structures by isolating the details of the iteration from the client of the data structure abstraction.

Iterators are critical for operating on collections of items. Matrices define iters to yield rows and columns; tree classes have recursive iters to traverse the nodes in pre-order, in-order and post-order; graph classes have iters to traverse vertices or edges breadth-first and depth-first. Other container classes such as hash tables, queues, etc. all provide iters to yield and insert elements. Arbitrary iterators may be used together in loops with other code.

Most likely, you will be able to do Assignment 2 using only already defined iterators. If interested, examine our iterator tutorial for more information and exercises. Also, see a TOPLAS paper "Iteration Abstraction in Sather" by Stephan Murer, Stephen Omohundro, David Stoutamire and Clemens Szyperski

Iterator usage example.

Iter example
my_fish.elt! produces elements of my_fish (which is a list of fish) one by one. A new position is computed For each produced fish.

Iter example
	my_fish.elt! produces elements of my_fish (which is a list of fish) one by one. A new position is computed For each produced fish.

pSather

pSather is the parallel extension to Sather. A major goal has been the easy reuse of serial code in parallel applications. pSather adds support for threads, synchronization, communication and placement of objects and threads.

Because of volume production, commercial workstations today offer better potential price/performance for general code than massively parallel processors. They also fit comfortably in the capital budget of most research grants. For these reasons networks of workstations considered as a single parallel computing facility will become an economically important platform.

Networks of workstations have longer latencies than centralized machines. In order to achieve high performance, it is important to organize data so that they do not need to be moved between workstations when that will stall waiting computations. Some compiler optimizations can alleviate this, but generally the programmer must design a layout intimately integrated with the specific algorithm

Machines do not have to have large latencies to make data placement important. Because processor speeds are outpacing memory speeds, attention to locality can have a profound effect on the performance of even ordinary serial programs (recall assignment 1). Existing serial languages can make life difficult for the performance-minded programmer because they do not allow much leeway in expressing placement. For example, extensions allowing the programmer to describe array layout as block-cyclic is helpful for matrix-oriented code but of no use for general data structures.

Some environments expose latencies to the programmer with a distributed memory model and explicit communication (split-phase). However, it is easier to program with a shared memory space which uses one name to refer to each datum no matter where the reference is. High performance still requires explicit human-directed placement. pSather tries to provide the best of both worlds; the compiler implements the shared memory abstraction using the most efficient facilities of the target platform available, while allowing the programmer to provide placement directives for control and data (without requiring them).

The memory performance model of pSather has two levels. The basic unit of location in pSather is the cluster . It is assumed that reading or writing memory on the same cluster is significantly faster than on a remote cluster. A cluster corresponds to an efficient group in the memory hierarchy, and may have more than one processor. For example, on a network of workstations a cluster would correspond to one workstation, although that workstation may have multiple processors sharing a common bus. This model is appropriate for any machine for which local cached access is significantly faster than general access.

In most languages there is no way to distinguish the locality of data that is referenced. This is convenient for the programmer but ignores the realities of modern machines, which introduce penalties for poorly placed data in the form of missing a cache line, TLB misses, or even paging to disk. This is especially important for distributed machines where data may reside on other nodes. pSather allows the programmer to help the compiler and runtime by providing explicit placement. If threads or objects are unfixed, the compiler and runtime can attempt to place the data somewhere suitable.

Threads

In serial Sather there is only one thread of execution; in pSather there may be many. Multiple threads are similar to multiple serial Sather programs executing concurrently, but threads share variables of a single namespace.

A new thread is created by executing a fork ,which may be a par or fork statement, parloop statement, or an attach . The new thread is a child of the forking thread. pSather provides operations that can block a thread, making it unable to execute statements until some condition occurs. pSather threads that are not blocked will eventually run, but there is no other constraint on the order of execution of statements between threads that are not blocked. Threads no longer exist once they terminate. When a pSather program begins execution it has a single thread corresponding to the main routine.

Fork Statement Example

dt:FLT=0.01; par loop i::=0.upto!(nthreads-1); fork loop t::=0.upto!(tsteps); move_my_fish(my_fish, dt); end; end; end; end;

fork statement must be syntactically enclosed in a par statement. Statements in the fork body are executed in a separate thread. Variables declared outside par (such as dt) are shared among all threads. Each thread has a copy of locals declared in the par body (such as t).

Parloop Statement Example

parloop S1 do S2 end

dt:FLT=0.01; parloop i::=0.upto!(nthreads-1); do loop t::=0.upto!(tsteps); move_my_fish(my_fish, dt); end; end;

syntactic sugar for par loop S1 fork S2 end end end

i::=0.upto!(nthreads-1) is evaluated serially. Code bracketed by do and end is executed in a different thread. Variables declared outside parloop (such as dt) are shared among all threads. Each thread has a copy of locals declared in the parloop body (such as t).

Attach Statement Example

g: GATE; loop i::=0.upto!(nthreads-1); g:-working_thread(tsteps, dt); end;

The left side of the attach statement must be of type $ATTACH (such as gates). For example, if the lhs is of type GATE{T}, the return type of the rhs must be of type 'T'. If the gate is locked by another thread, the executing thread is suspended until the gate becomes unlocked. The new thread is attached to the lhs. It receives a unique copy of every local variables. Changes to the locals by the originating thread are not observed by the new thread. When the rhs terminates, it detaches itself from the lhs, and enqueues the return value if any.

Synchronization between tasks often coincides with communication. Many parallel languages and libraries distinguish primitives for communicating and synchronizing. Because these so frequently go hand-in-hand, pSather provides a powerful construct to do both at once. A pSather GATE is a queue with implicit synchronization. It generalizes many constructs found in other languages, such as fork/join, barriers, semaphores, futures, condition variables, and mailboxes .

Gate Features

Signature	Description	Exclusive
create:SAME	Make a new unlocked GATE{T} object with an empty queue and no attached threads	N/A
size:INT	Returns number of elements in queue [GATE: returns counter]	No
has_thread:BOOL	Returns true if there exists a thread attached to the gate.	No
set(T) [GATE::set]	Replace head of queue with argument, or insert into queue if empty. [GATE: If counter is zero, set to one.]	Yes
get:T [GATE::get]	Return head of queue; do not remove from queue. Blocks until queue is not empty. [GATE: Blocks until counter is nonzero.]	Yes
enqueue(T) [GATE::enqueue]	Insert argument at tail of queue. [GATE: increment counter.]	Yes
dequeue:T [GATE::dequeue]	Block until queue is not empty, then remove and return head of queue. [GATE: Block until counter nonzero, then decrement.]	Yes

Locks

Locks control the blocking and unblocking of threads. `GATE', `GATE{T}', various `MUTEX's and read/write locks are special synchronization objects which provide a mutual exclusion lock. A thread acquires a lock, then holds the lock until it releases it. A single thread may acquire a lock multiple times recursively; it will be held until a corresponding number of releases occur. Exclusive locks such as `MUTEX' may only be held by one thread at a time. In addition to these simple exclusive locks, it is possible to lock on other more complex conditions.

Lock Statement Example

mutex1:MUTEX; -- global to threads parloop i::=0.upto!(nthreads-1); do -- code executed by each thread lock mutex1 then global_max_acc := global_max_acc.max(max_acc); global_max_speed := global_max_speed.max(max_speed); end; end;

Locks are acquired by the lock statement. The type of all expressions following `when' must be subtypes of $LOCK (MUTEX is a subtype of $LOCK). The statement list following the `then' is called the lock branch. A lock statement guarantees that all listed locks are atomically acquired before a lock branch executes. If the lock can't be acquired (for example, when some other thread holds it), the thread is suspended.

pSather Solution for Sharks & Fish 1.

A complete solution for problem 1 is available here . Problem 2 solution could be obtained here .

There are three main classes: MAIN, FISH, and CURRENT . Class MAIN contains the top level parallel loop for the simulation. Class FISH encapsulates a state of a single fish. It contains various fish attributes, such as position, mass, and velocity. It also defines a function that updates these attributes a given a force acting on a fish.

An important feature of class CURRENT is to return a current force given a position in space.

After definitions of various class attributes and local variables in main, we see calculations of the total number of threads and the number of fish managed by each thread. cluster_size is a built-in expression that returns a number of processors per calling cluster. For example, on ICSI machines that you will use for assignment 2, it should return 4, since each SMP has 4 processors. clusters is another built-in expressions that returns the number of clusters in the networks. Since you will be using a single stand-alone mutiprocessor, it should return 1. Fish are distributed equally among all threads.

A single parloop is sufficient to specify parallelism of the simulation. Each thread maintains a CURRENT object (although this is not necessary). An alternative would be to share a single global CURRENT object. Since the state of CURRENT does not change over the course of simulation and the object itself occupies very little space, it was decided to replicate it across the threads to save some communication cost.

Each thread manages a list of its local fish declared
my_fish:FLIST{FISH}
This means that my_fish is a variable of type FLIST{FISH} (a list whose elements are objects of type FISH).

Each thread iterates over a specified number of time steps. For each time step, threads first compute new positions for their respective lists of fish (by calling move_my_fish(my_fish, current, dt). This happens without any communication or synchronization since the lists are completely disjoint and there is no interaction among various fish. Then, each thread computes various things necessary to determine the size of the next time step: max velocity, max acceleration, etc. For simplicity (and also given that the number of threads is very limited), the following strategy was used to find global maximums of fish velocity and acceleration.

Each threads compares its maximum values with the currently known global maximums and replaces them if necessary. To avoid race conditions when multiple threads are trying to update the same variable such as global_max_acc and global_max_speed , a mutual exclusion lock is used to protect a critical section.

Another mutual exclusion lock is used to serialize drawing of fish by different threads. This is necessary primarily because Xlib functions are not thread safe.

CS267: Feb 13, 1996 Programming with pSather