Termination of a Single Task Queue



next up previous
Next: Termination of Multiple Up: Termination Detection Previous: Termination Detection

Termination of a Single Task Queue

Here we consider the termination of one task queue.

Termination detection is invoked when all processors are likely to be idle. An inexpensive method for computing this hint is to count the number of processors that has become idle at least once since the last invokation of the termination detection protocol. Each processor owns a termination request token, which is passed to the manager processor of the task queue when the processor becomes idle. The manager invokes the termination detection protocol when all tokens have been received. If termination detection fails, the tokens are sent back to the processors and the process is repeated.

As stated in Section 2, termination detection requires checking the task queue, the network, and the processors' work space. In general, it is difficult to obtain a consistent view of a distributed system without taking a snapshot. Instead of blocking all processors when performing termination detection, we collect global information incrementally and then check for consistency.

A sufficient condition for termination (referred to as ) is when all processors are clear of work, and the total number of dequeues equals the total number of enqueues. The former checks the processor work space, and the latter ensures the emptiness of the local partitions and the network.

Instead of computing the termination condition () directly, we compute an approximate condition () asynchronously based on possibly stale processor information. However, we make sure that the approximation is safe, that is, is true implies is true.

The procedure for computing is as follows. First, the manager sends a message to each processor to request for information. Upon seeing the message, each processor returns a triple ( clear, enqueues, dequeues), where clear is true if and only if the processor is out of tasks (in the local queue and the work space). The processor must also disable task migration. The manager combines all triples received from the processors and compute the formula for . If the condition is satisfied, the mananger sends a message to each processor to commit termination; otherwise, it sends messages to abort termination detection and resume task migration.

It remains to prove that the procedure yields a safe approximation to . Suppose that is false when is true. Then there must be an idle processor which receives a task after it responds to the request of the manager. The task must be sent by a processor before it is probed, when task migration is still allowed. As a result, the task is missed by the dequeue count. Since the dequeue count must equal the enqueue count for to be true, there must be a task which is missed by the enqueue count. This, however, implies that the task is enqueued to a processor after it is probed, and subsequently migrated to another processor not yet probed. The assumption contradicts with the protocol and the safety of is thus established.

To complete the correctness proof of , we must show that becomes true eventually after is true, that is, detects . Note that once becomes true it stays true. It is clear that must be true if the termination detection protocol is invoked after becomes true. Since that fact that is true sets up the condition for invoking the termination detection protocol, will eventually become true.



next up previous
Next: Termination of Multiple Up: Termination Detection Previous: Termination Detection



Boris Vaysman
Fri Mar 22 13:38:23 PST 1996