Here we consider the termination of one task queue.
Termination detection is invoked when all processors are likely to be idle. An inexpensive method for computing this hint is to count the number of processors that has become idle at least once since the last invokation of the termination detection protocol. Each processor owns a termination request token, which is passed to the manager processor of the task queue when the processor becomes idle. The manager invokes the termination detection protocol when all tokens have been received. If termination detection fails, the tokens are sent back to the processors and the process is repeated.
As stated in Section 2, termination detection requires checking the task queue, the network, and the processors' work space. In general, it is difficult to obtain a consistent view of a distributed system without taking a snapshot. Instead of blocking all processors when performing termination detection, we collect global information incrementally and then check for consistency.
A sufficient condition for termination (referred to as )
is when all processors are clear of work, and the total number of
dequeues equals the total number of enqueues.
The former checks the processor work space,
and the latter ensures the emptiness of the
local partitions and the network.
Instead of computing the termination condition ()
directly, we compute an approximate condition (
)
asynchronously based on possibly stale processor
information. However, we make sure that the approximation
is safe, that is,
is true implies
is true.
The procedure for computing is as follows.
First, the manager sends a message to
each processor to request for information. Upon
seeing the message, each processor returns a
triple ( clear, enqueues, dequeues), where clear
is true if and only if the processor is out of tasks
(in the local queue and the work space).
The processor must also disable task migration.
The manager combines all triples received from
the processors and compute the formula for
.
If the condition is satisfied, the mananger sends
a message to each processor to commit termination;
otherwise, it sends messages to abort termination
detection and resume task migration.
It remains to prove that the procedure yields a
safe approximation to . Suppose that
is false when
is true.
Then there must be an idle processor
which receives a task after it responds to the
request of the manager. The task must be sent
by a processor before it is probed, when
task migration is still allowed. As a result,
the task is missed by the dequeue count.
Since the dequeue count must equal the enqueue
count for
to be true, there must be a
task which is missed by the enqueue count.
This, however, implies that the task is enqueued
to a processor after it is probed, and subsequently
migrated to another processor not yet probed.
The assumption contradicts with the protocol
and the safety of
is thus established.
To complete the correctness proof of , we
must show that
becomes true eventually
after
is true, that is,
detects
.
Note that once
becomes true
it stays true. It is clear that
must be
true if the termination detection protocol is
invoked after
becomes true. Since that fact
that
is true sets up the condition for
invoking the termination detection protocol,
will eventually become true.