Introduction

For many applications, load balancing cannot be performed a prioi. A typical example of such applications is the search problem, where tasks are created at runtime based on the current status of computation. Also, the execution time of tasks may be unpredictable, even though the number of tasks and their dependency can be determined in advance. Dynamic load balancing must be used to deal with such irregularity.

A task queue is commonly used to implement dynamic load balancing. A task queue serves as a common repository for tasks from multiple processors. It is also responsible for scheduling tasks for execution when requested by the processors. The program starts with some initial tasks, which may generate more tasks when processed; it ends when the task queue is empty and all scheduled tasks have finished execution.

A naive implementation of a task queue is to make one processor responsible for all accesses. Such a centralized task queue will certainly become the performance bottleneck when the number of processors is large. A better implementation is to distribute the accesses among different processors. The advantage of a distributed task queue is two fold. First, the latency of the accesses can is reduced if they can be processed locally. Second, the bandwidth of accesses is increased if multiple accesses can be performed simultaneously on different portions of the task queue.

The interface of a distributed task queue resembles that of a sequential priority queue. However, some of the features requires new interpretations in the context of parallel programming. Among these are the meaning of priority and termination detection. We relax the strict priority semantics of a sequential queue to increase parallelism and to reduce the cost of implementation. We also augment the interface to handle termination detection correctly in the presence of multi-threading, which is unique to a parallel programming environment. Further discussion is given in Section 2

The most important function of a distributed task queue is to perform load balancing. We use a distributed protocol that balances load between nearest neighbors, which is extremely simple and works well in practice. The protocol also attempts to reduce the task migration penalty due to loss of locality. Section 3 describes the load balancing protocol in details.

The rest of the paper is organized as follows. Section 2 discuss the functionality and the interface of the distributed task queue. Section 3 describes the load balancing protocol. Section 4 describes the termination detection protocol. Section 5 sketches the implementation on the CM5 multiprocessor.

Next: Functionality of the Up: A Distributed Task Queue Previous: A Distributed Task Queue

Boris Vaysman
Fri Mar 22 13:38:23 PST 1996