The Task Queue Load Balancing Protocol



next up previous
Next: Termination Detection Up: Load Balancing Previous: What to Balance

The Task Queue Load Balancing Protocol

We use an idle-initiated load balancing protocol for the distributed task queue. The processors are organized in a fixed topology (configurable by the user), and tasks move only between nearest neighbors. The task queue uses the computation cost estimate to balance work between processors, and in doing so it uses the migration penalty estimate to reduce the loss of locality.

The load balancing protocol proceeds as follows. Each processor communicates with its neighbors via a set of unidirectional channels with binary states (On and OFF). Migration to a neighbor processor is allowed if and only if the corresponding channel is ON. An idle processor initiates load balancing by turning on all OFF channels from its neighbors. Upon seeing an ON channel, a busy processor balance its with the neighbor processor, and then turns off the ON channel. A processor's state (idle or busy) is determined by the amount of work it is currently assigned - it is idle if its assigned work falls below a certain threshold. The protocol is completely distributed and no global information is required.

Flow control is enforced by having the idle processor set channel capacity of the channel, that is, the maximum number of tasks allowed through the channel, when it is turned on. The capacity corresponds to the amount of space reserved for incoming tasks from the channel. Unused space is freed by the busy processor when the channel is turned off.

A busy processor attempts to migrate a fixed portion of work to the requesting processor. The portion is controlled by a runtime variable, and is raised or lowered by the runtime system to balance the eveness of processor load and the overhead of task migration. Furthermore, a processor never becomes idle as a result of giving away tasks.

In the current implementation, the local partition of a task queue contains two queues. One of them contains tasks that can be migrated. The other stores tasks whose computation cost is no larger than the migration penalty; these tasks are not eligible for migration. Scheduling is performed according to the user-supplied priority.

The load balancing protocol has two important properties. First, the cost of obtaining a task is bounded, since each batch of requests (i.e., simultaneous channel-ON actions) acquires at least one task before the next batch can be sent. Second, there is a path of ON-channels from any idle processor to some busy processor. This guarantees that an idle processor will eventually obtain a task when the system is sufficiently loaded.



next up previous
Next: Termination Detection Up: Load Balancing Previous: What to Balance



Boris Vaysman
Fri Mar 22 13:38:23 PST 1996