Reactive Synchronization Algorithms for Multiprocessors

Beng-Hong Lim and Anant Agarwal, MIT LCS

One-line summary: Allow dynamic selection of which "flavor" of synchronization to use (test-and-set, queue lock, etc.), by tracking the level of contention experienced by running program. A consensus object synchronizes (!) the change among algorithms.

Overview/Main Points

Key: allow multiple processes to execute different synchronization protocols, but ensure that only processes executing the currently selected protocol succeed (others must retry).
A mode variable is used as a hint for which protocol is the current one.
To change from protocol A to protocol B: acquire L(A) (lock for A's synchronization object); change mode variable to B; signal L(B); leave L(A) locked. Invariant: L(A) and L(B) are never both free simultaneously.
A process that had begun executing protocol A will find L(A) locked, and will either recheck the mode variable, which has been changed to B, or will be signaled for retry later, so they check the mode variable at that time.
The formal requirements for using this cool hack:
- My paraphrase: Critical sections are mutually exclusive, and completion of the critical sections is a necessary and sufficient condition for successful protocol completion.
  The gory details:
- To each protocol corresponds a consensus object which must be accessed atomically in order to complete the protocol (test-and-test-and-set lock, queue lock, etc.)
- Once the C.O. has been successfully accessed, the protocol must be able to run to successful completion regardless of whether other processes subsequently access the C.O.
- Internal protocol state (of a synchronization operation) can only be modified by the process that currently has access to the C.O.
Policy (when to change protocols): e.g. change from T&T&S to queue lock when the number of TTS trials seen by a process before succeeding exceeds some threshold; change from queue lock to TTS when queue is found to empty upon acquisition some threshold number of consecutive times. No formalisms offered for policy.
Microbenchmark Performance: Reactive algorithm follows the "optimum curve" for both fetch-and-op and spin locks (i.e. it performs as well as TS or TTS with backoff for low contention, and as well as queue locks for high contention).
Macrobenchmarks: highly concurrent programs run with varying numbers of processors (hence varying contention levels). For each run, they tried queue locks and software combining trees. Reactive algorithm matched performance of the better of the two in every case. (The graphs in this paper are great; check them out.)
Overhead of switching protocols: Plot of wall clock time vs. increasing relative frequency of contention-level changes. When contention is slowly changing, protocol switching overhead plays a larger role; when contention level is rapidly changing, protocl switching overhead goes away (and lock performance dominates). Again, best to see graphs in paper.
Reduces motivation to provide hardware support for queue locks (as some archs. do, such as Wisconsin Multicube and Stanford DASH), since reactive alg. can be used to select T&S at low contention anyway, which makes software queue lock overhead go away.

Relevance

A good use of specialization: expose synchronization protocol selection and allow it to track contention level. Allows better performance on hardware not optimized for a particular kind of synchronization protocol, and decreases motivation for providing such optimizations in hardware.

Flaws

Behavior needs to be encapsulated to make it easy to use; applications don't want to have to deal with protocol switching directly.
Work still needed on policy decisions; how to encapsulate that?

Back to index