Back to index
Reactive Synchronization Algorithms for Multiprocessors
Beng-Hong Lim and Anant Agarwal, MIT LCS
One-line summary:
Allow dynamic selection of which "flavor" of synchronization to use
(test-and-set, queue lock, etc.), by tracking the level of contention
experienced by running program. A
consensus object synchronizes (!) the change among algorithms.
Overview/Main Points
- Key: allow multiple processes to execute different
synchronization protocols, but ensure that only processes
executing the currently selected protocol succeed (others must
retry).
- A mode variable is used as a hint for which protocol is
the current one.
- To change from protocol A to protocol B: acquire L(A) (lock for A's
synchronization object); change mode variable to B; signal L(B); leave
L(A) locked. Invariant: L(A) and L(B) are never both free
simultaneously.
- A process that had begun executing protocol A will find L(A)
locked, and will either recheck the mode
variable, which has been changed to B, or will be signaled for
retry later, so they check the mode variable at that time.
- The formal requirements for using this cool hack:
- My paraphrase: Critical sections are mutually
exclusive, and completion of the critical sections is a
necessary and sufficient condition for successful protocol
completion.
The gory details:
- To each protocol corresponds a consensus
object which must be accessed atomically in order to complete
the protocol (test-and-test-and-set lock, queue lock,
etc.)
- Once the C.O. has been successfully accessed, the protocol
must be able to run to successful completion regardless of
whether other processes subsequently access the C.O.
- Internal protocol state (of a synchronization operation)
can only be modified by the process that currently has
access to the C.O.
- Policy (when to change protocols): e.g. change from T&T&S
to queue lock when the number of TTS trials seen by a process
before succeeding exceeds some threshold; change from queue lock
to TTS when queue is found to empty upon acquisition some
threshold number of consecutive times. No formalisms offered for
policy.
- Microbenchmark Performance: Reactive algorithm follows the
"optimum curve" for
both fetch-and-op and spin locks (i.e. it performs as well as TS
or TTS with backoff for low contention, and as well as queue
locks for high contention).
- Macrobenchmarks: highly concurrent programs run with varying
numbers of processors (hence varying contention levels). For
each run, they tried queue locks and software combining trees.
Reactive algorithm matched performance of the better of the two
in every case. (The graphs in this paper are great; check them
out.)
- Overhead of switching protocols: Plot of wall clock time
vs. increasing relative frequency of contention-level changes.
When contention is slowly changing, protocol switching overhead
plays a larger role; when contention level is rapidly changing,
protocl switching overhead goes away (and lock performance
dominates). Again, best to see graphs in paper.
- Reduces motivation to provide hardware support for queue locks
(as some archs. do, such as Wisconsin Multicube and Stanford
DASH), since reactive alg. can be used to select T&S at low
contention anyway, which makes software queue lock overhead go away.
Relevance
A good use of specialization: expose synchronization protocol selection
and allow it to track contention level. Allows better performance on
hardware not optimized for a particular kind of synchronization
protocol, and decreases motivation for providing such optimizations in
hardware.
Flaws
- Behavior needs to be encapsulated to make it easy to use;
applications don't want to have to deal with protocol switching
directly.
- Work still needed on policy decisions; how to encapsulate that?
Back to index