Advanced Topics in Computer Systems, CS262a |
|
Eric Brewer (based on notes by Joe Hellerstein) |
|
Concurrency Control: Alternate Realities
Optimistic Concurrency Control (Kung & Robinson)
Attractive, simple idea: optimize case where conflict is rare.
Basic idea: all transactions consist of three phases:
- Read. Here, all writes are to private storage (shadow copies).
- Validation. Make sure no conflicts have occurred.
- Write. If Validation was successful, make writes public. (If not,
abort!)
When might this make sense? Three examples:
- All transactions are readers.
- Lots of transactions, each accessing/modifying only a small amount
of data, large total amount of data.
- Fraction of transaction execution in which conflicts "really take
place" is small compared to total pathlength.
The Validation Phase
- Goal: to guarantee that only serializable schedules result.
- Technique: actually find an equivalent serializable schedule. In
particular,
- Assign each transaction a TN during execution.
- Ensure that if you run transactions in order induced by "<" on
TNs, you get an equivalent serial schedule.
Suppose TN(Ti) < TN(Tj). Then if one of the following three conditions
holds, it’s serializable:
- Ti completes its write phase before Tj starts its read phase.
- WS(Ti) intersect RS(Tj) = emptyset
and Ti completes its write phase before Tj starts its write phase.
- WS(Ti) intersect RS(Tj) = WS(Ti)
intersect WS(Tj) = emptyset and Ti completes its read
phase before Tj completes its read phase.
Is this correct? Each condition guarantees that the 3 possible classes
of conflicts (W-R, R-W, W-W) on the 2 orderings (i before j, j before i)
go in one order only: i before j. There are 3x2=6 possible conflict
orderings to consider.
- For condition 1 all conflicts are ordered i before j (true serial
execution!)
- For condition 2,
- No WiRj or RjWi conflicts since WS(Ti) Ç
RS(Tj) = Æ
- No WjRi or WjWi conflicts since the write phase (and hence the read
phase) of Ti precedes the write phase of Tj.
- This leaves the possibility of RiWj and WiWj, both of which are ordered
i before j.
- For condition 3,
- No WiRj or RjWi conflicts since WS(Ti) Ç
RS(Tj) = Æ .
- No WiWj or WjWi conflicts since WS(Ti) Ç
WS(Tj) = Æ .
- WjRi not possible since the read phase of Ti precedes the write phase
of Tj.
- This leaves only the possibility of RiWj.
Assigning TN's: at beginning of transactions is not optimistic,
since a transaction could not be able to validate immediately if it's predecessor
transactions were still running; this smells like locking. Instead,
assign TNs it at end of read phase. Note: this satisfies second half of
condition (3).
Note: a transaction T with a very long read phase must check write
sets of all transactions begun and finished while T was active. This
could require unbounded buffer space.
Solution: bound buffer space, toss out when full, abort transactions
that could be affected.
- Gives rise to starvation. Solve by having starving transaction write-lock
the whole DB!
Serial Validation
Only checks properties (1) and (2), since writes are not going to be interleaved.
Simple technique: make a critical section around <get xactno; validate
(1) or (2) for everybody from your start to finish; write>. Not great
if:
- write takes a long time
- SMP – might want to validate 2 things at once if there’s not enough
reading to do
Improvement to speed up validation:
repeat as often as you want {
}
<get xactno; validate with new xacts; write>.
Note: read-only xacts don’t need to get xactnos! Just need to validate
up to highest xactno at end of read phase (without critical section!)
Parallel Validation
Want to allow interleaved writes.
Need to be able to check condition (3).
- Save active xacts (those which have finished reading but not writing).
- Active xacts can’t intersect your read or write set.
- Validation:
<get xactno; copy active; add yourself to active>
check (1) or (2) against everything from start to finish;
check (3) against all xacts in active copy
If all’s clear, go ahead and write.
<bump xact counter, remove yourself from active>.
Small critical section.
Problems:
- a member of active that causes you to abort may have aborted
- can add even more bookkeeping to handle this
- can make active short with improvement analogous to that of serial
validation
One More Concurrency Control Technique
Time Stamping: Bernstein TODS ’79
- Every xact gets a unique timestamp at startup
- on Read: OK if Xact TS > WTS. Install new RTS if xact TS >
RTS.
- on Write: OK if xact TS > MAX(RTS, WTS). Install new WTS.
Problems:
- forces time-stamp order (tighter restriction than other schemes)
- cascaded aborts (no isolation)
- I/O cost even on "clean" pages
Multi-version timestamping techniques:
- Reed’s PhD, MIT ‘78
- reads get the appropriate version
- writes are a bit trickier – can be added if nobody read object between
the new write and any "later" writes
Timestamping is not dead, but it is not popular, either. Note that
it wasn't used in Postgres (which did keep versions).
Performance Study: Locking vs. Optimistic
Agrawal/Carey/Livny
Previous work had conflicting results:
- Carey & Stonebraker (VLDB84), Agrawal & DeWitt (TODS85): blocking
beats restarts
- Tay (Harvard PhD) & Balter (PODC82): restarts beat blocking
- Franaszek & Robinson (TODS85): optimistic beats locking
Goal of this paper:
- Do a good job modeling the problem and its variants
- Capture causes of previous conflicting results
- Make recommendations based on variables of problem
Methodology:
- simulation study, compare Blocking (i.e. 2PL), Immediate Restart (restart
after E(xact length) when denied a lock), and Optimistic (a la Kung &
Robinson)
- pay attention to model of system:
- database system model: hardware and software model (CPUs, disks, size
& granule of DB, load control mechanism, CC algorithm
- user model: arrival of user tasks, nature of tasks (e.g. batch vs.
interactive)
- transaction model: logical reference string (i.e. CC schedule), physical
reference string (i.e. disk block requests, CPU processing bursts).
Probabilistic modeling of each. They argue this is key to a performance
study of a DBMS.
- logical queuing model
- physical queuing model
Measurements
- measure throughput, mostly
- pay attention to variance of response time, too
- pick a DB size so that there are noticeable conflicts (else you get
comparable performance)
Experiment 1: Infinite Resources
- as many disks and CPUs as you want
- blocking thrashes due to transactions blocking numerous times &
deadlock
- restart plateaus: adaptive wait period (avg response time) before
restart
- serves as a primitive load control!
- optimistic scales logarithmically: restarts go up, but new xacts
replace old
- standard deviation of response time under locking much lower
Experiment 2: Limited Resources (1 CPU, 2 disks)
- Everybody thrashes
- blocking throughput peaks at mpl 25
- optimistic peaks at 10
- restart peaks at 10, plateaus at 50 – as good or better than optimistic
- at super-high mpl (200), restart beats both blocking and optimistic
- but total throughput worse than blocking @ mpl 25
- effectively, restart is achieving mpl 60
- load control is the answer here – adding it to blocking & optimistic
makes them handle higher mpls better
Experiment 3: Multiple Resources (5, 10, 25, 50 CPUs, 2 disks each)
- optimistic starts to win at 25 CPUs
- when useful disk utilization is only about 30%, system begins to behave
like infinite resources
- even better at 50
Experiment 4: Interactive Workloads
Add user think time.
- makes the system appear to have more resources
- so optimistic wins with think times 5 & 10 secs. Blocking still
wins for 1 second think time.
Questioning 2 assumptions:
- fake restart – biases for optimistic
- fake restarts result in less conflict.
- cost of conflict in optimistic is higher
- issue of k > 2 transactions contending for one item
- will have to punish k-1 of them with real restart
- write-lock acquisition
- recall our discussion of lock upgrades and deadlock
- blind write biases for restart (optimistic not an issue here), particularly
with infinite resources (blocking holds write locks for a long time; waste
of deadlock restart not an issue here).
- with finite resources, blind write restarts transactions earlier
(making restart look better)
Conclusions
- blocking beats restarting, unless resource utilization is low
- possible in situations of high think time
- mpl control important. Restart’s adaptive load control is too clumsy,
though.
- false assumptions made blocking look relatively worse