CS61B: Lecture 38 Monday, April 29, 2013 RANDOMIZED ANALYSIS =================== _Randomized_algorithms_ are algorithms that make decisions based on rolls of the dice. The random numbers actually help to keep the running time low. Examples are quicksort, quickselect, and hash tables with random hash codes. Randomized analysis, like amortized analysis, is a mathematically rigorous way of saying, "The average running time of this operation is fast, even though the worst-case running time is slow." Unlike amortized analysis, the "average" is taken over an infinite number of runs of the program. A randomized algorithm will sometimes run more slowly than the average, but the probability that it will run _asymptotically_ slower is extremely low. Randomized analysis requires a little bit of probability theory. Expectation ----------- Suppose a method x() flips a coin. If the coin comes up heads, x() takes one second to execute. If it comes up tails, x() takes three seconds. Let X be the exact running time of one call to x(). With probability 0.5, X is 1, and with probability 0.5, X is 3. For obvious reasons, X is called a _random_variable_. The _expected_ value of X is the average value X assumes in an infinite sequence of coin flips, E[X] = 0.5 * 1 + 0.5 * 3 = 2 seconds expected time. Suppose we run the code sequence x(); // takes time X x(); // takes time Y and let Y be the running time of the _second_ call. The total running time is T = X + Y. (Y and T are also random variables.) What is the expected total running time E[T]? The main idea from probability we need is called _linearity_of_expectation_, which says that expected running times sum linearly. E[X + Y] = E[X] + E[Y] = 2 + 2 = 4 seconds expected time. The interesting thing is that linearity of expectation holds true whether or not X and Y are _independent_. Independence means that the first coin flip has no effect on the outcome of the second. If X and Y are independent, the code will take four seconds on average. But what if they're not? Suppose the second coin flip always matches the first--we always get two heads, or two tails. Then the code still takes four seconds on average. If the second coin flip is always the opposite of the first--we always get one head and one tail-- the code still takes four seconds on average. So if we determine the expected running time of each individual operation, we can determine the expected running time of a whole program by adding up the expected costs of all the operations. Hash Tables ----------- The implementations of hash tables we have studied don't use random numbers, but we can model the effects of collisions on running time by pretending we have a random hash code. A _random_hash_code_ maps each possible key to a number that's chosen randomly. This does _not_ mean we roll dice every time we hash a key. A hash table can only work if a key maps to the same bucket every time. Each key hashes to a randomly chosen bucket in the table, but a key's random hash code never changes. Unfortunately, it's hard to choose a hash code randomly from all possible hash codes, because you need to remember a random number for each key, and that would seem to require another hash table. However, random hash codes are a good _model_ for how a good hash code will perform. The model isn't perfect, and it doesn't apply to bad hash codes, but for a hash code that proves effective in experiments, it's a good rough guess. Moreover, there is a sneaky number-theoretical trick called _universal_hashing_ that generates random hash codes. These random hash codes are chosen from a relatively small set of possibilities, yet they perform just as well as if they were chosen from the set of all possible hash codes. (If you're interested, you can read about it in the textbook "Algorithms" by Cormen, Leiserson, Rivest, and Stein.) Assume our hash table uses chaining and does not allow duplicate keys. If an entry is inserted whose key matches an existing entry, the old entry is replaced. Suppose we perform the operation find(k), and the key k hashes to a bucket b. Bucket b contains at most one entry with key k, so the cost of the search is one dollar, plus an additional dollar for every entry stored in bucket b whose key is not k. (Recall from last lecture that a _dollar_ is a unit of time chosen large enough to make this statement true.) Suppose there are n keys in the table besides k. Let V1, V2, ..., Vn be random variables such that for each key ki, the variable Vi = 1 if key ki hashes to bucket b, and Vi is zero otherwise. Then the cost of find(k) is T = 1 + V1 + V2 + ... + Vn. The expected cost of find(k) is (by linearity of expectation) E[T] = 1 + E[V1] + E[V2] + ... + E[Vn]. What is E[Vi]? Since there are N buckets, and the hash code is random, each key has a 1/N probability of hashing to bucket b. So E[Vi] = 1/N, and E[T] = 1 + n/N, which is one plus the load factor! If we keep the load factor n/N below some constant c as n grows, find operations cost expected O(1) time. The same analysis applies to insert and remove operations. All three hash table operations take O(1) expected amortized time. (The word "amortized" accounts for table resizing, as discussed last lecture.) Observe that the running times of hash table operations are _not_ independent. If key k1 and key k2 both hash to the same bucket, it increases the running time of both find(k1) and find(k2). Linearity of expectation is important because it implies that we can add the expected costs of individual operations, and obtain the expected total cost of all the operations an algorithm performs. Quicksort --------- Recall that mergesort sorts n items in O(n log n) time because the recursion tree has 1 + ceiling(log_2 n) levels, and each level involves O(n) time spent merging lists. Quicksort also spends linear time at each level (partitioning the lists), but it is trickier to analyze because the recursion tree is not perfectly balanced, and some keys survive to deeper levels than others. To analyze quicksort, let's analyze the expected depth one input key k will reach in the tree. (In effect, we're measuring a vertical slice of the recursion tree instead of a horizontal slice.) Assume no two keys are equal, since that is the slowest case. Quicksort chooses a random pivot. The pivot is equally likely to be the smallest key, the second smallest, the third smallest, ..., or the largest. For each case, the probability is 1/n. Since we want a roughly balanced partition, let's say that the least floor(n/4) keys and the greatest floor(n/4) keys are "bad" pivots, and the other keys are "good" pivots. Since there are at most n/2 bad pivots, the probability of choosing a bad pivot is <= 0.5. If we choose a good pivot, we'll have a 1/4-3/4 split or better, and our chosen key k will go into a subset containing at most three quarters of the keys, which is sorted recursively. If we choose a bad pivot, k might go into a subset with nearly all the other keys. Let D(n) be a random variable equal to the deepest depth at which key k appears when we sort n keys. D(n) varies from run to run, but we can reason about its expected value. Since we choose a bad key no more than half the time, E[D(n)] <= 1 + 0.5 E[D(n)] + 0.5 E[D(3n / 4)]. Multiplying by two and subtracting E[D(n)] from both sides gives E[D(n)] <= 2 + E[D(3n / 4)]. This inequality is called a _recurrence_, and you'll learn how to solve them in CS 170. (No, recurrences won't be on the CS 61B final exam.) The base cases for this recurrence are D(0) = 0 and D(1) = 0. It's easy to check by substitution that a solution is E[D(n)] <= 2 log n. 4/3 So any arbitrary key k appears in expected O(log n) levels of the recursion tree, and causes O(log n) partitioning work. By linearity of expectation, we can sum the expected O(log n) work for each of the n keys, and we find that quicksort runs in expected O(n log n) time. Quickselect ----------- For quickselect, we can analyze the expected running time more directly. Suppose we run quickselect on n keys. Let P(n) be a random variable equal to the total number of keys partitioned, summed over all the partitioning steps. Then the running time is in Theta(P(n)). Quickselect is like quicksort, but when we choose a good pivot, at least one quarter of the keys are discarded. We choose a good pivot at least half the time, so E[P(n)] <= n + 0.5 E[P(n)] + 0.5 E[P(3n / 4)], which is solved by E[P(n)] <= 8n. Therefore, the expected running time of quickselect on n keys is in O(n). Amortized Time vs. Expected Time -------------------------------- There's a subtle but important difference between amortized running time and expected running time. Quicksort with random pivots takes O(n log n) expected running time, but its worst-case running time is in Theta(n^2). This means that there is a small possibility that quicksort will cost Omega(n^2) dollars, but the probability of that happening approaches zero as n approaches infinity. A splay tree operation takes O(log n) amortized time, but the worst-case running time for a splay tree operation is in Theta(n). Splay trees are not randomized, and the "probability" of an Omega(n)-time splay tree operation is not a meaningful concept. If you take an empty splay tree, insert the items 1...n in order, then run find(1), the find operation _will_ cost n dollars. But a sequence of n splay tree operations, starting from an empty tree, _never_ costs more than O(n log n) actual running time. Ever. Hash tables are an interesting case, because they use both amortization and randomization. Resizing takes Theta(n) time. With a random hash code, there is a tiny probability that every item will hash to the same bucket, so the worst-case running time of an operation is Theta(n)--even without resizing. To account for resizing, we use amortized analysis. To account for collisions, we use randomized analysis. So when we say that hash table operations run in O(1) time, we mean they run in O(1) _expected_, _amortized_ time. Splay trees O(log n) amortized time / operation * Disjoint sets (tree-based) O(alpha(f + u, u)) amortized time / find op ** Quicksort O(n log n) expected time *** Quickselect Theta(n) expected time **** Hash tables Theta(1) expected amortized time / op ***** If you take CS 170, you will learn an amortized analysis of disjoint sets there. Unfortunately, the analyses of both disjoint sets and splay trees are complicated. Goodrich & Tamassia give the amortized analysis of splay trees, but you're not required to read or understand it for this class. * Worst-case time is in Theta(n), worst-case amortized time is in Theta(log n), best-case time is in Theta(1). ** For find operations, worst-case time is in Theta(log u), worst-case amortized time is in Theta(alpha(f + u, u)), best-case time is in Theta(1). All union operations take Theta(1) time. *** Worst-case time is in Theta(n^2)--if we get worst-case input AND worst-case random numbers. "Worst-case expected" time is in Theta(n log n)--meaning when the _input_ is worst-case, but we take the average over all possible sequences of random numbers. Recall that quicksort can be implemented so that keys equal to the pivot go into a separate list, in which case the best-case time is in Theta(n), because the best-case input is one where all the keys are equal. If quicksort is implemented so that keys equal to the pivot are divided between lists I1 and I2, as is the norm for array-based quicksort, then the best-case time is in Theta(n log n). **** Worst-case time is in Theta(n^2)--if we get worst-case input AND worst- case random numbers. Worst-case expected time, best-case time, and best-case expected time are in Theta(n). ***** Worst-case time is in Theta(n), expected worst-case time is in Theta(n) (worst case is when table is resized), amortized worst-case time is in Theta(n) (worst case is when every item is in one bucket), worst-case expected amortized time is in Theta(1), best-case time is in Theta(1). Confused yet?