CS61B: Lecture 31 Wednesday, April 9, 2014 QUICKSORT ========= Quicksort is a recursive divide-and-conquer algorithm, like mergesort. Quicksort is in practice the fastest known comparison-based sort for arrays, even though it has a Theta(n^2) worst-case running time. If properly designed, however, it virtually always runs in O(n log n) time. On arrays, this asymptotic bound hides a constant smaller than mergesort's, but mergesort is often slightly faster for sorting linked lists. Given an unsorted list I of items, quicksort chooses a "pivot" item v from I, then puts each item of I into one of two unsorted lists, depending on whether its key is less or greater than v's key. (Items whose keys are equal to v's key can go into either list; we'll discuss this issue later.) Start with the unsorted list I of n input items. Choose a pivot item v from I. Partition I into two unsorted lists I1 and I2. - I1 contains all items whose keys are smaller than v's key. - I2 contains all items whose keys are larger than v's. - Items with the same key as v can go into either list. - The pivot v, however, does not go into either list. Sort I1 recursively, yielding the sorted list S1. Sort I2 recursively, yielding the sorted list S2. Concatenate S1, v, and S2 together, yielding a sorted list S. The recursion bottoms out at one-item and zero-item lists. (Zero-item lists can arise when the pivot is the smallest or largest item in its list.) How long does quicksort take? The answer is made apparent by examining several possible recursion trees. In the illustrations below, the pivot v is always chosen to be the first item in the list. --------------------------- --------------------------- |4 | 7 | 1 | 5 | 9 | 3 | 0| |0 | 1 | 3 | 4 | 5 | 7 | 9| v = pivot --------------------------- --------------------------- / | \ / | \ * = empty list ----------- --- ----------- / --- ----------------------- |1 | 3 | 0| |4| |7 | 5 | 9| * |0| |1 | 3 | 4 | 5 | 7 | 9| I1----------- --- -----------I2 --- ----------------------- / | \ v / | \ v / | \ --- --- --- --- --- --- / --- ------------------- |0| |1| |3| |5| |7| |9| * |1| |3 | 4 | 5 | 7 | 9| I1--- --- ---I2 I1--- --- ---I2 --- ------------------- v v v / | \ / --- --------------- 0 1 3 4 5 7 9 * |3| |4 | 5 | 7 | 9| --- --------------- In the example at left, we get lucky, and the pivot v / | \ always turns out to be the item having the median key. / --- ----------- Hence, each unsorted list is partitioned into two pieces * |4| |5 | 7 | 9| of equal size, and we have a well-balanced recursion --- ----------- tree. Just like in mergesort, the tree has O(log n) v / | \ levels. Partitioning a list is a linear-time operation, / --- ------- so the total running time is O(n log n). * |5| |7 | 9| --- ------- The example at right, on the other hand, shows the Theta(n^2) v / | \ performance we suffer if the pivot always proves to have the / --- --- smallest or largest key in the list. (You can see it takes * |7| |9| Omega(n^2) time because the first n/2 levels each process a list --- --- of length n/2 or greater.) The recursion tree is as unbalanced v as it can be. This example shows that when the input list I happens to be already sorted, choosing the pivot to be the first item of the list is a disastrous policy. Choosing a Pivot ---------------- We need a better way to choose a pivot. A respected, time-tested method is to randomly select an item from I to serve as pivot. With a random pivot, we can expect "on average" to obtain a 1/4 - 3/4 split; half the time we'll obtain a worse split, half the time better. A little math (see Goodrich and Tamassia Section 11.2.1) shows that the average running time of quicksort with random pivots is in O(n log n). (We'll do the analysis late this semester in a lecture on "Randomized analysis.") An even better way to choose a pivot (when n is larger than 50 or so) is called the "median-of-three" strategy. Select three random items from I, and then choose the item having the middle key. With a lot of math, this strategy can be shown to have a smaller constant (hidden in the O(n log n) notation) than the one-random-item strategy. Quicksort on Linked Lists ------------------------- --------------------------- I deliberately left unresolved the question of |5 | 5 | 5 | 5 | 5 | 5 | 5| what to do with items that have the same key as --------------------------- the pivot. Suppose we put all the items having / | the same key as v into the list I1. If we try to ----------------------- --- sort a list in which every single item has the |5 | 5 | 5 | 5 | 5 | 5| |5| same key, then _every_ item will go into list I1, ----------------------- --- and quicksort will have quadratic running time! I1 v (See illustration at right.) --------------------------- When sorting a linked list, a far better solution |5 | 7 | 5 | 0 | 6 | 5 | 5| is to partition I into _three_ unsorted lists I1, --------------------------- I2, and Iv. Iv contains the pivot v and all the / | \ other items with the same key. We sort I1 and I2 --- --------------- ------- recursively, yielding S1 and S2. Iv, of course, |0| |5 | 5 | 5 | 5| |7 | 6| does not need to be sorted. Finally, we --- --------------- ------- concatenate S1, Iv, and S2 to yield S. I1 v Iv I2 This strategy is quite fast if there are a large number of duplicate keys, because the lists called "Iv" (at each level of the recursion tree) require no further sorting or manipulation. Unfortunately, with linked lists, selecting a pivot is annoying. With an array, we can read a randomly chosen pivot in constant time; with a linked list we must walk half-way through the list on average, increasing the constant in our running time. However, if we restrict ourselves to pivots near the beginning of the linked list, we risk quadratic running time (for instance, if I is already in sorted order, or nearly so), so we have to pay the price. (If you are clever, you can speed up your implementation by choosing random pivots during the partitioning step for the _next_ round of partitioning.) Quicksort on Arrays ------------------- Quicksort shines for sorting arrays. In-place quicksort is very fast. But a fast in-place quicksort is tricky to code. It's easy to write a buggy or quadratic version by mistake. Goodrich and Tamassia did. Suppose we have an array a in which we want to sort the items starting at a[low] and ending at a[high]. We choose a pivot v and move it out of the way by swapping it with the last item, a[high]. We employ two array indices, i and j. i is initially "low - 1", and j is initially "high", so that i and j sandwich the items to be sorted (not including the pivot). We will enforce the following invariants. - All items at or left of index i have a key <= the pivot's key. - All items at or right of index j have a key >= the pivot's key. To partition the array, we advance the index --------------------------- i until it encounters an item whose key is |3 | 8 | 0 | 9 | 5 | 7 | 4| greater than or equal to the pivot's key; then --------------------------- we decrement the index j until it encounters low v high an item whose key is less than or equal to the pivot's key. Then, we swap the items at --------------------------- i and j. We repeat this sequence until the |3 | 8 | 0 | 9 | 4 | 7 | 5| indices i and j meet in the middle. Then, --------------------------- we move the pivot back into the middle (by ^ ^ swapping it with the item at index i). i j An example is given at right. The randomly --------------------------- selected pivot, whose key is 5, is moved to |3 | 8 | 0 | 9 | 4 | 7 | 5| the end of the array by swapping it with the --------------------------- last item. The indices i and j are created. advance: i j i advances until it reaches an item whose key is >= 5, and j retreats until it reaches an --------------------------- item whose key is <= 5. The two items are |3 | 4 | 0 | 9 | 8 | 7 | 5| swapped, and i advances and j retreats again. --------------------------- After the second advance/retreat, i and j swap: i j have crossed paths, so we do not swap their items. Instead, we swap the pivot with the --------------------------- item at index i, putting it between the lists |3 | 4 | 0 | 9 | 8 | 7 | 5| I1 and I2 where it belongs. --------------------------- advance: j i What about items having the same key as the pivot? Handling these is particularly ----------- --- ----------- tricky. We'd like to put them on a separate |3 | 4 | 0| |5| |8 | 7 | 9| list (as we did for linked lists), but doing ----------- --- ----------- that in place is too complicated. As I noted I1 i I2 previously, if we put all these items into the list I1, we'll have quadratic running time when all the keys in the array are equal, so we don't want to do that either. The solution is to make sure each index, i and j, stops whenever it reaches a key equal to the pivot. Every key equal to the pivot (except perhaps one, if we end with i = j) takes part in one swap. Swapping an item equal to the pivot may seem unnecessary, but it has an excellent side effect: if all the items in the array have the same key, half these items will go into I1, and half into I2, giving us a well-balanced recursion tree. (To see why, try running the pseudocode below on paper with an array of equal keys.) WARNING: The code on page 530 of Goodrich and Tamassia gets this WRONG. Their implementation has quadratic running time when all the keys are equal. public static void quicksort(Comparable[] a, int low, int high) { // If there's fewer than two items, do nothing. if (low < high) { int pivotIndex = random number from low to high; Comparable pivot = a[pivotIndex]; a[pivotIndex] = a[high]; // Swap pivot with last item a[high] = pivot; int i = low - 1; int j = high; do { do { i++; } while (a[i].compareTo(pivot) < 0); do { j--; } while ((a[j].compareTo(pivot) > 0) && (j > low)); if (i < j) { swap a[i] and a[j]; } } while (i < j); a[high] = a[i]; a[i] = pivot; // Put pivot in the middle where it belongs quicksort(a, low, i - 1); // Recursively sort left list quicksort(a, i + 1, high); // Recursively sort right list } } Can the "do { i++ }" loop walk off the end of the array and generate an out-of- bounds exception? No, because a[high] contains the pivot, so i will stop advancing when i == high (if not sooner). There is no such assurance for j, though, so the "do { j-- }" loop explicitly tests whether "j > low" before retreating. Postscript ---------- The journal "Computing in Science & Engineering" did a poll of experts to make a list of the ten most important and influential algorithms of the twentieth century, and it published a separate article on each of the ten algorithms. Quicksort is one of the ten, and it is surely the simplest algorithm on the list. Quicksort's inventor, Sir C. A. R. "Tony" Hoare, received the ACM Turing Award in 1980 for his work on programming languages, and was conferred the title of Knight Bachelor in March 2000 by Queen Elizabeth II for his contributions to "Computing Science."