CS 61B: Selected Solutions (Practice for the Final Exam) [1] [a] insertion sort 71808294 17808294 01788294 01278894 01247889 [b] selection sort 71808294 01878294 01278894 01248897 01247898 01247889 [c] mergesort 71808294 17,80,8294 17,08,8294 17,08,28,94 17,08,28,49 0178,28,49 0178,2489 01247889 [d] quicksort 71808294 21808794 21088794 210|4|8798 0|12|4|8798 012|4|7898 012|4|7|8|98 01247889 [e] heapsort 71808294 \ 81807294 | 82807194 | bottomUpHeap() 82897104 | 82897140 / 08897241 01897842 01298874 01249887 01247898 01247898 01247889 [2] Extend each item so that it has a "secondary key," which is the index of the item in the initial array/list. If two items have the same primary key, the tie is broken using the secondary key, so no two items are ever considered equal. 5 8 7 5 8 8 3 7 => 5/0 8/1 7/2 5/3 8/4 8/5 3/6 7/7 [3] [a] DFS: abdcegfhi BFS: abcdegifh [b] DFS: efihgcdba [c] gh, ac, hi, ab, cd, cg, fg, ce (cg may come before cd instead) [4] [a] [i] O(|V|^2) [ii] O(|V|) [b] Visited in preorder. Finished in postorder. [c] With BFS, it's done exactly the same as with DFS. (See Homework 9 for a description of how it's done with the latter.) [d] for (each vertex v in the graph) { if (v has not been visited) { increment the count of connected components perform DFS on v, thereby marking all vertices in its connected component as having been visited } } This algorithm requires that you don't "unmark" the marked vertices between calls to DFS. [5] [a] 3 12 / \ / 1 12 3 /\ / / \ 0 2 11 1 11 / /\ / => 5 => 0 2 5 / \ / \ 4 7 4 7 / \ / \ 6 9 6 9 / \ / \ 8 10 8 10 [b] 4.5 / \ 3 5 / \ \ 1 4 11 / \ / \ 0 2 7 12 / \ 6 9 / \ 8 10 [c] 3 3 10 / \ / \ / \ 1 5 1 10 3 11 /\ /\ /\ /\ / \ \ 0 2 4 11 0 2 5 11 1 5 12 / \ / \ \ /\ /\ => 10 12 => 4 9 12 => 0 2 4 9 / / / 9 7 7 / / \ / \ 7 6 8 6 8 / \ 6 8 [d] 3 3 10 / \ / \ / \ 1 5 1 5 5 11 /\ /\ /\ /\ / \ \ 0 2 4 11 0 2 4 10 3 7 12 => / \ => / \ => / \ / \ 7 12 7 11 1 4 6 8 / \ / \ \ / \ 6 10 6 8 12 0 2 / 8 [6] [a] If the find operations are done first, then the find operations take O(1) time each because every item is the root of its own tree. No item has a parent, so finding the set an item is in takes a fixed number of operations. Union operations always take O(1) time. Hence, a sequence of n operations with all the finds before the unions takes O(n) time. [b] This question requires amortized analysis. Find operations can be expensive, but an expensive find operation is balanced out by lots of cheap union operations. The accounting is as follows. Union operations always take O(1) time, so let's say they have an actual cost of $1. Assign each union operation an amortized cost of $2, so every union operation puts $1 in the bank. Each union operation creates a new child. (Some node that was not a child of any other node before is a child now.) When all the union operations are done, there is $1 in the bank for every child, or in other words, for every node with a depth of one or greater. Let's say that a find(u) operation costs $1 if u is a root. For any other node, the find operation costs an additional $1 for each parent pointer the find operation traverses. So the actual cost is $(1 + d), where d is the depth of u. Assign each find operation an amortized cost of $2. This covers the case where u is a root or a child of a root. For each additional parent pointer traversed, $1 is withdrawn from the bank to pay for it. Fortunately, path compression changes the parent pointers of all the nodes we pay $1 to traverse, so these nodes become children of the root. All of the traversed nodes whose depths are 2 or greater move up, so their depths are now 1. We will never have to pay to traverse these nodes again. Say that a node is a grandchild if its depth is 2 or greater. Every time find(u) visits a grandchild, $1 is withdrawn from the bank, but the grandchild is no longer a grandchild. So the maximum number of dollars that can ever be withdrawn from the bank is the number of grandchildren. But we initially put $1 in the bank for every child, and every grandchild is a child, so the bank balance will never drop below zero. Therefore, the amortization works out. Union and find operations both have amortized costs of $2, so any sequence of n operations where all the unions are done first takes O(n) time. [7] [a] insert 4, 2, 1, 3, and 6 in that order. [b] insert 4, 5, 6, 1, 3, and 2 in that order. [8] [a] If you need inexact matches. For example, if you want to find the item less than or equal to 5 that's closest to 5. Hash tables can only do exact matches. (If exact matches are all you need, however, hash tables are faster.) [b] If each single operation absolutely must run in O(log n) time. OR If most operations are find()s, and the data access patterns are uniformly random. (2-3-4 trees are faster for these operations because they don't restructure the tree. But splay trees do better if a small proportion of the items are targets of most of the finds.) [c] If memory use is the primary consideration (especially if a 2-3-4 tree holding all the items won't fit in memory). [d] None. find() and remove() on a heap take worst-case Theta(n) time, and they're more complicated than in an unordered array. insert() on a heap takes worst-case Theta(log n) time, versus Theta(1) for an unordered array. [e] When you don't need to find the minimum key. [10] [a] A A E C S R T U D T U T R [b] A A E C S R R (Note that two nodes are different than in [a].) U D T U T T [12] DATASTRUCTURE DACASTRUTTURE DACA|E|TRUTTURS DACA|E|RRUTTUTS DACA|E|RR|S|TTUTU DACA|E|R|R|S|TTUTU ^ [13] Radix sort takes b/log2 r passes, so the overall running time of radix sort is n + r t = b (ln 2) ----- ln r To find the value of r that minimizes t, set dt/dr to zero. dt ln r - (n + r)/r -- = b (ln 2) ---------------- = 0 dr (ln r)^2 Therefore, ln r = (n + r)/r. Given that n = 493, with a calculator and some trial-and-error you can determine that r = 128 is the optimal radix. [14] [a] z: consider the case where, half-way through the sort, the last key in the "sorted" list is changed to a very low number. 1 3 5 7 9|8 4 2 6 10 zap! 1 3 5 7 0|8 4 2 6 10 Using an in-place insertion sort implementation that searches from the end of the sorted array, the remaining keys will never get past the zero. 1 3 5 7 0 2 4 6 8 10 Note that if a key in the "unsorted list" is zapped, no harm is done at all. [b] y: If an item in the "sorted list" is zapped, only that one item is affected. If an item in the "unsorted list" is zapped to a value lower than the last item in the sorted list, that item will be out-of-place, but other items are still sorted. [c] z: Consider merging two lists, where the first item in one of the lists gets zapped to a very high value. You'll wind up with two consecutive sorted portions. (After further merge operations, there will still be two consecutive sorted portions.) /== 100 3 5 7 9 11 merge \== 2 4 6 8 10 12 [d] y: Radix sort uses no comparisons at all, so the zapped item doesn't affect how the others are ordered.