CS 61B: Data Structures (Spring 2005)
Final Exam

### Solutions

Problem 1. (8 points) A Miscellany.

a. No. Simply setting the head field to null makes the listnodes unreachable, so a mark & sweep garbage collector can reclaim all of them.

b. This hash code does not differentiate between two different cells for which i + j is the same, so moving a chip along a diagonal does not change a board's hash code. Thus, collisions are likely.

c. Use eight hash tables--one for each category. You can tell what category a word is in (if any) simply by checking for its presence in each of the eight hash tables. (Each hash table's size needs to be chosen proportionately to the number of words in each category, so that each table will have the same load factor as one big hash table would. That way, the eight hash tables offer O(1)-time operations and take up the same amount of memory as one big table would.)

d.
Second
Fourth

(Optionally, you can also include an exception message at the end. Since this is just a code fragment, we don't know whether or not the exception is caught somewhere.)

e. Insertion sort and quicksort. (We gave a full point to people who also circled bucket sort, because a bucket sort implementation might inspect the range of the input keys before deciding how many buckets to use.)

f. 8.

Problem 2. (6 points) Operations on Data Structures.

a. b. c. Problem 3. (12 points) Sorting.

a.
5 9 7 4 0 2 8 8
5 8 7 4 0 2 9 8
5 8 7 4 0 2 8 9
0 8 7 4 5 2 8 9
0 2 7 4 5 8 8 9
0 2 4 7 5 8 8 9
0 2 4 5 7 8 8 9

b. All of the keys might go into the first list (e.g. if all the keys are equal). Then this quicksort algorithm will call itself recursively with the same list it started with, and it will repeatedly call itself forever, never terminating.

c. The worst-case running time is in Θ(n2), and the best-case running time is in Θ(n log n).

d. Replace the binary search tree with a splay tree. If the input array is already sorted, then after each insertion, the root has no right child (because the root is the largest key inserted so far). Each time a key is inserted, it becomes the new right child of the root, then it is splayed to the root with a single zig rotation. Thus, each insertion takes O(1) time.

e. If radix sort uses 512 buckets, it inspects 9 bits of each key per pass (as 512 = 29). Java ints are 32 bits long, so radix sort will need to do four passes of bucket sort or counting sort. If radix sort uses 256 buckets, it inspects 8 bits per pass, and still only needs to do four passes. However, sorting with 256 buckets is a bit faster per pass, and takes less memory, whereas there is no advantage to using 512 buckets at all. Therefore, you would never choose 512 buckets over 256 buckets for 32-bit keys.

(It's worth mentioning one misconception we saw on the exams. Some people thought that it's an inherently bad thing if the final pass of radix sort doesn't have 9 bits left to inspect. It's not. Depending on the number of items n you are sorting, it might make a lot of sense to use 2048 = 211 buckets, even though the final pass will only have 10 bits left to inspect. This is the smallest number of buckets that allows you to do just three passes, so it's the right choice for some values of n.)

f. Here are two different correct answers.

• A key observation is that bottomUpHeap is a comparison-based algorithm that transforms any array into a binary heap in O(n) time. Therefore, if the Professor is correct, then we can create an O(n)-time comparison-based sorting algorithm (for any list of keys, not just a binary heap) by putting bottomUpHeap and the Professor's algorithm together. But an O(n)-time comparison-based sorting algorithm cannot exist. (Note that this argument would not work if bottomUpHeap had Θ(n log n) worst-case running time; its linear running time is central to the argument.)
• Although the items in a heap are partially sorted, the leaves of a heap are not. For example, if all the internal nodes of a heap are zeros, and all the leaves are unique positive numbers, we can permute the leaves any way we like, and it's still a valid binary heap. At least half the nodes in a binary tree are leaves, so there are at least (n/2)! such permutations. So sorting the keys in an n-key binary heap is at least as hard as sorting n/2 keys from scratch, which takes Ω((n/2) log (n/2)) = Ω(n log n) time.
(A lot of exam-takers had the misconception that it doesn't mean anything for a list of keys to be ``partly sorted,'' and that the Ω(n log n)-time bound for comparison-based sorting applies even to partly sorted keys. But remember, we can merge two sorted lists of keys into a single sorted list in O(n) time; that's an example of beating the Ω(n log n) bound with keys that are ``partly sorted.'' So the Professor's claim wasn't completely far-fetched, and it requires a thoughtful argument to debunk it.)

g. This problem is known as topological sorting of a directed acyclic graph. The key observation is that we can only output a vertex after all the vertices that point to it have already been written to the output. So assign each vertex two fields: indegree, which records the indegree of the vertex (ignoring self-edges), and prerequisites, which records the number of vertices that point to this vertex and have been written to the output. We can initialize every prerequisites field to zero, and initialize every indegree field by looping through all the edges once. Then we create a list L, which will contain all the vertices for which indegree = prerequisites. Initialize L with all the vertices that have an indegree of zero. Repeat the following step until L is empty: remove a vertex from L, write it to the output, and increment the prerequisites field of every vertex in its adjacency list.

Problem 4. (5 points) Augmented Binary Search Trees.

There are several ways to answer this problem, and the three parts of the solution need to match each other, so here's just one possibility.

a. At each node, store the size of the subtree rooted at that node.

b. Search for the key k (in the usual manner for the find method in a binary search tree). As you walk down the tree, at every node n where you go to n's right child, add the size of the subtree rooted at n's left child to a running total, plus one for n.

When you reach the node that contains k, treat it the same way: add the size of the subtree rooted at that node's left child, plus one for k. Return the sum.

c. First, insert the new key k as usual. If k is already in the tree, do nothing. Otherwise, create a new leaf for k and set its size field to one. Walk back up the tree (or down the tree a second time) and increment the size field of all k's ancestors.

Problem 6. (6 points) Asymptotic Analysis.

a. O(x log y + 4y). (Important note: you cannot replace 22y with 2y. Constants in exponents cannot be tossed out.)

b. We divide the proof into two separate cases: either xy, or x < y.

In the first case, we have

x2 + 10xy + y2
x2 + 10x2 + y2
≤ 11 (x2 + y2)
for all values y ≥ 0 and xy, so x2 + 10xy + y2 ∈ O(x2 + y2) in this case.

The second case, where x < y, is symmetric.

x2 + 10xy + y2
x2 + 10y2 + y2
≤ 11 (x2 + y2)
for all values x ≥ 0 and yx, so x2 + 10xy + y2 ∈ O(x2 + y2) in this case too.

Problem 6. (6 points) Amortized Analysis.

a. The smallest possible amortized cost of enqueue is \$2. For dequeue and multiDequeue, we accepted answers of \$0 or \$1.

b. This accounting ensures that the following invariant always holds: the number of dollars in the bank is (at least) the number of items in the queue. We can verify that each operation maintains this invariant as follows.

An enqueue operation increases the size of the queue by one item. It costs one actual dollar and puts one dollar in the bank, which is enough to maintain the invariant.

A dequeue operation decreases the size of the queue by one item. Thus, the operation can withdraw one dollar from the bank, which is enough to pay for the operation's actual cost, giving an amortized cost of zero. (Some people charged an amortized cost of \$1 to cover the cost of throwing an exception if the queue is empty. We weren't picky about this when grading.)

A multiDequeue operation decreases the size of the queue by either its parameter i or the number of items in the queue, whichever is less. The actual cost of the operation is also the lesser of these two, so the operation withdraws from the bank exactly the number of dollars needed to cover its actual cost. Thus, the amortized cost is zero. (Again, one might charge \$1 amortized to cover exceptions on an empty queue.)

By induction on the sequence of operations, all the operations maintain the invariant that the number of dollars in the bank is (at least) as great as the number of items in the queue. Therefore, the bank balance never becomes negative.

c. No, we cannot. If you start with an empty queue and call multiEnqueue(item, i) once, it spends Θ(i) time inserting i copies of item. Since only one operation has taken place since the empty queue was created, its amortized (average) running time is in Θ(i). Since i can be arbitrarily large, the amortized running time is not in O(1).

Problem 7. (7 points) Index Sort.

```
public static Thing[] indexSort(Thing[] x) {     // Returns sorted copy of x.
Thing[] y;
y = new Thing[x.length];

for (int i = 0; i < x.length; i++) {
int index = 0;
for (int j = 0; j < x.length; j++) {
if ((x[j].key < x[i].key) || ((x[j].key == x[i].key) && (j < i))) {
index++;
}
}
y[index] = x[i];
}

return y;
}
```

Mail inquiries to cs61b@cory.eecs