261 Scribe notes. Date: March 3, 2021. Scribe: Rikhav Shah 2nd half of class presenter: Mayunk Main topic: dealing with untrusted databases. The primary data structure in this area is the Merkle tree. In this data structure, each node contains a (collision resistant) hash of its children. Then the root of the tree is cryptographically signed. The co-path of a node X is all the nodes in the root path of X and their immediate children. There are two ways to categorize the requirements of the structure. * readonly vs mutable * single vs multiple clients Consider readonly data structures (with single or multiple clients, DNS is an example of the latter). The Merkle tree works here. We can support a number of different queries. * Find item: return the copath of a node * Show no item exists: use a Merkle tree with sorted keys, return the copaths of the predecessor and successor values * Find all items in a range: again return the copaths of the predecessor and successor * Store an object or key/value pairs (rather than just ints): serialize the keys and sort the Merkle tree nodes by nodes * Store a table where each item has multiple attributes you want to query by: if there are k columns then keep k Merkle trees. For the ith tree, store key value pairs where the key is the column value and the value is an id of the object (or the row number). We can generalize the Merkle tree to support any (acyclic) *pointer* based data structure, like a linked list, with just O(1) factor overhead. Each node contains a hash of all the nodes it points to. We can also accommodate indexed-based data structures (like arrays) by converting them to binary trees (with log n overhead). It's unclear if we can accommodate all pointer-based data structures with cycles (though we can get simple structures, like a linked list where the tail points back to the head/entry). There's a sub-field called "accountable data structures" Say the gov is running a lottery among its citizens, and wants to prove to the public that the lottery was fair. How can they do so? Example: How Prof. Wagner's wife was able to get out of Romania -- exit visas were issued by a lottery, but the lottery turned out to be rigged. After years of never getting an exit visa, her family was finally given one. It appears this is because one of her father's patients was a party official. Govt constructs a Merkle tree where all the nodes are people who entered the lottery. Govt can prove that a given citizen is indeed included in the tree by providing their co-path. The govt can try to prove that the winning citizen was not entered more than once by providing the copaths of their predecessor and successor and somehow guaranteeing that the citizens are sorted (so that duplicate entries would have to be consecutive). If they sample a random number to be winning index, then they'd need to prove that that number was indeed random. They could attempt to do this by using a hash of the root hash as the random value. However, they could append dummy entries to the end of the tree to obtain new "random" values, until they arrive at one that is favorable to them. This is a hard problem! Now consider mutable data structures. To mutate a node, think about as creating a copy of the node with the new value and re-compute all the hashes along the root-path, which just needs the values of the hashes on the co-path. MAYUNK'S PRESENTATION (CONIKS paper): Say Alice wants to send a message to Bob using a particular server. Protocol 1: Alice encrypts her message with the servers public key and sends it to the server, the server decrypts it and encrypts it with Bob's public key and sends it to him. (this is similar to TLS) Downside: the server learns the message Protocol 2: Alice requests Bob's public key from the server. Alice then encrypts her message with it and sends it to the server to send to Bob. Downside: the server can lie about Bob's key and send its own key, i.e., the server can mount a man-in-the-middle attack. Protocol 3: Alice first verifies the server gave her the right key by checking directly with Bob Downside: if she could do that, might as well communicate entirely with Bob directly! Protocol 4: Bob requests their own key and sees if the server is being honest. If they aren't, Bob will announce to the world that the key provided by the server is wrong. Downside: server can send Alice and Bob something different. This problem is called "equivocation." It sends Bob his correct own key, but Alice a different key. Protocol 5: the "CONIKS" solution. Compute a Merkle tree over the database of public keys and Alice and Bob (and other users) just verify the root hashes with each other (rather than the full database). Every day, the server publishes the current version of the database, including the previous database's root hash to ensure continuity. If the server decides to equivocate, say it sends everyone in group A version A of the database, and likewise for group B. Then if anyone in group B checks with anyone in group A, the users will discover the equivocation. Therefore, each user should randomly pick a handful of other users to check their database with. Because each publication contains a hash of the previous publication, once the server decides to equivocate, it cannot merge the fork. Therefore, with high probability, the users will discover a fork after just a few days.