CS 261 Homework 2

Instructions

This problem set is due Friday, October 7, at 5:59pm.

Work on your own for this homework. You may use any source you like (including other papers or textbooks), but if you use any source not discussed in class, you must cite it.

You have three options on this homework. Select one option, and solve that one. Submit your homework solution by emailing it to cs261hw2@taverner.cs.berkeley.edu.

Option 1: Analyze three HW1 submissions

You've been assigned three htmlfilter implementations that were submitted to me in HW1. Your goal: assess whether they implementations meet the security goals set out in HW1. (You do not need to review how well they meet the functionality requirements.)

To begin, I will email you your assigned implementations. Implementations are identified by a two-digit code (e.g., 17.tar); I will assign you three of those implementations. Download those three implementations from this directory. Critique the design and implementation of all three.

1.1: What is the two-digit ID number of your first assigned implementation? What are its main security weaknesses? Or, if you found none, what are the best features of its design/implementation?
1.2: What is the two-digit ID number of your second assigned implementation? What are its main security weaknesses? Or, if you found none, what are the best features of its design/implementation?
1.3: What is the two-digit ID number of your third assigned implementation? What are its main security weaknesses? Or, if you found none, what are the best features of its design/implementation?
1.4: If you were forced to choose between these three implementations, which one would you judge to be most likely to meet its security goals? Why?

Promise: Your answers on this homework will not affect the grades of anyone else. You can feel free to critique an implementation honestly and frankly without fearing that your comments will have any negative effect on that person's grade. Grades for HW1 will have already been assigned by the time I see your solution. I will not show your evaluation to the authors of your assigned implementations.

Update (10/7): Please do not use the solutions to HW1 or its testbed as a substitute for your own analysis.

Option 2: Solve some thought problems

Question 2.1

Suppose I come up with a super-sekrit ultra-c00l new attack on browsers. In particular, I have a special URL, and if I can get your browser to visit that URL, then you are totally owned: I can take control of your browser. List three (or more) ways I could cost-effectively get a large number of users to follow a link to my special URL.

Question 2.2

One of your officemates, Bob, keeps playing pranks on you. You decide to return the favor and play a prank on him. You've hatched a plan to spread a rumor that Bob's advisor has decided to quit academia and go to work in the Peace Corps. Knowing Bob, you're pretty sure that when Bob hears this rumor, at first he is going to dismiss it as not believable -- but if when Bob next visits his advisor's web page, the web page includes a message announcing his advisor's impending retirement, Bob is gonna freak out.

Describe a method you could use which would probably be successful in causing Bob to freak out. (Do make sure to apologize to Bob afterwards and buy him a good dinner in recompense!)

Question 2.3

This question asks you to explore some of the consequences of active networks, where packets can contain mobile code that is executed by the routers along the path.

For concreteness, we can think of "adaptive routing" as a sample application. If your TCP connection to France is too slow because of poor bandwidth on the transatlantic link and for some reason you happen to know that there is a much faster route to France via China, you might wish to adaptively update the route your TCP packets take. In this case, you would "push" some mobile code into each router along the way. The mobile code would run at each router before the packet is forwarded and select which interface to send it out over.

We describe below a series of extensions to the IP protocol suite which allows for progressively more sophisticated active networks applications. For each of the four parts below, list the security threats that might arise for that extension. The purpose of this question is to study issues that are inherent in the functionality; you may ignore the risk of implementation bugs such as buffer overruns.

In the simplest variant, we'd extend the IP packet format to allow an optional extra header which contains some mobile code to run at each router. The mobile code is specified using Native Client (NaCl) object code: i.e., in compiled code that has been SFI'ed according to the NaCl scheme. Each router which receives such a packet first verifies that the mobile code has been correctly sandboxed using the NaCl SFI rules, but with one additional restriction: the code must contain no backwards jumps or indirect jumps and no function calls or returns (the only branches allowed are a forward branch to a fixed address). The router sets up the NaCl code in memory and copies into its data region (1) a copy of the entire packet, and (2) a global list of interfaces available at the router. (Each interface in the list is annotated with a little bit of relevant information that can be read by the NaCl code, such as the IP address of the next hop along that interface.) The router then executes the NaCl code. Just before exiting, the NaCl code should store the name of the desired outbound interface at a fixed location in its data region. The router will forward the packet out via that interface on towards its destination.
One obvious performance issue with the previous scheme is that it requires an overhead of potentially hundreds of bytes of code in every packet. So we introduce the notion of "flows" to amortize the cost of specifying the mobile code. Each packet is associated with a flow. In TCP, the flow ID might be the (src host, dst host, src port, dst port) tuple. For other protocols, we might simply extend the packet format to allow for a 32-bit flow ID. We add a "set handler" IP option which allows endpoints to specify a single chunk of mobile code which will be run at the router every time a packet is received on the same flow. Thus one endpoint can send a packet with the "set handler" IP option and containing a lengthy chunk of mobile code; that mobile code will then be applied to all subsequent packets on that flow, and does not need to be sent again. This allows us to specify a chunk of mobile code once; then all subsequent packets in the flow will inherit the same code without incurring any bandwidth overhead.
It occurs to us that we might like to allow the mobile code to make routing policy decisions based on the payload of the packets, or even to compress packets for us on the fly when bandwidth is scarce. Since this might require scanning the entire packet and possibly interpreting higher-level protocols, we will need to be able to write loops in bytecode. Therefore, we eliminate the restriction on backwards jumps, and allow arbitrary control flow in the bytecode (subject to the NaCl SFI rules). To implement compression, the mobile code can modify its copy of the packet. When the mobile code exits, the router will read the (possibly modified) packet from the mobile code's data region, and then use that in place of the packet it originally received: i.e., if the mobile code modifies its copy of the packet, then the router will forward the modified packet, not the original packet. Also, we allow mobile code to maintain state across packet reception events. Thus, when mobile code exits after processing one packet, the router makes a copy of the entire data region, and restores this copy before executing the mobile code for the next packet in the flow.
An astute reader points out that decompression may increase the size of a packet. If this exceeds the network's MTU, mobile code that receives a compressed packet may need to send multiple packets containing decompressed data. Therefore, we extend the scheme so that mobile code can construct whole IP packets in their own memory space and invoke a special operation to send those packets over the wire.

Option 3: Do a security review of some software

For this question, you will practice doing a security evaluation of some interesting program. Pick any open-source application where security is relevant. The only requirement is that it must consist of at least 2000 lines of code. If you're at a loss for a program to audit, good choices might include network daemons, standard Unix utilities, setuid programs, clients that process data that came from the network (e.g., MP3 players, image viewers, etc.), or any web application. Or, you could look on Google Code project hosting for a software program to audit. Try not to pick an application that is too large or complex; a smaller, well-defined piece of software will make your life easier for this problem.

Then, do all of the following.

3.1: What application did you pick? Which version? Give the URL where I can download the source. How many lines of code is it? (One simple way to count the number of lines is using the sloccount program. For C programs, another way is to run "find . -name '*.[ch]' -print | xargs wc -l" from within the source directory.) Include all of this in your write-up.
3.2: Spend 1-2 hours familiarizing yourself with it: e.g., how to run it, what it does, reading documentation. Then, based upon your understanding of the program, describe the threat model and security goals for your program in your write-up.
3.3: Spend 1-2 hours to understand the architecture and organization of your program. (You may have to browse source code for this, because it is often not documented.) Draw a diagram depicting the program architecture at a high level. Your diagram might show one or more important components of the program and how they interact. It should also show any untrusted external entities or data that the program interacts with, any network communication channels it uses, and any potentially sensitive or untrusted files (or other data containers) that it stores or reads on persistent storage. Your diagram does not need to be complete; do your best to identify the most interesting or security-relevant parts, stopping after 1-2 hours. Include your diagram with the write-up you turn in to this homework.
3.4: Based on the diagram you drew and your examination so far, what portions of the program seem most likely to have the highest risk of security holes?

(If you want some tips on how to build a diagram and identify security risks from it, you could check out Microsoft's Introduction to threat modelling or the STRIDE methodology.)
3.5: Select one high-risk portion of the code (from your analysis in 3.4). Ideally, you'd choose a subset of at most a few hundred lines of code of the program. Spend 1-2 hours reading the code from this portion of the program, looking to see whether the program meets its security goals. This should include an intensive read of the source code, looking for common implementation errors. You don't have to audit all the high-risk code; instead, stop after 1-2 hours, and describe what you did review.
3.6: Write a summary of the results of your audit. Did you find any security holes or fishy-looking code? Could the code have been structured better for security? Describe your findings. Attach a copy of your summary to your write-up for the homework.

(To get the idea of what a summary for 3.6 might look like, you can refer to a report I wrote a while back, when reviewing a piece of code for fun.)
3.7: If you found any security holes, notify the code maintainer as well. Attach to your write-up a copy of your bug report you sent to the code maintainer and (if available) a URL in their bug tracker where I can view the report.

Turn in the entire write-up.