CS 261 Homework 1
Instructions
This problem set is due Friday, 14 September.
Work on your own for this homework.
You may use any source you like (including other papers or textbooks),
but if you use any source not discussed in class,
you must cite it.
Question 1
This homework asks you to design and implement a HTML filter so
that I can safely view untrusted HTML content.
This method must not harm my machine even through these pages
come from an untrusted source, and even though my web browser is too
complex for me to have full faith in its ability to safely handle
totally untrusted HTML pages.
The filter might be used, for instance, for filtering and
viewing HTML email before displaying it.
You're going to write me a sanitizing filter that I can use
something like this:
./htmlfilter < scarystuff.html > safe.html
firefox safe.html
I have two goals:
- Security:
- This procedure must not, under any circumstances, cause any harm
to my system.
Ideally, using this procedure to view HTML files should be as harmless
as viewing an ASCII text file with, say, /bin/more;
note that even if an attacker supplies the entire contents of an ASCII
email, viewing it with /bin/more cannot harm my machine,
so /bin/more is in some sense the gold standard.
In particular, viewing untrusted content using your HTML filter and my
favorite web browser should not cause any lasting side effects to my machine;
it should not leak any confidential information (e.g., the contents of
files on my hard disk; or, information about what I'm viewing in another
window with the same browser); and it should not endanger the integrity
of my machine (e.g., tampering with a different web document that I'm
viewing in another window using the same browser).
Your scheme must not only be secure; it must also be verifiably secure.
You will have to provide an assurance argument why it is reasonable to
believe that your filter achieves this goal.
- Functional:
- In an ideal world, your filter would allow me to view as much
of the HTML content
as possible -- except where this would conflict with the previous
requirement, in which case security is more important than functionality.
For instance, a filter that ignores its input and always outputs the
empty HTML page is not very useful.
Thus, your solution should be at least minimally useful for viewing
the textual content of HTML emails.
However, I don't really care whether I get to see pretty pictures,
dancing pigs and other fancy decorative stuff or not.
Also, feel free to keep your implementation simple and to omit support
for complex functionality.
This is intended only as a proof of concept exercise.
To keep this homework problem tractable,
you can err on the side of omitting functionality in your
implementation (though it might
be nice if your approach can be generalized to support as much
functionality as possible).
Security matters more than functionality; my threshold for security will
be pretty high, while my threshold for functionality will be very low.
I want you to come up with a design, implement it, document your basic
architecture and assurance argument, and submit both the document and the
code.
Your submission should contain at least three files:
- README
- Document the basic architecture you've used and the theory of operation
for your scheme.
Sketch the assurance argument why one should expect your scheme
to be secure.
This should be an ASCII text file, and it doesn't have to be too
lengthy; a page or so should be enough.
You might want to describe both the policy you are enforcing
(e.g., the restrictions you're trying to place on the HTML content)
as well as the method you're using for enforcing that policy
(e.g., the implementation strategy for ensuring that the restrictions are
fully and accurately enforced).
- Makefile
- A Makefile with everything needed to compile your program.
If I run make, it should do everything needed to compile your
program and finally generate in the current
directory an executable file called htmlfilter.
This program should read an untrusted HTML file from stdin and write a
sanitized HTML file to stdout.
- Source files
- Include any source files needed to build the executable.
Don't include the executable itself; I will run make myself.
You can use pretty much any well-supported language you like
(e.g., C, C++, Java, Perl, Python, Ruby, ML, OCaml, bash script)
as long as it will work on my Linux system.
However, to avoid any difficulties,
please take care to make your program as portable as possible.
I encourage you to test your code on the EECS instructional Linux
servers (ilinux1.eecs.berkeley.edu, ilinux2.eecs.berkeley.edu,
etc.).
From within the directory where the above files are found, run
tar cf your-lastname.tar .
Then, email this file as an attachment to cs261hw1
at taverner.cs.berkeley.edu by the due date.
Because I will be using automated scripts to run your programs,
I ask you to follow the above instructions carefully.
To help demonstrate the format, here is reference code that demonstrates
the required format: ref.tar.
Feel free to keep your implementation simple.
If you are writing more than a few hundred
of lines of code, you're probably working too hard.
Some hints: You may want to review a HTML primer or reference document
to refresh your memory about the format of HTML and the semantics of various
aspects of HTML. You'll probably need to do something about Javascript and
other executable content, as by default it can cause side effects and
violate the security policy outlined above.