CS 294 Projects

General

Your term project should address some issue relevant to this class. There are two categories of projects: You will also be expected to give a short presentation on your project in class.

Collaboration

Study projects will be done individually. If you want to do a research project, you can work in teams of two if you like (this is intended to make up for the fact that research is likely to be harder than independent study).

Topics

Project topics should be relevant to this class. This will be broadly construed, and you're more than welcome to pick a topic relevant to your current research, so long as it has some connection to the class and is approved by me. The idea is to give you a lot of freedom to study any topic of interest to you that wasn't presented in class.

If you're at a loss for a project topic, I've prepared a list of possible project topics that you can peruse as examples of how to a pick a suitable project. See below. But don't feel limited to these suggestions! They are intended only as examples.

If you just want to peruse the literature, a good place to start is the conference proceedings of the Fast Software Encryption workshop, and (to a lesser degree) CRYPTO and EUROCRYPT. These are all available in one convenient location in the Engineering library. Some papers can be found online at Springer's LINK service, through the library's INSPEC database (use Melvyl online), or at Citeseer.

Proposals

When you have chosen a project topic, please send me email at daw@cs.berkeley.edu describing your proposal. The email should contain the title of the project and a short (one-paragraph) description of the topic. For study projects, please also include the list of papers you are planning to read.

Your project proposal is due April 12th.

Presentations

You will give a brief presentation during the last two weeks of class. Think of this as a chance to teach the rest of the class about what you learned. The time slots will be necessarily brief (probably about 10 minutes each). I will assign the time slots (possibly giving some small reward to people who pick the most challenging project by allowing them to present during the second week rather than the first).

Final reports

The final report is due May 20th. This is a strict deadline. You may submit your project report electronically or on paper. I prefer electronic submission, although you may choose either. In either case, the deadline is the same: Monday, May 20, before 9:00am.

If you submit the final report electronically, it must be in a format which is easily readable on Unix platforms: that means HTML, Postscript, or PDF is fine (but not Microsoft Word). If you submit on paper, place it in David Wagner's mailbox in Soda Hall (in the mailroom, or outside his office: 765 Soda).

Form of the final report

There is no page limit (either minimum or maximum), and reports will be evaluated on technical content (not on length), but I expect that a typical report would be about 5 to 10 pages long.

You may find it convenient to write your report in LaTeX, which includes support for a good deal of mathematics. LaTeX is present on most Unix system and is part of standard Linux distributions. There are several free ports of TeX and LaTeX to Windows, including MiKTeX. You can also write up the project report with some other word processor, provided mathematical notation comes out reasonably readable.

Advice on writing

If you are not familiar with writing papers in computer science (or even if you are), the following resources may help:

Example study projects

Higher-order differential cryptanalysis
Learn about this algebraic attack, and discuss its relevance to cryptanalysis of block ciphers (or stream ciphers, or public-key ciphers).
Berlekamp-Massey and continued fractions
The Berlekamp-Massey algorithm for cryptanalysis of LFSR's has a close connection to continued fractions in the ring of polynomials GF(2)[t]. Study and explain. Discuss the security of the following simple stream cipher: the key is two n-bit relatively-prime integers p,q, and the keystream output is given by the binary expansion of p/q.
Hash functions
Learn about the design of cryptographic hash functions. For instance, you might read about the NSA-designed hash function SHA, and learn about about why they published a minor tweak a few years after its design. (The original hash is now commonly known as SHA0, and the new one as SHA1. See Chabaud and Joux's paper in CRYPTO'98 for more.) Or, you might learn about the history of MD2, MD4, and MD5, and cryptanalysis of these schemes (particularly Hans Dobbertin's work).
Non-traditional public-key encryption
There have been lots of public-key cryptosystems proposed whose security is based on number theoretic problems like factoring or the discrete log. But what if these number theoretic problems turn out to be easy? We saw a few algebraic schemes in class, and you might survey all such schemes that have been proposed, which have been broken (by what types of attacks), and which remain unbroken. Or, if you're more ambitious, you could survey all non-number-theoretic schemes (e.g., braid groups, NTRU, error-correcting codes, lattices, etc.).
Historical ciphers
Examine work on the security of various historical ciphers (such as the German Enigma). What lessons do they have for modern cryptography?
Hidden Markov models
Learn about HMM's. What applications to cryptography or cryptanalysis might they have?
Statistics
Learn about some interesting subject in statistics of relevance to cryptography. One example might be hypothesis testing, an area that has been well studied and that has obvious relevance to cryptography (whenever we ask that a cryptosystem be indistinguishable from an idealized version given at most q observations, we are asking that hypothesis testing be difficult for the hypotheses H0 = real cipher, H1 = ideal cipher).
Hands-on cryptanalysis
Pick some attack (e.g., differential cryptanalysis), pick a cipher that is vulnerable to it (e.g., DES reduced to 8 rounds), and implement the best key-recovery attack you can on that cipher. Be sure your implementation is able to recover keys, not just distinguish it from random. Test your implementation, and compare its observed performance to the theoretical predictions.
Independent verification of security claims
Pick some cryptographic primitive proposed in the literature in the past ten years. (I suggest looking at unbroken submissions to the AES or NESSIE competitions, but you could look elsewhere if you like.) The designers have probably made lots of verifiable claims about its security, such as that there is no differential characteristic with probability larger than 2^-64 for more than 8 rounds. Pick one, and independently verify it. (If necessary, you can verify some weakened, but still interesting, version of their claim.) What algorithms did you need to use?
Implementation attacks on cryptosystems
Learn how implementations (rather than algorithms) can fail. For instance, you might learn about timing attacks, power analysis, TEMPEST/HIJACK/NONSTOP, or other side-channel attacks. Or, you might read up on fault attacks and other active modification attacks.
Starting points for further reading: Ross Anderson's Security Engineering, Paul Kocher's work, Boneh, DeMillo, and Lipton's work.
Solving systems of probabilistic linear equations
Consider a system of m linear equations in n unknowns, with the twist that each equation only holds with some probability p. You may assume that the probability p is known. If you like, you can consider the general case where p may be different for each equation, but the special case where p is the same for all equations may be of particular interest. Survey known algorithms for this problem and its relevance to cryptography. You might look at algorithms for finding low-weight codewords in random linear codes (with relevance to some public-key systems, like McEliece's scheme). You might look at algorithms for learning parity with noise (possible applications to fast correlation attacks on filtered LFSR's?). You might look at ad-hoc algorithms used in fast correlation attacks on filtered LFSR's with low-weight feedback polynomials, and the special case of systems of sparse linear equations. You might even compare to theoretical algorithms for reconstructing a multivariate polynomial q given a noisy oracle for q (e.g., Goldreich, Rubinfeld, Sudan, FOCS'95); in the degree 1 case, this is a related problem. You could also look at the information-theoretic lower bound on when a unique solution is likely to exist, ignoring the problem of how to find it efficiently.

Example research projects

Berlekamp-Massey for non-consecutive bits of known text
Given 2n consecutive bits of output from a n-bit LFSR, the Berlekamp-Massey algorithm efficiently infers the LFSR parameters. Since the initial state and feedback polynomial are both secret, there are 2n bits of key, and Berlekamp-Massey is effectively optimal. The research problem: can this be extended to the case of output bits at arbitrary locations (not necessarily consecutive)?
Implementation failures in cryptography
Investigate implementation flaws in cryptosystems (see above for examples). Two ideas:
Time-space tradeoffs for non-uniform distributions
If we have a function f:X->Y and we're given y=f(x) for some unknown x distributed uniformly on X, Hellman's time-space tradeoff gives a way to do a lengthy precomputation (based only on f) that later allows us to quickly invert f (to compute x from y). Extend his result to be more efficient in the case where x has some non-uniform (but known) distribution on X. Is there a single parameter of the distribution that characterizes how well the generalized time-space tradeoff performs?
Lower bounds on time-space tradeoffs
Can you prove any lower bounds on precomputation algorithms, like Hellman's? You might consider algorithms that are given f, run in time T, output a table of space S, then later are given f(x) and find x in time T' and with probability p. For what parameters T,T',S,p are solutions possible? One way to formalize this might be to consider algorithms that are given only oracle access to f. The natural distribution on f is probably that of a function chosenly uniform at random from X to X, but you could potentially consider other distributions, such as a random permutation on X.
Cryptanalyze a chaining mode
Analyze the security of EPBC, a chaining mode from Crypto & Coding '97 (LNCS 1355, Springer-Verlag).

David Wagner, daw@cs.berkeley.edu, http://www.cs.berkeley.edu/~daw/.