From dawagner@flagstaff.princeton.edu Mon Sep 25 22:20:55 EDT 1995
Article: 40389 of sci.crypt
Path: cnn.Princeton.EDU!flagstaff.princeton.edu!dawagner
From: dawagner@flagstaff.princeton.edu (David A. Wagner)
Newsgroups: sci.crypt
Subject: Re: Weak Keys in RC4
Date: 26 Sep 1995 02:20:37 GMT
Organization: Princeton University
Lines: 83
Message-ID: <447o1l$cbj@cnn.Princeton.EDU>
References: <43u1eh$1j3@hermes.is.co.za>
NNTP-Posting-Host: flagstaff.princeton.edu

I had also been thinking about a (totally different) class of RC4 weak
keys, and since you posted your analysis, I thought I'd pass on my
method too -- maybe it'll be interesting to compare the different places
we ended up at, even though we started with the same basic observation.

I was considering a simplified-exportable-RC4-SSL-variant like this:

A -> B: K[0..9], RSA-Encrypt(K[10..15])
A -> B: RC4(K[0..15]) xor plaintext

Here K[a..b] means bytes a to b of the 16 byte key K[0..15].  So in
my model, I was wondering what happens if an adversary can modify K[0..9].

It turns out that by carefully choosing K[0..9], one can set up the
RC4 key scheduling so that 2 or 3 known plaintext bytes reveal most of
the rest of the RC4 key.  (I had been thinking of this as a related-key
attack, but now that I read your post, I think the weak-keys-formulation
is conceptually nicer!)  Unfortunately, my weak keys are a bit more
ugly than yours -- do yours ever reveal information about K[10..15]?

Of course, my weak keys don't break the real SSL, since SSL sends
K[0..9] in the clear, and then uses MD5(K[0..15]) as a RC4 key.  So
the only moral of this story is ``thank god SSL hashes before using
RC4''.

In article <43u1eh$1j3@hermes.is.co.za>,
Andrew Roos <andrewr@vironix.co.za> wrote:
> A.   Given a key length of K bytes, and E < K, there is a 37% probability that
>      element E of the state table depends only on elements 0..E (inclusive) of
>      the key. 

I too independently noticed this, and it's the basis of my attack
as well.

I'll try to control the value of state[0], ..., state[9] after the
key scheduling algorithm: call it S[0..9].  In particular, we want to
set things up so that a ciphertext byte depends upon only S[0..9] and
one key byte.

Note that the first output ciphertext byte from RC4 is
	S[S[1] + S[S[1]]]
(and it swaps state[1] and state[state[1]]).  I want to set S[] up so that
	S[1] <= 9, i.e. S[1] + S[S[1]] is known.
Now note that knowledge of S[10] and K[0..9] can be used to derive K[10]
with high probability (by observation <A> above).  Therefore let's set up
S[] so that
	(S[1] + S[S[1]]) % 256 = 10,
which means that the first byte of ciphertext tells us S[10].

For example, suppose K[0] = 10, K[1] = 255.  Then S[0] = 10 & S[1] = 0
(so S[1] + S[S[1]] = 10) with probability 1/e^2 = 14%.  Also note that
	S[10] = known-state[K[10] + known-value]
with probability about 1/e = 37%, so knowledge of S[0..10] often gives
away the value of K[10].  Thus we can easily pick K[0..9] so that S[]
has the desired form -- then K[10] can be found once out of e^3 = 20
trials, and K[11..15] can be found with a brute force search over the
remaining 2^32 possibilities, for a total complexity of 2^36 << 2^40.


This can be improved a bit to reveal more K[] values.  Let
	T[n] = S[1] + S[2] + ... + S[n].
Suppose we ensure that
	T[1] < 10,	S[1] + S[T[1]] = 10
	T[2] < 10,	S[2] + S[T[2]] = 11
	T[3] < 10,	S[3] + S[T[3]] = 12;
then in this case, the first 3 ciphertext bytes will reveal K[10..12]
with high probability.

We can get the first 3 bytes to satisfy the criterion.  Suppose
	K[0..6] = {10,-1,-7,2,0,-13,3};
then we will have
	S[0..6] = {10,0,5,1,x,6,11}
with probability 1/e^6 = 2.5%.  If that holds, then the first 3 ciphertext
bytes will reveal K[10..12] with probability ~ 1/e^3.  So 1/e^9 = .01% of
the time we can find the rest of the key with a brute force search over
2^16 possibilities, for a total complexity of 2^16 e^9 = 2^29 << 2^36.
(This doesn't take into account the fact that often the other 99.99% of
the time may reveal partial information about K[10..12].)


I hope I didn't make any arithmetic mistakes in there.  Anyhow, this is
of very limited practical interest, but maybe someone'll be able to expand
it to be more interesting.