From dawagner Thu Feb 16 15:57:04 1995
Subject: Re: paper on collision search
To: wiener@bnr.ca (michael)
Date: Thu, 16 Feb 1995 15:57:04 -0500 (EST)
In-Reply-To:  <"14528 Fri Feb 10 09:32:00 1995"@bnr.ca>  from "michael" at Feb 10, 95 09:31:00 am
X-Mailer: ELM [version 2.4 PL23]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 3510      
Status: RO

I thought you might enjoy hearing that my implementation of
your algorithm have found this non-trivial collision in Unix's
crypt(3) password hash function:

crypt("2NGGMda3", "Hx") = "yX8CL2luKyI"
crypt("gnB9Gw1j", "s8") = "yX8CL2luKyI"
[after removing the salt which crypt(3) prepends to its output]

This was found after a total of 6.1 billion trial crypt()s;
since crypt() outputs a 64 bit hash, this is in close agreement
to the 5.4 billion predicted by the birthday paradox.

Since the distributed clients (the programs that were doing
the actual work, as opposed to the server which recorded dist.
points) were stateless, it was very easy to adapt the algorithm
to use only spare cycles (at night, for instance).  It came in
handy :-)

Your algorithm performed nicely, and the predictions were
matched the observed results closely.  Congrats!

By the way, I found the collision by iterating this f:

f("xxyyyyyyyyz") = crypt("yyyyyyyy", "xx")+2
[the +2 removes the salt "xx" which crypt(3) prepends to its output]

using the spare cycles of 27 Sun workstations (LXs and Sparc 2s).
Since f drops the last character of crypt()'s output (which
has 4 bits of entropy), I expected to see 16 pseudo-collisions
before finding a useful one; in fact, I saw a total of 19.
I got about 1310 crypts per second, so I figure it took about
1290 CPU hours, total.

I know that finding a collision in crypt(3) has no practical
significance, but it proved to me that the algorithm works as
advertised. :-)




Thanks for your previous email, by the way:
> 
> When you say that you are finding n collisions, I assume that you
> mean continuing the same collision search with the same function
> f and the retaining the database of distinguished points rather
> than starting over for each collision.  In this case, the 
> expected run-time is sqrt(pi*n/2) 2^(s/2).  This means that
> the number of collisions found will grow as the square of the
> time spent.
> 

Oh, ok, of course -- that makes sense now that I hear it and
can think about with the benefit of hindsight :-)

> >I have some theory which *may* somewhat explain the low
> >figures, though it doesn't explain all the surprises, and
> >I'm not convinced it's correct.
> 
> I'm afraid that I wasn't able to pick out where the reasoning broke
> down, but I am quite certain that the expected time to the first 
> collision is O(|X|^(1/2)) and not O(|X|^(1/3)).  My explanation above 
> accounts for why the time to subsequent collisions is lower.

Actually, you're right, my conclusion was totally wrong.
I did some experiments, and found that for random f:X->X,
it's experimently true that |f^n(X)| ~= 2|X|/n; in fact,
one rjenkins@us.oracle.com (Robert Jenkins) has proved that
this is in fact the correct answer (and derived an error
term) by looking at the power series...

Turns out, though, that f restricted to f^n(X) does not
behave like a random mapping for n large. [i.e. for y in
f^n(X), the sequence y, f(y), f(f(y)), .. is not a random
walk on f^n(X)]  That's why my O(|X|^(1/3)) figure was
way off.  Oops :-)

> 
> I hope that this helps.  I'm glad to see that someone else
> finds this stuff as interesting as Paul and I do.
> 

Sure does!  Yeah, this is fun stuff -- and I'm really lucky
to be able to hear authorative answers right from the horse's
mouth, so to speak :-)  Thanks...

-------------------------------------------------------------------------------
David Wagner                                             dawagner@princeton.edu