From dawagner Thu Feb 16 15:57:04 1995 Subject: Re: paper on collision search To: wiener@bnr.ca (michael) Date: Thu, 16 Feb 1995 15:57:04 -0500 (EST) In-Reply-To: <"14528 Fri Feb 10 09:32:00 1995"@bnr.ca> from "michael" at Feb 10, 95 09:31:00 am X-Mailer: ELM [version 2.4 PL23] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 3510 Status: RO I thought you might enjoy hearing that my implementation of your algorithm have found this non-trivial collision in Unix's crypt(3) password hash function: crypt("2NGGMda3", "Hx") = "yX8CL2luKyI" crypt("gnB9Gw1j", "s8") = "yX8CL2luKyI" [after removing the salt which crypt(3) prepends to its output] This was found after a total of 6.1 billion trial crypt()s; since crypt() outputs a 64 bit hash, this is in close agreement to the 5.4 billion predicted by the birthday paradox. Since the distributed clients (the programs that were doing the actual work, as opposed to the server which recorded dist. points) were stateless, it was very easy to adapt the algorithm to use only spare cycles (at night, for instance). It came in handy :-) Your algorithm performed nicely, and the predictions were matched the observed results closely. Congrats! By the way, I found the collision by iterating this f: f("xxyyyyyyyyz") = crypt("yyyyyyyy", "xx")+2 [the +2 removes the salt "xx" which crypt(3) prepends to its output] using the spare cycles of 27 Sun workstations (LXs and Sparc 2s). Since f drops the last character of crypt()'s output (which has 4 bits of entropy), I expected to see 16 pseudo-collisions before finding a useful one; in fact, I saw a total of 19. I got about 1310 crypts per second, so I figure it took about 1290 CPU hours, total. I know that finding a collision in crypt(3) has no practical significance, but it proved to me that the algorithm works as advertised. :-) Thanks for your previous email, by the way: > > When you say that you are finding n collisions, I assume that you > mean continuing the same collision search with the same function > f and the retaining the database of distinguished points rather > than starting over for each collision. In this case, the > expected run-time is sqrt(pi*n/2) 2^(s/2). This means that > the number of collisions found will grow as the square of the > time spent. > Oh, ok, of course -- that makes sense now that I hear it and can think about with the benefit of hindsight :-) > >I have some theory which *may* somewhat explain the low > >figures, though it doesn't explain all the surprises, and > >I'm not convinced it's correct. > > I'm afraid that I wasn't able to pick out where the reasoning broke > down, but I am quite certain that the expected time to the first > collision is O(|X|^(1/2)) and not O(|X|^(1/3)). My explanation above > accounts for why the time to subsequent collisions is lower. Actually, you're right, my conclusion was totally wrong. I did some experiments, and found that for random f:X->X, it's experimently true that |f^n(X)| ~= 2|X|/n; in fact, one rjenkins@us.oracle.com (Robert Jenkins) has proved that this is in fact the correct answer (and derived an error term) by looking at the power series... Turns out, though, that f restricted to f^n(X) does not behave like a random mapping for n large. [i.e. for y in f^n(X), the sequence y, f(y), f(f(y)), .. is not a random walk on f^n(X)] That's why my O(|X|^(1/3)) figure was way off. Oops :-) > > I hope that this helps. I'm glad to see that someone else > finds this stuff as interesting as Paul and I do. > Sure does! Yeah, this is fun stuff -- and I'm really lucky to be able to hear authorative answers right from the horse's mouth, so to speak :-) Thanks... ------------------------------------------------------------------------------- David Wagner dawagner@princeton.edu