From dawagner@flagstaff.princeton.edu Sun Feb 5 00:32:36 EST 1995 Article: 32233 of sci.crypt Newsgroups: sci.crypt Path: princeton!flagstaff.princeton.edu!dawagner From: dawagner@flagstaff.princeton.edu (David A. Wagner) Subject: Re: behavior of random mappings Message-ID: <1995Feb5.053011.22441@Princeton.EDU> Originator: news@hedgehog.Princeton.EDU Sender: news@Princeton.EDU (USENET News System) Nntp-Posting-Host: flagstaff.princeton.edu Organization: Princeton University References: <3h174n$hj7$1@mhadf.production.compuserve.com> Date: Sun, 5 Feb 1995 05:30:11 GMT Lines: 107 In article <3h174n$hj7$1@mhadf.production.compuserve.com>, Bob Jenkins <74512.261@CompuServe.COM> wrote: > I keep running into variations of this problem: Let a mapping f:X->X > be given. Apply the mapping n times to all the elements of X. How > many elements am I left with? I'll assume f is a random map. Let a_n = E |f^n(X)| / |X| be the fraction of elements left after n iterations, on average. Claim: a_1 ~= 1 - 1/e. Heuristic argument: Fix any y in X. Calculate P(y not in f(X)) = \prod_{x in X} P(f(x) != y) = (1 - 1/|X|)^|X| ~= 1/e. This is true for all y in X, so E |f(X)| = \sum_{y in X} P(y in f(X)) ~= |X| (1 - 1/e). Claim: a_n ~= 1 - e^(-a_{n-1}). Heuristic argument: For any y in X, we have P(y not in f^n(X)) = E \prod_{x in f^{n-1}(X)} P(f(x) != y) = (1 - 1/|X|)^(|X| a_{n-1}) ~= e^(-a_{n-1}). E |f^n(X)| ~= \sum_{y in X} P(y in f(X)) ~= |X| (1 - e^(-a_{n-1})). Claim: a_n ~= 2/n. Heuristic: Experimentally iterating the above recurrence, I get a_10 = .158, a_100 = .0194, a_500 = .00397, a_1000 = .00199. [Probably one could actually verify this by use of the Taylor expansion of e^x and carrying the appropriate error terms, but I'm too lazy to do so. :-] This seems to indicate that one should find a collision in the sequence x, f(x), f^2(x), f^3(x), ... after approximately O(|X|^(1/3)) iterations, which is surprising, because the standard reasoning says that you need to generate O(sqrt(|X|)) random values before having a good chance of finding a collision! [Why O(|X|^(1/3))? Because after n = O(|X|^(1/3)) iterations, E |f^n(X)| = O(|X|^(2/3)), and then another O(|X|^(1/3)) iterations of f will give you O(sqrt(|f^n(X)|)) values in f^n(X), so by the birthday paradox, you will have a good chance of finding a collision.] Maybe I'll do some tests to try and verify this -- its nifty! Can anyone see any flaws in this reasoning? Comments? I seem to remember someone describing all this reasoning on cypherpunks a *long* time ago -- I wish I could remember who... > Apply f n times, how many cycles are there, how big are they. > (how big are cycles in an RNG that isn't reversible.) Are you asking for the average number of cycles you find in the sequence x, f(x), f^2(x), ..., f^n(x)? If so, the standard thinking would say you'd have a good chance of finding a cycle after n > O(sqrt(|X|)), and you'd expect the cycle to be of length O(sqrt(|X|)). On the other hand, this reasoning seems to indicate O(|X|^(1/3)) instead (as before). Hrmm. Or maybe you're asking for the average number of cycles you find in the sequence x, f^n(x), f^{2n}(x), ..., f^{kn}(x)? I'm not sure how to approach this question. The standard thinking would say you'll find a cycle after k > O(sqrt(|X|)) iterations; but the reasoning above seems to indicate that for large n, this overestimates k (and I have *no* idea by how much). In fact, I think if you use the reasoning above you'll get that you need a larger value of k for n prime than for when n is smooth (it seems like for n a large prime, the above estimate is about right, but for n even, the above estimate oughta be too large by at least a factor of two). I could be confused though. > Apply n different mappings, same question. > Apply n different mappings, how long before two arbitrary elements collide. If the mappings are random and independent, I'd expect the answer to be the same as above. > f causes groups of 256 elements to collide, apply f n times, ... Here's a theoretical approach that *oughta* work: For any x in X, let [x] = {y in X : f(y) = f(x)} be the equivalence class of x. You say |[x]| = 256 always. Let Y = {[x] : x in X}, so that |Y| = |X|/256. Define g : Y -> Y in the obvious way, i.e. g([x]) = f(x). Now apply the above reasoning to g, and the answers you get for g will be within 1 of the answers for f. (I think) This should yield (for example) that x, f(x), f^2(x), ... will hit a cycle after O(sqrt(|X|)/16) iterations, on average. > This has to be a well-researched set of questions. So what's the > standard name for it? What are the answers? References? No clue... I know there's an extensive survey paper on random mappings in one of the CRYPTO 'xx or EUROCRYPT 'xx conference proceedings, but I'm not near the library right now, so I don't know which off-hand... ------------------------------------------------------------------------------- David Wagner dawagner@princeton.edu