From dawagner@flagstaff.princeton.edu Sun Feb 5 00:32:36 EST 1995
Article: 32233 of sci.crypt
Newsgroups: sci.crypt
Path: princeton!flagstaff.princeton.edu!dawagner
From: dawagner@flagstaff.princeton.edu (David A. Wagner)
Subject: Re: behavior of random mappings
Message-ID: <1995Feb5.053011.22441@Princeton.EDU>
Originator: news@hedgehog.Princeton.EDU
Sender: news@Princeton.EDU (USENET News System)
Nntp-Posting-Host: flagstaff.princeton.edu
Organization: Princeton University
References: <3h174n$hj7$1@mhadf.production.compuserve.com>
Date: Sun, 5 Feb 1995 05:30:11 GMT
Lines: 107
In article <3h174n$hj7$1@mhadf.production.compuserve.com>,
Bob Jenkins <74512.261@CompuServe.COM> wrote:
> I keep running into variations of this problem: Let a mapping f:X->X
> be given. Apply the mapping n times to all the elements of X. How
> many elements am I left with?
I'll assume f is a random map. Let a_n = E |f^n(X)| / |X| be
the fraction of elements left after n iterations, on average.
Claim: a_1 ~= 1 - 1/e.
Heuristic argument:
Fix any y in X. Calculate
P(y not in f(X)) = \prod_{x in X} P(f(x) != y)
= (1 - 1/|X|)^|X| ~= 1/e.
This is true for all y in X, so
E |f(X)| = \sum_{y in X} P(y in f(X)) ~= |X| (1 - 1/e).
Claim: a_n ~= 1 - e^(-a_{n-1}).
Heuristic argument:
For any y in X, we have
P(y not in f^n(X)) = E \prod_{x in f^{n-1}(X)} P(f(x) != y)
= (1 - 1/|X|)^(|X| a_{n-1}) ~= e^(-a_{n-1}).
E |f^n(X)| ~= \sum_{y in X} P(y in f(X)) ~= |X| (1 - e^(-a_{n-1})).
Claim: a_n ~= 2/n.
Heuristic:
Experimentally iterating the above recurrence, I get
a_10 = .158, a_100 = .0194, a_500 = .00397, a_1000 = .00199.
[Probably one could actually verify this by use of the
Taylor expansion of e^x and carrying the appropriate
error terms, but I'm too lazy to do so. :-]
This seems to indicate that one should find a collision in
the sequence x, f(x), f^2(x), f^3(x), ... after approximately
O(|X|^(1/3)) iterations, which is surprising, because the
standard reasoning says that you need to generate O(sqrt(|X|))
random values before having a good chance of finding a
collision!
[Why O(|X|^(1/3))? Because after n = O(|X|^(1/3)) iterations,
E |f^n(X)| = O(|X|^(2/3)), and then another O(|X|^(1/3))
iterations of f will give you O(sqrt(|f^n(X)|)) values in f^n(X),
so by the birthday paradox, you will have a good chance of
finding a collision.]
Maybe I'll do some tests to try and verify this -- its nifty!
Can anyone see any flaws in this reasoning? Comments?
I seem to remember someone describing all this reasoning on
cypherpunks a *long* time ago -- I wish I could remember who...
> Apply f n times, how many cycles are there, how big are they.
> (how big are cycles in an RNG that isn't reversible.)
Are you asking for the average number of cycles you find in
the sequence x, f(x), f^2(x), ..., f^n(x)? If so, the standard
thinking would say you'd have a good chance of finding a cycle
after n > O(sqrt(|X|)), and you'd expect the cycle to be of
length O(sqrt(|X|)). On the other hand, this reasoning seems
to indicate O(|X|^(1/3)) instead (as before). Hrmm.
Or maybe you're asking for the average number of cycles you
find in the sequence x, f^n(x), f^{2n}(x), ..., f^{kn}(x)?
I'm not sure how to approach this question. The standard
thinking would say you'll find a cycle after k > O(sqrt(|X|))
iterations; but the reasoning above seems to indicate that
for large n, this overestimates k (and I have *no* idea by
how much). In fact, I think if you use the reasoning above
you'll get that you need a larger value of k for n prime than
for when n is smooth (it seems like for n a large prime,
the above estimate is about right, but for n even, the above
estimate oughta be too large by at least a factor of two).
I could be confused though.
> Apply n different mappings, same question.
> Apply n different mappings, how long before two arbitrary elements collide.
If the mappings are random and independent, I'd expect the
answer to be the same as above.
> f causes groups of 256 elements to collide, apply f n times, ...
Here's a theoretical approach that *oughta* work:
For any x in X, let [x] = {y in X : f(y) = f(x)} be the equivalence
class of x. You say |[x]| = 256 always. Let Y = {[x] : x in X},
so that |Y| = |X|/256. Define g : Y -> Y in the obvious way, i.e.
g([x]) = f(x). Now apply the above reasoning to g, and the answers
you get for g will be within 1 of the answers for f. (I think)
This should yield (for example) that x, f(x), f^2(x), ... will hit
a cycle after O(sqrt(|X|)/16) iterations, on average.
> This has to be a well-researched set of questions. So what's the
> standard name for it? What are the answers? References?
No clue... I know there's an extensive survey paper on random
mappings in one of the CRYPTO 'xx or EUROCRYPT 'xx conference
proceedings, but I'm not near the library right now, so I don't
know which off-hand...
-------------------------------------------------------------------------------
David Wagner dawagner@princeton.edu