From dawagner@flagstaff.princeton.edu Sun Feb  5 00:32:36 EST 1995
Article: 32233 of sci.crypt
Newsgroups: sci.crypt
Path: princeton!flagstaff.princeton.edu!dawagner
From: dawagner@flagstaff.princeton.edu (David A. Wagner)
Subject: Re: behavior of random mappings
Message-ID: <1995Feb5.053011.22441@Princeton.EDU>
Originator: news@hedgehog.Princeton.EDU
Sender: news@Princeton.EDU (USENET News System)
Nntp-Posting-Host: flagstaff.princeton.edu
Organization: Princeton University
References: <3h174n$hj7$1@mhadf.production.compuserve.com>
Date: Sun, 5 Feb 1995 05:30:11 GMT
Lines: 107

In article <3h174n$hj7$1@mhadf.production.compuserve.com>,
Bob Jenkins  <74512.261@CompuServe.COM> wrote:
> I keep running into variations of this problem:  Let a mapping f:X->X
> be given.  Apply the mapping n times to all the elements of X.  How
> many elements am I left with?

I'll assume f is a random map.  Let a_n = E |f^n(X)| / |X| be
the fraction of elements left after n iterations, on average.

Claim:	a_1 ~= 1 - 1/e.
Heuristic argument:
	Fix any y in X.  Calculate

	P(y not in f(X)) = \prod_{x in X} P(f(x) != y)
		= (1 - 1/|X|)^|X| ~= 1/e.

	This is true for all y in X, so

	E |f(X)| = \sum_{y in X} P(y in f(X)) ~= |X| (1 - 1/e).

Claim:	a_n ~= 1 - e^(-a_{n-1}).
Heuristic argument:
	For any y in X, we have

	P(y not in f^n(X)) = E \prod_{x in f^{n-1}(X)} P(f(x) != y)
		= (1 - 1/|X|)^(|X| a_{n-1}) ~= e^(-a_{n-1}).
	E |f^n(X)| ~= \sum_{y in X} P(y in f(X)) ~= |X| (1 - e^(-a_{n-1})).

Claim:	a_n ~= 2/n.
Heuristic:
	Experimentally iterating the above recurrence, I get
	a_10 = .158, a_100 = .0194, a_500 = .00397, a_1000 = .00199.
	[Probably one could actually verify this by use of the
	Taylor expansion of e^x and carrying the appropriate
	error terms, but I'm too lazy to do so. :-]
			
This seems to indicate that one should find a collision in
the sequence x, f(x), f^2(x), f^3(x), ... after approximately
O(|X|^(1/3)) iterations, which is surprising, because the
standard reasoning says that you need to generate O(sqrt(|X|))
random values before having a good chance of finding a
collision!

[Why O(|X|^(1/3))?  Because after n = O(|X|^(1/3)) iterations,
E |f^n(X)| = O(|X|^(2/3)), and then another O(|X|^(1/3))
iterations of f will give you O(sqrt(|f^n(X)|)) values in f^n(X),
so by the birthday paradox, you will have a good chance of
finding a collision.]

Maybe I'll do some tests to try and verify this -- its nifty!
Can anyone see any flaws in this reasoning?  Comments?

I seem to remember someone describing all this reasoning on
cypherpunks a *long* time ago -- I wish I could remember who...

>   Apply f n times, how many cycles are there, how big are they.
>     (how big are cycles in an RNG that isn't reversible.)

Are you asking for the average number of cycles you find in
the sequence x, f(x), f^2(x), ..., f^n(x)?  If so, the standard
thinking would say you'd have a good chance of finding a cycle
after n > O(sqrt(|X|)), and you'd expect the cycle to be of
length O(sqrt(|X|)).  On the other hand, this reasoning seems
to indicate O(|X|^(1/3)) instead (as before).  Hrmm.

Or maybe you're asking for the average number of cycles you
find in the sequence x, f^n(x), f^{2n}(x), ..., f^{kn}(x)?
I'm not sure how to approach this question.  The standard
thinking would say you'll find a cycle after k > O(sqrt(|X|))
iterations; but the reasoning above seems to indicate that
for large n, this overestimates k (and I have *no* idea by
how much).  In fact, I think if you use the reasoning above
you'll get that you need a larger value of k for n prime than
for when n is smooth (it seems like for n a large prime,
the above estimate is about right, but for n even, the above
estimate oughta be too large by at least a factor of two).
I could be confused though.

>   Apply n different mappings, same question.
>   Apply n different mappings, how long before two arbitrary elements collide.

If the mappings are random and independent, I'd expect the
answer to be the same as above.

>   f causes groups of 256 elements to collide, apply f n times, ...

Here's a theoretical approach that *oughta* work:

For any x in X, let [x] = {y in X : f(y) = f(x)} be the equivalence
class of x.  You say |[x]| = 256 always.  Let Y = {[x] : x in X},
so that |Y| = |X|/256.  Define g : Y -> Y in the obvious way, i.e.
g([x]) = f(x).  Now apply the above reasoning to g, and the answers
you get for g will be within 1 of the answers for f.  (I think)

This should yield (for example) that x, f(x), f^2(x), ... will hit
a cycle after O(sqrt(|X|)/16) iterations, on average.

> This has to be a well-researched set of questions.  So what's the
> standard name for it?  What are the answers?  References?

No clue...  I know there's an extensive survey paper on random
mappings in one of the CRYPTO 'xx or EUROCRYPT 'xx conference
proceedings, but I'm not near the library right now, so I don't
know which off-hand...

-------------------------------------------------------------------------------
David Wagner                                             dawagner@princeton.edu