From rsalz@osf.org Tue Jan 23 19:50:58 PST 1996
Article: 28940 of hks.lists.cypherpunks
Path: hks.net!news-mail-gateway!owner-cypherpunks
From: rsalz@osf.org (Rich Salz)
Newsgroups: hks.lists.cypherpunks
Subject: Random noise from disk drives
Date: 22 Jan 1996 23:56:38 -0500
Organization: HKS, Inc.
Lines: 97
Sender: root@hks.net
Message-ID: <9601230431.AA06742@sulphur.osf.org>
NNTP-Posting-Host: bb.hks.net

Don Davis  has done some interesting/important/widely-referenced work on
using disk latencies as a random number source.  We were talking about
some of his techniques, and he kindly gave me permission to forward along
his email.  He's not on the list, so I cc'd him.
	/r$

--------------------forwarded message-----------------
well, if you're willing to accept a temporary peak
load on your machine, then a paging rng is easy to
write.

the basic idea that worked is:
    * allocate a little more memory than is available in ram;
    * walk through it in steps of a prime # near 4kb or 5kb;
    * time each access with whatever system clock is easiest;
    * throw away accesses that take less than 5 millisec or so;
    * keep only the parity of each access that remains.

naturally, on the first passes through the array, you'll
get mostly fast accesses; thereafter, most accesses
will cause page-faults. mostly a page-fault's access-time
reflects where the page was on the disk, where the head
and spindle were before the page-fault, etc. that's why
you keep only the parity. however, you're not going to
get a bit of entropy from every access, by any means.

the reason for a prime-valued step is that you want to be
sure of visiting every page, but you don't want to have
to know the page-size. (i haven't checked this trick yet;
i used 2kb or 4kb steps, if i remember correctly).

the reason for the 5 msec cutoff is that it's rare for
a disk access to happen so fast, and you want to screen
out other causes of non-immediate memory accesses. you'll
only get a real access that fast, if at the moment of the
page-fault, the head's already in the right track, and if
the block you want is less than a third of a rev away
>from  the head.

i used a byte-indexed table to get the parity quickly.
you don't really need to bother packing the parity bits
into bytes; store the 1's and 0's as chars into the md5
buffer, & hash it; then, put the hash itself into the
buffer, xor more 1's & 0's back in on top, rehash, etc.
running md5 8 times as often as necessary doesn't matter.

note that this algorithm should defeat caching disk
controllers, but i haven't checked for sure. if it
doesn't, the symptom will be long stretches of fast
access-times.

any fancy stuff you put in, like feeding noise-bits
back into your path through the array, is a bad idea;
first, it really only adds pseudo-randomness; second,
it's liable to reduce the frequency of page-faults.
though i knew it was a bad idea, i tried it anyway,
lots of ways, and that's what i found. simple is best.

really, the only complicated code should be outside
the rng: reallocating when memory availability changes,
minimizing the ui, stuff like that.

it is worthwhile to run find, or some other filesys-
traversing program, while you're running the paging-rng.
the benefit is not from the changed paging-times, but
>from  the extra head-motion, which perturbs the spindle
speed.

if you choose to measure the rng's quality, study the
raw parity-bits, not the hashed output. the hashes would
look perfectly random, even if the parity bits were periodic.
look for periodicity (fft), long-term autocorrelation,
and measure the entropy with an entropy estimator. don't
bother with runs tests and the others; the hashing will
produce nice output bits, even if the parity inputs
are periodic or are highly correlated in the long-run.
but if they're not periodic and uncorrelated, the
output bits will be random, instead of pseudo-random.

on the only machine i studied closely, i got 100 bits/min, 
with a 1 khz interrupt-clock; if your interrupt schedule
is more frequent, then you can expect proportionally more
entropy.  remember that even for long rsa keys, you only
need enough entropy to prevent exhaustive key-search.
use 100 bits or so to seed the prng, then use the prng
to generate a prime.  reseed for each prime, and as
often as you can for symmetric session-keys. i don't
trust pseudo-random key-generation (it's no surprise
that i don't, i suppose).

please let me know if anything comes of this, like if
someone builds it properly before i get around to it.
i suppose i ought to build it, test it, and write a paper
about how much fun it was, but i have too much work, and
too many papers to write.
					-don davis, boston
-----------------egassem dedrawrof--------------------


From don@cam.ov.com Wed Jan 24 14:59:49 PST 1996
Article: 29100 of hks.lists.cypherpunks
Path: hks.net!news-mail-gateway!owner-cypherpunks
From: don@cam.ov.com ("Donald T. Davis")
Newsgroups: hks.lists.cypherpunks
Subject: disk randomness
Date: 24 Jan 1996 14:49:42 -0500
Organization: HKS, Inc.
Lines: 38
Sender: root@hks.net
Message-ID: <199601241850.NAA21039@gza-client1.cam.ov.com>
NNTP-Posting-Host: bb.hks.net


rich salz posted to this list a message i sent him
about a portable way to gather disk-noise for a true rng.

he also was kind enough to forward a reply to me from
the list, because i wasn't subscribed at the time. the
reply's author pointed out that my approach is not a 
practical one, and that NOISE.SYS gathers disk timings
and other noise more efficiently, anyway. now that i'm
subscribed, i'll answer on my own behalf:

i agree that my algorithm isn't practical. in fact,
that's why i agreed to rich's request that i let him
post my message here. i don't recommend paging-timings
to my clients, because it's not a workable approach
for production-quality code.

memory-paging's only virtue as a noise-source is that
it's uniquely portable. i failed to emphasize this,
in the message rich forwarded for me. the code needs
no device-specific calls, and the only OS-specific call
is the gettime() call. even with this virtue, i don't
recommend it as a production-quality algorithm, unless
the process that needs the rng is already memory-bound.
i'm sorry that my original msg was unclear on this point;
that's my fault, not rich's.

by the way, i think the "interesting work" of mine to
which rich referred, is my paper on disk randomness,
which appeared in the crypto '94 proceedings. it presents
work i did at mit from '88-9, and shows mathematically
why disk-timings can contain true entropy: a disk's speed
variations come from air turbulence, which now is known
mathematically to be unpredictable in the long run.
my coauthors were p.r. fenstermacher, a chaos-theory
physicist, and r. ihaka, a statistician.

				-don davis, boston