From rsalz@osf.org Tue Jan 23 19:50:58 PST 1996 Article: 28940 of hks.lists.cypherpunks Path: hks.net!news-mail-gateway!owner-cypherpunks From: rsalz@osf.org (Rich Salz) Newsgroups: hks.lists.cypherpunks Subject: Random noise from disk drives Date: 22 Jan 1996 23:56:38 -0500 Organization: HKS, Inc. Lines: 97 Sender: root@hks.net Message-ID: <9601230431.AA06742@sulphur.osf.org> NNTP-Posting-Host: bb.hks.net Don Davis has done some interesting/important/widely-referenced work on using disk latencies as a random number source. We were talking about some of his techniques, and he kindly gave me permission to forward along his email. He's not on the list, so I cc'd him. /r$ --------------------forwarded message----------------- well, if you're willing to accept a temporary peak load on your machine, then a paging rng is easy to write. the basic idea that worked is: * allocate a little more memory than is available in ram; * walk through it in steps of a prime # near 4kb or 5kb; * time each access with whatever system clock is easiest; * throw away accesses that take less than 5 millisec or so; * keep only the parity of each access that remains. naturally, on the first passes through the array, you'll get mostly fast accesses; thereafter, most accesses will cause page-faults. mostly a page-fault's access-time reflects where the page was on the disk, where the head and spindle were before the page-fault, etc. that's why you keep only the parity. however, you're not going to get a bit of entropy from every access, by any means. the reason for a prime-valued step is that you want to be sure of visiting every page, but you don't want to have to know the page-size. (i haven't checked this trick yet; i used 2kb or 4kb steps, if i remember correctly). the reason for the 5 msec cutoff is that it's rare for a disk access to happen so fast, and you want to screen out other causes of non-immediate memory accesses. you'll only get a real access that fast, if at the moment of the page-fault, the head's already in the right track, and if the block you want is less than a third of a rev away >from the head. i used a byte-indexed table to get the parity quickly. you don't really need to bother packing the parity bits into bytes; store the 1's and 0's as chars into the md5 buffer, & hash it; then, put the hash itself into the buffer, xor more 1's & 0's back in on top, rehash, etc. running md5 8 times as often as necessary doesn't matter. note that this algorithm should defeat caching disk controllers, but i haven't checked for sure. if it doesn't, the symptom will be long stretches of fast access-times. any fancy stuff you put in, like feeding noise-bits back into your path through the array, is a bad idea; first, it really only adds pseudo-randomness; second, it's liable to reduce the frequency of page-faults. though i knew it was a bad idea, i tried it anyway, lots of ways, and that's what i found. simple is best. really, the only complicated code should be outside the rng: reallocating when memory availability changes, minimizing the ui, stuff like that. it is worthwhile to run find, or some other filesys- traversing program, while you're running the paging-rng. the benefit is not from the changed paging-times, but >from the extra head-motion, which perturbs the spindle speed. if you choose to measure the rng's quality, study the raw parity-bits, not the hashed output. the hashes would look perfectly random, even if the parity bits were periodic. look for periodicity (fft), long-term autocorrelation, and measure the entropy with an entropy estimator. don't bother with runs tests and the others; the hashing will produce nice output bits, even if the parity inputs are periodic or are highly correlated in the long-run. but if they're not periodic and uncorrelated, the output bits will be random, instead of pseudo-random. on the only machine i studied closely, i got 100 bits/min, with a 1 khz interrupt-clock; if your interrupt schedule is more frequent, then you can expect proportionally more entropy. remember that even for long rsa keys, you only need enough entropy to prevent exhaustive key-search. use 100 bits or so to seed the prng, then use the prng to generate a prime. reseed for each prime, and as often as you can for symmetric session-keys. i don't trust pseudo-random key-generation (it's no surprise that i don't, i suppose). please let me know if anything comes of this, like if someone builds it properly before i get around to it. i suppose i ought to build it, test it, and write a paper about how much fun it was, but i have too much work, and too many papers to write. -don davis, boston -----------------egassem dedrawrof-------------------- From don@cam.ov.com Wed Jan 24 14:59:49 PST 1996 Article: 29100 of hks.lists.cypherpunks Path: hks.net!news-mail-gateway!owner-cypherpunks From: don@cam.ov.com ("Donald T. Davis") Newsgroups: hks.lists.cypherpunks Subject: disk randomness Date: 24 Jan 1996 14:49:42 -0500 Organization: HKS, Inc. Lines: 38 Sender: root@hks.net Message-ID: <199601241850.NAA21039@gza-client1.cam.ov.com> NNTP-Posting-Host: bb.hks.net rich salz posted to this list a message i sent him about a portable way to gather disk-noise for a true rng. he also was kind enough to forward a reply to me from the list, because i wasn't subscribed at the time. the reply's author pointed out that my approach is not a practical one, and that NOISE.SYS gathers disk timings and other noise more efficiently, anyway. now that i'm subscribed, i'll answer on my own behalf: i agree that my algorithm isn't practical. in fact, that's why i agreed to rich's request that i let him post my message here. i don't recommend paging-timings to my clients, because it's not a workable approach for production-quality code. memory-paging's only virtue as a noise-source is that it's uniquely portable. i failed to emphasize this, in the message rich forwarded for me. the code needs no device-specific calls, and the only OS-specific call is the gettime() call. even with this virtue, i don't recommend it as a production-quality algorithm, unless the process that needs the rng is already memory-bound. i'm sorry that my original msg was unclear on this point; that's my fault, not rich's. by the way, i think the "interesting work" of mine to which rich referred, is my paper on disk randomness, which appeared in the crypto '94 proceedings. it presents work i did at mit from '88-9, and shows mathematically why disk-timings can contain true entropy: a disk's speed variations come from air turbulence, which now is known mathematically to be unpredictable in the long run. my coauthors were p.r. fenstermacher, a chaos-theory physicist, and r. ihaka, a statistician. -don davis, boston