CS70 - Lecture 11 - Feb 11, 2011 - 10 Evans
Goal for Note 6:
Cryptography: when you type in your password into a website,
why can't anyone else who might see it (as it goes by on the
network) decode it?
Any message (character string) is converted to a number M
(a long message is converted to a sequence of numbers).
What happens when a Sender wants to send a secret message to a Receiver:
The Sender takes message M and encrypts it to get the
encrypted message C = f_enc(M)
The Sender sends C to the Receiver. Anyone may "intercept" C on its way.
The Receiver decrypts C to get the original message M = f_dec(C).
For this to work as the Sender and Receiver desire:
f_enc and f_dec have to be inverses of one another, i.e.
M = f_dec(f_enc(M)) for all M
It is easy for the Sender to evaluate f_enc
It is easy for the Receiver to evaluate f_dec
It is very hard for anyone other than the Receiver to evaluate f_dec.
The harder it is, the better the secrecy.
Two kinds of cryptography:
Private key (traditional): need one "Key" for both f_enc and f_dec
where K=Key is a shared secret between Sender, Receiver
EX: xor: C = f_enc(M) = M xor K
think of M, C, K as bit strings of the same length
M = f_dec(C) = C xor K
ASK&WAIT: Why are f_enc and f_dec inverses?
hard to break if K used once
EX: Original Washington/Moscow hotline worked this way
EX: crypt command in UNIX, uses algorithm from German Enigma machine
used in World War II, which was broken by Turing
Secrecy depends on keeping K a secret known only to Sender, Receiver
so only they can evaluate f_enc and f_dec
Disadvantage: if 1000 people want to talk to one another in secret,
need 999*1000 secret keys, so all pairs can talk; too many keys!
Public key: any Sender can do f_enc, but only one Receiver can do f_dec
Advantage: for 1000 people to talk in secret, each person has his/her
own secret f_dec, but can just publish the corresponding f_enc
EX: RSA (Rivest/Shamir/Adleman)
Need: 1) large number n that is product of two different large primes p*q=n
large means at least 200 to 400 decimal digits
We will assume our message satisfies 0 <= M < n;
longer messages can be broken into smaller parts and sent separately.
2) integer e that is relatively prime to (p-1)*(q-1)
3) integer d = multiplicative inverse of e mod (p-1)*(q-1)
Everyone knows n and e, but only Receiver knows d
Then for message M, C = f_enc(M) = M^e mod n is the encrypted message
For encrypted message C, M = f_dec(C) = C^d mod n is the decrypted message
EX: Try 2537=n=p*q=43*59, e=13, message = STOP = (ST,OP)=(1819,1415)
using position of letters in alphabet. Then encrypted message
= ( 1819^13 mod 2537 , 1415^13 mod 2537 ) = ( 2081, 2182 ).
To decrypt we use d = 937 and compute
( 2081^937 mod 2537 , 2182^937 mod 2537 ) = (1819,1415)
We will show that f_enc and f_dec are inverses of one another shortly.
But first, why is f_enc() easy and f_dec() hard to evaluate?
f_enc() requires multiplying by M and taking the remainder mod n,
both of which are easy, even if M and n are large.
f_dec() equally easy if we know d, which only the Receiver knows.
Why is it hard to figure out d? All you have to do is
1) factor n=p*q
2) use Euclidean algorithm to compute d so d*e ==1 mod (p-1)*(q-1)
But 1) is very hard: Best algorithms would take billions of years
if n has 400 digits. And any other known algorithm to compute d
leads to computing p and q too. So quality of encryption depends on
large integers being very hard to factor. If you figure out an
algorithm to factor quickly, you can become rich or famous.
Proof that f_dec() is inverse of f_enc requires
Fermat's Little Theorem (proof later)
If p is prime and p does not divide a evenly, then a^(p-1) == 1 mod p
Corollary: If p is prime then for any positive numbers a and b, a^(1+b*(p-1)) == a mod p
Proof:
Case 1: If p divides a, then a^(1+b*(p-1)) == 0 == a mod p
Case 2: if p does not divide a, then by Fermat's Little Theorem
a^(1+b*(p-1)) == a*(a^(p-1))^b == a*1^b == a mod p
Proof that f_dec(f_enc(M)) = M, where 0 <= M < p*q:
f_dec(f_enc(M)) = f_dec(M^e mod n) = (M^e)^d mod n = M^(e*d) mod n.
Since e*d == 1 mod (p-1)*(q-1), we can write e*d = 1 + m*(p-1)*(q-1) for some m.
Then by using the Corollary twice we get
M^(e*d) = M^(1 + m*(p-1)*(q-1)) == M mod p
M^(e*d) = M^(1 + m*(p-1)*(q-1)) == M mod q
Thus both p and q divide (M - M^(e*d)), and since they are
different primes, their product n = p*q also divides (M-M^(e*d)), ie.
M^(e*d) == M mod n
or M = M^(e*d) mod n as desired.
To finish cryptography, we need a proof of Fermat's Little Theorem:
Thm: If p is prime and p does not divide a evenly, then a^(p-1) == 1 mod p
Here are some "numerical experiments" to devise a proof conjecture:
consider integers 1 <= i < p, for some prime p, say p=7.
Try multiplying them by any integer mod p, see what you get:
1 2 3 4 5 6
*2 mod 7 => 2 4 6 3 5 7
*3 mod 7 => 3 6 2 5 1 4
*4 mod 7 => 4 1 5 2 6 3
*5 mod 7 => 5 3 1 6 4 2
*6 mod 7 => 6 5 4 3 2 1
ASK&WAIT: What is the pattern?
Can see same pattern for any prime p
Conjecture (proven shortly): given any prime p and any 1 <= a < p,
the numbers a*1 mod p, a*2 mod p , ... a*(p-1) mod p are
all different, i.e. just a permutation of 1,...,p-1
Now take their product:
(p-1)! = (a*1) mod p * (a*2) mod p *...*(a*(p-1)) mod p
or
(p-1)! == (a*1*a*2*...a*(p-1)) mod p
== a^(p-1) (p-1)! mod p
Suppose we could "divide by" (p-1)!;
would get 1 == a^(p-1) mod p as desired
Now let's do proof carefully:
Proof of Conjecture: suppose 1 <= x,y < p , x neq y
so -(p-1) <= x-y <= p-1, x neq y
so p does not divide x-y
so p does not divide a*(x-y)
so a*x mod p neq a*y mod p
In other words, a*1 mod p, a_2 mod p , ... , a*(p-1) mod p
all different as conjectured.
So now we have (p-1)! == a^(p-1)*(p-1)! mod p, and want
to conclude 1 == a^(p-1) mod p
ASK&WAIT: What did we prove last time that lets us do this?
Thus (p-1)!*x == 1 mod p has unique solution, multiply through to get
(p-1)!*x == a^(p-1)*(p-1)!*x mod p
or
1 == a^(p-1)*1 mod p
as desired. This completes the proof of Fermat's Little Theorem.
You can show even more, that (p-1)! == -1 mod p (Wilson's Theorem)
For RSA to be useful, we need to find a lot of large primes.
It turns out that there are so many primes, you can just
pick numbers randomly and test if they are prime;
there are enough primes that chances are you won't have
to test too many random numbers before finding one.
Def: pi(n) = the number of primes <= n
Ex: pi(20) = |{2,3,5,7,11,13,17,19}| = 8
Theorem (Prime Number Theorem): The limit as n -> infinity of
pi(n) / (n/ log_e n) = 1
EX: n pi(n) n/log_e(n) pi(n)/ (n/log_e n)
10^1 4 4.3 .92
10^2 25 21.7 1.15
10^3 168 144.8 1.16
10^4 1229 1085.7 1.13
10^5 9592 8685.9 1.10
10^6 78498 72382.4 1.08
10^7 664579 620420.7 1.07
10^8 5761455 5428681.0 1.06
The point is that the ratio in the last column is slowly approaching 1
So about what fraction of 200 decimal digit numbers are prime?
# 200 digit primes / # 200 digit numbers
= ( pi(10^200) - pi(10^199) ) / (10^200 - 10^199 )
~ ( 10^200/log_e(10^200) - 10^199/log_e(10^199) ) / (10^200 - 10^199)
~ .002 or about 1 out of 500
So if you pick 500 random 200 digit numbers,
there is a reasonable chance that one is prime.
But we still need a quick test that a particular number is
prime. We already said that trying to factor big numbers
is too expensive (which is why we can use RSA safely in
the first place!), so we need something cheaper.
It turns out that Fermat's Little Theorem tells us (almost)
all we need: any prime p satisfies a^(p-1) == 1 mod p, which
is cheap to test for some randomly chosen a not divisible by p;
if a^(p-1) is not == 1 mod p, we are sure p is not prime.
But if a^(p-1) == 1 mod p for enough randomly chosen a, we have
strong evidence that p is prime. This is not quite enough
(there is a set of nonprimes, called "Carmichael numbers", that
pass this test), but the test can be improved to identify
primes reliably.