Math 55 - Fall 2007 - Lecture notes # 15 - Oct 3 (Wednesday) Finish through end of Section 3.7 (not 3.8) Goals for today: Recall Euclidean algorithm for the gcd Use it to solve a "congruence equation" a*x=b mod m for x how to do division in modular arithmetic Use it to solve a system of congruence equations: Chinese Remainder Theorem Apply it to cyptography Recall property of Euclidean algorithm for gcd: given a and m, it computes 1) d = gcd(a,m) 2) integers s and t such that a*s+m*t=d How to solve a*x == 1 mod m for x: (analogy of reciprocal of a in modular arithmetic) Theorem: a*x ==1 mod m can be solved for x if and only if gcd(a,m)=1. When it can be solved, x is unique mod m, i.e. the only one in the range 0 to m-1, and is called "the inverse of a modulo m". EX: Solve 2*x==1 mod 5: try x=0,1,2,3,4, getting 2*x=0,2,4,1,3, so x=3 is the unique answer (gcd(2,5)=1) EX: Solve 2*x==1 mod 4: try x=0,1,2,3, getting 2*x=0,2,0,2, so there is no solution (gcd(2,4)=2) Proof: If gcd(a,m)=1, we have to show that we can solve for x: Use the Euclidean algorithm to find s and t such that a*s+m*t=1. Thus a*s = 1-m*t == 1 mod m, so x=s is a solution. If gcd(a,m) /= 1, we have to show that no x satisfies a*x==1 mod m: Recall that a*x == 1 mod m is equivalent to a*x mod m = 1, and that a*x mod m = a*x+m*t for some t. But if gcd(a,m)=d>1, then d|a and d|m, so d|(a*x+m*t) for any integer t, and in particular d|(a*x mod m). Since d does not divide 1, a*x mod m /= 1. To show that the solution x is unique mod m when it exists, suppose both that a*x1 == 1 mod m and a*x2 == 1 mod m, and that 1 <= x1 < m and 1 <= x2 < m; we have to show that x1=x2. Now a*x1-a*x2 == 0 mod m, so m|(a*(x1-x2)). Since gcd(a,m)=1, a and m have no common factors, and thus m|(x1-x2). Now x1-x2 satisfies two properties: 1) m|(x1-x2), so x1-x2 is in the set {..., -2*m,-m,0,m,2*m,...} 2) -m < x1-x2 < m, since 1 <= x1 < m and 1 <= x2 < m; The only value of x1-x2 satisfying these properties is x1-x2=0, or x1=x2 as desired. Corollary: a*y == b mod m has a solution y for any b if and only if gcd(a,m) == 1 (analogy of dividing b/a in modular arithmetic) proof: if gcd(a,m)=1, then the Theorem says we can solve a*x == 1 mod m. Multiply through by b to get a*(x*b) == b mod m, so we can take y = x*b (b times "inverse of a") If gcd(a,m)>1, then the Theorem tells us we cannot solve when b=1. ASK&WAIT: under what conditions on b can we solve a*y == b mod m for y? Chinese Remainder Theorem: Let m1, m2,..., mn be pairwise relatively prime numbers, ie gcd(mi,mj)=1 for all i and j. Let m = m1*m2*...*mn. Then the n equations x == a1 mod m1 , x == a2 mod m2, ... , x == an mod mn have a unique solution mod m for any a1, a2,...,an, i.e. there is only one solution in the range from 0 to m-1. EX: x == 2 mod 3, x == 3 mod 5 x x==2 mod 3 ? x==3 mod 5? 0 1 2 Yes 3 Yes 4 5 Yes 6 7 8 Yes Yes x=8 is unique solution mod 3*5=15 9 10 11 Yes 12 13 Yes 14 Yes Proof: We give an algorithm for computing x, and leave uniqueness to homework. Let Mi = m/mi, for i=1,...,n. Thus Mi = product of all mj except for mi, so gcd(Mi,mi)=1, since mj and mi have no common factors. By the last theorem, each Mi has an inverse yi mod mi, i.e. Mi*yi == 1 mod mi. We claim a solution is x = a1*M1*y1 + a2*M2*y2 + ... + an*Mn*yn. To confirm this we have to verify that x == ai mod mi for all i: x == (a1*M1*y1 + a2*M2*y2 + ... + an*Mn*yn) mod mi == ( a1*M1*y1 mod mi + ... + an*Mn*yn mod mi ) mod mi == ( ai*Mi*yi mod mi + sum_{j /= i} aj*Mj*yj ) mod mi == ai mod mi since Mi*yi == 1 mod mi + 0 since mi | Mj when j /= i == ai mod mi as desired EX: x == 2 mod 3, x == 3 mod 5 again: a1=2, m1=3, a2=3, m2=5, M1=5, M2=3, y1=2 since 2*5==1 mod 3 and y2=2 since 2*3==1 mod 5 x = 2*5*2 + 3*3*2 = 38 == 8 mod 15 Cryptography Recall that a message (character string) is converted to a number M What happens when a Sender wants to send a secret message to a Receiver: The Sender takes message M and encrypts it to get the encrypted message C = f_enc(M) The Sender sends C to the Receiver. Anyone may "intercept" C on its way. The Receiver decrypts C to get the original message M = f_dec(C). For this to work as the Sender and Receiver desire: f_enc and f_dec have to be one-to-one, onto functions and be inverses of one another, i.e. M = f_dec(f_enc(M)) for all M It is easy for the Sender to evaluate f_enc It is easy for the Receiver to evaluate f_dec It is very hard for anyone other than the Receiver to evaluate f_dec. The harder it is, the better the secrecy. Two kinds of cryptography: Private key (traditional): need one "Key" for both f_enc and f_dec where K=Key is a shared secret between Sender, Receiver EX: shift: C = f_enc(M) = M-K mod n, M = f_dec(C) = C+K mod n, easy to break ASK&WAIT: How? EX: xor: C = f_enc(M) = M xor K (thinking of M, C, K as bit strings of the same length) M = f_dec(C) = C xor K ASK&WAIT: Why are f_enc and f_dec inverses? hard to break if K used once EX: Original Washington/Moscow hotline worked this way EX: crypt command in UNIX, uses algorithm from German Enigma machine used in World War II, which was broken by Turing Secrecy depends on keeping K a secret known only to Sender, Receiver so only they can evaluate f_enc and f_dec Disadvantage: if 1000 people want to talk to one another in secret, need 999*1000 secret keys, so all pairs can talk; too many keys! Public key: any Sender can do f_enc, but only one Receiver can do f_dec Advantage: for 1000 people to talk in secret, each person has his/her own secret f_dec, but can just publish the corresponding f_enc EX: RSA (Rivest/Shamir/Adleman) Need: 1) large number n that is product of two large primes p*q=n large means 200 to 400 decimal digits 2) integer e that is relatively prime to (p-1)*(q-1) 3) integer d = inverse of e mod (p-1)*(q-1) Everyone knows n and e, but only Receiver knows d Then for message M, C = f_enc(M) = M^e mod n is the encryted message For encrypted message C, M = f_dec(C) = C^d mod n is the decrypted message EX: Try 2537=n=p*q=43*59, e=13, message = STOP = (ST,OP)=(1819,1415) using position of letters in alphabet. Then encrypted message = ( 1819^13 mod 2537 , 1415^13 mod 2537 ) = ( 2081, 2182 ). To decrypt we use d = 937 and compute ( 2081^937 mod 2537 , 2182^937 mod 2537 ) = (1819,1415) We will show that f_enc and f_dec are inverses of one another shortly. But first, why is f_enc() easy and f_dec() hard to evaluate? f_enc() requires multiplying by M and taking the remainder mod n, both of which are easy, even if M and n are large. f_dec() equally easy if we know d, which only the Receiver knows. Why is it hard to figure out d? All you have to do is 1) factor n=p*q 2) use Euclidean algorithm to compute d so d*e ==1 mod (p-1)*(q-1) But 1) is very hard: Best algorithms would take billions of years if n has 400 digits. And any other known algorithm to compute d leads to computing p and q too. So quality of encryption depends on large integers being very hard to factor. If you figure out an algorithm to factor quickly, you can become rich or famous. Proof that f_dec() is inverse of f_enc requires Fermat's Little Theorem (proof later) If p is prime and p /| a, then a^(p-1) == 1 mod p Proof that f_dec(f_enc(M)) = M, where M < p,q f_dec(f_enc(M)) = f_dec(M^e mod n) = (M^e)^d mod n = M^(e*d) mod n. We need to show that M^(e*d) mod n = M mod n = M, since M < p*q = n. Now e*d == 1 mod (p-1)*(q-1) so e*d = 1+m*(p-1)*(q-1) for some m. Then M^(e*d) mod n = M^(1 + m*(p-1)*(q-1)) mod n = M * M^(m*(p-1)*(q-1)) mod n Now since M < p and M < q, and p and q are prime, we must have gcd(M,p) = gcd(M,q) = 1. Then Fermat's Little Theorem implies that M^(p-1) == 1 mod p and M^(q-1) == 1 mod q. Thus M^(e*d) = M * (M^(p-1))^(m*(q-1)) == M * (1)^(m*(q-1)) mod p == M mod p and M^(e*d) = M * (M^(q-1))^(m*(p-1)) == M * (1)^(m*(p-1)) mod q == M mod q. Finally, by the Chinese Remainder Theorem, M^(e*d) is the unique solution mod p*q to x == M mod p x == M mod q so M^(e*d) mod n = M as desired. For RSA to be useful, we need to find a lot of large primes. It turns out that there are so many primes, you can just pick numbers randomly and test if they are prime; there are enough primes that chances are you won't have to test too many random numbers before finding one. Def: pi(n) = the number of primes <= n Ex: pi(20) = |{2,3,5,7,11,13,17,19}| = 8 Theorem (Prime Number Theorem): The limit as n -> infinity of pi(n) / (n/ log_e n) = 1 EX: n pi(n) n/log_e(n) pi(n)/ (n/log_e n) 10^1 4 4.3 .92 10^2 25 21.7 1.15 10^3 168 144.8 1.16 10^4 1229 1085.7 1.13 10^5 9592 8685.9 1.10 10^6 78498 72382.4 1.08 10^7 664579 620420.7 1.07 10^8 5761455 5428681.0 1.06 The point is that the ratio in the last column is slowly approaching 1 So about what fraction of 200 decimal digit numbers are prime? # 200 digit primes / # 200 digit numbers = ( pi(10^200) - pi(10^199) ) / (10^200 - 10^199 ) ~ ( 10^200/log_e(10^200) - 10^199/log_e(10^199) ) / (10^200 - 10^199) ~ .002 or about 1 out of 500 So if you pick 500 random 200 digit numbers, there is a reasonable chance that one is prime. But we still need a quick test that a particular number is prime. We already said that trying to factor big numbers is too expensive (which is why we can use RSA safely in the first place!), so we need something cheaper. It turns out that Fermat's Little Theorem tells us (almost) all we need: a prime p satisfies a^(p-1) == 1 mod p, which is cheap to test for some randomly chosen a not divisible by p; if a^(p-1) is not == 1 mod p, we are sure p is not prime. But if a^(p-1) == 1 mod p for enough randomly chosen a, we have strong evidence that p is prime. This is not quite enough (there is a set of nonprimes, called "Carmichael numbers", that pass this test), but the test can be improved to identify primes reliably. To finish cryptography, we need a proof of Fermat's Little Theorem: Thm: IF p is prime and p \| a, then a^(p-1) == 1 mod p Here are some "numerical experiments" to devise proof conjecture: consider integers 1 <= i < p, for some prime p, say p=7. Try multiplying them by any integer mod p, see what you get: 1 2 3 4 5 6 *2 mod 7 => 2 4 6 3 5 7 *3 mod 7 => 3 6 2 5 1 4 *4 mod 7 => 4 1 5 2 6 3 *5 mod 7 => 5 3 1 6 4 2 *6 mod 7 => 6 5 4 3 2 1 ASK&WAIT: What is the pattern? Can see same pattern for any prime p Conjecture (proven shortly): given any prime p and any 1 <= a < p, the numbers a*1 mod p, a*2 mod p , ... a*(p-1) mod p are all different, i.e. just a permutatation of 1,...,p-1 Now take their product: (p-1)! = (a*1) mod p * (a*2) mod p *...*(a*(p-1)) mod p or (p-1)! == (a*1*a*2*...a*(p-1)) mod p == a^(p-1) (p-1)! mod p Suppose we could "divide by" (p-1)!; would get 1 == a^(p-1) mod p as desired Now let's do proof carefully: Proof of Conjecture: suppose 1 <= x,y < p , x \= y so -(p-1) <= x-y <= p-1, x \= y so p \| x-y so p \| a*(x-y) so a*x mod p \= a*y mod p In other words, a*1 mod p, a_2 mod p , ... , a*(p-1) mod p all different as conjectured. So now we have (p-1)! == a^(p-1)*(p-1)! mod p, and want to conclude 1 == a^(p-1) mod p ASK&WAIT: What did we prove last time that lets us do this? Thus (p-1)!*x == 1 mod p has unique solution, multiply through to get (p-1)!*x == a^(p-1)*(p-1)!*x mod p or 1 == a^(p-1)*1 mod p as desired For homework, you will show more, that (p-1)! == -1 mod p (Wilson's Theorem)