CS70 - Lecture 11 - Feb 11,  2011 - 10 Evans


Goal for Note 6:

   Cryptography: when you type in your password into a website,

     why can't anyone else who might see it (as it goes by on the

     network) decode it?


     Any message (character string) is converted to a number M

     (a long message is converted to a sequence of numbers).

     What happens when a Sender wants to send a secret message to a Receiver:

       The Sender takes message M and encrypts it to get the

            encrypted message C = f_enc(M)

       The Sender sends C to the Receiver. Anyone may "intercept"  C on its way.

       The Receiver decrypts C to get the original message M = f_dec(C).


     For this to work as the Sender and Receiver desire:


       f_enc and f_dec have to be inverses of one another, i.e. 

            M = f_dec(f_enc(M)) for all M

       It is easy for the Sender to evaluate f_enc

       It is easy for the Receiver to evaluate f_dec

       It is very hard for anyone other than the Receiver to evaluate  f_dec.  

            The harder it is, the better the secrecy.

         

     Two kinds of cryptography:

       Private key (traditional): need one "Key" for both f_enc and f_dec

         where K=Key is a shared secret between Sender, Receiver

  EX: xor:  C = f_enc(M) = M xor K 

                                think of M, C, K as bit strings of the same length

                  M = f_dec(C) = C xor K 

ASK&WAIT: Why are f_enc and f_dec inverses?

      hard to break if K used once

  EX: Original Washington/Moscow hotline worked this way

  EX: crypt command in UNIX, uses algorithm from German Enigma machine

      used in World War II, which was broken by Turing


  Secrecy depends on keeping K a secret known only to Sender, Receiver

  so only they can evaluate f_enc and f_dec

  Disadvantage: if 1000 people want to talk to one another in secret,

  need 999*1000 secret keys, so all pairs can talk; too many keys!


  Public key: any Sender can do f_enc, but only one Receiver can do f_dec

  Advantage: for 1000 people to talk in secret, each person has his/her

       own secret f_dec, but can just publish the corresponding f_enc

  EX: RSA (Rivest/Shamir/Adleman)

    Need: 1) large number n that is product of two different large primes p*q=n

            large means at least 200 to 400 decimal digits

            We will assume our message satisfies 0 <= M < n; 

            longer messages can be broken into smaller parts and sent separately.

          2) integer e that is relatively prime to (p-1)*(q-1)

          3) integer d = multiplicative inverse of e mod (p-1)*(q-1)

    Everyone knows n and e, but only Receiver  knows d

   Then for message M, C = f_enc(M) = M^e mod n is the encrypted message

   For encrypted message C, M = f_dec(C) = C^d mod n is the decrypted message

  EX: Try 2537=n=p*q=43*59, e=13, message = STOP = (ST,OP)=(1819,1415)

      using position of letters in alphabet.  Then encrypted message 

      = ( 1819^13 mod 2537 , 1415^13 mod 2537 ) = ( 2081, 2182 ).

      To decrypt we use d = 937 and compute

      ( 2081^937 mod 2537 , 2182^937 mod 2537 ) = (1819,1415)


  We will show that f_enc and f_dec are inverses of one another shortly.

  But first, why is f_enc() easy and f_dec() hard to evaluate?

    f_enc() requires multiplying by M and taking the remainder mod n,

      both of which are easy, even if M and n are large.

    f_dec() equally easy if we know d, which only the Receiver knows.

      Why is it hard to figure out d? All you have to do is

       1) factor n=p*q

       2) use Euclidean algorithm to compute d so d*e ==1 mod (p-1)*(q-1)

     But 1) is very hard: Best algorithms would take billions of years

     if n has 400 digits. And any other known algorithm to compute d 

     leads to computing p and q too. So quality of encryption depends on

     large integers being very hard to factor. If you figure out an

     algorithm to factor quickly, you can become rich or famous.

  

   Proof that f_dec() is inverse of f_enc requires

   Fermat's Little Theorem (proof later)

       If p is prime and p does not divide a evenly, then a^(p-1) == 1 mod p


   Corollary: If p is prime then for any positive numbers a and b,  a^(1+b*(p-1)) == a mod p

   Proof: 

      Case 1: If p divides a, then a^(1+b*(p-1)) == 0 == a mod p

      Case 2: if p does not divide a, then by Fermat's Little Theorem

          a^(1+b*(p-1)) == a*(a^(p-1))^b == a*1^b == a mod p


   Proof that f_dec(f_enc(M)) = M, where 0 <= M < p*q:

   f_dec(f_enc(M)) = f_dec(M^e mod n) = (M^e)^d mod n = M^(e*d) mod n.

   Since e*d == 1 mod (p-1)*(q-1), we can write e*d = 1 + m*(p-1)*(q-1) for some m. 

   Then by using the Corollary twice we get

       M^(e*d) = M^(1 + m*(p-1)*(q-1))  ==  M mod p

       M^(e*d) = M^(1 + m*(p-1)*(q-1))  ==  M mod q

   Thus both p and q divide (M - M^(e*d)), and since they are

   different primes, their product n = p*q also divides (M-M^(e*d)), ie.

       M^(e*d) == M mod n

   or M = M^(e*d) mod n as desired. 

                    

   To finish cryptography, we need a proof of Fermat's Little Theorem:

   Thm: If p is prime and p does not divide a evenly, then a^(p-1) == 1 mod p

   

   Here are some "numerical experiments" to devise a proof conjecture:

    

      consider integers 1 <= i < p, for some prime p, say p=7.

      Try multiplying them by any integer mod p, see what you get:

                   1 2 3 4 5 6

       *2 mod 7 => 2 4 6 3 5 7

       *3 mod 7 => 3 6 2 5 1 4

       *4 mod 7 => 4 1 5 2 6 3

       *5 mod 7 => 5 3 1 6 4 2

       *6 mod 7 => 6 5 4 3 2 1

ASK&WAIT:  What is the pattern?

       Can see same pattern for any prime p

    Conjecture (proven shortly): given any prime p and any 1 <= a < p,

      the numbers a*1 mod p, a*2 mod p , ... a*(p-1) mod p are 

      all different, i.e. just a permutation of 1,...,p-1


    Now take their product:

      (p-1)! = (a*1) mod p * (a*2) mod p *...*(a*(p-1)) mod p

    or

      (p-1)! == (a*1*a*2*...a*(p-1)) mod p

             == a^(p-1) (p-1)! mod p


    Suppose we could "divide by" (p-1)!; 

    would get 1 == a^(p-1) mod p as desired


    Now let's do proof carefully:

    Proof of Conjecture: suppose 1 <= x,y < p , x neq y

             so -(p-1) <= x-y <= p-1, x neq y

             so p does not divide x-y

             so p does not divide a*(x-y)

             so a*x mod p neq a*y mod p

       In other words, a*1 mod p, a_2 mod p , ... , a*(p-1) mod p 

       all different as conjectured.


   So now we have (p-1)! == a^(p-1)*(p-1)! mod p, and want

   to conclude 1 == a^(p-1) mod p

ASK&WAIT: What did we prove last time that lets us do this?

   Thus (p-1)!*x == 1 mod p has unique solution, multiply through to get

   (p-1)!*x == a^(p-1)*(p-1)!*x mod p 

     or

          1 == a^(p-1)*1 mod p

     as desired. This completes the proof of Fermat's Little Theorem.


   You can show even more, that  (p-1)! == -1 mod p (Wilson's Theorem)


   For RSA to be useful, we need to find a lot of large primes.

   It turns out that there are so many primes, you can just

   pick numbers randomly and test if they are prime; 

   there are enough primes that chances are you won't have

   to test too many random numbers before finding one.


   Def: pi(n) = the number of primes <= n

   Ex:  pi(20) = |{2,3,5,7,11,13,17,19}| = 8

   Theorem (Prime Number Theorem): The limit as n -> infinity of

      pi(n) / (n/ log_e n) = 1


   EX:   n     pi(n)   n/log_e(n)    pi(n)/ (n/log_e n)

        10^1       4       4.3        .92

        10^2      25      21.7       1.15

        10^3     168     144.8       1.16

        10^4    1229    1085.7       1.13

        10^5    9592    8685.9       1.10

        10^6   78498   72382.4       1.08

        10^7  664579  620420.7       1.07

        10^8 5761455 5428681.0       1.06

   The point is that the ratio in the last column is slowly approaching 1


   So about what fraction of 200 decimal digit numbers are prime?

     # 200 digit primes / # 200 digit numbers

  =  ( pi(10^200) - pi(10^199) ) / (10^200 - 10^199 )

  ~  ( 10^200/log_e(10^200) - 10^199/log_e(10^199) ) / (10^200 - 10^199)

  ~  .002 or about 1 out of 500

    So if you pick 500 random 200 digit numbers, 

    there is a reasonable chance that one is prime.


    But we still need a quick test that a particular number is

    prime. We already said that trying to factor big numbers

    is too expensive (which is why we can use RSA safely in

    the first place!), so we need something cheaper. 


    It turns out that Fermat's Little Theorem tells us (almost)

    all we need: any prime p satisfies  a^(p-1) == 1 mod p, which

    is cheap to test for some randomly chosen a not divisible by p;

    if a^(p-1) is not == 1 mod p, we are sure p is not prime.

    But if a^(p-1) == 1 mod p for enough randomly chosen a, we have

    strong evidence that p is prime. This is not quite enough

    (there is a set of nonprimes, called "Carmichael numbers", that

    pass this test), but the test can be improved to identify

    primes reliably.