Math 55 - Spring 2004 - Lecture notes # 9 - Feb 19 (Thursday)

   Finish through end of Section 2.6 (not 2.7) if not yet done
   Start reading Sections 3.1 - 3.4

   Homework, due Feb 25
        (1) Let a = a4a3a2a1a0 be a 5-bit 2s complement integer; 
            each ai is 0 or 1. Similarly let b = b4b3b2b1b0, and
            let s = a+b = s4s3s2s1s0 be their sum in 
            2s complement arithmetic.
            Interpreting a 0 bit as False and a 1 bit as True, write down
            logical formulas for s4,s3,s2,s1,s0 using and, or, not, xor, 
            with the inputs a4,a3,a2,a1,a0, b4,b3,b2,b1,b0.
            Hint: introduce new logical variables (bits) c4,c3,c2,c1,c0
            where ci is the carry into the i-th bit from the i-1-st bit.
            Your logical formulas for si and c(i+1) in terms of ai, bi 
            and ci should look the same for all i 
            (you can let c0 = 0 = False so
            that all the formulas look the same). 
            In particular, your formulas should express the facts that 
            c(i+1)=1 if the sum ai + bi + ci is at least 2, and 
            si = 1 if ai + bi + ci is 1 or 3 (i.e. odd).
            Computers implement these formulas in hardware to perform 
            2s complement addition.
        (2) Explain how to use nearly the same logical formulas as above 
            to compute the difference d = a-b
        (3) The purpose of this question is to illustrate that 
            there are a lot of primes.
            (a) Let n and d be integers, and x = n*10^d
                Use the prime number theorem to evaluate the 
                limit as d -> infinity of pi(x + 10^d) / pi(x) 
                where pi(x) is the number of primes less than or equal to x.
                (in other words, n is fixed and d is growing)
            (b) Use part(a) to show that given any arbitrary string of
                decimal digits (representing the integer n), 
                then for all sufficiently large M, there is always a prime 
                p such that
                   1. p has an M-digit decimal expansion, and
                   2. p's decimal expansion starts with the given string
                      (representing n).
        2.4-46
        2.5-18,36,38,40
        2.6-20,24

   Goals for today: Recall Euclidean algorithm for the gcd
                    Use it to solve a "congruence equation" 
                         a*x=b mod m for x
                         how to do division in modular arithmetic
                    Use it to solve a system of congruence equations:
                         Chinese Remainder Theorem
                    Apply it to cyptography
                    
   Recall property of Euclidean algorithm for gcd: given a and m, 
   it computes
       1) d = gcd(a,m)
       2) integers s and t such that a*s+m*t=d 

   How to solve a*x == 1 mod m for x: 
   (analogy of reciprocal of a in modular arithmetic)
   Theorem: a*x ==1 mod m can be solved for x if and only if gcd(a,m)=1.
          When it can be solved, x is unique mod m, i.e. the only one in
          the range 0 to m-1, and is called "the inverse of a modulo m".
     EX: Solve 2*x==1 mod 5: try x=0,1,2,3,4, 
                       getting 2*x=0,2,4,1,3, 
         so x=3 is the unique answer (gcd(2,5)=1)
     EX: Solve 2*x==1 mod 4: try x=0,1,2,3, 
                       getting 2*x=0,2,0,2, 
         so there is no solution (gcd(2,4)=2)
     Proof: If gcd(a,m)=1, we have to show that we can solve for x:
            Use the Euclidean algorithm to find s and t such that 
            a*s+m*t=1. Thus a*s = 1-m*t == 1 mod m, so x=s is a solution.

            If gcd(a,m) /= 1, we have to show that no x satisfies 
            a*x==1 mod m: Recall that a*x == 1 mod m is equivalent to 
            a*x mod m = 1 mod m = 1, and that a*x mod m = a*x+m*t 
            for some t.  But if gcd(a,m)=d>1, then d|a and d|m, 
            so d|(a*x+m*t) for any integer t, and in particular 
            d|(a*x mod m).  Since d does not divide 1, a*x mod m /= 1.

            To show that the solution x is unique mod m when it exists,
            suppose both that a*x1 == 1 mod m and a*x2 == 1 mod m, and
            that 1 <= x1 < m and 1 <= x2 < m; we have to show that x1=x2. 
            Now a*x1-a*x2 == 0 mod m, so m|(a*(x1-x2)). 
            Since gcd(a,m)=1, a and m have no common factors,
            and thus m|(x1-x2). Now x1-x2 satisfies two properties:
              1) m|(x1-x2), so x1-x2 is in the set 
                  {..., -2*m,-m,0,m,2*m,...}
              2) -m < x1-x2 < m, since 1 <= x1 < m and 1 <= x2 < m;
            The only value of x1-x2 satisfying these properties is 
            x1-x2=0, or x1=x2 as desired.

   Corollary: a*y == b mod m has a solution y for any b if and only if
       gcd(a,m) == 1  (analogy of dividing b/a in modular arithmetic)
       proof: if gcd(a,m)=1, then the Theorem says we can solve 
              a*x == 1 mod m. Multiply through by b to get  
              a*(x*b) == b mod m, so we can take y = x*b 
              (b times "inverse of a") If gcd(a,m)>1, 
              then the Theorem tells us we cannot solve when b=1.

ASK&WAIT: under what conditions on b can we solve a*y == b mod m for y?

   Chinese Remainder Theorem: Let m1, m2,..., mn be pairwise
       relatively prime numbers, ie gcd(mi,mj)=1 for all i and j.
       Let m = m1*m2*...*mn.  Then the n equations  
         x == a1 mod m1 , x == a2 mod m2, ... , x == an mod mn
       have a unique solution mod m for any a1, a2,...,an, i.e. there is
       only one solution in the range from 0 to m-1.
   EX: x == 2 mod 3, x == 3 mod 5 
           x    x==2 mod 3 ?   x==3 mod 5?
           0                                
           1                                
           2        Yes
           3                      Yes       
           4                                
           5        Yes                     
           6                                
           7                                
           8        Yes           Yes  x=8 is unique solution mod 3*5=15
           9                                
          10                                
          11        Yes                     
          12                                
          13                      Yes       
          14        Yes                     

    Proof: We give an algorithm for computing x, and leave uniqueness to
        homework. Let Mi = m/mi, for i=1,...,n. Thus
        Mi = product of all mj except for mi, so gcd(Mi,mi)=1, since
        mj and mi have no common factors. By the last theorem, each
        Mi has an inverse yi mod mi, i.e. Mi*yi == 1 mod mi.
        We claim a solution is x = a1*M1*y1 + a2*M2*y2 + ... + an*Mn*yn.
        To confirm this we have to verify that x == ai mod mi for all i:
        x == (a1*M1*y1 + a2*M2*y2 + ... + an*Mn*yn) mod mi
          == ( a1*M1*y1 mod mi + ... + an*Mn*yn mod mi ) mod mi
          == ( ai*Mi*yi mod mi 
             + sum_{j /= i} aj*Mj*yj ) mod mi
          == ai mod mi       since Mi*yi == 1 mod mi
             + 0             since mi | Mj when j /= i
          == ai mod mi       as desired

   EX: x == 2 mod 3, x == 3 mod 5 again:
       a1=2, m1=3, a2=3, m2=5, M1=5, M2=3, 
       y1=2 since 2*5==1 mod 3   and y2=2 since 2*3==1 mod 5
       x = 2*5*2 + 3*3*2 = 38 == 8 mod 15
                    
  Cryptography
     Recall that a message (character string) is converted to a number M
     What happens when a Sender wants to send a secret message to 
     a Receiver:
       The Sender takes message M and encrypts it to get the
            encrypted message C = f_enc(M)
       The Sender sends C to the Receiver. Anyone may "intercept" 
            C on its way.
       The Receiver decrypts C to get the original message M = f_dec(C).

     For this to work as the Sender and Receiver desire:

       f_enc and f_dec have to be one-to-one, onto functions and be
         inverses of one another, i.e. M = f_dec(f_enc(M)) for all M
       It is easy for the Sender to evaluate f_enc
       It is easy for the Receiver to evaluate f_dec
       It is very hard for anyone other than the Receiver to evaluate 
         f_dec.  The harder it is, the better the secrecy.
         
     Two kinds of cryptography:
       Private key (traditional): need one "Key" for both f_enc and f_dec
         where K=Key is a shared secret between Sender, Receiver
  EX: shift:   C = f_enc(M) = M-K mod n, M = f_dec(C) = C+K mod n,
      easy to break
ASK&WAIT: How?
  EX: xor:  C = f_enc(M) = M xor K (thinking of M, C, K as bit strings
            of the same length)
            M = f_dec(C) = C xor K 
ASK&WAIT: Why are f_enc and f_dec inverses?
      hard to break if K used once
  EX: Original Washington/Moscow hotline worked this way
  EX: crypt command in UNIX, uses algorithm from German Enigma machine
      used in World War II, which was broken by Turing

  Secrecy depends on keeping K a secret known only to Sender, Receiver
  so only they can evaluate f_enc and f_dec
  Disadvantage: if 1000 people want to talk to one another in secret,
  need 999*1000 secret keys, so all pairs can talk; too many keys!

  Public key: any Sender can do f_enc, but only one Receiver can do f_dec
  Advantage: for 1000 people to talk in secret, each person has his/her
       own secret f_dec, but can just publish the corresponding f_enc
  EX: RSA (Rivest/Shamir/Adleman)
    Need: 1) large number n that is product of two large primes p*q=n
            large means 200 to 400 decimal digits
          2) integer e that is relatively prime to (p-1)*(q-1)
          3) integer d = inverse of e mod (p-1)*(q-1)
    Everyone knows n and e, but only Receiver  knows d
   Then for message M, C = f_enc(M) = M^e mod n is the encryted message
      For encrypted message C, M = f_dec(C) = C^d mod n is the decrypted 
          message
  EX: Try 2537=n=p*q=43*59, e=13, message = STOP = (ST,OP)=(1819,1415)
      using position of letters in alphabet.  Then encrypted message 
      = ( 1819^13 mod 2537 , 1415^13 mod 2537 ) = ( 2081, 2182 ).
      To decrypt we use d = 937 and compute
      ( 2081^937 mod 2537 , 2182^937 mod 2537 ) = (1819,1415)

  We will show that f_enc and f_dec are inverses of one another shortly.
  But first, why is f_enc() easy and f_dec() hard to evaluate?
    f_enc() requires multiplying by M and taking the remainder mod n,
      both of which are easy, even if M and n are large.
    f_dec() equally easy if we know d, which only the Receiver knows.
      Why is it hard to figure out d? All you have to do is
       1) factor n=p*q
       2) use Euclidean algorithm to compute d so d*e ==1 mod (p-1)*(q-1)
     But 1) is very hard: Best algorithms would take billions of years
     if n has 400 digits. And any other known algorithm to compute d 
     leads to computing p and q too. So quality of encryption depends on
     large integers being very hard to factor. If you figure out an
     algorithm to factor quickly, you can become rich or famous.
  
   Proof that f_dec() is inverse of f_enc requires
   Fermat's Little Theorem (proof is questions 15-17 in section 2.6):
       If p is prime and p /| a, then a^(p-1) == 1 mod p

   Proof that f_dec(f_enc(M)) = M, where M < p,q
   f_dec(f_enc(M)) = f_dec(M^e mod n) = (M^e)^d mod n = M^(e*d) mod n.
   We need to show that M^(e*d) mod n = M mod n = M, since M < p*q = n.
   Now e*d == 1 mod (p-1)*(q-1) so e*d = 1+m*(p-1)*(q-1) for some m. Then
   M^(e*d) mod n = M^(1 + m*(p-1)*(q-1)) mod n
                 = M * M^(m*(p-1)*(q-1)) mod n
   Now since M < p and M < q, and p and q are prime, we must have
   gcd(M,p) = gcd(M,q) = 1. Then Fermat's Little Theorem implies that
   M^(p-1) == 1 mod p and M^(q-1) == 1 mod q.
   Thus  M^(e*d) = M * (M^(p-1))^(m*(q-1)) 
                == M * (1)^(m*(q-1)) mod p 
                == M mod p
   and  M^(e*d) = M * (M^(q-1))^(m*(p-1)) 
               == M * (1)^(m*(p-1)) mod q 
               == M mod q.
   Finally, by the Chinese Remainder Theorem, M^(e*d) is the unique
   solution mod p*q to
     x == M mod p
     x == M mod q
   so M^(e*d) mod n = M as desired.

   For RSA to be useful, we need to find a lot of large primes.
   We will not discuss the algorithm for finding them, but just   
   discuss the theorem that says there are a lot to be found:

   Def: pi(n) = the number of primes <= n
   Ex:  pi(20) = |{2,3,5,7,11,13,17,19}| = 8
   Theorem (Prime Number Theorem): The limit as n -> infinity of
      pi(n) / (n/ log_e n) = 1

   EX:   n     pi(n)   n/log_e(n)    pi(n)/ (n/log_e n)
        10^1       4       4.3        .92
        10^2      25      21.7       1.15
        10^3     168     144.8       1.16
        10^4    1229    1085.7       1.13
        10^5    9592    8685.9       1.10
        10^6   78498   72382.4       1.08
        10^7  664579  620420.7       1.07
        10^8 5761455 5428681.0       1.06
   The point is that the ratio in the last column is slowly approaching 1

   So about what fraction of 200 decimal digit numbers are prime?
     # 200 digit primes / # 200 digit numbers
  =  ( pi(10^200) - pi(10^199) ) / (10^200 - 10^199 )
  ~  ( 10^200/log_e(10^200) - 10^199/log_e(10^199) ) / (10^200 - 10^199)
  ~  .002 or about 1 out of 500
    So if you pick 500 random 200 digit numbers, 
    there is a reasonable chance that one is prime