CS70 - Lecture 5 - Jan 28, 2011 - 10 Evans

Today, homework was due at 8am. Henceforth, due 6:30pm Thursdays, when Soda closes.

We are increasing sections sizes, waitlist down to 3.

Section 105 (T 3-4) still has 4 slots. 

Recall induction for proving "forall n in N (natural numbers), P(n)":

   Base case: prove P(0) by itself

   Induction step: prove P(n) -> P(n+1)

Simple variation: to prove "forall positive n in N, P(n)",

just change the base case to P(1). 

Warmup: prove forall n>1,   (1 - 1/2)*(1 - 1/3)*(1 - 1/4)*…*(1 - 1/n) = 1/n

     i.e. P(n) = "(1 - 1/2)*…*(1 - 1/n) = 1/n"

Base case, for n=2:  P(2) = (1 - 1/2) = 1/2"  - true!

Induction step:

    P(n) -> (1 - 1/2)*…*(1 - 1/n) = 1/n 

            -> (1 - 1/2)*…*(1 - 1/n)*(1 - 1/(n+1)) = 1/n * ( 1 - 1/(n+1)) = (1/n)*(n+1-1)/(n+1) = 1/(n+1)

            -> P(n+1) as desired


   (1 - 1/2)*(1 - 1/3)*(1 - 1/4)*…*(1 - 1/n)

= (1/2)*(2/3)*(3/4)*(4/5)*…*((n-1)/n)          

       everything cancels except the first and last factors, 1 and n

= 1/n

EG: Let's prove something harder that we can only do with induction, the "2 Color Theorem"

 P(n) = "if you draw n lines in the plane, dividing the plane into regions with the lines

              as borders, then you can color each region either Red or Blue so that

              no two regions sharing a border have the same color. (Sharing a border means

              they share a line or line segment as a border, not just a point.) 

Proof by induction on n = the number of lines

  Base case: P(1): clearly true (one side of the line is Red, the other Blue)

  Induction step: Suppose P(n) is true, we need to show P(n+1) is true:

       Given n+1 lines, take just n of them. So since P(n) is true we know that

       just these n lines divide the planes into regions that we can color Red and Blue.

       Now add line n+1 and think about what happens: it will pass through some of the

       regions and not others. Pick one side of line n+1:  If we take the R and B regions

       on this one side, and swap their colors, they still have the property that two

       regions sharing a border have different colors. But now a single region divided

       by line n+1 will have different colors on either side of line n+1, so we are done.

Note: The 2-color theorem is an easy special case of a much harder result, called

the 4-color theorem: This says that if you draw any curves in the plane (not just lines)

to divide the plane into regions, you only need at most 4 colors to color the regions

so that no two regions sharing a border (not just a point) have the same color. This

was conjectured in 1852, and "proven" several times (with errors in the proofs

discovered years later), until a real "proof" was published in 1976. But it used a computer

to check an enormous number of cases, so no one could read it and understand it,

one had to believe the computer program was correct and had been executed correctly.

Eventually a human-readable (but still very long) proof was published in 1996.

EG: Consider a checkerboard of size 2^n by 2^n squares. Suppose we have a lot

of L-shaped tiles that cover exactly 3 squares each. The problem is to prove

   P(n) = "It is possible to cover the 2^n by 2^n checkerboard with nonoverlapping

                L-shaped tiles so that all squares but one are completely covered."

This is one of those cases where it is easier to prove something that looks harder;

we will add the condition to P(n):

    "Furthermore, it is possible for the uncovered square to be any of the (2^n)^2 squares

     in the checkerboard."

The base case, P(1), is a 2x2 checkerboard, which is easy to confirm.

The induction step goes as follows: Assume P(n) is true. To prove P(n+1),

consider the 2^(n+1)x2^(n+1) checkerboard as 4 smaller checkerboards

of size 2^nx2^n touching at one corner, in the center of the larger checkerboard.

The square to be left uncovered is in one of these 2^nx2^n checkerboards,

which we know we can cover with L-shaped tiles since P(n) applies to it.

Now consider the other three 2^nx2^n checkerboards, which themselves form

a (large) L-shaped region. Mark the 3 1x1 squares at the inner angle of this

large L; each one lies in a different 2^nx2^n checkerboard. Applying P(n)

to each of these three 2^nx2^n checkerboards, we can cover all but the

marked square it contains using small L-shaped tiles. One more small

L-shaped tile covers the 3 marked 1x1 squares, completing the covering

and the proof. 

EG Here is a bogus proof: What is wrong with it?

Theorem: All horses have same color, i.e. the proposition

  P(n) = "in any set of n horses, they all have the same color" is true

Base case: P(1) is obviously true, since these is just one horse

Induction step: To prove P(n+1) given P(n) being true,

take the n+1 horses and call then h(1),…,h(n+1). Now apply

P(n) to the sets {h(1),…,h(n)} and {h(2),…,h(n+1)} to conclude

that h(1),…,h(n) all have the same color, and

h(2),…h(n+2) all have the same color. Since these two subsets

overlap in h(2), we conclude that all of h(1),…,h(n+1) have the same color.

EG:  Theorem. There are infinitely many primes.  Tried this before, but

it didn't quite work, let's try again.

Lemma: Every integer n>1 is either prime or can be written as the product of primes.

Assuming this for a moment, let's prove the Theorem. We assume it is false,

that there only finitely many primes, call them p(1),…,p(k), and get a contradiction: 

Let N = p(1)*…*p(k)+1, as we did before. Then N is not divisible by any of p(1),…,p(k),

since you get a remainder of 1. But by the Lemma, N is either prime itself, or a

product of other primes other than p(1),…,p(k). Either way we get a contradiction.

To prove the Lemma, we will use a different form of induction than before called

strong induction: to prove "forall n P(n)", we will

   prove a base case, P(2) in our case

   show that ( P(2) and P(3) and … and P(n) ) -> P(n+1)

This differs from the induction we did before, because we need to use

all of P(2), P(3),…,P(n) to prove P(n+1), not just P(n)

Returning to our Lemma, suppose we have proven

    P(k) = "either k is a prime or a product of primes"

for k = 2, 3, … n, and now we want to prove P(n+1):

   Case 1:  n+1 is prime:  then the Lemma is true

   Case 2:  n+1 is composite: then we can write n+1 = r*s, where r and s

                   are some integers > 1, and < n+1. So we know P( r ) and P(s) are true.

                   So (by induction), either r is prime or a product of primes, and the same holds for s.

                   So the product r*s is the product of primes, and the Lemma is also true as desired.

EG: Fibonacci numbers: Let F(0)=0, F(1)=1 and F(n)=F(n-1)+F(n-2).

So F(0,1,2,…) = 0,1,1,2,3,5,8,13,21,...

Theorem: Let x_+ = (1+sqrt(5))/2 ~ 1.6 and x_- = (1-sqrt(5))/2 ~ -.6

 Then F(n) = ( x_+^n - x_-^n )/sqrt(5) ~ 1.6^n / sqrt(5)

  grows exponentially fast

Proof by induction:

  P(n) = " F(n) = ( x_+^n - x_-^n )/sqrt(5)"

  Bases case(s): Check P(0) and P(1) are true,

   i.e. that the formula yields F(0) = 0 and F(1) = 1 as desired

  Induction step: show that P(n-2) and P(n-1) -> P(n): 

    P(n-2) and P(n-1) ->

      F(n-2)+F(n-1) = ( x_+^(n-2) - x_-^(n-2) )/sqrt(5) + (x_+^(n-1) - x_-^(n-1))/sqrt(5)

                               = … = ( x_+^n - x_-^n)/sqrt(5)

                               = F(n)   ->   P(n)

Consider following 2 algorithms for computing F(n):

   func F1(n)

       if n=0 return 0

       elseif n = 1 return 1


            x = 0, y = 1, 

            for i= 2 to n

                tmp = y,  y = x+y, x=tmp  

(Note: bug in class notes; what does the function there compute instead?)


  func F2(n)

      if n=0 return 0

      else if n= 1 return 1

      else return F2(n-1) + F2(n-2)

Which algorithm is faster? More simply: how many additions does each one perform?

   Let A1(n) = #additions_in_F1(n) = ?

   Let A2(n) = #additions_in_F2(n) = 1 + A2(n-1) + A2(n-2)

   So A2(0,1,2,…) = 0, 0, 1, 2, 4, 7, 12, 20,...

   What is the relationship between A2(n) and F(n)?

   Looks like A2(n) = F(n+1)-1.

   Proof by induction:  A2(n) = 1 + A2(n-1) + A2(n-2)

                                                  = 1 + (F(n)-1) + (F(n-1)-1)     … by induction

                                                  = F(n) + F(n-1) -1

                                                  = F(n+1) -1   … as desired!

Looks like F1(n) is *much* faster than F2(n)

How would you ever guess the formula for F(n)? Exactly: guess!

Try F(n) = x^n and see what x has to be for this to be true:

    F(n) = x^n = F(n-1) + F(n-2) = x^(n-1) + x^(n-2)

or            x^n = x^(n-1) + x^(n-2)

or            x^2 = x + 1

or            x = (1+sqrt(5))/2 = x_+   or   x = (1-sqrt(5))/2 = x_-

So both F(n) = x_+^n and F(n) = x_-^n  satisfy F(n) = F(n-1)+F(n-2).

But neither satisfies F(0)=0 and F(1) = 1: what to do?

Note that for any constants r and s, F(n) = r*x_+^n + s*x_-^n

also satisfies F(n) = F(n-1) + F(n-2), so we can pick the 2 constants r and s

to satisfy the 2 constraints F(0)=0 and F(1) = 1 (or F(0)=7 and F(1) = -pi, whatever we like).

The same "guessing" procedure works for similar recurrences, like

    G(n) = 2*G(n-1) - 7*G(n-2), G(0) = 2, G(1) = -3

Try plugging in G(n) = x^n, solve a quadratic for two values of x, etc.

What do you think happens with

    H(n) = 3*H(n-1) + 2*H(n-2) - H(n-3)?

Such "linear recurrences" occur commonly, eg in analyzing signal processing.

Next example of using induction to analyze and algorithm, this time for sorting.

One of the fastest algorithms is called quicksort, and it works like this.

 function quicksort(n, A)

    … input is  array A of n numbers

    … output is array of these n numbers sorted in increasing order

    if (n=0 or n=1) 



       pick a random number 1 <= i <= n

       reorder the entries of A so that

           the initial entries are all <= A(i) (say there are m of them)

           the next entry = A(i)

           the remaining entries are > A(i)

        return S = [quicksort(m,A),A(i),quicksort(n-m-1,A(m+1:n))]


The function quicksort is recursive, and we will prove it correctly

sorts by using induction on the length of the array being sorted

P(n) = "quicksort correctly sorts an input array of length n"

Base cases: P(0) and P(1) work because the algorithm doesn't have to do anything

Induction step: We assume P(0) and P(1) and … and P(n) and prove P(n+1):

after reordering array A of length n+1, it is partitioned into 3 subsets:

(1)    entries <= A(i), except A(i) itself

(2)    A(i)

(3)     entries > A(i)

Obviously, if we correctly sort subsets (1) and (3), the whole array will be sorted.

Quicksort correctly sorts these subsets because their lengths are at most n,

since they don't contain A(i).