Math 55 - Fall 2007 - Lecture notes # 8 - Sep 14 (Friday)

   Goals: Sequences and Summations
          Big-Oh notation
   Reading: Secs 2.4, 3.1, 3.2, 3.3
     Note: We will not cover binary search, sorting, 
           greedy algorithms, which are covered elsewhere.
           But we will write simple program ("pseudocode")
           to use to illustrate various ideas, so this should
           be familiar. 


Def: A sequence is an ordered list (a(1), a(2), a(3), ...) 
     of elements of some set S. A sequence may be finite or infinite

Ex (1,2,3),  (1,2,3,4,...),  (1,2,4,8,...), (1/1, 1/2, 1/3, 1/4, ...)

Def: The sum of a sequence is the sum of all member of the sequence.
     If the sequence is finite (a(1),a(2),...,a(n)) this is 
     denoted SUM_{i=1}^n  a(i).
     If the sequence is infinite (a(1),a(2),...) this is 
     denoted SUM_{i=1}^infinity  a(i).

We will also sometimes use the notation
      SUM_{i=m}^n a(i) = a(m) + a(m+1) + ... + a(n)
and
      SUM_{i IN K} a(i) = sum of all a(i) such that i IN K, 
        where K is a set of indices, not necessarily all in N

There are some common and important summations whose values
should be easy to recognize:

Ex: SUM_{i=1}^n = n*(n+1)/2
    Proof: Add:
                 s =  1  +   2   +   3   + ... +  n
                 s =  n  + (n-1) + (n-2) + ... +  1
       to get  2*s = n+1 +  n+1  +  n+1  + ... + n+1 = n*(n+1)

Given one summation, one can convert others to look like it:
Ex: SUM_{i <= n, i odd) i , where n is odd
    = SUM_{j=1}^{(n+1)/2}  [2*j-1]
    = 2*[SUM_{j=1}^((n+1)/2} j] - [SUM_{j=1}^{(n+1}/2} 1]
    = 2*((n+1)/2)*((n+1)/2 + 1)/2 -  (n+1)/2
    = (n+1)^2 / 4

Ex: SUM_{i=1}^n (a(i+1) - a(i))
    = (-a(1) + a(2)) + (-a(2) + a(3)) + (-a(3) + a(4)) + ... + (-a(n) + a(n+1))
    = a(n+1) - a(1)
    This is called a "telescoping series".

Ex: If a(i) = i^2, so that a(i+1) - a(i) = 2*i+1, then
    (n+1)^2 - 1^2 = SUM_{i=1}^n ((i+1)^2 - i^2)
                  = SUM_{i=1}^n (2*i+1) 
                  = 2*SUM_{i=1}^n i + n 
    so SUM_{i=1}^n i = (n+1)^2/2 - n/2 - 1/2 = n*(n+1)/2  as before

Ex: if a(i) = i^3, so that a(i+1) - a(i) = 3*i^2 + 3*i + 1 then
    (n+1)^3 - 1^3 = SUM_{i=1}^n  ((i+1)^3 - i^3)
                  = SUM_{i=1}^n  (3*i^2 + 3*i + 1)
                  = 3*[SUM_{i=1}^n i^2] + 3*[SUM_{i=1}^n i] + [SUM_{i=1}^n 1]
                  = 3*[SUM_{i=1}^n i^2] + 3*n*(n+1)/2 + n
    so SUM_{i=1}^n i^2 = [(n+1)^3 - 3*n*(n+1)/2 - n - 1]/3
                       = n^3/3 + n^2/2 + n/6

Ex: Can keep going to get formulas for SUM_{i=1}^n i^k for any k,
as long as you know how to multiply out (i+1)^k,
but we'll have an easy approximation soon...

Ex: Geometric sum:
    SUM_{i=0}^n r^i = ( 1 - r^{n+1} ) / ( 1-r ) if r neq 1, else n+1
    Proof: When r neq 1, subtract:
            s = 1 + r + r^2 + ... + r^n
          r*s =     r + r^2 + ... + r^n + r^{n+1)
    to get
      (1-r)*s = 1 - r^(n+1).
    Since r neq 1, divide by 1-r to get the answer.

Ex: Assume |r|<1. Then
    SUM_{i=0}^infinity = 1/(1-r)
    Proof: let n -> infinity in SUM_{i=0}^n r^i = ( 1 - r^{n+1} ) / ( 1-r ),
           and note r^{n+1) -> 0.

Ex: .9999... = 9/10 + 9/100 + ...
             = 9/10 * (1 + 1/10 + 1/100 + ...)
             = 9/10 * (1/( 1 - 1/10 ))
             = 1

Now we start using ideas from calculus to evaluate or approximate sums:

Ex: Differentiating one sum to get another:
    SUM_{i=0}^n i*x^{i-1} = SUM_{i=0}^n d/dx (x^i)
                          = d/dx SUM_{i=0}^n (x^i)
                          = d/dx (1-x^(n+1))/(1-x)
                          = (-(1-x)*(n+1)*x^n-(1-x^(n+1))*(-1))/(1-x)^2
                          = (n*x^{n+1) - (n+1)*x^n + 1)/(1-x)^2

Ex: Approximating sums by integrals: (PICTURE)
    Notation: "INTEGRAL_0^n" means "integral from 0 to n of f(x)"
              "f(x) |_0^n" means "f(n) - f(0)"
    INTEGRAL_0^n x^k dx < SUM_{i=1}^n i^k < INTEGRAL_0^n x^k dx + n^k
    x^(k+1)/(k+1) |_0^n < SUM_{i=1}^n i^k < x^(k+1}/(k+1} |_0^n + n^k
    n^(k+1)/(k+1)       < SUM_{i=1}^n i^k < n^(k+1}/(k+1}  + n^k

    The ratio of the upper bound to the lower bound is 1 + (k+1)/n,
    which gets closer to 1 as n grows, i.e. the upper and lower
    bounds get relatively closer together.

Ex: Same idea works for any SUM_{i=1}^n f(i) where f(i) is an increasing
    function:
    INTEGRAL_0^n f(x) dx < SUM_{i=1}^n f(i) < INTEGRAL_0^n f(x) dx + f(n)

ASK&WAIT: SUM_{i=1}^n log(i),  (natural logarithm)

Ex: Similar idea works for SUM_{i=1}^n f(i) where f(i) is decreasing: (PICTURE)
    INTEGRAL_1^{n+1} f(x) dx < SUM_{i=1}^n f(i) < INTEGRAL_1^{n+1} f(x) dx +f(1)

ASK&WAIT: SUM_{i=1}^n 1/i

It is often enough, and a lot easier, to say that a sum grows 
proportionally to  log(n) or n*log(n) or n^(k+1) without getting 
exact upper and lower bounds, or even the constant of proportionality.
For example, if you know Program 1 takes time proportional to
n to solve a problem of size n, and Program 2 take time proportional 
to n^2, since n^2 grows a lot faster than n, eventually
Program 1 will be faster than Program 2, even if we don't know
exactly how big n has to be.  

This is what "Big-O" analysis is about: given some complicated
function of n, we want a simple function of n that grows about
as fast (up to a constant multiplier)


   Ex: Consider f(n) = (pi+7)/3*n^2 + n + 1 + sin(n). When n is large,
       n^2 is so much larger than n+1+sin(n) that we would like to ignore
       these terms. Furthermore, (pi+7)/3 is just a constant, and
       for many purposes we don't care exactly what the constant is.
       So we'd like some notation to say that f(n) grows like
       some constant time n^2: we will write "f(n) is O(n^2)".

       Introduce notation to mean "proportional to n", or any other g(n):
       DEF: Let f and g map integers (or reals) to reals. We say that
            f(x) is O(g(x)) (read "f is Big-O of g") if there are 
            constants C>0 and k>=0 such that |f(x)| <= C*|g(x)| whenever x>k
       Intuitively, if f(x) grows as x grows, then g(x) grows at least as fast
       EG: f(x) = 100*x and g(x)=x^2, then for x>100, g(x)>f(x) and f(x)=O(g(x))
       EG:  if n>=1 then
            f(n) = (pi+7)/3*n^2 + n + 1 + sin(n)
                <= (pi+7)/3*(n^2 + n^2 + n^2 + n^2)
                 = 4*(pi+7)/3*n^2
           so f(n) = O(n^2), with k=1 and C=4*(pi+7)/3 in the definition

       (Draw pictures to illustrate)

       Remark: Sometimes we write f(x) = O(g(x)), but this is misleading
               notation, because f1(x) = O(g(x)) and f2(x) = O(g(x))
               do not mean f1(x) = f2(x), for example 
               x = O(x) and 2*x = O(x)

       In some of most important applications of O(), we never have an 
       exact formula for f(n), such as when f(n) is the exact running 
       time of a program.  In such cases all we can hope for is a 
       simpler function g(n) such that f(n) is O(g(n)).
       But to teach you how to simplify f(n) to get g(n), 
       we will use exact expressions f(n) as examples.

       Goals of O() are
        1) simplicity: O(n^2) is simpler than (pi+7)/3*n^2+n+1+sin(n),
                       O(n) is simpler than actual run time of prog1
        2) reasonable "accuracy": 
       EG: Consider f(x) = (pi+7)/3*n^2 + n + 1 + sin(n) 
           f(n) is both O(n^2) and O(n^3) 
ASK&WAIT: why?
ASK&WAIT:  Which is a "better" answer, to say f(n) is O(n^2) or O(n^3), 
           since both are true?
  EX:  Suppose we have two programs for the same problem and want to
       pick the fastest. Suppose prog1 runs in exactly time 10*n and prog2
       runs in time n^2, so prog1 is clearly faster when n>10. But if we are
       "sloppy" and say that both run in time O(n^2), then we can't
       compare them.

       So we would like rules that make it easy to find simple 
       and accurate g(x) so f(x)=O(g(x)) for complicated f(x)
       that avoid ever needing explicit values of C and k in 
       the definition of Big-O

       Rule 1: if c is a constant, c*f(x) is O(f(x))
ASK&WAIT: what are C and k in definition of O() that proves this?
       EX given any a,b >0, we have log_a x = O(log_b x)
ASK&WAIT: why?

       Rule 2: x^a = O(x^b) if 0 < a < b
ASK&WAIT: what are C and k in definition of O() that proves this?

       Rule 3: If f1(x) = O(g1(x)) and f2(x) = O(g2(x)), then
                  f1(x)+f2(x) = O(h(x)) where h(x)=max(|g1(x)|,|g2(x)|)
               proof: |f1(x)| <= C1*|g1(x)| for x>k1 and
                      |f2(x)| <= C2*|g1(x)| for x>k2 means
              |f1(x)|+|f2(x)| <= C1*|g1(x)|+C2*|g2(x)| for x>max(k1,k2) so
                              <= C1*h(x)   +C2*h(x)    for x>max(k1,k2) so
                              <= (C1+C2)*h(x)          for x>max(k1,k2)
     EX: let f(x) = a_k*x^k + a_{k-1}*x^{k-1} + ... + a_1*x + a_0, 
         be a polynomial of degree k, i.e. a_k is nonzero
           by Rule 1 each term a_j*x^j is O(x^j), so 
           by Rule 2 each term is O(x^k), so
           by Rule 3 f(x) is O(x^k)
           In other words, for polynomials only the term with the
              largest exponent matters

       Rule 4: If f1(x) = O(g1(x)) and f2(x) = O(g2(x)), then
                  f1(x)*f2(x) = O(h(x)) where h(x)=g1(x)*g2(x)
ASK&WAIT: what are C and k in definition of O() that proves this?

       EG: f(x) = (x+1)*log(x-1) = f1(x)*f2(x)
                = O(x)*O(log(x)) = O(x*log x)

       Rule 5: if f(x) = O(g(x)) and a>0, then (f(x))^a = O((g(x))^a)
ASK&WAIT: what are C and k in definition of O() that proves this?
            
       Rule 6: (log_c x)^a = O(x^b) for any a>0, b>0, c>0
               in fact, the limit as x increases of (log_c x)^a / x^b is zero
       Proof: By rules 1 and 5 we can assume c is any convenient constant, 
              say e=2.71828...
              If we can show log x = O(x^(b/a)) for any b,a>0, then taking
                the a-th power yields (log x)^a = O(x^b) (Rule 5)
              So try to show log x = O(x^d) for any d>0
              First we will show that as x increases, 
                 f(x) = log x / x^d decreases, once x is large enough:
              Differentiate f(x), and show that for x large enough, f'(x)<0:
                 f'(x) = (x^d * (1/x) - d*x^(d-1)*log x)/x^(2*d)
                       = x^(-d-1)*(1 - log x^d )
                       < 0 if x > e^(1/d)
              Now we show that f(x) actually goes to zero as x increases.
              Since f(x) is decreasing, it is enough to show that there is
              an increasing sequence of values x1,x2,x3,... such that f(xi) -> 0
              Let x(i) = e^(i/d). Then
                 f(x(i+1)) = log e^((i+1)/d) / e^(i+1)
                           = log e^((i/d)*(i+1)/i) / ( e * e^i )
                           = ((i+1)/i)/e * log e^(i/d) / e^i
                           = ((i+1)/i)/e) * f(x(i))
                           <= (2/e) * f(x(i)) 
                                because (i+1)/i takes values 2 > 3/2 > 4/3 > ...
                           <  .74 * f(x(i))
                           <  .74^2 * f(x(i-1)) ...
                           <  .74^i * f(x(1))
                           =  .74^i * (1/(d*e)), 
                              which goes to zero as i increases
              
ASK&WAIT:  simplify O(log x + sqrt(x))
ASK&WAIT:  simplify O((log n)^1000 + n^.001)

       Rule 7: x^a = O(b^x) for any a>0, b>1
               in fact, the limit as x increases of x^a / b^x is zero
           Proof: Just take logarithms (base 2, say), apply Rule 6:
           log(x^a) = a*log x and log(b^x) = x*log b; we know that
           for large enough x, x*log b is arbitrarily larger than a*log x,
           so 2^(x*log b)=b^x is arbitrarily larger than 2^(a*log x) = x^a

ASK&WAIT:  Simplify O(n^(10^100) + (1.0000001)^n)
ASK&WAIT:  Simplify O((n+1)*log(sqrt(4n-2)) + log((n!)^2))
ASK&WAIT:  Simplify O(n^(log n) + 1.1^(sqrt n))
ASK&WAIT:  simplify O((log n)^(log n) - n^(log log n))
ASK&WAIT:  simplify O((2^n + n^3*log n)*(n^4 + (log n)^2) + 1.5^n*n^5)

 Some more definitions related to Big-O

 DEF: We say f(x) is Big-Omega(g(x)) if g(x) is O(f(x))

 Motivation: use g(x)=O(f(x)) to say a constant*f(x) is an 
             upper bound on g(x)
             use f(x)=Big-Omega(g(x)) to say a constant*g(x) is a
             lower bound on f(x)

 DEF: We say f(x) and g(x) are of the "same order" (or f(x) = BIG_THETA( g(x) ))
      if f(x) = O(g(x)) and f(x) = Big-Omega(g(x)), i.e. for x > k,
      there are constants C1>0 and C2>0 such that
      C1*g(x) <= f(x) <= C2*g(x)

 EX:  2*n^2 + n + 1 = O(n^3) but BIG_THETA(n^2), because
            n^2 <= 2*n^2 + n + 1 <= 4*n^2    for n>=1