Math 55 - Fall 2007 - Lecture notes # 8 - Sep 14 (Friday) Goals: Sequences and Summations Big-Oh notation Reading: Secs 2.4, 3.1, 3.2, 3.3 Note: We will not cover binary search, sorting, greedy algorithms, which are covered elsewhere. But we will write simple program ("pseudocode") to use to illustrate various ideas, so this should be familiar. Def: A sequence is an ordered list (a(1), a(2), a(3), ...) of elements of some set S. A sequence may be finite or infinite Ex (1,2,3), (1,2,3,4,...), (1,2,4,8,...), (1/1, 1/2, 1/3, 1/4, ...) Def: The sum of a sequence is the sum of all member of the sequence. If the sequence is finite (a(1),a(2),...,a(n)) this is denoted SUM_{i=1}^n a(i). If the sequence is infinite (a(1),a(2),...) this is denoted SUM_{i=1}^infinity a(i). We will also sometimes use the notation SUM_{i=m}^n a(i) = a(m) + a(m+1) + ... + a(n) and SUM_{i IN K} a(i) = sum of all a(i) such that i IN K, where K is a set of indices, not necessarily all in N There are some common and important summations whose values should be easy to recognize: Ex: SUM_{i=1}^n = n*(n+1)/2 Proof: Add: s = 1 + 2 + 3 + ... + n s = n + (n-1) + (n-2) + ... + 1 to get 2*s = n+1 + n+1 + n+1 + ... + n+1 = n*(n+1) Given one summation, one can convert others to look like it: Ex: SUM_{i <= n, i odd) i , where n is odd = SUM_{j=1}^{(n+1)/2} [2*j-1] = 2*[SUM_{j=1}^((n+1)/2} j] - [SUM_{j=1}^{(n+1}/2} 1] = 2*((n+1)/2)*((n+1)/2 + 1)/2 - (n+1)/2 = (n+1)^2 / 4 Ex: SUM_{i=1}^n (a(i+1) - a(i)) = (-a(1) + a(2)) + (-a(2) + a(3)) + (-a(3) + a(4)) + ... + (-a(n) + a(n+1)) = a(n+1) - a(1) This is called a "telescoping series". Ex: If a(i) = i^2, so that a(i+1) - a(i) = 2*i+1, then (n+1)^2 - 1^2 = SUM_{i=1}^n ((i+1)^2 - i^2) = SUM_{i=1}^n (2*i+1) = 2*SUM_{i=1}^n i + n so SUM_{i=1}^n i = (n+1)^2/2 - n/2 - 1/2 = n*(n+1)/2 as before Ex: if a(i) = i^3, so that a(i+1) - a(i) = 3*i^2 + 3*i + 1 then (n+1)^3 - 1^3 = SUM_{i=1}^n ((i+1)^3 - i^3) = SUM_{i=1}^n (3*i^2 + 3*i + 1) = 3*[SUM_{i=1}^n i^2] + 3*[SUM_{i=1}^n i] + [SUM_{i=1}^n 1] = 3*[SUM_{i=1}^n i^2] + 3*n*(n+1)/2 + n so SUM_{i=1}^n i^2 = [(n+1)^3 - 3*n*(n+1)/2 - n - 1]/3 = n^3/3 + n^2/2 + n/6 Ex: Can keep going to get formulas for SUM_{i=1}^n i^k for any k, as long as you know how to multiply out (i+1)^k, but we'll have an easy approximation soon... Ex: Geometric sum: SUM_{i=0}^n r^i = ( 1 - r^{n+1} ) / ( 1-r ) if r neq 1, else n+1 Proof: When r neq 1, subtract: s = 1 + r + r^2 + ... + r^n r*s = r + r^2 + ... + r^n + r^{n+1) to get (1-r)*s = 1 - r^(n+1). Since r neq 1, divide by 1-r to get the answer. Ex: Assume |r|<1. Then SUM_{i=0}^infinity = 1/(1-r) Proof: let n -> infinity in SUM_{i=0}^n r^i = ( 1 - r^{n+1} ) / ( 1-r ), and note r^{n+1) -> 0. Ex: .9999... = 9/10 + 9/100 + ... = 9/10 * (1 + 1/10 + 1/100 + ...) = 9/10 * (1/( 1 - 1/10 )) = 1 Now we start using ideas from calculus to evaluate or approximate sums: Ex: Differentiating one sum to get another: SUM_{i=0}^n i*x^{i-1} = SUM_{i=0}^n d/dx (x^i) = d/dx SUM_{i=0}^n (x^i) = d/dx (1-x^(n+1))/(1-x) = (-(1-x)*(n+1)*x^n-(1-x^(n+1))*(-1))/(1-x)^2 = (n*x^{n+1) - (n+1)*x^n + 1)/(1-x)^2 Ex: Approximating sums by integrals: (PICTURE) Notation: "INTEGRAL_0^n" means "integral from 0 to n of f(x)" "f(x) |_0^n" means "f(n) - f(0)" INTEGRAL_0^n x^k dx < SUM_{i=1}^n i^k < INTEGRAL_0^n x^k dx + n^k x^(k+1)/(k+1) |_0^n < SUM_{i=1}^n i^k < x^(k+1}/(k+1} |_0^n + n^k n^(k+1)/(k+1) < SUM_{i=1}^n i^k < n^(k+1}/(k+1} + n^k The ratio of the upper bound to the lower bound is 1 + (k+1)/n, which gets closer to 1 as n grows, i.e. the upper and lower bounds get relatively closer together. Ex: Same idea works for any SUM_{i=1}^n f(i) where f(i) is an increasing function: INTEGRAL_0^n f(x) dx < SUM_{i=1}^n f(i) < INTEGRAL_0^n f(x) dx + f(n) ASK&WAIT: SUM_{i=1}^n log(i), (natural logarithm) Ex: Similar idea works for SUM_{i=1}^n f(i) where f(i) is decreasing: (PICTURE) INTEGRAL_1^{n+1} f(x) dx < SUM_{i=1}^n f(i) < INTEGRAL_1^{n+1} f(x) dx +f(1) ASK&WAIT: SUM_{i=1}^n 1/i It is often enough, and a lot easier, to say that a sum grows proportionally to log(n) or n*log(n) or n^(k+1) without getting exact upper and lower bounds, or even the constant of proportionality. For example, if you know Program 1 takes time proportional to n to solve a problem of size n, and Program 2 take time proportional to n^2, since n^2 grows a lot faster than n, eventually Program 1 will be faster than Program 2, even if we don't know exactly how big n has to be. This is what "Big-O" analysis is about: given some complicated function of n, we want a simple function of n that grows about as fast (up to a constant multiplier) Ex: Consider f(n) = (pi+7)/3*n^2 + n + 1 + sin(n). When n is large, n^2 is so much larger than n+1+sin(n) that we would like to ignore these terms. Furthermore, (pi+7)/3 is just a constant, and for many purposes we don't care exactly what the constant is. So we'd like some notation to say that f(n) grows like some constant time n^2: we will write "f(n) is O(n^2)". Introduce notation to mean "proportional to n", or any other g(n): DEF: Let f and g map integers (or reals) to reals. We say that f(x) is O(g(x)) (read "f is Big-O of g") if there are constants C>0 and k>=0 such that |f(x)| <= C*|g(x)| whenever x>k Intuitively, if f(x) grows as x grows, then g(x) grows at least as fast EG: f(x) = 100*x and g(x)=x^2, then for x>100, g(x)>f(x) and f(x)=O(g(x)) EG: if n>=1 then f(n) = (pi+7)/3*n^2 + n + 1 + sin(n) <= (pi+7)/3*(n^2 + n^2 + n^2 + n^2) = 4*(pi+7)/3*n^2 so f(n) = O(n^2), with k=1 and C=4*(pi+7)/3 in the definition (Draw pictures to illustrate) Remark: Sometimes we write f(x) = O(g(x)), but this is misleading notation, because f1(x) = O(g(x)) and f2(x) = O(g(x)) do not mean f1(x) = f2(x), for example x = O(x) and 2*x = O(x) In some of most important applications of O(), we never have an exact formula for f(n), such as when f(n) is the exact running time of a program. In such cases all we can hope for is a simpler function g(n) such that f(n) is O(g(n)). But to teach you how to simplify f(n) to get g(n), we will use exact expressions f(n) as examples. Goals of O() are 1) simplicity: O(n^2) is simpler than (pi+7)/3*n^2+n+1+sin(n), O(n) is simpler than actual run time of prog1 2) reasonable "accuracy": EG: Consider f(x) = (pi+7)/3*n^2 + n + 1 + sin(n) f(n) is both O(n^2) and O(n^3) ASK&WAIT: why? ASK&WAIT: Which is a "better" answer, to say f(n) is O(n^2) or O(n^3), since both are true? EX: Suppose we have two programs for the same problem and want to pick the fastest. Suppose prog1 runs in exactly time 10*n and prog2 runs in time n^2, so prog1 is clearly faster when n>10. But if we are "sloppy" and say that both run in time O(n^2), then we can't compare them. So we would like rules that make it easy to find simple and accurate g(x) so f(x)=O(g(x)) for complicated f(x) that avoid ever needing explicit values of C and k in the definition of Big-O Rule 1: if c is a constant, c*f(x) is O(f(x)) ASK&WAIT: what are C and k in definition of O() that proves this? EX given any a,b >0, we have log_a x = O(log_b x) ASK&WAIT: why? Rule 2: x^a = O(x^b) if 0 < a < b ASK&WAIT: what are C and k in definition of O() that proves this? Rule 3: If f1(x) = O(g1(x)) and f2(x) = O(g2(x)), then f1(x)+f2(x) = O(h(x)) where h(x)=max(|g1(x)|,|g2(x)|) proof: |f1(x)| <= C1*|g1(x)| for x>k1 and |f2(x)| <= C2*|g1(x)| for x>k2 means |f1(x)|+|f2(x)| <= C1*|g1(x)|+C2*|g2(x)| for x>max(k1,k2) so <= C1*h(x) +C2*h(x) for x>max(k1,k2) so <= (C1+C2)*h(x) for x>max(k1,k2) EX: let f(x) = a_k*x^k + a_{k-1}*x^{k-1} + ... + a_1*x + a_0, be a polynomial of degree k, i.e. a_k is nonzero by Rule 1 each term a_j*x^j is O(x^j), so by Rule 2 each term is O(x^k), so by Rule 3 f(x) is O(x^k) In other words, for polynomials only the term with the largest exponent matters Rule 4: If f1(x) = O(g1(x)) and f2(x) = O(g2(x)), then f1(x)*f2(x) = O(h(x)) where h(x)=g1(x)*g2(x) ASK&WAIT: what are C and k in definition of O() that proves this? EG: f(x) = (x+1)*log(x-1) = f1(x)*f2(x) = O(x)*O(log(x)) = O(x*log x) Rule 5: if f(x) = O(g(x)) and a>0, then (f(x))^a = O((g(x))^a) ASK&WAIT: what are C and k in definition of O() that proves this? Rule 6: (log_c x)^a = O(x^b) for any a>0, b>0, c>0 in fact, the limit as x increases of (log_c x)^a / x^b is zero Proof: By rules 1 and 5 we can assume c is any convenient constant, say e=2.71828... If we can show log x = O(x^(b/a)) for any b,a>0, then taking the a-th power yields (log x)^a = O(x^b) (Rule 5) So try to show log x = O(x^d) for any d>0 First we will show that as x increases, f(x) = log x / x^d decreases, once x is large enough: Differentiate f(x), and show that for x large enough, f'(x)<0: f'(x) = (x^d * (1/x) - d*x^(d-1)*log x)/x^(2*d) = x^(-d-1)*(1 - log x^d ) < 0 if x > e^(1/d) Now we show that f(x) actually goes to zero as x increases. Since f(x) is decreasing, it is enough to show that there is an increasing sequence of values x1,x2,x3,... such that f(xi) -> 0 Let x(i) = e^(i/d). Then f(x(i+1)) = log e^((i+1)/d) / e^(i+1) = log e^((i/d)*(i+1)/i) / ( e * e^i ) = ((i+1)/i)/e * log e^(i/d) / e^i = ((i+1)/i)/e) * f(x(i)) <= (2/e) * f(x(i)) because (i+1)/i takes values 2 > 3/2 > 4/3 > ... < .74 * f(x(i)) < .74^2 * f(x(i-1)) ... < .74^i * f(x(1)) = .74^i * (1/(d*e)), which goes to zero as i increases ASK&WAIT: simplify O(log x + sqrt(x)) ASK&WAIT: simplify O((log n)^1000 + n^.001) Rule 7: x^a = O(b^x) for any a>0, b>1 in fact, the limit as x increases of x^a / b^x is zero Proof: Just take logarithms (base 2, say), apply Rule 6: log(x^a) = a*log x and log(b^x) = x*log b; we know that for large enough x, x*log b is arbitrarily larger than a*log x, so 2^(x*log b)=b^x is arbitrarily larger than 2^(a*log x) = x^a ASK&WAIT: Simplify O(n^(10^100) + (1.0000001)^n) ASK&WAIT: Simplify O((n+1)*log(sqrt(4n-2)) + log((n!)^2)) ASK&WAIT: Simplify O(n^(log n) + 1.1^(sqrt n)) ASK&WAIT: simplify O((log n)^(log n) - n^(log log n)) ASK&WAIT: simplify O((2^n + n^3*log n)*(n^4 + (log n)^2) + 1.5^n*n^5) Some more definitions related to Big-O DEF: We say f(x) is Big-Omega(g(x)) if g(x) is O(f(x)) Motivation: use g(x)=O(f(x)) to say a constant*f(x) is an upper bound on g(x) use f(x)=Big-Omega(g(x)) to say a constant*g(x) is a lower bound on f(x) DEF: We say f(x) and g(x) are of the "same order" (or f(x) = BIG_THETA( g(x) )) if f(x) = O(g(x)) and f(x) = Big-Omega(g(x)), i.e. for x > k, there are constants C1>0 and C2>0 such that C1*g(x) <= f(x) <= C2*g(x) EX: 2*n^2 + n + 1 = O(n^3) but BIG_THETA(n^2), because n^2 <= 2*n^2 + n + 1 <= 4*n^2 for n>=1