Math 128a - Spring 2002 - Lecture 6 - Feb 7 (Thursday) Goals: Continue Studying methods for solving equations f(x)=0 There are some simple cases where we can establish that Newton converges in a large interval: Definition: A function is convex on an interval I if f''(x)>0 on I Theorem: Suppose 1) f has 2 continuous derivatives on the real axis 2) f'(x) > 0 on the real axis (f strictly increasing) 3) f''(x) > 0 on the real axis (f convex) 4) f(z) = 0 for some real value (draw picture of typical function f) Then Newton converges to z from any real starting point x(1) (picture) Example: Suppose f(x) is polynomial, with all coeffs >= 0, except the constant, which is <0. Then above theorem applies to show that Newton converges from any positive starting point. This is the case in computing rates of return on investments (and done in financial calculators, eg HP12C) (see homework!) Examples: Newton to compute the square root (used in some computers to implement for sqrt given only addition, multiplication, division solve f(x) = x^2 - a = 0; f'(x) = 2*x so Newton is x -f(x)/f'(x) = x-(x^2-a)/(2*x) = .5*(x+a/x) Newton to compute reciprocal (used in some computers to implement division given only addition, subtraction and multiplication Try solving f(x) = x*a - 1 = 0; get x(n+1) = 1/a - one step convergence but need to be able to divide! Try solving f(x) = a - 1/x; get x(n+1) = x(n)*(2-x(n)*a) only requires addition, multiplication, so ok Note: neither method alone is good enough to get result correctly rounded (rounded to the nearest floating point number) as IEEE requires, but they can be fixed up Next Algorithm: Secant method Advantage over Newton: don't need to evaluate f'(x), which may be expensive or impossible (may only be given a "black box" subroutine for f(x), that we can't differentiate) If the most expensive operation is evaluating f and f', then secant is faster Idea: if we can't evaluate f'(x(n)), what is a good approximation? Use the slope of the line between the last two points on the curve: f'(x(n)) ~ [ f(x(n)) - f(x(n-1)) ] / [ x(n) - x(n-1) ] so x(n+1) = x(n) - f(x(n))*[x(n)-x(n-1)]/[f(x(n))-f(x(n-1))] pick two starting values x(1) and x(2) compute f(1) = f(x(1)) and f(2) = f(x(2)) n = 2 while (|f(x(n))| > tol or ... ) x(n+1) = x(n) - f(n)*[x(n)-x(n-1)]/[f(n)-f(n-1)] f(n+1) = f(x(n+1)) n=n+1 end (picture) Convergence proof: we will show that the order of this method is p = (1+sqrt(5))/2 ~ 1.62, i.e. |x(n) - z| = O(|x(n-1) - z|^p) Let e(n+1) = x(n+1) - z. Then e(n+1) = e(n) - f(x(n))*[x(n)-x(n-1)]/[f(x(n))-f(x(n-1))] = e(n) - f(x(n))*(e(n)-e(n-1)]/[f(x(n))-f(x(n-1))] ~ e(n) - f'(z)*e(n) * ( e(n)-e(n-1) ) / [f'(z)*e(n) + f''(z)*e(n)^2/2 - f'(z)*e(n-1) - f''(z)*e(n-1)^2/2 ] = e(n)*[ 1 - f'(z) * (e(n)-e(n-1)) / [f'(z)*(e(n)-e(n-1)) + f''(z)/2*(e(n)^2-e(n-1)^2)]] = e(n)*[ 1 - f'(z)/[f'(z) + f''(z)/2*(e(n)+e(n-1)) ] = e(n)*[ 1 - 1/(1 + (f''(z)/2/f'(z))*(e(n)+e(n-1)) ] ~ e(n)*[ 1 - 1/(1 + (f''(z)/2/f'(z))*e(n-1) ] since we assume |e(n)| << |e(n-1)| near convergence ~ e(n)*[ 1 - ( 1 - (f''(z)/2/f'(z))*e(n-1) ) ] = (f''(z)/2/f'(z)) * e(n)*e(n-1) Now let us analyze the convergence of a sequence of the form e(n+1) = C*e(n)*e(n-1) We can "guess" that e(n+1) = c*e(n)^a, plug in and get e(n+1) = c*e(n)^a = c*(c*e(n-1)^a)^a = c^(1+a)*e(n-1)^(a^2) so c^(1+a)*e(n-1)^(a^2) = C*c*e(n-1)^a*e(n-1)) or c/C = e(n-1)^(1+a-a^2) Since the LHS is independent of n, the RHS must also be so 1+a-a^2 = 0 or a=(1+sqrt(5))/2 ~ 1.62 (The other root of the quadratic is -.62, corresponding to e(n) -> infinity, which is not relevant) A related approach, with a connection to Fibonacci numbers: Take logarithms to get le(n+1) = log(C) + le(n) + le(n-1) Write le'(n) = le(n) + log(C) to get le'(n+1) = le'(n) + le'(n-1) the Fibonacci recurrence How do we solve Fibonacci recurrence? write le'(n) = p^n, plug in to get p^(n-1) = p^n + p^(n-1) divide by p^(n-1) to get p^2 = p + 1 or p+- = (1 +- sqrt(5))/2 ~ 1.62, -.62 so the general solution is le'(n) = c1 * (p+)^n + c2 * (p-)^n where we choose c1 and c2 to agree with le'(0) and le'(1). Since |p-| < 1, the second term goes to zero, and we get le'(n) ~ c1 * 1.62^n and e(n) ~ 2^(c1 * 1.62^n)/C ~ c^(1.62^n) / C where c = 2^c1 This seems to converge more slowly than Newton, so why might it be faster? Suppose each evaluation of f and f' is equally (and very) expensive. Then in the same time we can do one Newton iteration we can do two secant steps. This means that the error e(n) from secant decreases to e(n+2) = O(e(n)^(p^2)) = O(e(n)^((3+sqrt(5))/2)) = O(e(n)^2.62) in the same time Newton's error decreases to e(n+1) = O(e(n)^2) Both very fast in any event: Order p=2 => error decreases like 1e-1, 1e-2, 1e-4, 1e-8, 1e-16 Order p=1.62 => error decreases like 1e-1, 2e-2, 2e-3, 6e-5, 1e-7, 8e-12, 1e-18 What is best to use to find a zero of a function of a single variable? The best available methods are hybrids, using sign changes if they are found ('help fzero') in matlab, or specialized to certain functional forms, like polynomaisl (Laguerre's method, which is cubically convergent, provides guaranteed error bounds (modulo round off), and converges monotonically if all the roots are real. The Matlab roots function finds roots of polynomials by constructing a matrix whose eigenvalues are the roots of the polynomials, and using their eigenvalue routine. This may sound like reducing to a more difficult problem (and a more expensive one, since finding the eigenvalues costs O(n^3), in contrast to probably O(n^2) for the methods we have been talking about). But the eigenvalue routine is so reliable, and their concern about performance low enough, that they choose to use it. For further information see, www.cs.berkeley.edu/~demmel/Math128/RealRoots.pdf Last topic on finding zeros of a function of a single variable: Several of these methods, notably Newton, are of the form x(n+1) = F(x(n)) where in the case of Newton F(x) = x - f(x)/f'(x). Let us ask under what general conditions an iteration x(n+1)=F(x(n)) converges. If F() is continuous, and x(n) -> x, then x = F(x) and x is called a "fixed point" of F. x(n+1)=F(x(n)) is called "fixed point iteration". There are many theorems in mathematics about when a function F() has a fixed point, and whether fixed-point iteration will find it. Here is a basic definition and theorem: Definition. F is called a contraction if |F(x)-F(y)| <= c*|x-y| where 0<= c < 1 In words, applying F makes points x and y get closer together, by at least a factor c<1. Ex: F(x) = x/2 + 10. Then |F(x)-F(y)| = |x-y|/2 Theorem (Contraction Mapping Theorem): Let C be a closed interval [a,b], and suppose F maps C to C. Then if F is a contraction, it has a unique fixed point x in C, and fixed point iteration x(n+1) = F(x(n)) converges to it. Furthermore, if |F(z)-F(y)| <= c * |z-y| then |x-x(n)| <= |x(n+1)-x(n)| / (1-c) is an error bound. Proof: |x(n+1)-x(n)| = |F(x(n)) - F(x(n-1))| <= c |x(n) - x(n-1)| so by induction |x(n+1)-x(n)| <= c^n * |x(1) - x(0)| which goes to 0, since 0 <= c < 1. Now x(n) - x(0) = sum from i=0 to n-1 x(i+1)-x(i) and we can ask whether this telescoping sum converges; it will if it converges "absolutely", i.e. the following converges: sum from i=0 to n-1 |x(i+1)-x(i)| But this is dominated by the convergent geometric series sum from i=0 to n-1 c^i * |x(1)-x(0)| = |x(1)-x(0)| * (1-c^n)/(1-c) so it converges. Thus |x - x(0)| <= |x(1) - x(0)|/(1-c) Ex: F(x) = x/2 + 10 maps [0,40] to [10,30] and so satisifies the conditions of the theorem. Iteration converges to x = F(x) = x/2+10 or x=20. (picture) Ex: On your homework, you will show that if |F'| < 1, then F is a contraction. Let us look at Newton's method: F(x) = x - f(x)/f'(x) => F'(x) = f(x)*f''(x) / (f'(x)*f'(x)) If z is simple zero (f'(z) is nonzero), then F'(x) is small when x is near z, and so F is a contraction, and we expect Newton to converge