Math 128a - Spring 2002 - Lecture 5 - Feb 5 (Tuesday)
Read section 3.4
Goals: Study new methods for solving equations f(x)=0
Advantages of bisection:
guaranteed to converge, assuming only that f(x) has different
signs at xlower and xupper (f() need not be smooth)
all you need to know about f is a function that evaluates it,
(not what it does inside, i.e. just a "black box")
Drawbacks of bisection:
need to start with an interval where f(x) changes sign
hard to see how to generalize to case of more than 1 variable
Slow convergence: find width of interval = tol means
that we take
log_2 (start_width/tol) = O(log (1/tol))
= O( # correct digits )
steps to converge.
Def: A sequence x(n) -> x converges linearly if
|x(n+1) - x| <= c * |x(n) - x| for some 0 |f(x(n))| or
error bound for f'(x(n)) > |f'(x(n))| or
n > Max_n or ...)
Graphical illustration of Newton's Method.
Run matlab code, illustrate convergence.
[xvals,fvals,dfvals]=newton('funcN',3,1e-14)
[xvals,fvals,dfvals]
[xvals,fvals,dfvals]=newton('funcN',4,1e-14)
[xvals,fvals,dfvals]
Notice that error approximately squares at each point,
This is called quadratic convergence:
|x(n+1) - x| <= C * |x(n) - x|^2 for some C
Ex: if C=1, |x(n)-x|=.1, then get 1e-1 => 1e-2 => 1e-4 => 1e-8 => 1e-16 (done)
Rough convergence analysis (see what assumptions we need for a formal proof):
Suppose z is exact zero, and use next term of Taylor expansion:
0 = f(x(n)) + f'(x(n))*(x(n+1) - x(n))
f(z) = 0 = f(x(n)) + f'(x(n))*(z - x(n)) + f''(x(n))/2*(z-x(n))^2 + O((z-x(n))^3)
Subtract to get
x(n+1) - z = f''(x(n))/(2*f'(x(n))*(z - x(n))^2 + O((z-x(n))^3)
So it seems that if
we are close enough to x that the Taylor expansions are accurate
f''(x(n))/f'(x(n)) is not too large
then we should get quadratic convergence with C ~ f''(z)/(2*f'(z))
Theorem (Kantorovich): Let x(1) be the starting guess for a zero of f().
Suppose there are three constants r, m and M such that
1) |x(1) - z| <= r where z is a zero of f()
2) |f'(x)| >= m > 0 when |z - x| <= r
3) |f''(x)| exists and is <= M when |z - x| <= r
4) r <= m/M (relates size of interval, magnitudes of f', f'')
then the sequence computed by Newton's method
1) stays in the interval [z-r,z+r],
2) converges to z at least as fast as bisection, and
3) the iterates satisfy
|x(n+1) - z| <= (M/(2*m)) * |x(n) - z|^2
i.e. x(n) converges quadratically with C = M/(2*m)
Proof: Taylor's Theorem says
f(z) = 0 = f(x(n)) + f'(x(n))*(z - x(n)) + f''(eta)/2*(z-x(n))^2
where eta lies between z and x(n)
Subtract from
0 = f(x(n)) + f'(x(n))*(x(n+1) - x(n))
to get x(n+1) - z = f''(eta)/(2*f'(x(n))*(x(n) - z)^2
so if |x(n) - z| <= r then
|x(n+1) - z| <= M/(2*m) * |x(n) - z|^2
<= M/(2*m)*r*|x(n)-z|
<= |x(n)-z|/2,
so by induction the entire sequence of x(n)'s stays
in the interval [z-r,z+r], and this analysis applies
to each step. In particular
convergence is always at least as fast as bisection, and
convergence is quadratic with C = M/(2*m)
This is called a "local convergence" theorem, because it says that
you converge once you are close enough to the zero.
In contrast, our analysis of bisection is a "global convergence" theorem,
because it doesn't matter how wide the initial interval is, as long
as the function changes sign on it.
Global convergence theorems, i.e. ones that apply to wide intervals, are
rare and hard to prove, because most methods can have very complicated
behavior (dynamics) on large intervals (example below), unless the
function is very "well behaved"
What can happen when conditions in the theorem are violated?
1) If f' gets too small: In the extreme case, if some f'(x(n))=0, the
whole sequence blows up. (picture)
2) If f'' gets too large: this means f' can change a lot in a small interval
so that a straightline approximation to f can change drastically.
For example, f'(x(n)) could become 0
3) If we start far away from a root, anything can happen
Theorem. Let p(x) = product from i=1 to n (x-x(i)) be a polynomial with
real distinct roots x(1) < x(2) < ... < x(n)
Define the intervals I(0) = (-infinity, x(1))
I(1) = (x(1), x(2))
... I(j) = (x(j), x(j+1))
I(n) = (x(n), infinity)
Pick any arbitrary infinite sequence of integers
(n(1), n(2), n(3), ... ) in the range 1 to n-1, such that n(i) != n(i+1)
such as (1, 2, 1, 5, 3, 4, 6, 2, n-1, ... )
Then there is a starting point x(1) for Newton in interval I(n(1))
such that
1) Newton never converges
2) x(i) is in interval I(n(i))
Proof: uses idea of Cantor set from Ma 104: try it!
See Fig 3.8, p 127, 3rd Edition of text (or cover of 2nd edition)
This represents Newton's method applied to x^5+1=0 (or x^4+4 = 0)
using complex arithmetic. Each point in the complex plane (or "pixel")
is coded to indicate to which of the 5 (or 4) roots Newton converges
when started there. The picture is "fractal", and very complicated,
no matter how close you look.
There are some simple cases where we can establish that Newton converges
in a large interval:
Definition: A function is convex on an interval I if f''(x)>0 on I
Lemma: If f is convex on I, and if x1 < x2 are both in I, then the line
segment connecting (x1,f(x1)) and (x2,f(x2)) lies on or above the
graph of f
Proof: try it.
Theorem: Suppose
1) f has 2 continuous derivatives on the real axis
2) f'(x) > 0 on the real axis (f strictly increasing)
3) f''(x) > 0 on the real axis (f convex)
4) f(z) = 0 for some real value
(draw picture of typical function f)
Then Newton converges to z from any real starting point x(1)
Proof: From the proof of Kantorovich's Thm we still have
x(n+1) - z = f''(eta)/(2*f'(x(n))*(x(n) - z)^2
The RHS is always positive, so x(i)>z for i>1.
Since f(x(i))/f'(x(i))>0 for any x(i)>z,
x(i+1) = x(i) - f(x(i))/f'(x(i)) < x(i)
so the x(i) form a decreasing sequence bounded below by z.
So they have a limit called z*. But z* satisifies
z* = z* - f(z*)/f'(z*)
so f(z*)=0 as desired.
Since f is strictly increasing, the zero is unique and z = z*.
(picture)
Example:
Suppose f(x) is polynomial, with all coeffs >= 0, except the constant,
which is <0. Then above theorem applies (for x>= 0)
to show that Newton converges
from any positive starting point. This is the case in computing
rates of return on investments (and done in financial calculators,
eg HP12C) (see homework!)
Examples:
Newton to compute the square root (used in some computers to
implement for sqrt given only addition, multiplication, division
solve f(x) = x^2 - a = 0;
f'(x) = 2*x so Newton is x -f(x)/f'(x) = x-(x^2-a)/(2*x) = .5*(x+a/x)
Newton to compute reciprocal (used in some computers to implement division
given only addition, subtraction and multiplication
Try solving f(x) = x*a - 1 = 0; get x(n+1) = 1/a - one step convergence
but need to be able to divide!
Try solving f(x) = a - 1/x; get x(n+1) = x(n)*(2-x(n)*a)
only requires addition, multiplication, so ok
Note: neither method alone is good enough to get result correctly rounded
(rounded to the nearest floating point number)
as IEEE requires, but they can be fixed up
Next Algorithm: Secant method
Advantage over Newton: don't need to evaluate f'(x), which
may be expensive or impossible (may only be given a
"black box" subroutine for f(x), that we can't differentiate)
If the most expensive operation is evaluating f and f', then
secant is faster
Idea: if we can't evaluate f'(x(n)), what is a good approximation?
Use the slope of the line between the last two points on
the curve:
f'(x(n)) ~ [ f(x(n)) - f(x(n-1)) ] / [ x(n) - x(n-1) ]
so
x(n+1) = x(n) - f(x(n))*[x(n)-x(n-1)]/[f(x(n))-f(x(n-1))]
pick two starting values x(1) and x(2)
compute f(1) = f(x(1)) and f(2) = f(x(2))
n = 2
while (|f(x(n))| > tol or ... )
x(n+1) = x(n) - f(n)*[x(n)-x(n-1)]/[f(n)-f(n-1)]
f(n+1) = f(x(n+1))
n=n+1
end
Convergence proof: we will show that the order of this
method is p = (1+sqrt(5))/2 ~ 1.62, i.e.
|x(n) - z| = O(|x(n-1) - z|^p)
Let e(n+1) = x(n+1) - z. Then
e(n+1) = e(n) - f(x(n))*[x(n)-x(n-1)]/[f(x(n))-f(x(n-1))]
= e(n) - f(x(n))*(e(n)-e(n-1)]/[f(x(n))-f(x(n-1))]
~ e(n) - f'(z)*e(n) * ( e(n)-e(n-1) ) /
[f'(z)*e(n) + f''(z)*e(n)^2/2
- f'(z)*e(n-1) - f''(z)*e(n-1)^2/2 ]
= e(n)*[ 1 - f'(z) * (e(n)-e(n-1)) /
[f'(z)*(e(n)-e(n-1)) + f''(z)/2*(e(n)^2-e(n-1)^2)]]
= e(n)*[ 1 - f'(z)/[f'(z) + f''(z)/2*(e(n)+e(n-1)) ]
= e(n)*[ 1 - 1/(1 + (f''(z)/2/f'(z))*(e(n)+e(n-1)) ]
~ e(n)*[ 1 - 1/(1 + (f''(z)/2/f'(z))*e(n-1) ]
since we assume |e(n)| << |e(n-1)| near convergence
~ e(n)*[ 1 - ( 1 - (f''(z)/2/f'(z))*e(n-1) ) ]
= (f''(z)/2/f'(z)) * e(n)*e(n-1)
Now let us analyze the convergence of a sequence of the form
e(n+1) = C*e(n)*e(n-1)
We can "guess" that e(n+1) = c*e(n)^a, plug in and get
e(n+1) = c*e(n)^a = c*(c*e(n-1)^a)^a = c^(1+a)*e(n-1)^(a^2)
so
c^(1+a)*e(n-1)^(a^2) = C*c*e(n-1)^a*e(n-1))
or
c/C = e(n-1)^(1+a-a^2)
Since the LHS is independent of n, the RHS must also be so
1+a-a^2 = 0 or a=(1+sqrt(5))/2 ~ 1.62
(The other root of the quadratic is -.62, corresponding
to e(n) -> infinity, which is not relevant)
A related approach, with a connection to Fibonacci numbers:
Take logarithms to get
le(n+1) = log(C) + le(n) + le(n-1)
Write le'(n) = le(n) + log(C) to get
le'(n+1) = le'(n) + le'(n-1)
the Fibonacci recurrence
How do we solve Fibonacci recurrence? write le'(n) = p^n,
plug in to get
p^(n-1) = p^n + p^(n-1)
divide by p^(n-1) to get
p^2 = p + 1
or p+- = (1 +- sqrt(5))/2 ~ 1.62, -.62
so the general solution is
le'(n) = c1 * (p+)^n + c2 * (p-)^n
where we choose c1 and c2 to agree with le'(0) and le'(1).
Since |p-| < 1, the second term goes to zero, and we get
le'(n) ~ c1 * 1.62^n
and
e(n) ~ 2^(c1 * 1.62^n)/C ~ c^(1.62^n) / C
where c = 2^c1
This seems to converge more slowly than Newton, so why might it
be faster? Suppose each evaluation of f and f' is equally (and very)
expensive. Then in the same time we can do one Newton iteration we
can do two secant steps. This means that the error e(n) from secant
decreases to e(n+2) = O(e(n)^(p^2)) = O(e(n)^((3+sqrt(5))/2))
= O(e(n)^2.62)
in the same time Newton's error decreases to
e(n+1) = O(e(n)^2)
What is best to use to find a zero of a function of a single
variable?
The best available methods are hybrids, using
sign changes if they are found ('help fzero') in matlab,
or specialized to certain functional forms, like polynomaisl
(Laguerre's method, which is cubically convergent, provides
guaranteed error bounds (modulo round off), and converges
monotonically if all the roots are real.
The Matlab roots function finds roots of polynomials by
constructing a matrix whose eigenvalues are the roots of the
polynomials, and using their eigenvalue routine. This may
sound like reducing to a more difficult problem (and a
more expensive one, since finding the eigenvalues costs
O(n^3), in contrast to probably O(n^2) for the methods we
have been talking about). But the eigenvalue routine is
so reliable, and their concern about performance low enough,
that they choose to use it.
For further information see,
www.cs.berkeley.edu/~demmel/Math128/RealRoots.pdf
Last topic on finding zeros of a function of a single variable:
Several of these methods, notably Newton, are of the form
x(n+1) = F(x(n))
where in the case of Newton F(x) = x - f(x)/f'(x).
Let us ask under what general conditions an iteration x(n+1)=F(x(n))
converges. If F() is continuous, and x(n) -> x, then
x = F(x)
and x is called a "fixed point" of F. x(n+1)=F(x(n)) is called
"fixed point iteration". There are many theorems in mathematics
about when a function F() has a fixed point, and whether fixed-point
iteration will find it. Here is a basic definition and theorem:
Definition. F is called a contraction if |F(x)-F(y)| <= c*|x-y|
where 0<= c < 1
In words, applying F makes points x and y get closer together, by
at least a factor c<1.
Ex: F(x) = x/2 + 10. Then |F(x)-F(y)| = |x-y|/2
Theorem (Contraction Mapping Theorem):
Let C be a closed interval [a,b], and suppose F maps C to C.
Then if F is a contraction, it has a unique fixed point x in C,
and fixed point iteration x(n+1) = F(x(n)) converges to it.
Furthermore, if |F(z)-F(y)| <= c * |z-y| then
|x-x(n)| <= |x(n+1)-x(n)| / (1-c)
is an error bound.
Proof: We will use the fact that a sum converges if it converges
absolutely, i.e. that sum_i y(i) converges if sum_i |y(i)| does.
In our case we write the "telescoping sum"
x(n) - x(0) = x(n) - x(n-1)
+x(n-1) - x(n-2)
+x(n-2) - x(n-3)
...
+x(1) - x(0)
We want to show this sum (with y(i) = x(i+1)-x(i)) converges
absolutely. To do this note that
|y(i)| = |x(i+1)-x(i)|
= |F(x(i)) - F(x(i-1))|
<= c * |x(i) - x(i-1)|
so by induction
|y(i)| = |x(i+1)-x(i)| <= c^i * |x(1) - x(0)|
which goes to 0, since 0 <= c < 1.
Now x(n) - x(0) = sum from i=0 to n-1 x(i+1)-x(i)
converges "absolutely", i.e. the following converges:
sum from i=0 to n-1 |x(i+1)-x(i)|
because it is dominated by the convergent geometric series
sum from i=0 to n-1 c^i * |x(1)-x(0)|
= |x(1)-x(0)| * (1-c^n)/(1-c)
so it converges. Thus |x - x(0)| <= |x(1) - x(0)|/(1-c)
A similar proof shows that |x-x(n)| <= |x(n+1)-x(n)| / (1-c).
Ex: F(x) = x/2 + 10 maps [0,40] to [10,30] and so satisifies the
conditions of the theorem. Iteration converges to
x = F(x) = x/2+10 or x=20. (picture)
Ex: On your homework, you will show that if |F'| < 1, then
F is a contraction. Let us look at Newton's method:
F(x) = x - f(x)/f'(x) => F'(x) = f(x)*f''(x) / (f'(x)*f'(x))
If z is simple zero (f'(z) is nonzero), then F'(x) is small
when x is near z, and so F is a contraction, and we expect
Newton to converge