Math 128a - Spring 2002 - Lecture 3 - Jan 29 (Tuesday) The first homework assignment has been posted on the web page. it is due Feb 7. Reading in text: Sections 2.3, 3.2, 3.3 Goals: Want to understand how to bound error in evaluating polynomial Want to modify bisection (or any zero finder) to use error bound to give bound to user on location of zero Example 11: x=(1:.001:3), y=func(x), plot(x,y), axis([1 3 -1e-8 1e-8]) round off makes answer "junk" but with structure x=2+(0:2000)*2*eps;y=func(x);plot(y,'.') Use Fact from last time to bound error: UN = smallest normalized floating point number = "Underflow threshold" = 2^(-1022) ~ 10^(-308) OV = largest normalized floating point number = "Overflow threshold" ~ 2^(1024) ~ 10^(308) Let op be one of =,-,*,/. Suppose UN <= |fl(a op b)| <= OV. (i.e. fl(a op b) does not overflow or underflow). Then (*) fl(a op b) = (a op b)*(1+ delta) where |delta| <= epsilon We note that IEEE also guarantees that (**) fl(sqrt(a)) = sqrt(a)*(1+delta), |delta| <= eps Example 12: The program p = 1, for i=1:n, p=p*x(i), end computes the product x(1)*x(2)* ... *x(n) What is difference between the true p and the computed p? Approach: write down a "program" that includes all the rounding error explicitly, and evaluate it exactly: p* = 1, for i=1:n, p* = (p* * x(i))*(1 + delta(i)) where delta(i) is the roundoff error in the i-th multiplication. Evaluating, we get p* = x(1)*...*x(n) * (1+delta(1))*...*(1+delta(n)) = p * (1+delta(1))*...*(1+delta(n)) = p * (1+r) where 1+r = (1+delta(1))*...*(1+delta(n)) Ideally, 1+r is as close to 1 as possible. How close is it? Simple bounds: (1-epsilon)^n <= 1+r = (1+delta(1))*...*(1+delta(n) <= (1+epsilon)^n so (1-epsilon)^n - 1 <= r <= (1+epsilon)^n - 1 How to approximate (1+-epsilon)^n? Simplest way: Binomial theorem or Taylor expansion: (1+-epsilon)^n = 1 +- n*epsilon + O((n*epsilon)^2) so if |n*epsilon|<<1, get that -n*epsilon + O((n*epsilon)^2 ) <= r <= n*epsilon + O((n*epsilon)^2 ) For example, n<=100, epsilon ~ 1e-16, means p* approximates p to at least 14 decimal places Approximation ok if n*epsilon << 1 (see below) To be rigorous, and not ignore O(epsilon^2) term, one can prove Lemma: |r| <= n*epsilon / (1 - n*epsilon ) if n*epsilon < 1 Proof: homework This is "worst case analysis", only happens if all delta(i) ~ epsilon, or all delta(i) ~ -epsilon. This is unlikely, but possible, so often answer is more accurate than worst case prediction (examples later) From now on, we will write |r| <= n*epsilon, ignoring second order terms In case of p*, if for example, epsilon = 1e-16 and n = 100, then n*epsilon <= 1e-14 => p* good to 14 decimal places (details on homework) Example 13: The program s = 0; for i=1:n, s=s+x(i), end computes the sum x(1)+x(2)+ ... +x(n) What is difference between the true s and the computed s*? Approach: as beforewrite down a program that includes all the rounding error explicitly, and evaluate it exactly: s* = 0, for i=1:n, s* = (s* + x(i))*(1 + delta(i)) where delta(i) is the roundoff error in the i-th addition This is tricker algebra, so idea is to modify program to have a unique identifier for each intermediate value: s*(0) = 0, for i=1:n, s*(i) = (s*(i-1) + x(i))*(1 + delta(i)) Multiply out a few terms to see pattern (skip induction proof) s*(1) = x(1) ... no roundoff when adding 0 s*(2) = (x(1)+x(2))*(1+delta(2)) s*(3) = ((x(1)+x(2))*(1+delta(2)) + x(3))*(1+delta(3)) = x(1)*(1+delta(2))*(1+delta(3)) + x(2)*(1+delta(2))*(1+delta(3)) + x(3)*(1+delta(3)) s*(4) = x(1)*(1+delta(2))*(1+delta(3))*(1+delta(4)) + x(2)*(1+delta(2))*(1+delta(3))*(1+delta(4)) + x(3)*(1+delta(3))*(1+delta(4)) + x(4)*(1+delta(4)) ... s*(k) = x(1)*(1+delta(2))*...*(1+delta(k)) + x(2)*(1+delta(2))*...*(1+delta(k)) + x(3)*(1+delta(3))*...*(1+delta(k)) + x(4)*(1+delta(4))*...*(1+delta(k)) + ... + x(k)*(1+delta(k)) so s*(n) = sum from i=1 to n of x*(i) where x*(i) = x(i)*(1+r(i)) where |r(i)| <= (n-1)*epsilon In words, s*(n) is the exact sum of a perturbed set of summands x*(i), each of which is a small relative perturbation of x(i) What does this mean? First, suppose all the x(i) are positive; then claim that s*(n) = s(n)*(1+delta), where |delta| <= (n-1)*epsilon Proof: s*(n) - s(n) = sum x(i)*(1+r(i)) - sum x(i) = sum x(i)*r(i) <= sum x(i)*(n-1)*epsilon = s(n)*(1+(n-1)*epsilon) and similary s*(n) >= s(n)*(1-(n-1)*epsilon) What if not all x(i) > 0? then can only show that | s*(n) - s(n) | = | sum_i x(i)*r(i) | <= (n-1)*epsilon* sum_i |x(i)| so error bound same as if all x(i)>0, and so the relative error is | s*(n) - s(n) | / | s(n) | <= (n-1)*epsilon * [ sum |x(i)| / | sum x(i) | ] Thus, if |s(n)| = | sum x(i) | << sum |x(i)|, i.e. if there is a lot of "cancellation" in the sum, then there may be a large relative error: Ex: fl((x+1e16) - 1e16) = 0, not x, for any 0 < |x| < epsilon*1e16 ~ 4.44 so the relative error is 100%. What about polynomial root finding, which involves summing and multiplying? As we approach the the root, we are in precisely the situation where the value of the polynomial (a sum) is supposed to be near zero, i.e. we expect large relative errors. The analysis is similar to summing: Example 14: Horner's rule for polynomial evaluation (used in polyval) ... compute p = sum from i=0 to n of c(i)*x^i p = c(n); for i=n-1:-1:0, p = x*p + c(i) Using same approach as for rounding error analysis in Example 13, we get p*(n) = c(n); for i=n-1:-1:0, p*(i) = (x*p*(i+1)*(1+eps(i)) + c(i))*(1+delta(i)) end with |eps(i)| <= epsilon, |delta(i)| <= epsilon p*(n) = c(n) ... no roundoff p*(n-1) = ( (x * p*(n) * (1+eps(n-1)) + c(n-1)) * (1+delta(n-1)) = [x * c(n)] * (1+eps(n-1))*(1+delta(n-1)) + c(n-1) * (1+delta(n-1)) p*(n-2) = ( (x * p*(n-1)) * (1+eps(n-2)) + c(n-2)) * (1+delta(n-2)) = [x^2 * c*(n)] * (1+eps(n-1))*(1+eps(n-2))*(1+delta(n-1))*(1+delta(n-2) +[x * c(n-1)] * (1+eps(n-2))*(1+delta(n-1))*(1+delta(n-2) + c(n-2) * (1+delta(n-2)) p*(n-k) = [x^k * c(n) ] * (1+eps(n-1))*...*(1+eps(n-k) * (1+delta(n-1))*...*(1+delta(n-k) * +[x^(k-1) * c(n-1) ] * (1+eps(n-2))*...*(1+eps(n-k) * (1+delta(n-1))*...*(1+delta(n-k) * +[x^(k-2) * c(n-2) ] * (1+eps(n-3))*...*(1+eps(n-k) * (1+delta(n-2))*...*(1+delta(n-k) * ... + [c(n-k)] * (1+delta(n-k)) Note that the maximum number of terms of the form (1+tiny) multiplied together is 2*n. Thus p*(n) = sum from i=0 to n of c*(i)*x^i where c*(i) = c(i)*(1+r(i)) where |r(i)| <= 2*n*epsilon That is the computed value p* of the polynomial is the exact value of another polynomial evaluated at the same point x, but with different coefficiencts c*(i), each of which differs from the original c(i) by a small relative changes r(i). Further more, we can write the error bound as | p* - p | = | sum from i=0 to n c*(i)*x^i - c(i)*x^i | = | sum from i=0 to n [c*(i) - c(i)]*x^i | = | sum from i=0 to n [r(i)*c(i)]*x^i | <= sum from i=0 to n |r(i)*c(i)]*x^i | because r(i) can be positive or negative, <= 2*n*epsilon * sum from i=0 to n |c(i)|*|x|^i = 2*n*epsilion * value of a polynomial with coefficients |c(i)| at |x|, i.e. all positive numbers, without cancellation In matlab, if p = polyval(coef,x), then | p* - p | <= 2*n*epsilon * polyval(abs(coef),abs(x)) In practice, factor 2*n is pessimistic, and often omitted. Example: [val,errorbnd]=polyvalbnd(coef,x) [x,val,errorbnd,errorbnd./val] for x=(1:.2:3)' when relative error bound = errorbnd./val > 1, can't trust sign in zero finder Example: [xout,HornerVal,TrueVal,AbsErrorBound,TrueAbsError]= polyplot(roots,x) [...]=polyplot(2*ones(13,1),1:.001:3); [...]=polyplot(1:20,-5:.001:25); Using this error bound in a zero finder, like bisection: 1) compute both f(xmid) and an error bound on f(xmid) 2) stop when error bound > |f(xmid)|, because then sign of f(xmid) in doubt, might as well be zero End of floating point arithmetic details and error analysis, for now: Big picture to remember: value of polynomial computed with roundoff is the exact value of a slightly perturbed polynomial; this is an example of a general phenomenon called "backward stability" that is widely used to analyze the impact of roundoff on an algorithm (Warning: this is the first of several uses of the word "stability" in this course. It has different meanings in different contexts, so be aware!) Here is generally how we will analyze algorithms: Let y=f(x) be the true mathematical function we'd like to evaluate Ex: y = sin(x), or y = value of polynomial evaluated at a point, in which case x is a vector of values (polynomial coefficients and argument of the polynomial) y = root of a polynomial, in which case x is as above y = solution of a system of equations, in which case y is a vector, and x includes all the coefficients of the system of equations Let alg(x) be the result of the algorithm in the machine, including all round off errors and any other approximations we make Then we want to bound the error = f(x) - alg(x) We do this in two steps, by analyzing 1) backward stability 2) conditioning Both terms have different meanings in different mathematical contexts, so beware For simplicity at first, suppose x and y are scalars 1) "Backward Stability" : Suppose we can show that alg(x) = f(x+ dx) where |dx| << |x|, then we say that the algorithm is "stable" because we get the exact answer for a slightly wrong problem x + dx (Terminology: dx also called the "backward error") Examples: Evaluating sin(x), other built-in functions is stable. Evaluating a sum is stable because we get the exact sum of a slightly different set of summands (see last time) Evaluating a polynomial is stable, because we get the exact value of a slightly different polynomial f(x)=x, alg(x) = (x+1e100)-1e100 is unstable, because we get 0 for |x|<1e84, and alg(x) 0 != x+dx = f(x+dx) for |dx|<<|x| and x != 0 2) "Conditioning" By Taylor's theorem, f(x + dx) = f(x) + dx*f'(x) + O(dx^2) Thus if f'(x) is large, we expect f(x+dx) - f(x) to be large, else small. Since we care about relative errors, we often write this in the equivalent way as [ f(x+dx) - f(x) ] / f(x) = ( dx / x ) * ( x * f'(x) / f(x) ) or relative change in f = relative change in x * "condition number" So putting it all together, we get [ alg(x) - f(x) ] / f(x) ~ ( dx / x ) * ( x * f'(x) / f(x) ) or, taking absolute values | [ alg(x) - f(x) ] / f(x) | ~ | dx / x | * | x * f'(x) / f(x) | relative error in f = relative change in x * "condition number" Thus we expect a small error if the algorithm is stable (dx/x is small) and the condition number is not too large Having a stable algorithm obviously depends on having a good algorithm The condition number only depends on f(x), not the algorithm. Ex: consider f(x) = sin(x), then condition number = k(x) = x*cos(x)/sin(x) = x*cot(x) Consequence 1: k(x) is large when sin(x) near 0 Consequence 2: k(x) is large when x is large What about f(x,y)? or more variables? Recall Taylor's Thm f:R^2 -> R f(x + deltax, y + deltay) = f(x,y) + df/dx(x,y) deltax + df/dy(x,y) deltay + second order terms Proof: Use chain rule (ignoring second order terms) d f(x(t),y(t))/ dt = df/dx dx/dt + df/dy dy/dt so f(x(t+deltat),y(t+deltat)) = f(x,y) + df/dx dx/dt deltat + df/dy dy/dt deltat so f(x(t) + dx/dt deltat, y(t) + dy/dt deltat) = f(x,y) + df/dx dx/dt deltat + df/dy dy/dt deltat write dx/dt deltat = deltax, dy/dt deltat = deltay to get result More generally, f(x1 + deltax1, ... , xn + deltaxn) = f(x1,...,xn) + sum from i=1 to n df(x1,...,dx)/dxi deltaxi Def: vector delf = (df/dx1,...,df/dxn) called gradient of f So rewrite Taylor's theorem as f(x1 + deltax1 ,..., xn + deltaxn) = f(x1,...,xn) + delf * deltax where deltax = (deltax1 , ... , deltaxn) where the second term is the dot product of two vectors Thus [ f(x1+deltax1,...,xn+deltaxn) ] - f(x1,...,xn) / f(x1,...,xn) = delf * deltax / f(x1,..,xn) Example: Evaluating a polynomial p(z) = sum from i=0 to d c(i)*z^i What are parameters? If only z changes, then n=1, x=z, delf = p'(x) But from rounding error analysis, only coeffs change, so n=d+1, x=[c(0),...,c(n)], f(c(0),c(1),c(2),...,c(n)) = sum from i=0 to n c(i)*x^i delf = [ 1, x, x^2, ... , x^n] deltax = [ deltac(0), deltac(1), ... , deltac(n)] Thus [ f(x1+deltax1,...,xn+deltaxn) ] - f(x1,...,xn) / f(x1,...,xn) = sum from i=0 to n deltac(i)*x^i / sum from i=0 to n c(i)*x^i Example: Finding zeros of a polynomial In other words f(c(0),...,c(n)) = root of polynomial with coeffs c(i) How to differentiate it? Implicitly: write p(x) = 0 and (p + deltap)(x+dx) = 0, solve for dx where p(x) = sum c(i)*x^i 0 = (p + deltap)(x+dx) = sum (c(i)+delta(c(i)))*(x+dx)^i = sum c(i)*(x+dx)^i + sum deltac(i)*(x+dx)^i ~ sum c(i)*(x^i + i*x^(i-1)*dx) + sum deltac(i)*(x^i + i*x^(i-1)*dx) = sum c(i)*x^i + sum i*c(i)*x^(i-1) * dx + sum deltac(i)*x^i + sum i*deltac(i)*x^(i-1) * dx = 0 + p'(x) * dx + deltap(x) + O(delta^2) ... we ignore last term Thus, dx = -deltap(x)/p'(x) to first order Matlab example to illustrate this: ebnd = polyperturb(r,e,m) takes the roots r of a polynomial, forms the coefficients for i=1 to m, perturbs coefficients by relative amount e, computes the roots plots them illustrates pictorially how sensitive the roots are to perturbations. ebnd = polyperturb([1 2 3],1e-4, 300) ebnd = polyperturb([1 2 3],1e-3, 300) ebnd = polyperturb([1 2 3],2e-3, 300) ebnd = polyperturbANS([1 2 3],2e-3, 300) ebnd = polyperturb([1 2 3],2e-2, 300) ebnd = polyperturb(1:10,1e-4, 300) ebnd = polyperturbANS([1 2 2 3 3 3],1e-6, 300) ebnd = polyperturbANS([1 2 2 3 3 3],1e-4, 300) ebnd = polyperturbANS([1 2 2 3 3 3],1e-3, 300) ebnd = polyperturb(2*ones(5,1),1e-12, 300) why does it look like this?