Notes for Ma221, Lecture 2, Aug 31 2010

Goals: Floating point arithmetic
       Roundoff error analysis with
          problem of evaluating a polynomial
       Start matrix and vector norms

Example: Recall plot of  (x-2)^13, evaluated 2 ways:
       as (x-2)^13 - smooth, monotonic curve, as expected
       as x^13 - 26*x^12 + ... - 8192, with Horner's rule:
           for x in the range [1.8,2.3], basically get random numbers 
           in the range [-1e-8,1e-8]
       why the big difference? need to understand floating point arithmetic

Floating Point - how real numbers represented in computer

   Long ago, computers did floating point in many different ways,
     making it hard to understand bugs and write portable code.
     Fortunately Prof. Kahan led an IEEE standards committee that
     convinced all the computer manufacturers to agree on one way
     to do it, called the IEEE Floating Point Standard, for which
     he won the Turing Award. This was in in 1985, the standard was
     recently updated in 2008. See his web page for much more information:
       www.cs.berkeley.edu/~wkahan
   Scientific Notation:
      +- d.ddd x radix^e
   Floating point usually uses radix=2 (10 sometimes, 16 historically)
      so you need to store sign bit (+-), exponent (e), and mantissa (d.ddd).
      The #digits, exponent range are limited, to fit into 32 bits, 64 bits, etc.
   Normalized:  use 3.1000e0 not  0.0031e3 - (leading digits nonzero)
      normalization gives uniqueness of representations 
   Def: rnd(x) = nearest floating point number to x 
   Relative Representation Error (RRE):  | x - rnd(x) | / | rnd(x) |
   Maximum Relative Representation Error  = max_x  RRE(x)
    aka machine epsilon, macheps, 
       = half distance from 1 to next larger number 1+radix^(1-p)
       = .5 * radix^(1-p) = | (1+.5*radix^(1-p)) - 1 | / 1
       = 2^(-p) in binary
    Note: eps in Matlab = 2^(-52) =  2*macheps
   Roundoff error model, provided no over/underflow
      fl(a op b) = (a op b)(1 + delta),   |delta| <= macheps
         where op may be add, subtract, multiply or divide
      Use throughout course, all you need for most algorithms
      true for complex arithmetic (see Q 1.12 for details, use bigger macheps)
   IEEE single(S)/double(D)/quad (Q) formats : radix = 2
      S: 1+8+23 bits, -126 <= e <= 127, p = 24, MRRE = 2^(-24) ~ 6e-8
           overflow threshold (OV) ~  2^128 ~ 1e38, 
           underflow threshold (UN) = 2^(-126) ~ 1e-38
      D: 1+11+52 bits, -1022 <= e <= 123, p=53, MRRE = 2^(-53) ~ 1e-16
           OV ~  2^1024 ~ 1e308, 
           UN = 2^(-1022) ~ 1e-308
      Q: 1+15+112 bits, -16382 <= e <= 16383, MRRE = 2^(-113) ~ 1e-36
           OV ~  2^16384 ~ 1e11356,
           UN = 2^(-16382) ~ 1e-11356
   Note: S and D available in hardware in most platforms,
         Q usually in software (added to 2008 standard)
         E(xtended), an 80-bit format on Intel x86 architectures, was
            in old IEEE standard from 1985, now deprecated
         higher available via software simulation
           (see ARPREC, GMP on web page)
         2008 standard also added decimal arithmetic (for Excel...)

That's enough for now to understand the plot of (x-2)^13, but more 
about floating point later.

  Analyze Horner for p(x)
     full expression
            p = sum_{i=0 to d} a_i x^i
            p   = a_d, for i=d-1:-1:0, p   =  x*p       + a_i
            p_d = a_d, for i=d-1:-1:0, p_i =  x*p_{i+1} + a_i
            p_d = a_d, for i=d-1:-1:0, p_i = [x*p_{i+1}*(1+d_i) + a_i]*(1+d'_i)
                     where |d_i| <= macheps and |d'_i| <= macheps
            p_0 = sum_{i=0:d-1} [(1+d'_i)*prod_{j=0:i-1}(1+d_j)*(1+d'_j)*a_i*x^i
                  + prod_{j=0:d-1} (1+d_j)*(1_d'_j)*a_d*x^d
                = sum_{i=0:d-1} [product of 2i+1 terms like 1+d] a_i*x^i
                  + [product of 2d terms like (1+d)] a_d*x^d

     In words: Horner is backward stable: you get exact value of polynomial
        q(x) at x but with slightly changed coefficients

     how to simplify to get error bound
           prod_{i=1:n} (1 + delta_i)
              <= prod_{i=1:n} (1 + eps)
               = 1 + n*eps + O(eps^2)  ...  usually ignore (eps^2)
              <= 1 + n*eps/(1-n*eps)  if n*eps < 1 ... (lemma left to students)
       similarly
           prod_{i=1:n} (1 + delta_i)
              >= prod_{i=1:n} (1 - eps)
               = 1 - n*eps + O(eps^2)  ...  usually ignore (eps^2)
              >= 1 - n*eps/(1-n*eps)  if n*eps < 1 ... (lemma left to students)

       so  |prod_{1 to n} (1 + delta) - 1| <= n*eps ... ignore eps^2
       and thus
           |computed p_d - p(x)| <= sum_{i=0:d-1} (2i+1)*eps*|a_i*x^i|
                                    + 2*d*eps*|a_d*x^d|
                                 <= sum_{i=0:d} 2*d*eps*|a_i*x^i|

           relerr = |computed p_d - p(x)|/|p(x)|
                 <=  sum_{i=0:d} |a_i x^i| / |p(x)|   *   2*d*eps
                  =  condition number  *  relative backward error

           How many decimal digits can we trust?
               d correct digits <=> relative error = 10^(-d) 
                                <=> -log_10 (relative error) = d

    how to modify Horner to compute error bound
            p = a_d ebnd = |a_d|,
            for i=d-1:-1:1, p = x*p + a_i, eb = |x|*eb + |a_i|
            ebnd = 2*d*macheps*ebnd

     Matlab demo:
       coeff = poly(2*ones(13,1));
       x = [1.6:1e-4:2.4];
       y = polyval(coeff,x);
       yacc = (x-2).^13;
       ebnd = 13*eps*polyval(abs(coeff),abs(x));
       %   note eps in matlab = 2*macheps
       plot(x,y,'k.',x,yacc,'c',x,y-ebnd,'r',x,y+ebnd,'r')
       axis([1.65 2.35 -1e-8 1e-8]), grid
       %   need to make vertical axis wider to see bounds
       axis([1.65 2.35 -1e-6 1e-6]), grid
       %   conclusion: don't trust sign outside roughly [1.72, 2.33]

     Consider Question 1.21: how could we use this error bound to 
       stop iterating in root finding using bisection?
       ... now try wider range, look at actual and estimated # correct digits

       x = -1:.0001:5;
       y = polyval(coeff,x);
       yacc=(x-2).^13;
       ebnd=13*eps*polyval(abs(coeff),abs(x));
       plot(x,-log10(abs((y-yacc)./yacc)),'k.',x,-log10(ebnd./abs(yacc)),'r')
       axis([-1 5 0 16]), grid
       title('Number of correct digits in y')

   Notice the analogy between Horner's rule and computing dot-products:

     p   = a_d, for i=d-1:-1:1, p   = x*p     + a_i
     s   =  0 , for i=1:d,      s   = x_i*y_i + s

   analysis of dot products very similar (homework 1.10)

   Natural question is what algebraic formulas have the property that
   there is some way to evaluate them that, despite round off, the
   final answer is always correct in most of its leading digits:
   Ex: general polynomial: no, not without higher precision
   Ex: x^2+y^2:  yes, no cancellation, 
   Ex: determinant of a Vandermonde matrix: V(i,j) = x_i^(j-1)
       det = prod_{i < j} (x_j - x_i) so yes

   A complete answer is an open question, but there are necessary and
   sufficient conditions based on algebraic and geometric properties
   of the formula, see article by Demmel/Dumitriu/Holtz/Koev in
   Acta Numerica v 17, 2008; a class of linear algebra problems
   are identified that, like det(Vandermonde), permit accurate
   solution despite roundoff. We will not discuss this further,
   but someone could pursue it as a class project. 
   (This work was cited in coauthor Prof. Olga Holtz's award
    of the 2008 European Math Society Prize)
      
   More about floating point:
    what happens when result too big, too small to fit, or 
    isn't mathematically defined, like 0/0? 
    get exceptions
 
    What if answer < UN? underflow
      What to do: if you return zero, then what happens with code:
          ... suppose a, b, nonnegative
          if (a .ne. b) then x = max(a,b)/(a-b)   
          ... can divide by zero, 
      How big can correct value of x be? abs(x) < 1/macheps = 1/MRRE
      Instead, IEEE standard says you get "subnormal numbers"
         x = +- 0.dddd  * 2^exp_min instead of +- 1.dddd * 2^exp
      Draw IEEE number line
      Without subnormals, do error analysis with
          fl(a op b) = (a op b)(1 + delta) + eta,   |delta| <= macheps, 
                    |eta| <= UN
      With subnormals, do error analysis with
          fl(a op b) = (a op b)(1 + delta) + eta,   |delta| <= macheps, 
                |eta| <= UN*macheps for *,/  and eta=0 for +-
      subnormals (underflow), extend error model:
        S: + eta, |eta| <= 2^(-150) ~ 1e-45
        D: + eta, |eta| <= 2^(-1075) ~ 1e-324
        E: + eta, |eta| <= 2^(-16446) ~ 1e-11399
        Thm: With subnormals, for all floats a,b fl(a-b) = 0 iff a=b
        Purpose: simplify reasoning about floating point algorithms

     What is answer > OV? get infinity (overflow)
        fl(1e20*1e20) = fl(1/0) = inf    ... in single
        fl(1/inf) = 0, fl(1/-0) = -inf

     What if answer mathematically undefined? get NaN = Not a Number 
        fl(sqrt(-1)) = fl(0/0) = fl(inf-inf) = NaN
        3 + NaN = NaN ... so you see NaN on output if one occurred

      Flags to check if an exception occurred

    Why bother with exceptions? Why not always just stop when one occurs?
    (1) reliability: too hard to have test before each
         floating point operation to avoid exception
         Ex for control system (see Ariane 5 crash on webpage), 
         Ex MatLab: don't want to go into infinite loop because of input NaN 
            (caused several fixes to LAPACK, and also helped motivate
             an on-going CS research project to build tools to prove
             that NaNs, infinities, etc cannot cause infinite loops or
             analogous bugs)
    (2) speed: ditto (too slow to test before each operation)
          (1) run "reckless" code that is fast but assumes no exceptions
          (2) check exception flags
          (3) in case of exception, rerun with slower algorithm
        Ex: s = root-sum-squares = sqrt(sum_{i=1}^n x_i^2)
        What could go wrong? (see Q 1.19)
        Ex: current issue is to implement a new parallel algorithm
            for finding eigenvectors of triangular matrices using this idea.
            (possible class project)
    (3) sometimes one can prove code correct with exceptions:
        current fastest routine to find (some) eigenvalues of 
        Ex: symmetric matrices depends on these, including 1/-0 is -inf

   Impact on LAPACK: One of The fastest algorithms in LAPACK for finding
     eigenvalues of symmetric matrices asssumes that 1/+0 = +infinity,
     and 1/-0 = -infinity, as specified by the IEEE standard.
     We were porting this code to an ATI GPU a couple of years ago, 
     and discovered that they did not handle exceptions correctly: 
     1/(-0) was +infinity instead of -infinity.
     The code depended on getting -infinity to get the right answer,
     and (until they fixed their hardware) we had to take the quantity
     in the denominator 1/d and instead compute 1/(d+0), which made
     the -0 turn into a +0, whose reciprocal was correctly computed. 
     See www.cs.berkeley.edu/~volkov for details

   Impact on Matlab: Last year we released a new version of 
     an LAPACK routine used in used in least square solvers and eigenvalue
     routines, that could be a lot faster on band matrices, and could be
     used to more easily generate random orthogonal matrices with a certain
     probability distribution requested by some users.  Mathworks
     noticed that is was more susceptible to over/underflow, and after
     careful analysis, we decided this was an intrinsic property, and we
     had to remove it from most parts of later releases.
    
   To generalize error analysis to linear algebra, need to
   generalize the idea of absolute value to vectors and matrices, so we need norms.