SCALING.TXT                                                18 July 2011

        Prescaling  for a  Sum of Squares  in  IEEE 754 Floating-Point

                                   W. Kahan

    OUR TASK:
    Choose a scale factor  å  so that  á = û( ä x[j]ý )  can be computed
    as  á := û( ä (åúx[j])ý )/å  with no undeserved damage from over/-
    underflow  (though possibly the final division  (...)/å  may overflow
    if  á  deserves it,  or underflow if every real  x[j]  is subnormal),
    no division in the inner loop,  and rarely a second pass through  x .
    Compensated Summation  can be used for better accuracy if  n  is huge.

    We assume  n := ä 1  is given,  ƒ := max ³x[j]³ > 0  will be computed
    with  á ,  and that we already know these  Environmental Constants:
       î = NextAfter(1, ì) - 1 = 2ú(Roundoff threshold)  (Matlab's eps)
       ê = NextAfter(ì, 1) = Overflow threshold          (Matlab's realmax)
       æ = NextAfter(0, 1) = Smallest (subnormal) positive number
       ç = æ/î = Normal Underflow threshold              (Matlab's realmin)

    For  IEEE 754's "Single"  (4 bytes wide)  binary floating-point format,
           î = 1/2^23 ,   ê ÷ 2^128 ,   æ = 1/2^149 ,   ç = 1/2^126 .
    For  IEEE 754's "Double"  (8 bytes wide)  binary floating-point format,
          î = 1/2^52 ,   ê ÷ 2^1024 ,   æ = 1/2^1074 ,   ç = 1/2^1022 .
    For  IEEE 754's "Quad"  (16 bytes wide)  binary floating-point format,
        î = 1/2^112 ,   ê ÷ 2^16384 ,   æ = 1/2^16494 ,   ç = 1/2^16382 .

    Evidently  æ ó ƒ ó ê .  Lest roundoff spoil  á ,  we pressume also that
                            n << min{ 1/î,  êúîý } .

    Usually the data  {x[j]}  require no scaling  (å := 1).  Therefore  ƒ
    and  ä x[j]ý  can be computed simultaneously until a  ³x[j]³ ò û(ê/n)
    is encountered,  whereupon summation can be suspended while the rest of
    {³x[j]³}  is scanned to determine  ƒ ,  thus preventing  ä x[j]ý  from
    overflowing to  ì  and then slowing down some computers drastically.

    From the given  n  and  ƒ  we infer  ƒý ó ä x[j]ý ó núƒý .  To evaluate
    ä x[j]ý  without risking severe harm from over/underflow,  we require,
    say,  ûç /î < ƒ < û(ê/n) ,  in which case leave  å := 1  (no scaling)
    and then,  satisfactorily,  find  ç/îý < ä x[j]ý < ê .  The usual case.

    If instead  ƒ ò û(ê/n) ,  premature overflow can be avoided by scaling
    every  x[j]  down to  åúx[j]  with  å := ûî /ûê ,  say.  (Not  û(î/ê)
    lest  î/ê  underflow to zero first.)  Then,  satisfactorily,  ...
      îý << î/n ó (åúƒ)ý ó ä (åúx[j])ý ó nú(åúƒ)ý ó nú(åúê)ý = núîúê << ê .

    If instead  ƒ ó ûç /î ,  scaling every  x[j]  up by  å := 1/(îúûç)
    will allow evaluation of  ä (åúx[j])ý  without any underflows because,
    satisfactorily,  ...
     ç ó (åúæ)ý ó (åúƒ)ý ó ä (åúx[j])ý ó nú(åúƒ)ý ó núåýúç/îý = n/îý << ê .

    To avoid unnecessary rounding errors,  ensure that  å  is a power of  2
    by multiplying the formulas for  å  by  1/û2  or  û2  respectively only
    if necessary.  e.g.,  tiny  å := û(î/2)/ûê  for  IEEE 754's "Single".

    To apply the foregoing to complex data  {z[j]}  when  á :=  ä ³z[j]³ý ,
    treat complex  {z[j]}  as real data of twice  {z[j]}'s  dimension  n .