SCALING.TXT 18 July 2011 Prescaling for a Sum of Squares in IEEE 754 Floating-Point W. Kahan OUR TASK: Choose a scale factor å so that á = û( ä x[j]ý ) can be computed as á := û( ä (åúx[j])ý )/å with no undeserved damage from over/- underflow (though possibly the final division (...)/å may overflow if á deserves it, or underflow if every real x[j] is subnormal), no division in the inner loop, and rarely a second pass through x . Compensated Summation can be used for better accuracy if n is huge. We assume n := ä 1 is given, ƒ := max ³x[j]³ > 0 will be computed with á , and that we already know these Environmental Constants: î = NextAfter(1, ì) - 1 = 2ú(Roundoff threshold) (Matlab's eps) ê = NextAfter(ì, 1) = Overflow threshold (Matlab's realmax) æ = NextAfter(0, 1) = Smallest (subnormal) positive number ç = æ/î = Normal Underflow threshold (Matlab's realmin) For IEEE 754's "Single" (4 bytes wide) binary floating-point format, î = 1/2^23 , ê ÷ 2^128 , æ = 1/2^149 , ç = 1/2^126 . For IEEE 754's "Double" (8 bytes wide) binary floating-point format, î = 1/2^52 , ê ÷ 2^1024 , æ = 1/2^1074 , ç = 1/2^1022 . For IEEE 754's "Quad" (16 bytes wide) binary floating-point format, î = 1/2^112 , ê ÷ 2^16384 , æ = 1/2^16494 , ç = 1/2^16382 . Evidently æ ó ƒ ó ê . Lest roundoff spoil á , we pressume also that n << min{ 1/î, êúîý } . Usually the data {x[j]} require no scaling (å := 1). Therefore ƒ and ä x[j]ý can be computed simultaneously until a ³x[j]³ ò û(ê/n) is encountered, whereupon summation can be suspended while the rest of {³x[j]³} is scanned to determine ƒ , thus preventing ä x[j]ý from overflowing to ì and then slowing down some computers drastically. From the given n and ƒ we infer ƒý ó ä x[j]ý ó núƒý . To evaluate ä x[j]ý without risking severe harm from over/underflow, we require, say, ûç /î < ƒ < û(ê/n) , in which case leave å := 1 (no scaling) and then, satisfactorily, find ç/îý < ä x[j]ý < ê . The usual case. If instead ƒ ò û(ê/n) , premature overflow can be avoided by scaling every x[j] down to åúx[j] with å := ûî /ûê , say. (Not û(î/ê) lest î/ê underflow to zero first.) Then, satisfactorily, ... îý << î/n ó (åúƒ)ý ó ä (åúx[j])ý ó nú(åúƒ)ý ó nú(åúê)ý = núîúê << ê . If instead ƒ ó ûç /î , scaling every x[j] up by å := 1/(îúûç) will allow evaluation of ä (åúx[j])ý without any underflows because, satisfactorily, ... ç ó (åúæ)ý ó (åúƒ)ý ó ä (åúx[j])ý ó nú(åúƒ)ý ó núåýúç/îý = n/îý << ê . To avoid unnecessary rounding errors, ensure that å is a power of 2 by multiplying the formulas for å by 1/û2 or û2 respectively only if necessary. e.g., tiny å := û(î/2)/ûê for IEEE 754's "Single". To apply the foregoing to complex data {z[j]} when á := ä ³z[j]³ý , treat complex {z[j]} as real data of twice {z[j]}'s dimension n .