LOG10HAF.txt 9 Aug. 2004
A Logarithm Too Clever by Half
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
W. Kahan
MATLAB's log10 function computes the base-ten logarithm of its binary
floating-point argument inaccurately despite first appearances. For
instance, smooth graphs of log(10)*log10(x)/(x-1) and log(x)/(x-1)
plotted at values x scattered densely near but not at x = 1 ought
to be almost indistinguishable, but they aren't. They reveal log10's
error at x = 1 + eps to be near 4% in MATLAB 6.5 on PCs. Values
x far from 1 incur subtler errors measured in ULPs (see FOOTNOTE 1
below). These subtle errors revive a long-standing question worrysome
to conscientious implementors of run-time mathematical libraries that
compute log, exp, cos, y^x, ...:
How accurate is accurate enough?
We must distinguish the implemented function LOG10(x) from the ideal
function log10(x) . They differ by at least one rounding error since,
with rare exceptions, log10(x) is irrational (really transcendental)
for arguments x rational like all floating-point numbers. Those rare
exceptions provide the occasion for this case-study in error analysis.
For which binary floating-point arguments w can LOG10(10.0^w) == w ?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If LOG10 differed from log10 by always less than half an ulp, and
if the computed value of 10.0^w were also rounded correctly, also
within half an ulp, we would find LOG10(10.0^w) == w for ...
-----------------------------------------------------------
| all |w| > 1/2 unless 10.0^w over/underflows, |
| 1/4 < w < log10(2) = 0.30103... , |
| -0.3622... = -log10(log(10)) < w < -log10(2) , |
-----------------------------------------------------------
and for many other values of w including w = 0 . But not for _all_.
Why not for all |w| < 1/2 too? Then many a computed value of 10.0^w
can repeat among computed values of 10.0^w for several consecutive
floating-point numbers w . For instance, 10.0^w rounds to the same
value for four arguments in the interval 1/8 - 4*ulp(1/8) < w <= 1/8 ;
and then this value's log10 rounds correctly to 1/8 - ulp(1/8) .
Such repetitions become vastly more abundant as |w| gets tinier.
No such repetitions should occur if w lies in the intervals exhibited
in the box above. Therein the two rounding errors in LOG10(10.0^w)
would cancel each other out _provided_ LOG10 and 10.0^w were
_always_ rounded correctly -- each within half an ulp.
Could LOG10 and 10.0^w be implemented so well that they were always
rounded correctly? Yes and no. Yes if you are willing to suffer speed
penalties of at least about 20% on average and perhaps 800% or more
occasionally. No if 10.0^w is computed by invoking the math. library
program Y^W ( Y**W in Fortran, POW(Y,W) in C). Nobody knows how
much it would cost to compute y^w correctly rounded for _every_ two
floating-point arguments at which it does not over/underflow. Instead,
reputable math. libraries compute elementary transcendental functions
mostly within slightly more than half an ulp and almost always well
within one ulp.
Why can't Y^W be rounded within half an ulp like SQRT ? Because
nobody knows how much computation it would cost to resolve what I long
ago christened "The Table-Maker's Dilemma". Here is an instance of it
derived from phenomena descried by D.H. Lehmer in the early 1940s:
Let's try to compute C := cosh(pi*sqrt(163))/8 - (2^53 - 1) correctly
rounded to Matlab's working precision of 53 sig. bits using binary
floating-point variables 8 bytes wide. Matlab's versions 3.5, 4.2,
5.3 and 6.5 on Intel-based PCs, and versions 3.5, 4.2 and 5.2 on
both Power-Macs and old 68040-based Macs, all produce the integer
result C = 7401389035307025 though they display it differently. Its
last two digits "25" are wrong. The error comes mostly from roundoff
in the argument pi*sqrt(163) passed to cosh(...) . I know Matlab's
C is wrong because it differs from the result computed by my Fortran
program that used the PC's extra-precise arithmetic registers (10
bytes wide, 64 sig. bits): C = 7401389035307056 fit into 53 sig.
bits, but its "6" was wrong. C = 7401389035307055.5000 to 64 sig.
bits before it was rounded to 53. Its last ".5000" misleads. With
Digits := 25 MAPLE Vr5 gets C = 7401389035307055.49999978 . With
Digits := 30 MAPLE Vr5 gets C = 7401389035307055.5000000000015 .
Is it time to quit yet? That's the Table-Maker's Dilemma. No general
way exists to predict how many extra digits will have to be carried to
compute a transcendental expression and round it _correctly_ to some
preassigned number of digits. Even the fact (if true) that a finite
number of extra digits will ultimately suffice may be a deep theorem.
MAPLE Vr5 can compute C = 7401389035307055.49999999999995313... well
enough to round it correctly to 53 sig. bits, to 16 sig. dec., only
by carrying at least twice that number of decimal Digits during the
computation. Any other way to round C correctly is hard to imagine.
For every transcendental function like log(x) and exp(x) arguments
x may exist for which the function cannot be rounded correctly to 53
sig. bits without first knowing rather more than the first hundred sig.
bits of its value. Many such arguments have been tabulated in France
by Jean-Michel Muller's students, who claim to have computed _all_
such arguments x for each of several of the elementary transcendental
functions. Y^W is not yet among them and probably never will be, nor
will most of the named non-elementary functions of Statistics and of
Mathematical Physics, for example Bessel functions J(n, x) .
Besides, suppose every transcendental function with a conventionally
recognized name were implemented perfectly, correctly rounded within
half an ulp, instead of within 0.51 ulp, say. We should not expect
any program composed from them to behave significantly better thanks to
their perfection. Some examples, notably x - sin(x) , were treated
in my paper "Mathematics Written in Sand ...", pp. 12-26 of the 1983
Statistical Computing Section of the Proceedings of the American
Statistical Association. It has been reproduced and is now posted on
my web page http://www.cs.berkeleu.edu/~wkahan/MathSand.pdf ; there
see pp. 28 - 30. In that paper I wrote rashly ...
"So, uncompromising adherence to the most rigorous
rules for approximate arithmetic will not protect
a computer from unpleasant surprises. Apparently
the approximation of the continuum by a discrete
set must introduce some irreducible quantum of
noise into mathematical thought, as well as into
computed results, and we don't know how big that
quantum is. If we have to tolerate this unknown
noise, we might as well tolerate a little more."
Out of context that reads now like a license to toss rocks through the
windows of derelict buildings. I had intended to draw attention to a
persistent worry: If accuracy within half an ulp is more accuracy
than is reasonable to expect of _every_ function in our library, ...
How much accuracy is enough?
A desideratum generally acknowledged but not always achieved is ...
"In error by rather less than an ulp if by more than half."
If that is achieved, it guarantees that every _Cardinal Value_ is
honored: Whenever a function's exact value is a floating-point number,
this is the computed value of the function too. That desideratum makes
LOG10(100) == 2 , COS(0) == 1 , 9^3.5 == 2187 , ... .
By itself that desideratum is inadequate because it does not guarantee
sign symmetry like tan(-x) = -tan(x) , nor weakened monotonicity like
log10(x) >= log10(y) whenever x > y > 0 , nor weak inequalities like
arctan(x) <= pi/2 with non-floating-point bounds. (As luck would have
it this last inequality is easy to satisfy in the 8-byte wide IEEE 754
arithmetic used by MATLAB because its value of pi rounded to 53
sig. bits falls below the true pi by about 0.27 ulp(pi) .) Users'
reasonable expectations oblige implementors of mathematical libraries
to do better than merely keep their handiwork in error by less than one
ulp. The indeterminate extent of this obligation worries many of us.
MATLAB's LOG10 can err by too much more than an ulp. Its earliest
implementation came naively from a formula log10(x) = log(x)/log(10)
whose rounding errors spoiled the identity log10(10^m) == m at every
m in the set { 3, 6, 9, 12, 13, 15, 18, 21 } despite that 10^m is
computable perfectly for integers m = 0 to 22 . Since LOG10(10.0^m)
fell short by an ulp in those eight cases, they generated numbers
2.999999..., 5.999999..., 8.999999..., 11.999999..., etc.
instead of the expected integers. Expressions like "floor(log10(x))"
misbehaved in ways that had to be explained to angry programmers who
must have felt betrayed when MATLAB displayed 16 instead of the 17
sig. dec. needed to expose _all_ such numbers as non-integers. They
display on PCs thus:
3 6 9 12 etc.
All MATLAB versions display arrays entirely of integers thus:
3 6 9 12 etc.
The difference ought to be obvious; perhaps it's not obvious enough to
preclude confusion and demands for tedious explanations and excuses.
Confusion caused by roundoff is exacerbated by _Cosmetic Roundings_,
designed to conceal approximation performed by the underlying floating-
point arithmetic, when results are displayed. Another case of mass
confusion consequent upon poor policies for numerical displays appears
on pp. 12-17 of "Marketing vs. Mathematics" posted on my web page at
http://www.cs.berkeley.edu/~wkahan/MktgMath.pdf. Its moral is ...
Decimal displays of Binary nonintegers cannot always be WYSIWYG.
-----------------------------------------------------------------
| Trying to pretend otherwise afflicts both customers and |
| implementors with bugs that go mostly misdiagnosed, so |
| "fixing" one bug merely spawns others. |
-----------------------------------------------------------------
(MATLAB's users deserve to be able easily to display as few or as many
(up to 17) sig. dec. as they desire. Nonintegers should _always_ be
displayed with a decimal point, even if it is followed only by zeros
or blanks, in contrast to integers that never display with a decimal
point unless they are so big that their rightmost digits round away.
Why this decimal point display convention is followed by MATLAB 5.2
on Macintoshes but by no other MATLAB versions mystifies me.)
Tiresomely repeated explanations undermine resistance to temptation.
MATLAB's LOG10 had to be changed. Its successor in MATLAB 6.5 on
PCs employs tricky _Cosmetic Rounding_ to force LOG10(10.0^m) == m
for _every_ integer m for which 10.0^m does not over/underflow.
These are m = -307 to 308 . Moreover, if an array W is constructed
by removing from [-307: 1/16 :308] just those floating-point numbers
strictly outside the intervals shown boxed above, thus leaving 9828
numbers in W , then LOG10(10.0.^W) == W . This coincidence might be
construed as testimony that now LOG10 is nearly correctly rounded.
It isn't. LOG10 can err by almost 3 ulps.
On an IBM PC, MATLAB 6.5 gets LOG10(54) too small by 1.513 ulps;
this is the case that came to my notice by spoiling one of my programs.
And now the identity LOG10(10.0^w) == w is violated by more than one
ulp for some non-integer arguments w drawn from the intervals in the
box above. For instance LOG10(10.0^w) is now 2 ulps too small when
w = 6411/4096 = 1.565185546875 . MATLAB rounds x = 10.0^w = 36.743...
correctly, falling below the true 10^w by less than 0.48 ulp. But
LOG10(x) is computed almost 1.82 ulps too low. These two shortfalls
combine to produce a total shortfall of 2 ulps. Worse again is an
example x = 0.60411447293839671 = hex 1.354e7e009f12e / 2 for which
LOG10(x) errs by 2.904 ulps. Evidently Cosmetic Rounding intended
to cover up some rounding errors can worsen others substantially.
What is "Cosmetic Rounding" and what does it do for LOG10(10.0^m) ?
LOG10 plays a clever trick based upon how IEEE Standard 754 rounds
values midway between two adjacent floating-point numbers. The rounded
value is the one of those numbers whose last bit is zero. This "round
midway cases to nearest even" policy has advantageous properties. One
is that, if roundoff were random (and it often seems that way), IEEE
754's rounding would be Statistically Unbiased, putting the Law of
Averages to work for us when vastly many independent rounding errors
accumulate. Another advantage is enhanced conservation of mathematical
relations. Here is an esoteric illustration of conservation in action:
Suppose m ranges over integers from 1 to 2^52 = 4503599627370496 ,
and n ranges over integers in this set of frequently used divisors:
{ 2, 3, 4, 5, 6, 8, 9, 10, 12, 16, 17, 18, 20, 24, 32, 33, 36, ... } .
Display them in binary to see what these divisors have in common. Next
compute q = m/n and p = q*n in MATLAB, which rounds both quotient
and product to IEEE 754's specifications. Then p == m _always_.
If arithmetic were not binary, or if rounding did not conform to IEEE
754, the predicate "p == m" would fail for some pairs {m, n} . For
instance, decimal arithmetic produces (10/3.0)*3 = 9.999...999 < 10 ,
unless your calculator rounds cosmetically and must spawn consequent
anomalies elsewhere.
Every binary floating-point operation specified by IEEE 754 rounds by
default correctly, within half an ulp. Still, there seems to be a
bias towards small integers whenever they would be true results absent
roundoff, and also whenever they wouldn't. This bias tempts tricky
programmers irresistibly to try to exploit it, as in MATLAB's LOG10.
LOG10's trick exploits a bias towards small integers thus: Constants
tl02 = 6.64385618977472436 and TL02 = 6.64385618977472525 are
adjacent 8-byte floating-point numbers that straddle 2/log10(2) .
It exceeds tl02 by 0.374 ulp and falls 0.626 ulp below TL02.
Then LOG10(x) = LOG2(x)/tl02 + LOG2(x)/TL02 . Here LOG2 is the base
2 logarithm built into MATLAB since version 4. (It has been buggy
since then too; see FOOTNOTE 2.) And here is how the trick works:
In the computation of LOG10(10.0^m) each quotient LOG2(10.0^m)/tl02
and LOG2(10.0^m)/TL02 entails four roundings: in the constant, in
10.0^m , in LOG2 and in the division. These very rarely accumulate
to as much as 2 ulps in each quotient, each of which would be m/2 ,
an integer or half-integer, if computed without rounding. Because of
the way the constants were chosen, LOG2(10.0^m)/tl02 is usually high
by an ulp if not just right, and LOG2(10.0^m)/TL02 is usually low
by an ulp if not just right. If their sum is not just m it is half
an ulp away most likely; then IEEE 754 rounds it to m exactly.
This happens for every relevant m from -307 to 308 as luck would
have it, so MATLAB's current LOG10 appears perfect. But it isn't.
We saw that LOG10(10.0^w) differs from w = 6411/4096 by 2 ulps
despite that w has so many trailing zeros that the last paragraph's
reasoning must work. Yet the trick doesn't work. Bad luck. Really!
To see how big a role luck plays in such tricks, construct a constant
L2 = 0.30102999566398120 = ( log10(2) rounded to 53 sig. bits). In
fact L2 matches log10(2) to 56 sig. bits, as luck would have it.
Consequently the simple expression L2*LOG2(x) runs noticeably faster
than the current LOG10(x) and is more accurate by at least about an
ulp in their worst cases. However, that simple expression violates
the identity log10(10^m) == m by an ulp at some integers |m| > 22
at all of which the computed value of 10.0^m is already inexact.
Next, a trick worsens the simple expression's worst errors practically
imperceptibly at the cost of two more multiplies and an add thus:
Compute y = 0.25*LOG2(x) and return y + 0.20411998265592478*y .
The long constant converts exactly to 4*L2 - 1 . The returned value
matches log10(x) within about 1.4 ulps and conserves the identity
log10(10^m) == m for all integers |m| < 59 . Is this good enough?
Whoever has to explain that identity's violations will deem LOG10's
accuracy inadequate, as if in obeisance to a higher principle:
"If its accuracy has to be explained it's not accurate enough."
This slogan, tantamount to despair of explanation's power to overcome
oversimplification, oversimplifies the trade-off of accuracy versus
its costs, mostly performance degradation and protracted development.
Costs rise steeply with attempts to improve accuracy beyond some point
depending upon the available computing resources and ingenuity. Rather
than resort to cosmetic expedients which earn contempt for misplaced
ingenuity after the truth they attempted to conceal inevitably becomes
exposed, a better practice is to compare the cost of explanations with
the costs of accuracy increased sufficiently to obviate explanation,
and then choose. MATLAB could have chosen to compute LOG10 from one
of the more accurate expressions above using L2 , and then add to the
documentation displayed in response to "help log10" something like
% Note that MATLAB's binary floating-point arithmetic has to round
% 10^m to something slightly else at integers m < 0 or m > 22 .
% And then log10(10^m) may differ slightly from m if |m| > 58 .
This would have educational value too. Instead, buried in the text of
MATLAB's current tricky LOG10.M is an ambiguous comment ...
% Compute y = log2(x)/log2(10) with an averaging process so that
% roundoff errors cancel and log10(10^k) == k for all integers
% k = -307:308.
It depends upon what "and" means. Do roundoff errors cancel for x
in general, or just for x = 10^k ? A better comment would say ...
% Compute y = log2(x)/log2(10) from a tricky average that
% combines rounding errors to force log10(10^k) == k at
% every integer k = -307:308. Elsewhere log10 errs by
% less than about three units in its last (53rd) sig. bit.
But we don't have to choose one of the foregoing tricky schemes. Their
ingenuity is misplaced; "Too Clever by Half" the British would say.
MATLAB's resources afford affordable ways to compute log10 so well
that no apology nor explanation is necessary. Here is one way:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
function y = log10(x)
% LOG10 Common (base ten) logarithm
% log10(X) is the base-ten logarithm of the elements of abs(X)
% computed rather more accurately than MATLAB's own log10(X) .
%
% Note that MATLAB's binary floating-point arithmetic has to round
% 10^m to something slightly else at integers m < 0 or m > 22 .
% Despite that, all( log10(10.0.^M) == M ) for M = [-307:308] .
%
% See also LOG, LOG2, POW2, LG2, EXP, LOGM.
% For IEEE 754 floating-point and MATLAB versions 4.x - 6.x.
% Constants:
% R10 := 1/2 - 1/log(10) rounded to 53 sig. bits, accurate to
% = 0.0657055180967481695 = hex 1.0D213AB646BC7 / 10 hex
% L2h := log10(2) rounded to 42 sig. bits
% = 0.301029995663952832 = hex 1.34413509F7800 / 4
% L2t := ( log10(2) - L2h ) rounded to 53 sig. bits
% = 2.83633945510449644 / 10^14
% R2 := sqrt(1/2) rounded to 53 sig. bits, high by 0.44 ulp
% = 0.707106781186547573 = hex 1.6A09E667F3BCD / 2
x = abs(x) ; %... Has complex log10 an application?
[s, k] = log2(x) ; %... x = s*2^k , 1/2 <= s < 1, like frexp.
%... k*L2h will be exact because -1074 <= integer k <= 1024 .
% A bug in MATLAB 4.2 on 68040-based Macintoshes requires ...**
j = (k == -65536) ; if any(j(:)), k = k + 65536*j ; end % ...**
% All other versions of MATLAB can omit the previous line. ...**
% The segregation of 1/4 <= x < 4 is needed to run this program on
% computers that cannot accumulate the scalar product y below to
% 64 sig. bits as IBM PCs and 68040-based Macs can. On just
% these computers a simpler program produces better results sooner.
j = (abs(k-0.5) < 2) ; %... j = (-2 < k < 3)
if any(j(:)), s(j) = x(j) ; k(j) = 0*j(j) ; end
% Now 1/4 <= s(j) = x(j) < 4 and k(j) = 0 . Otherwise ...
j = (~j)&(s < 0.707106781186547573) ;
if any(j(:)), s(j) = 2*s(j) ; k = k-j ; end
s = log(s) ; %... presumed accurate well within an ulp .
y = (( k*2.83633945510449644e-14 - s*0.0657055180967481695 ) ...
+ s*0.5 ) + k*0.301029995663952832 ;
% End of LOG10.M (C) 1995 - 2004 W. Kahan
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
How accurate is my LOG10 ? It's hard to say without knowing how LOG
is implemented since its errors are inherited by LOG10 . According to
my tests, MATLAB's LOG(x) errs by at worst about 0.8 ulp over the
interval 1/4 <= x <= 4 , wherein my LOG10(x) errs by at worst about
1.7 ulps. Elsewhere my LOG10 errs by at most about 0.75 ulps. As
we have seen, MATLAB's LOG10(x) can err by almost 3 ulps anywhere,
and when 1 < x < 1.1 its errors can amount to many GULPs. ...
-----------------------------------------------------------------------
FOOTNOTE 1: What's an _ULP_ ? It's a Unit in the Last Place. Here
is my original (in 1960, now obsolete) definition:
Ulp(x) is the gap between the two floating-point
numbers nearest x , even if x is one of them.
"Ulp" was coined to characterize the accuracies of implementations of
run-time math. library functions on the IBM 7090 and 7094. They were
not so accurate nor fast as they should have been in 1960. By 1964
most of their deficiencies had been repaired by contributions to SHARE
(the organized users of IBM mainframes), and especially by the work
of Hirondo Kuki at the University of Chicago.
From time to time well-intentioned attempts to speed up a math. library
introduced inaccuries so gross that Fred Gustavson at IBM Yorktown
Heights coined the word "GULP" (Giga-Ulp) to describe them. Today
there is no excuse for a math. library less accurate than the Freely
Distributed Math. Library fdlibm promulgated from Sun Microsystems
by mostly graduates from the Univ. of Calif. at Berkeley who created
the libraries distributed with 4.3 BSD Berkeley UNIX in the 1980s.
By then the adoption of IEEE Standard 754 had made infinities and
NaNs so ubiquitous that the definition of an ulp had to be changed:
Ulp(x) is the gap between the two _finite_ floating-point numbers
nearest x , even if x is one of them. (But ulp(NaN) is NaN .)
E.g., for IEEE 754 Double Precision (8 bytes wide) as in MATLAB,
ulp(1) = eps/2, ulp(1.5) = eps, ulp(inf) = 2^971, ulp(0) = 2^-1074 .
Why "_finite_" ?
Without this word we would have ulp(Infinity) = Infinity , which is
not too bad but not so good in situations like the following:
For every finite floating-point number y we expect computed values
x := ( y - ulp(y) rounded ) and z := ( y + ulp(y) rounded )
to satisfy " x <= y <= z ". We cannot expect " x < y < z " in every
case; consider the case y := 1 and remember that by default IEEE
754 rounds midway cases to nearest even so z = 1 . These inferences
hold for decimal floating-point too though for very different reasons.
By defining ulp(Infinity) to be finite we sustain these inferences at
infinite y too; they would be invalidated there otherwise. Tighter
inferences are available just when y is finite; then
" x < y <= z , or x <= y < z , or both."
Both strict inequalities are false if y is infinite; even so, no
INVALID OPERATON occurs. Were ulp(Infinity) = Infinity , an INVALID
OPERATION would occur at x := y - ulp(y) producing NaN, and again
at each comparison of x and y . Gratuitous INVALID OPERATIONs
like these seem to punish a program cruelly and unusually: they would
wrest control from it in many environmemts, though not MATLAB's.
Here is a vectorized MATLAB program to compute ulp(x) on machines
whose arithmetics conform to IEEE Standard 754, as almost all do now.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
function u = ulp(x)
% ulp(x) is the gap between the two finite floating-point numbers
% nearest x , even if x is one of them. ulp(NaN) is NaN . E.g.,
% ulp(1) = eps/2, ulp(1.5) = eps, ulp(inf) = 2^971, ulp(0) = 2^-1074
u = abs(real(x)) ; %... to propagate NaN
k = ( u < 5.0e-308 ) ; %... copes with x too near 0
if any(k(:)), v = k*5.0e-324 ; u(k) = v(k) ; end
j = ( 9.0e307 < u ) ; %... copes with x near Infinity
if any(j(:)), v = j*(2^(1024-53)) ; u(j) = v(j) ; end
j = ~(j|k) ; %... copes with the rest
if any(j(:))
v = u(j) ; w = (0.7*eps)*v ; %... "0.7" works for IEEE 754
u(j) = min( (v+w)-v, (w-v)+v ) ; end
if any(imag(x(:))), u = u + i*ulp(imag(x)) ; end % recursive call !
% END of ULP.M (C) 1989 by W. Kahan
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
-----------------------------------------------------------------------
FOOTNOTE 2: In some ways log2 is more natural for binary arithmetic
than is the "natural" log based upon e . Now compatibility with
Intel's 8087 designed in 1977 compels hardware/firmware support for
log2 in IBM PCs and clones. Why doesn't MATLAB's LOG2 use it?
Instead its LOG2 has suffered from two bugs. A bizarre bug only in
MATLAB 4.2 on 68040-based Macintoshes subtracts 2^16 = 65536 from
log2(x) if 1/2 <= |x| < 1 . The second bug persists in MATLABs 4, 5
and 6 on PCs and Macs and probably others. One manifestation of
this bug is a difference of GULPs, not a few ulps, between 1 and
the computed value of z = log(2)*log2(1 + eps)/eps tabulated here:
Computers MATLABs z
~~~~~~~~~~ ~~~~~~~~ ~~~~~~
IBM PCs 4.2 0.9995
IBM PCs 5.3, 6.5 1.04
68K Macs 4.2, 5.2 1.105
Power Macs 4.2, 5.2 1.15
LOG10 in MATLAB 6.5 inherits this bug from LOG2 . This drastic
bug's longevity ought to give somebody pause for serious thought.
Here is a MATLAB program to compute y = lg2(x) that should be used
instead of y = log2(x) . Distinguish this usage from [y,k] = log2(x)
which resembles C's function frexp and is free from the second bug.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
function y = lg2(x)
% LG2 Base-two logarithm
% lg2(X) = the base-two logarithm of the elements of X , assuming
% them all to be nonnegative, without the bug in MATLAB's log2(x)
% that loses its relative accuracy as x decreases towards 1 .
%
% See also LOG, LOG2, EXP, LOGM.
% For IEEE 754 floating-point and MATLAB versions 4.x - 6.x.
% Constants:
% L2 := 1/log(2) rounded to 53 sig. bits, accurate to 55
% = 1.44269504088896339 = 1.71547652B82FE hex
% R2 := sqrt(1/2) rounded to 53 sig. bits, high by 0.44 ulp
% = 0.707106781186547573 = hex 1.6A09E667F3BCD / 2
if any(imag(x))|any(x < 0) %... Has complex lg2 any application?
error(' lg2(x) is provided only for x >= 0 .'), end
[s, k] = log2(x) ; %... x = s*2^k , 1/2 <= s < 1, like frexp.
% A bug in MATLAB 4.2 on 68040-based Macintoshes requires ...**
j = (k == -65536) ; if any(j(:)), k = k + 65536*j ; end % ...**
% All other versions of MATLAB can omit the previous line. ...**
% The segregation of 1/4 <= x < 4 is needed to run this program on
% computers that cannot accumulate the scalar product y below to
% 64 sig. bits as IBM PCs and 68040-based Macs can. On just
% these computers a simpler program produces better results sooner.
j = (abs(k-0.5) < 2) ; %... j = (-2 < k < 3)
if any(j(:)), s(j) = x(j) ; k(j) = 0*j(j) ; end
% Now 1/4 <= s(j) = x(j) < 4 and k(j) = 0 . Otherwise ...
j = (~j)&(s < 0.707106781186547573) ;
if any(j(:)), s(j) = 2*s(j) ; k = k-j ; end
y = log(s)*1.44269504088896339 + k ;
% End of LG2.M (C) 1995 - 2004 W. Kahan
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
How accurate is my LG2 ? It's hard to say without knowing how LOG
is implemented since its errors are inherited by LG2 . According to
my tests, MATLAB's LOG(x) errs by at worst about 0.8 ulp over the
interval 1/4 <= x <= 4 , wherein my LG2(x) errs by at worst about
1.5 ulps. Elsewhere LG2 errs by at most about 0.7 ulps. As we have
seen, MATLAB's LOG2(x) can err by GULPs as x descends to 1 .
-----------------------------------------------------------------------
Ruminations on the (Re)Development Costs of Numerical Software
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Corrections entail changes intended for somebody's benefit; but every
change, no matter how small, imposes a cost somewhere upon somebody
else who is entitled at least to grumble ...
"Why couldn't it have been gotten right in the first place?"
I would prefer to believe that Eternal Truth is worth its price no
matter how small its benefit seems at first, though the poet who wrote
" 'Beauty is truth, truth beauty,' -- that is all
Ye know on earth, and all ye need to know. "
("Ode on a Grecian Urn", John Keats, 1795-1821)
had no exposure to cost/benefit analyses. Certainly he glossed over
the costs of first attending to grubby details to ascertain the Truth
or fashion Beauty, and then delivering it to deserving destinations.
Attending to those last few grubby details inflates the development
costs of reliable numerical software so badly as recalls to mind the
costs of maintaining an old house. Replacing a worn out wall switch is
a task that should entail unfastening six screws and refastening them.
But behind that switch is ancient wiring whose insulation has dried out
and cracked, so it must be replaced lest it start a fire. Access to
the wiring inside the wall is blocked by a pipe. Thus the replacement
of a switch costing a dollar turns into jobs for a plumber, carpenter,
electrician, plasterer and painter, or else for an exceptionally
versatile handyman. A dripping faucet threatens the budget similarly.
We all wish someone else had attended to details well enough that they
would no longer burden us. In what working environments can we expect
some "someone else" to succeed? Not in environments that oblige her
to cope laboriously and repeatedly with exceptions, boundary cases and
gratuitous limits and details that could have been reduced to something
"... as simple as possible, but no simpler"
(Albert Einstein, 1879-1955)
but weren't by whoever built her environment. Consequently we endure
swollen capture-cross-sections for error when trying to write numerical
software using today's programming languages and development systems.
The best numerical programming environment I have known was SANE, the
Standard Apple Numerical Environment on old 680x0-based Macintoshes;
look at "Apple Numerics Manual, Second Edition" (1988) Addison-Wesley,
Reading Mass. Unfortunately, in 1992, just as programmers began to
express appreciation for SANE, it suffered Collateral Damage from the
CEO's "Business Decision" to switch Macintoshes onto an IBM Power
PC processor lacking the 68040's hardware support for SANE. This
CEO's "Big Picture" experience selling carbonated beverages had made
him disdainful of minutiae that matter too much in computing. Now at
Hewlett-Packard a promising numerical programming environment based on
Intel's Itanium processor and the C99 language (it's not for a mass
market) is struggling into existence. Its future is uncertain too.
I wish the makers of MATLAB, recalling their roots, would rededicate
attention to their numerical underpinnings. Among improvements needed:
Declarable 4-byte, 8-byte and (10 or 16)-byte variables *
but use old Kernighan-Ritchie C rather than Java/Fortran
rules for expression-evaluation.
Access to IEEE 754's flags for trapless exception-handling *
Access to IEEE 754's directed roundings as diagnostic aids *
Support for fast (8 or 10)-byte 64-bit integer arithmetic
because 32-bit is too narrow for constructing test-data
when the accuracy of a floating-point program is under test.
Faster and more intelligible interface to VPA via Maple
Later ... Support for nascent Interval Arithmetic too
Operator Overloading to handle linear space objects
along lines explored by Dave Meredith's LINALG
But all that is a story for another day.
-----------------------------------------------------------------------
* To read more about the first three improvements requested look at
these items posted on :
"Marketing vs. Mathematics"
"How Java's Floating-Point Hurts Everyone Everywhere"
(coauthored with Joe Darcy)
"Matlab's Loss is Nobody's Gain"
Questions about the scope and persistence of MATLAB 6.5's directive
" system_dependent('setprecision', {24|53|64}) " remain unanswered.
It alters the values computed for 10.0.^M , but its effects upon
log10.m and mxmuleps.m have not yet been figured out.
-----------------------------------------------------------------------