NaN.txt What's in a NaN ? 31 Aug. 2007 ~~~~~~~~~~~~~~~~~~ by W. Kahan The symbol "NaN" inspired by "Not a Number" is necessary to provide Algebraic Completion to a number system that extends the field of real or complex numbers to include Infinity. NaNs do annoy us at times. "Algebraic Completion" ensures that no arithmetic operation need be "Undefined" no matter what its operands may be. This is necessary for computers, which are not usually programmed to stop and reconsider what they have been commanded to do. Humans could stop and reconsider, though too often they don't. Neither Algebraic Completion nor NaNs are needed by human mathematicians not engaged in computer programming. "Algebraic Completion" is dangerously misleading unless accompanied by "Algebraic Integrity", which means that ... If exceptions produce different evaluations of expressions that are algebraically equivalent over almost all the finite Real Field, then (absent roundoff) those evaluations can produce at most two different values; and if these are not +Infinity and -Infinity then at least one is NaN . For example here are evaluations of three expressions at three places: 2/(1 + 1/x) 2x/(1 + x) 2 + (2/x)/(-1 - 1/x) ~~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~ @ x := -1 : +Infinity (!) -Infinity (!) -Infinity (!) @ x := 0 : 0 (!) 0 NaN (!) @ x = Infinity : 2 NaN (!) 2 "(!)" above means that a flag, INVALID or DIVIDE-BY-ZERO, gets raised too. Of course, roundoff can undermine Algebraic Integrity; one expression may resist roundoff better than another in a part of the expressions' domain. That can motivate a programmer's choice. There have been attempts to achieve Algebraic Completion without maintaining Algebtaic Integrity. APL defined 0/0 := 1 ; MathCad defined 0/0 := 0 ; these have turned out to be badly misguided. In 1962 Seymour Cray included "Indefinites" in the CDC 6600, the super-computer of the 1960s. These NaN-like floating-point objects were intended to allow the completion of vectorized arithmetic despite occasional exceptions like Overflow and Division-by-Zero. However, the 6600's Fortran compiler left Indefinites unsupported. Their unpredictable misbehavior rendered them useless, so their creation had to abort program execution, usually with an inscrutable error-message. "NaN" is NOT "Undefined". Quite the contrary. NaNs' persistence in arithmetic operations is defined fully. However, an interpretation of NaN as the output of a program is defined by the programmer if only by default. A NaN may signify that the program encountered a situation not anticipated by the programmer, so a NaN was generated instead of a numerical result, finite or infinite, that could only cause worse confusion. This is what IEEE Standard 754 provides by default. The generation of a NaN from non-NaN operand(s) must also raise the INVALID OPERATION Flag. Ideally this flag, when raised (not Null), should serve as a pointer to a record of at least the location of the invalid operation and preferably a description too of the unanticipated situation, so that it may be diagnosed retrospectively first by the user of the program and subsequently by its programmer. The provision of information useful for retrospective diagnosis poses challenges we wish designers of computer systems would meet, but they haven't yet. Non-default interpretations of NaNs serve other programmers' needs. A NaN may signify that the result of a program's search was not what the user desired because it does not exist or else because the program failed to find it for lack of some resource like time or a better place to start the search. Or what was sought was found in too many places. The NaN's significance could be conveyed by an error-code or better by an informative message, but too often neither is forthcoming. And if a program almost always succeeds, its users are likely to overlook an error-code put out by the program. A NaN is harder to overlook. A NaN may signify that the argument offered to a function lies beyond its domain. Whether such a situation is benign or fatal is up to the program that invoked the function. It need not terminate a search; the search program may treat the NaN as advice to search elsewhere for a zero or extremum of the function, especially when the boundary of the function's domain is difficult for the search program's user to specify. A Nan may denote a missing datum. This is what a Signalling NaN is good for if it prompts a statistical program's user to supply a missing datum or else choose how to interpolate it from neighboring data. But this response to Signalling NaNs may require a trap-handler that many computing platforms cannot support; instead they turn off the Signal and then must raise the INVALID flag. Signalling NaNs go unused. A NaN may signify an uninitialized variable, perhaps an oversight, or perhaps inconsequential because it is destined to be ignored later. Some NaNs can be ignored. Some NaNs must be ignored. In general, if for some value X the value v := f(X, y) is the same for every finite and infinite value of y , then necessarily f(X,NaN) = v too. For instance a complex number is infinite if either its imaginary or real part is infinite even if its other part is NaN . By definition NaN^0 := 1 because y^0 := 1 for every y by an almost universal convention. Max{|x|, |y|}, |x + iy| and norm([5, x, y]) must be infinite if either x or y is infinite regardless of whether the other is NaN. Likewise |x| + |y| , but that may be too much to ask. Sometimes the appropriate disposal of a NaN will require adscititious information from far beyond the NaN's immediate locale. For instance max{5, NaN} should be 5 in the context of Windowing during graphic rendering; otherwise NaN might reasonably be expected. In any event max{5, NaN} should be the same as max{NaN, 5} but often isn't. NaNs must unavoidably violate some familiar relationships; one example is " if max{x, y} is x then min{x, y} is y ". Some questions can be resolved only by agreements upon conventions that will appear arbitrary. For example, where does a NaN in an array go when the array is sorted? Why not the far end of the sorted array? A NaN that turns up in a program unexpectedly can cause a malfunction like an interminable loop or inappropriate jump. These can arise from unanticipated Disorder among NaNs and numbers. All the predicates " x < y " , " x <= y " , " x >= y " and " x > y " are FALSE and should raise the INVALID flag if either or both of x and y is/are NaN ; but then " x != y " is TRUE and " x == y " is FALSE without raising that flag. Ideally, programming languages should provide additional "silent" comparison predicates like, respectively, " x !>= y " , " x !> y " , " x !< y " and " x !<= y " that can be unexceptionally TRUE when x or/and y is/are NaN . Unfortunately, the programming language community has resisted the introduction of new "silent" predicates as of this writing. Worse, many optimizing compilers turn " x == x " into TRUE and " x != x " into FALSE at compile-time, so these two predicates cannot be used reliably to segregate a NaN from every other value of x . Instead, the Math. library must be augmented by "Generic" logical functions isNaN(x) , isInf(x) and isFinite(x) that apply apt bit-twiddling to x to reach their decisions. Legacy software written before NaNs existed, or by programmers not aware of them, may need revision to cope with NaNs appropriately. A first revision is to documentation; how easily can the program's user predict what it will do about NaNs ? If not easily enough, revisions to the program become necessary. NaNs detected in its input will need treatment different from NaNs created during its execution. Probably the easiest way to cope with a NaN in input is pass it on to the output undigested, if not ignorable. For instance, a program to compute the cube root of a real scalar x should begin with if ( (0.5*x == x) or isNaN(x) ) then return(x) endif to cope with 0.0, Infinity and NaN . A program that acts upon an array must be scrutinized to determine how much output is contaminated by a NaN in the input. Often such a program needs no revision except for computers whose arithmetics are slowed too severely by NaNs . A NaN created after an exception like Overflow or Division-by-Zero in legacy software requires analysis of the programmer's intent to cope with such exceptions. We should resist the temptation to enable a trap that will abort the program upon an INVALID operation. Abortion can be more dangerous than continued execution. Instead the program should be scrutinized for a loop that may never quit if a NaN or Infinity turns up in the test(s) for the loop's termination. Even if the loop will surely terminate, it may run intolerably too long on computers whose arithmetics are slowed too severely by NaNs and Infinities. "NaN" is no synonym for "Error". Ignoring a NaN may be a mistake; ignoring a raised INVALID OPERATION flag too is very likely mistaken. NaNs and Infinities are nuisances whose existence is justified only by the experience of an era when they did not exist. Then exceptions produced unpredictable consequences whose only portable treatment was abortion, and that is undebuggable because traps are imprecise at the source-code level after optimizations that exploit concurrency. Now the worst impedimant to portable handling of floating-point exceptions is inadequate support for NaNs, Infinities and Flags in compilers.