Date: Thu, 31 May 2001 21:23:07 -0700 (PDT)
From: "W. Kahan" <wkahan@EECS.Berkeley.EDU>
To: gls@labean.East.Sun.COM
Subject: Your  72  slides
Cc: wkahan@EECS.Berkeley.EDU, joe.darcy@eng.sun.com


Guy:  Thanks for sending your slides.  Do you have enough time
   allotted to your presentation to cover them all?   Even if 
   you rushed through them as fast as they could be read,  not 
   allowing any time for passage through the viscosity of the 
   human mind,  they would take at least about an hour.
   
   I agree with some of what you propose and disagree with 
   other parts,  as might be expected.  I agree with pages  1  
   and  2,  and with the intent behind  page 3  even if it is 
   slightly naive,  and with the first bullet on page  5.  Then 
   we diverge at the word  "portable",  to which you assign a 
   meaning that goes well beyond what the word meant for at 
   least  30  years,  from roughly  1960  to  1990.
   
   We agree that  Java  should not support traps,  as they are 
   construed by hardware implementors,  though we have different
   reasons.  Certainly,  applications programmers should not be 
   expected to write their own trap handlers to cope with  Over/
   Underflows,  Divisions-by-Zero  or  Invalid Operations.  But 
   a cure you advocate turns out to be worse than the disease; 
   the  OV  and  UN  that  Fraley and Walther  proposed  turned 
   out to burden,  not ease, the lives of applications programmers.
   Wm. J. Cody  was the point man on that issue twenty years ago, 
   and came down strongly against their inclusion in the proposed 
   IEEE standard.  Only because we needed  F & W's  votes did we 
   "Compromise"  by introducing the  Signalling NaNs  to accommodate 
   them;  you can see how little good that did.
   
   Your example starting on  p. 31  is a  Red Herring.  It can be 
   handled easily on machines that tell for each floating-point 
   instruction which status bits to update in the event of a 
   floating-point exception;  this is done on  Itanium  to 
   facilitate speculative execution down both paths of branches. 
   But that capability is irrelevant to your example for this 
   reason:  Knowing  WHICH  of  x  and  y  overflowed is almost 
   never nearly as important as knowing that at least one of them 
   did,  after which the cost of recomputing both,  if need be, 
   will usually be inconsequential compared with the cost of coping 
   with the overflow.  Flag  tests are sparse in most applications 
   programs,  hardly ever to be found in tight loops.  Exceptional 
   treatment of exceptions is appropriate for the  Math  library of 
   elementary transcendental functions  ONLY  if these run very 
   fast,  which is another interesting question if  Java  continues 
   to try to legislate exact reproducibility of all floating-point 
   computations.
   
   Please understand that I do not scoff at the need for  SOME  kinds 
   of floating-point computations to be reproducible exactly.  We do 
   differ on whether that much reproducibility can be enforced by the 
   language alone,  and therefore whether it should be invoked as an 
   excuse for canonizing a floating-point architecture  (SPARC's)  
   disadvantageous for the overwhelming majority of programmers.  In 
   the absence of a  Pope of Computation,  there is no way to ensure 
   exact reproducibility of approximated results no matter how hard 
   mere language designers and implementors try to enforce it.  Instead
   I believe that exact reproducibility should be the just reward for 
   a programmer who demands it explicitly and pays the price in both  
   a disciplined restriction to  (sometimes cumbersome)  programming 
   locutions that guarantee reproducibility,  and a consequent loss 
   of execution-time speed.  And that demand makes sense for  Binary 
   far less than for  Decimal  floating-point,  which is what  Java 
   should have specified at the outset if exact reproducibility were 
   really intended by its designers.
   
   You ask on  p. 58  why  "IEEE 754  supports wrapped exponents  
   _only_  through traps".  The  "_only_"  is gratuitous,  as becomes 
   clear on  p. 59  where you mention  scalb  and  logb.  The traps 
   specified by  IEEE 754  serve  _only_  as clues for hardware 
   designers that their traps need not be precise  (provided they are 
   restartable),  and need supply only certain minimum information 
   (wrapped result in a known destination for  Over/Underflow,  operand
   values and intended destination for  Invalid Operation,  etc.)  to 
   what would  (we vainly hoped)  turn out to be a limited menu of 
   useful options along the lines of my notes you have cited.
   
   Incidentally,  your comment on page  62  that my  "original example 
   does not address the question of overflow in the sums"  is mistaken. 
   The counting mode as described in my notes and actually implemented 
   on the  IBM 7090/7094,  and used very successfully to compute long 
   products/quotients for physicists and chemists in the mid  1960s,  
   counts up or down correctly regardless of whether over/underflow 
   occurs in a sum or a subsequent product or even quotient.  The 
   counting mode was used also to speed up the comparison of complex 
   absolute values;  sqrt(x**2 + y**2)  vs.  sqrt(a**2 + b**2)  was 
   turned into  (x-a)*(x+a)  vs.  (b-y)*(b+y)  after sorting to make 
   x = |x| >= y = |y|  and  a = |a| >= b = |b| ,  and also counting 
   any over/underflows.  I know that the counting mode has only very 
   few uses,  but can you think of a better way to do what it does?
   
   I think the suggestion that the significand's last bit be used as 
   an  Inexact  flag betrays a profound misunderstanding of its uses:
   "An inexact value represents a result that fell somewhere in the 
   interval between adjacent exact values"  is mathematically true 
   and at the same time false for human purposes.  Likewise,  you 
   have not thought through the reasons for identifying  NaNs  with 
   their origins;  the  _place_  in the program where the  NaN  was 
   generated is what we need much more than the op-code.
   
   I agree with your suggestions about  max  and  min  on  p. 52,  
   and with the usefulness of fast tests of flags and of operands' 
   categoties.  For the latter,  perhaps reprogrammable  PLAs  would 
   be helpful.
   
   Rounding modes are  NOT  just special trap handlers.  DEC's  Alpha 
   almost got them right when some were allowed into the  OP-code,  but 
   put the wrong ones there.  (p. 65)
   
   Trap handlers should certainly be subroutines supplied along with 
   the run-time library of  Math  functions and  Decimal-Binary 
   conversions.  There are so few things that can be done usefully 
   after floating-point exceptions,  they might as well be put into 
   a standardized menu.  (p. 68)  The  _Modes_  requested by a program
   to determine how exceptions shall be handled,  and how rounding 
   will be directed,  are values of variables whose scopes are the 
   proper responsibility of language designers.  We agree that they 
   are not best regarded as  "dedicated registers",  and that  "Side 
   effects on global state are a bad idea ...".   (pp. 69 - 70)
   
   "Borneo  may be on the right track for  Java"  warms my heart.
   
   :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
   
   In any event,  I hope you will take pity on your audience by 
   showing them only a selection from the pages you have sent me.
   A memorable treatment of a few issues is more humane than an 
   attempt to cover all of them in one hour.
   
   With warmest regards,
                                              W. Kahan
                                           <wkahan@cs.berkeley.edu>
                                           (510) 642-5638


:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Date: Wed, 6 Jun 2001 18:06:21 -0400 (EDT)
From: Guy Steele - Sun Microsystems Labs <gls@labean.East.Sun.COM>
Reply-To: Guy Steele - Sun Microsystems Labs <gls@labean.East.Sun.COM>
Subject: Thanks for the comments
To: wkahan@EECS.Berkeley.EDU
Cc: Guy.Steele@east.sun.com, joe.darcy@eng.sun.com


   Date: Thu, 31 May 2001 21:23:07 -0700 (PDT)
   From: "W. Kahan" <wkahan@EECS.Berkeley.EDU>
   To: gls@labean.East.Sun.COM
   Subject: Your  72  slides
   Cc: wkahan@EECS.Berkeley.EDU, joe.darcy@eng.sun.com

Thank you so much for your prompt and extensive feedback
on my slides---much more than I had expected.  I was able
to read them late Saturday evening and thereby improve my
talk to some extent.

   Guy:  Thanks for sending your slides.  Do you have enough time
   allotted to your presentation to cover them all?   Even if 
   you rushed through them as fast as they could be read,  not 
   allowing any time for passage through the viscosity of the 
   human mind,  they would take at least about an hour.

Yes, I had an hour, and I realized that I was pushing it on
the material.  But I made printed handouts---with thumbnails
at four per page, 72 slides fit on nine sheets of paper,
double-sided, which is not too bad.  That allowed me to
comment on the slides rather than read them, knowing that
listeners could return to puzzling points later if necessary.
On a slide such as #39 or #42, I merely noted that there
were lots of special cases and read one or two as examples,
then went on.  I ended up skipping slides #59 through #63,
stopping only to note that they were slightly esoteric tools
that allow one to manipulate exponent and significand
separately, and to request that everyone scribble out the
footnote on slide #62 because it was based on my misreading
of your notes (as you pointed out).
  
 
   I agree with some of what you propose and disagree with 
   other parts,  as might be expected.  I agree with pages  1  
   and  2,  and with the intent behind  page 3  even if it is 
   slightly naive,

Naive in the sense that it slavishly mirrors the existing Java
"import" mechanism?  It is thereby understood that one can
important a single explicitly named static member, such as by

	import static java.lang.Math.sqrt;

just as one can import a single explicitly named class; but there
is no facility for renaming an entity as it is imported, for example.
Or do you view it as naive in some additional sense?


		    and with the first bullet on page  5.  Then 
   we diverge at the word  "portable",  to which you assign a 
   meaning that goes well beyond what the word meant for at 
   least  30  years,  from roughly  1960  to  1990.

Yes, so I have in the past; but for this talk, at this slide,
I discussed this precise variance in opinion and meaning concerning
this term.  Briefly put, I noted that portability down to the last
bit was explicitly envisioned by Coonen, at least (slide #6),
as one alternative; noted that Java was the first widely-used
language to bring that goal to fruition by design (and by fiat);
then pointed out that a capable programmer could easily write
code that would behave in an acceptably portable and desirably
speedy fashion whether the precision was double or extended;
then asked the question of what fraction of the original target
audience for Java would likely be capable in this area, and noted
that the target audience has shifted in the last five years.

I still defend the original decision to require bit-for-bit
portability in an attempt to protect the originally envisioned
target audience from falling into certain traps, particularly
in the case of code debugged on an Intel machine and later
downloaded to less capable machines.  It makes no more sense
to criticize the original design of Java for choosing not to
support numerical programming than to criticize Fortran 2000
for failing to support network sockets and HTML.  (Which reminds me:
they both do a poor job of supporting linked lists and functional
arguments---why can't they both be more like Lisp??)  Nevertheless,
those interested in numerical applications seem to like what they
see in Java.  You need to understand that many in the original
Java community regard the "Fortran programmers" as party crashers.
I am not one; I think there is value in extending Java to support
the needs of many more communities.  But it must be done in
a thoughtful way, balancing many interests.

Moreover, it is much easier to marshal support and resources
for a language facility if it has broad applicability.  Thus
lightweight classes and operator overloading, considered as
general facilities, are much easier to sell the Java community
than something as specific as complex and imaginary numbers.


   We agree that  Java  should not support traps,  as they are 
   construed by hardware implementors,  though we have different
   reasons.  Certainly,  applications programmers should not be 
   expected to write their own trap handlers to cope with  Over/
   Underflows,  Divisions-by-Zero  or  Invalid Operations.  But 
   a cure you advocate turns out to be worse than the disease; 
   the  OV  and  UN  that  Fraley and Walther  proposed  turned 
   out to burden,  not ease, the lives of applications programmers.
   Wm. J. Cody  was the point man on that issue twenty years ago, 
   and came down strongly against their inclusion in the proposed 
   IEEE standard.  Only because we needed  F & W's  votes did we 
   "Compromise"  by introducing the  Signalling NaNs  to accommodate 
   them;  you can see how little good that did.

Yes.  I did not have space on the overheads or time in the talk
to do a detailed comparison to contrast my proposal with that of
Fraley and Walther.  I had to acknowledge their priority in
suggesting OV and UN symbols, but there are notable differences.
F & W seemed to want a mode that eliminates all "risk", especially
by requiring that adding UN to an ordinary value must produce ERR
rather than letting the UN quietly disappear.  Such a definition
could indeed cause ERR symbols to proliferate unnecessarily---though
F & W are indeed correct that letting UN disappear when added to
a normal is risky if the programmer has not thought out the
consequences.

I am not proposing such a stringent treatment, however.
In my proposal, OV behaves very much as infinity does now,
and OV would tend to be propagated in exactly those situations
where infinity is propagated now.  It's only purpose is to
distinguish, so to speak, between an "exact" infinity arising
from division by zero and an "inexact" infinity arising from
overflow or the reciprocation of an underflow.  Similarly,
UN distinguishes between an "exact" zero, produced by subtraction
of equals or multiplication by an exact zero, and a very tiny
but nonzero quantity.  I address the value of this distinction
below in response to your example about the comparison of
the norms of complex numbers.

I considered an even more elaborate alternative, in which one has two
UN values: one ("UN!") that guarantees that the result is tiny and
another ("UN?") that merely indicates that some contributing
computation was tiny but the result may not be.  Thus the single
precision product (2^-65)(2^-65) would produce "UN!", but multiplying
that result by (2^60) would produce "UN?".  On the other hand,
multiplying "UN!" by a value not greater than 1 can correctly produce
"UN!".  Then adding "UN!" to an ordinary number quietly disappears
(when rounding to nearest) but adding "UN?" produces "UN?", thus
behaving rather like the ERR symbol of F & W.  However, I am not
certain that this additional complication is useful in practice;
it may be subject to the same objections that Cody had years ago.

In the actual proposal I put forward, UN times any finite value
produces UN.  This is exactly analogous to the behavior under
IEEE 754 where drastic underflow produces a zero, and multiplying
that zero by any finite value produces zero.  The only difference
is that the fact of underflow has been encoded in the value.
And adding this value to any nonzero number causes the underflow
indication to quietly disappear; so in this respect it is a little
different from the IEEE 754 underflow flag.


   Your example starting on  p. 31  is a  Red Herring.  It can be 
   handled easily on machines that tell for each floating-point 
   instruction which status bits to update in the event of a 
   floating-point exception;  this is done on  Itanium  to 
   facilitate speculative execution down both paths of branches. 

I allude to this in the rotated comment about the IA-64 at the
right-hand edge of slide #35.  But, far from being a Red Herring,
it precisely supports my point: there is value in not limiting
the recording of flag status information to a single global
status register, but instead providing a way to associate flag
status information with results of computations.  The IA-64/Itanium
approach is a middle ground that requires a compiler (or assembly
language programmer) to dynamically associate multiple flag status
registers with computations and their results, rather than (a) having
a single status register, or (b) having a separate status register
associated with each and every floating-point register.  This falls
under the category of "additional explicit destination register"
for state in the list on slide #30.  It is a legitimate and very
clever approach to getting some of the advantages of associating
status flag information with computations rather than with intervals
of time; it is completely compatible with IEEE 754, but it makes things
rather more complicated for the compiler.  The approach I want to
explore is somewhat incompatible with IEEE 754, but makes things
much easier on the compiler, among other advantages.


   But that capability is irrelevant to your example for this 
   reason:  Knowing  WHICH  of  x  and  y  overflowed is almost 
   never nearly as important as knowing that at least one of them 
   did,

Which is why I labeled the example "artificial" on slide #31.

	 after which the cost of recomputing both,  if need be, 
   will usually be inconsequential compared with the cost of coping 
   with the overflow.  Flag  tests are sparse in most applications 
   programs,  hardly ever to be found in tight loops.

I conjecture that one reason for this, at least, is that flag testing
is too expensive on today's hardware.  I believe it would be more
useful, and more used, if flag testing were cheap and easy to express.

						       Exceptional 
   treatment of exceptions is appropriate for the  Math  library of 
   elementary transcendental functions  ONLY  if these run very 
   fast,  which is another interesting question if  Java  continues 
   to try to legislate exact reproducibility of all floating-point 
   computations.

One of my goals is to make the use of flags much faster by
encouraging hardware designers to implement fast and convenient
test instructions.  Another goal is to make coping with overflow
and underflow significantly faster by proving better facilities
for scaling computations.  (See below.)

As for this example, I suggested that the tight loop shown on
slide #37 is more plausible.  This does indeed have flag testing
in a tight loop.  If the statistic of the program are that
N is 10,000 and overflow occurs about once in a million iterations,
then this approach is wrong and it would be better to test flags
after the entire loop is done, or to use a trap.  But if N is
1,000,000 and overflow occurs about once in a million iterations,
then it is more likely than not that the exception will be
encountered, and it may be better to catch it early than to
have to redo the entire loop; therefore either a trap or an
explicit test on each iteration is called for.  In some architectures
the trap is better; I am out to show that under circumstances that
are plausible, even likely, on today's architectures, the use of
an explicit test may be more efficient.  (Recall that reacting to
a trap may require tens, hundreds, or even thousands of clock cycles.)


   Please understand that I do not scoff at the need for  SOME  kinds 
   of floating-point computations to be reproducible exactly.  We do 
   differ on whether that much reproducibility can be enforced by the 
   language alone,  and therefore whether it should be invoked as an 
   excuse for canonizing a floating-point architecture  (SPARC's)  
   disadvantageous for the overwhelming majority of programmers.

The overwhelming majority of programmers are busy programming
point-and-click user interfaces, e-commerce, and video games.
Most Java programmers don't need floating-point computations;
they're trying to capture credit-card numbers and email addresses.

Of course such reproducibility can be enforced by a language.
Java (before the Great "strictfp" Compromise) did a pretty good
job of that, except to the extent that implementors violated
the implementation specification---and I suspect that they did
this not because they were expert in floating-point arithmetic
and had the same principled reasons that you do for disobeying
or arguing against the specification, but because they were actually
quite naive about the issues and simply assumed that using the native
instructions of their chosen target machine, as they have in the past
for other, less rigorously specified languages, would "do the right
thing" (or close enough).

Then again, reproducibility can't be enforced by a language, because
people can always choose to use another language.

The question is, should Java be extended in such a way to be more
useful to other user communities, in particular those interested
in numerical programming?

I have consistently argued in favor of this general principle, though
I have often argued against specific proposals for doing so because
part of my job is to balance the needs of multiple user communities.

I agree, for example, that Java might be somewhat easier to use and
less error-prone for numerical purposes if it were to use the
"widest-precision-anywhere-in-an-expression" rule.  But that rule
violates what I regard as a much more fundamental principle of
language design, which is that you can always break off a
subcomputation and give it a name, then refer to it by that name,
without in any way disturbing the behavior of the program.  The
ability to name things is one of our most important tools in taming
complexity; I must balk at any attempt to undermine naming mechanisms
and their expected properties.  Now, these two goals are not
necessarily mutually incompatible under all circumstances, but
preserving them both requires great care and finesse, and may place a
burden on the application programmer---and, what is more important,
the maintenance programmer---in some languages.

Which brings me to an important meta-principle.  One of the goals
of a good language design is not only to make it easy to express
a program in the first place, but to make it easy to express that
program in such a way that perturbations to the program can be
made easily and reliably.  The careful programmer often must think
about not just a single point in the space of programs, but
a cluster of design points that might be of interest in the future,
and express his program in such a way that these alternate design
points may be reached through small changes in the program text.
Naming mechanisms are one tool for achieving this goal (subroutines
being a special case of such naming mechanisms).


								  In 
   the absence of a  Pope of Computation,  there is no way to ensure 
   exact reproducibility of approximated results no matter how hard 
   mere language designers and implementors try to enforce it.  Instead
   I believe that exact reproducibility should be the just reward for 
   a programmer who demands it explicitly and pays the price in both  
   a disciplined restriction to  (sometimes cumbersome)  programming 
   locutions that guarantee reproducibility,  and a consequent loss 
   of execution-time speed.  And that demand makes sense for  Binary 
   far less than for  Decimal  floating-point,  which is what  Java 
   should have specified at the outset if exact reproducibility were 
   really intended by its designers.

You seem to have paraphrased Tom Lehrer: "Speed is the important
thing, rather than getting the right answer."

I believe that exact reproducibility (perhaps at the price of speed)
should be the reward for a programmer who has said nothing special
and may be unaware of the persnickety ways in which computers may
vary from model to model.  You yourself have exhibited many examples
of computations that work as desired at one precision and fail at
another precision---sometimes smaller and sometimes larger.  It is
very important to me that once a numerically naive programmer
has verified correct operation of his program on one implementation
---whether the program manages to exhibit that desired behavior
through use, abuse, or nonuse of floating-point arithmetic---
he has an extremely good chance of that program operating as desired
and tested on millions of other machines over which he has no control
other than the expectation that it conforms to the Java Language
Specification.

Speed should be the just reward for a programmer who demands it
explicitly (thereby indicating that he claims the competence and takes
the responsibility for deciding whether such variations are tolerable
for the purpose at hand) and pays the price in a disciplined
restriction to programming locutions that produce behavior that always
lies within an acceptable range of behaviors despite the variations
among the supporting implementations, or in a disciplined restriction
of the use of the program to specific implementations that are
adequate to support it.

Therefore the introduction of a keyword ("strictfp") to allow the
programmer to indicate whether certain variations in floating-point
behavior are tolerable for his purposes is a perfectly good idea,
and additional such indications may be desirable in the future.

But making the default behavior, that which occurs if the programmer
says nothing, to allow variation from machine to machine, was an
absolutely terrible decision from a language designer's point of view.
This decision was dictated by even larger interests, which were
political and economic rather than technical.

Would it be fair to say that you are trying to promote the interests
of numerically competent programmers, for whom "speed over reproducibility"
is a reasonable default because of their assumed competence, whereas I
am trying to promote the interests of a much larger and more diverse
pool of programmers, whose fields of competence vary widely?

   
   You ask on  p. 58  why  "IEEE 754  supports wrapped exponents  
   _only_  through traps".  The  "_only_"  is gratuitous,  as becomes 
   clear on  p. 59  where you mention  scalb  and  logb.  The traps 
   specified by  IEEE 754  serve  _only_  as clues for hardware 
   designers that their traps need not be precise  (provided they are 
   restartable),  and need supply only certain minimum information 
   (wrapped result in a known destination for  Over/Underflow,  operand
   values and intended destination for  Invalid Operation,  etc.)  to 
   what would  (we vainly hoped)  turn out to be a limited menu of 
   useful options along the lines of my notes you have cited.

I have news for you: an awful lot of hardware designers take IEEE 754
quite literally and do not regard its specifications as clues, but as
requirements (the notable exception to this rule being the willingness
to trap and let software handle denorms).

The only part of 754 proper, as opposed to the appendix (which is
largely ignored by hardware designers precisely because it is an
appendix), that addresses the computation of wrapped exponents
is the discussion in sections 7.3 and 7.4.  Some hardware actually
delivers a result with wrapped exponent to the hardware-level traps
handler, and some uses software after the hardware-level trap has
been sprung, but my point was that a third, and potentially useful,
approach would be for the trap handler to receive the original
operands and for the hardware to provide an instruction that would
produce results with wrapped exponents; such an instruction could be
used on entry top the hardware trap handler but could also be a
generally useful facility in ordinary code as well.  There is no
need to tie the concept of producing a wrapped exponent to the
concept of taking a trap.  While IEEE 754 does not require the
tying of these two concepts, the way it is written certainly
encourages implementations to tie them together.

It is unfortunate that IEEE 754 was not accompanied by a rationale
document explaining the envisioned range of implementation options
(as advice to the hardware guys) and the envisioned uses for the
prescribed facilities (as advice to the language guys).


   Incidentally,  your comment on page  62  that my  "original example 
   does not address the question of overflow in the sums"  is mistaken. 

That is entirely correct; it was my error in misreading the intent
of your notes, and I told my audience so.  Thanks for noting it.

   The counting mode as described in my notes and actually implemented 
   on the  IBM 7090/7094,  and used very successfully to compute long 
   products/quotients for physicists and chemists in the mid  1960s,  
   counts up or down correctly regardless of whether over/underflow 
   occurs in a sum or a subsequent product or even quotient.

I now see how such code would operate.

							      The 
   counting mode was used also to speed up the comparison of complex 
   absolute values;  sqrt(x**2 + y**2)  vs.  sqrt(a**2 + b**2)  was 
   turned into  (x-a)*(x+a)  vs.  (b-y)*(b+y)  after sorting to make 
   x = |x| >= y = |y|  and  a = |a| >= b = |b| ,  and also counting 
   any over/underflows.  I know that the counting mode has only very 
   few uses,  but can you think of a better way to do what it does?

Sure.  Suppose that we have the OV and UN symbols, and suppose also
that compound tests such as

	if (isNaN(x) | isInfinity(x) | isOV(x) | isUN(x)) { ... }

are cheap (a test instruction followed by a conditional branch,
for example).  Then we can use code such as the following rather
than KOUNT mode:

	p = (x-a)*(x+a)
	q = (b-y)*(b+y)
	if (isOV(p) | isUN(p) | isSubnormal(p)) {
	  if (isOV(p) ? isOV(q) : (isUN(q) | isSubnormal(q))) {
	    k = 61 - SCALEOF(max(x, a))
	    x = scalb(x, k)
	    y = scalb(y, k)
	    a = scalb(a, k)
	    b = scalb(b, k)
	    p = (x-a)*(x+a)
	    q = (b-y)*(b+y)
          }
	}
	return p < q;

where SCALEOF is much like LOGB except that it returns an integer
rather than a floating-point number and it returns zero if the
argument is NIOUZ (NaN, Infinity, OV, UN, or Zero).

In this code as well as yours, we rely on the fact that any NaN in
the input will cause the comparison to fail, and a NaN resulting
from a difference of infinities will also cause the comparison
to fail, thus causing all points at infinity to be regarded as equal.

One of the reasons this works, and would not work under IEEE 754, is
that if x and a are equal, y and b are not equal, and the computation
of q underflows, the underflow does not produce a zero; where IEEE 754
would produce a zero (and only the multiplication could do this, not
the addition or the subtraction), this proposal produces UN of the
appropriate sign, and the comparison of this to the zero resulting
from (x-a)*(x+a) produces the correct result.

If one of p and q is OV and the other is not, then the simple
comparison produces the correct result.  Likewise, if one of p and q
is UN or subnormal and the other is not, the simple comparison
produces the correct result.  (One must worry about subnormal
values as well as UN because of the loss of precision.)  Only if
P and q have both overflowed or both become small is more care required;
in this code I simply beat it with a sledgehammer, scaling all operands.
(It is assumed that scaling a nonzero number to be very tiny never
produces a zero, but rather UN.)

If the compound test (isOV(p) | isUN(p) | isSubnormal(p)) can be
performed in one or two instructions, this technique compares
favorably with the cost of establishing KOUNT mode in the first place
(not to mention saving and restoring the previous mode), even if we
assume a single machine instruction that saves the old mode in
register R1 while loading the new mode from register R2.

But we can do even better.  I believe it would be useful to have
a few very general tools for manipulating exponents and significands
separately.  Consider these proposed primitives, which could easily
be single hardware instructions (as I describe further below):

SCALEOF(x) = if isNIOUZ(x) then 0 else (int)logb(x)

   This has the property that if x is not NIOUZ, then scalb(x, -SCALEOF(x))
   is less than 2 but not less than 1.

COSCALE(x, s) = scalb(x, SCALEOF(s))

COUNTERSCALE(x, s) = scalb(x, -SCALEOF(s))

   This has the property that if x is not NIOUZ, then COUNTERSCALE(x, x)
   is less than 2 but not less than 1.

LOWCOUNTERSCALE(x, s) = scalb(x, -SCALEOF(s)-1)

   This has the property that if x is not NIOUZ, then COUNTERSCALE(x, x)
   is less than 1 but not less than 1/2.

   All four of these can, I imagine, be implemented in a single clock cycle.
   The advantage of COSCALE and COUNTERSCALE is that they are useful idioms
   that can be implemented entirely on the floating-point side of the
   processor, without using integer registers, and twice or three times
   as fast as implementing them in terms of SCALEOF and scalb.

MULSIG(x, y) = x * COUNTERSCALE(y, y)

   This amounts to ignoring the exponent field of y if y is not NIOUZ,
   so I expect it could easily be supported by existing floating-point
   multiply circuits as a single instruction by adding a little bit of
   control gating to the exponent computation.  Thus it would have
   exactly the same cost as an ordinary multiplication.

MULSIGS(x, y) = COUNTERSCALE(x, x) * COUNTERSCALE(y, y)

   Similar.  If either operand is NIOUZ, the result will be NIOUZ.
   If neither result is NIOUZ, the result will be less than 4 but
   not less than 1.

LOWMULSIG(x, y) = x * LOWCOUNTERSCALE(y, y)

   Similar.  Note that it multiplies x by a value that is less than 1,
   so it cannot overflow.

AVERAGE(x, y) = (x + y)/2, computed by adjusting the exponent during
   the addition rather than in two separate steps.  The advantage of
   this operation over addition is that it never overflows.  The downside
   is that if the result is subnormal, you might have lost a bit of
   information (the lsb).  This should require only a small amount of
   extra control circuitry in the exponent calculations of existing adders.


These primitives are adequate to implement, as a user library,
an encapsulated data type consisting of a double precision
significand plus a 32-bit (or 64-bit) exponent.  Here is a sketch:

class wideExponentDouble {
  int expt;
  double signif;   // always less than 2 but not less than 1, if not NIOUZ

  wideExponentDouble(double value) {
    this.expt = SCALEOF(value);
    this.signif = COUNTERSCALE(value, value);
  }

  wideExponentDouble(int scale, double value) {
    this.expt = scale + SCALEOF(value);
    this.signif = COUNTERSCALE(value, value);
  }

  wideExponentDouble add(wideExponentDouble that) {
    if (this.expt < that.expt) return that.add(this);
    double thatsig = isNIOUZ(that.signif) ? that.signif :
		     scalb(that.signif, that.expt - this.expt);
    double newsig = this.signif + thatsig;
    int newexpt = isNIOUZ(newsig) ? Integer.MIN_VALUE : this.expt + 
SCALEOF(newsig);
    return new wideExponentDouble(newexpt, COUNTERSCALE(newsig, newsig));
  }

  wideExponentDouble multiply(wideExponentDouble that) {
    double newsig = MULSIGS(this.signif, that.signif);
    return new wideExponentDouble(this.expt + that.expt + SCALEOF(newsig),
				  COUNTERSCALE(newsig, newsig));
  }

  wideExponentDouble sqrt() {
    double adjustedSignif = this.expt & 1 == 0 ? this.signif : 
scalb(this.signif, 1));
    return new wideExponentDouble(this.expt >> 1,     // NOT "this.expt / 2" 
!!!!
				  sqrt(adjustedSignif));
  }

  ...

}


But for particular algorithms we can do much better with hand-coding.
Returning to the question of comparing complex magnitudes: instead
of comparing (x-a)*(x+a) and (b-y)*(b+y), let us compare
(x-a)*(x+a)/2 and (b-y)*(b+y)/2.  The point is that we can compute
(x+a)/2 and (b+y)/2 using the AVERAGE primitive, and therefore the
multiplications are the only possible source of overflow.

   c = x - a;
   d = AVERAGE(x, a);
   f = b - y;
   g = AVERAGE(b, y);
   p = LOWMULSIG(d, c);
   q = g * LOWCOUNTERSCALE(f, c);
   if (isUN(q) | isSubnormal(q)) {    // testing q, not p
     p = c * scalb(x + a, 192);
     q = f * scalb(b + y, 192);
   }
   return p < q;

Moreover, we perform one multiplication using LOWMULSIG, so it can't
overflow, either.  And b+y is not greater than x+a, so the result of
LOWCOUNTERSCALE(g, d) is less than 1, so the other multiplication
never overflows.  So overflow simply is not an issue in this code.

On the other hand, if we are very finicky, we must worry about
underflow and about the possibility that the AVERAGE operation
may have lost a least significant bit.

The AVERAGE operation can have lost a bit only if its result is
subnormal.  So if computing d lost a bit, then p must be subnormal;
if computing g lost a bit, then q must be subnormal.  But x >= y
and a >= b, so if d is subnormal then g is subnormal.  It follows
that if either AVERAGE operation lost a bit, then q is subnormal.
So the test will catch those cases, as well as cases where both
multiplications produce subnormal or UN results.  In such cases
we perform additions rather than averaging operations, scale
the results up to avoid underflow, multiply, and compare.  These
cases occur only when at least one of the four numbers is very tiny.

In the "normal" case, which covers nearly ALL numbers (not just half
of them), the execution cost is four adds or subtracts (counting
AVERAGE as an add), two multiplies (counting LOWMULSIG as a multiply),
one extra scaling operation, and a test.

Tricks such as AVERAGE are well-known to the DSP community for use
with fixed-point arithmetic, and are built in as hardware instructions
in many vector processors.  Tools such as I propose would allow the
careful floating-point programmer to mix the benefits of floating-point
arithmetic with those of explicit scaling.

Slides #59 through #63 illustrate how long products of sums may be
handled in a similar manner, by controlling scaling ahead of time so
as to prevent overflow and underflow from occurring in the first
place.  Use of the primitive AVERAGE in slide #62 would make the code
even more concise, assuming that loss of the lsb when averaging very
tiny numbers is not a problem for this application.  (One would have
to initialize "expt" to N rather than 0 to compensate for the extra
divisions by 2.)


   I think the suggestion that the significand's last bit be used as 
   an  Inexact  flag betrays a profound misunderstanding of its uses:
   "An inexact value represents a result that fell somewhere in the 
   interval between adjacent exact values"  is mathematically true 
   and at the same time false for human purposes.

It's true enough when the inexact value comes from exact operands.
If inexact operands go in, the interval interpretation is misleading.
However, it seems to aid people's intuition when they first see the
idea, and it is interesting (and surprising) that if you "steal"
the low bit of the word, as opposed to any other bit, to serve as
an inexact flag then the resulting behavior looks like a rounding mode.

However, this raises an interesting question: In practice, when
the inexact flag is used, is the computed value usually discarded
when it is found to be inexact after all, or is it not infrequently
put to use despite the fact that it is inexact?  If the value is
usually discarded, then one might simply propose a mode of computation
in which any inexact result is replaced by a NaN that is guaranteed
to propagate the information about inexactness; then one need not lose
a bit of precision.

						   Likewise,  you 
   have not thought through the reasons for identifying  NaNs  with 
   their origins;  the  _place_  in the program where the  NaN  was 
   generated is what we need much more than the op-code.

I understand that.  In 1980, the address space of a typical computer
was well under 23 bits and you could hope to encode a program location
in the significand of a float.  It's too bad that IEEE 754 didn't
specify, or at least suggest, that a generated NaN should include
the program counter of the instruction that generated the NaN!

But that possibility has been blown away by Moore's Law.  There may
well be, in a fairly large code, more than 2^23 distinct program
locations that could generate NaNs.  So we must be more clever and
retain partial information.  Remembering the kind of operation seems
helpful; there may be other information we can retain.  As for
doubles, we have a second word to play with, so maybe we can retain a
32-bit program counter, which may hold us for five years at most (my
home desktop computer has 1 GB of memory on it).

Experience with serious debugging systems over the last 20 years
has shown us that even a program counter is really not enough
information.  It's not helpful, for example, to know that "hypot"
caused an overflow without knowing who called "hypot".  A thrown
Java exception object is rather heavyweight for this reason;
it contains a summary of many stack frames to aid understand
of where and how the exceptional situation occurred.

So perhaps a double could contain an integer that identifies
a region of memory into which debugging information, including
a program counter or many program counters, has been dumped.
But allocating such a memory region is perhaps best done by
software, and therefore is perhaps best handled by a low-level
trap handler.  Therefore the hardware specification need not
concern itself with such things, as long as sufficiently many
NaN values have been reserved for such use by software.


   I agree with your suggestions about  max  and  min  on  p. 52,  
   and with the usefulness of fast tests of flags and of operands' 
   categoties.  For the latter,  perhaps reprogrammable  PLAs  would 
   be helpful.

An interesting idea for some contexts, thanks.
   
   Rounding modes are  NOT  just special trap handlers.

Sure they are.  If we define the inability to represent the exact
mathematical value in the destination format by reason of insufficient
precision (which we do: we call it the inexact trap), then that situation
must be handled.  It may be handled by hardware or by software,
but it must be handled.

One way to handle it is to choose a bit pattern that represents some
value other than the exact mathematical value.  When the chosen value
is close to the exact mathematical value, we call this rounding.

IEEE 754 specifies a trap enable flag for overflow.  This allows
us to choose easily between two trap handlers.  One is a trap handler
in the conventional sense: a user-supplied software subroutine.
The other is a trap handler that, for efficiency, is normally
implemented in hardware: it computes an infinity of the appropriate
sign and returns that as the result of the operation that signaled
the exception.

IEEE 754 likewise specifies a trap enable flag for underflow.  This
allows us to choose easily between two trap handlers.  One is a trap
handler in the conventional sense: a user-supplied software
subroutine.  The other is a trap handler that, for efficiency, might
be implemented in hardware: it computes an appropriate subnormal
number (possibly zero) and returns that as the result of the operation
that signaled the exception.  But, for some processors, that
trap handler is often also implemented in software.

IEEE 754 likewise specifies a trap enable flag for inexact.  It also
provides for a four-bit rounding mode specifier.  Together, at a
sufficiently abstract level of interpretation, these allow us to
choose easily among five trap handlers.  One is a trap
handler in the conventional sense: a user-supplied software
subroutine.  The other four are trap handlers that, for efficiency, 
are normally implemented in hardware: we call these trap handlers
round-to-nearest, round-toward-zero, round-toward-plus-infinity,
and round-toward-minus-infinity.  If only the specification of the
interface to the user-supplied trap handler had required that
the trap handler have easy access to the raw result plus the guard
and sticky bits, it would be easy to implement other rounding
modes in software.

							 DEC's  Alpha 
   almost got them right when some were allowed into the  OP-code,  but 
   put the wrong ones there.  (p. 65)

Yep, alas.


   Trap handlers should certainly be subroutines supplied along with 
   the run-time library of  Math  functions and  Decimal-Binary 
   conversions.  There are so few things that can be done usefully 
   after floating-point exceptions,  they might as well be put into 
   a standardized menu.  (p. 68)  The  _Modes_  requested by a program
   to determine how exceptions shall be handled,  and how rounding 
   will be directed,  are values of variables whose scopes are the 
   proper responsibility of language designers.  We agree that they 
   are not best regarded as  "dedicated registers",  and that  "Side 
   effects on global state are a bad idea ...".   (pp. 69 - 70)
   
I think here we are in agreement.

   "Borneo  may be on the right track for  Java"  warms my heart.

Glad to hear it, and I meant it sincerely.  My two reservations
are, first, whether trap handlers should be subroutines (which
they are not in Borneo, clearly for reasons of fitting in with
the rest of Java), and, second, whether it has sufficient abstraction
to permit some wiggle room for implementors under the hood.
What looks like a trap handler in Borneo might be reached at the
low level either by a hardware trap mechanism or an explicit
branch instruction; I want to make sure that hardware designers
and compiler writers have some choices.

   ::::::  ::::::   ::::::   ::::::   ::::::
   

I am very grateful that you took the time to look over my slides
and give me such detailed feedback.  I'd love to continue the
conversation, too, and work toward a more detailed and sensible
proposal, and perhaps an appropriate refinement of Borneo with
Joe Darcy.


Yours,

Guy Steele