SRTEST                                                    15 March 1995
                           A Test for  SRT  Division
                           =========================
        **************************************************************
        *    CAUTION:  This is work in progress.  It may be  WRONG.  *
        *    It is being distributed now only so that my colleagues  *
        *    in academe and in industry,  who need it immediately,   *
        *    can try it out and help me get it right.                *
        **************************************************************

                                Prof. W. Kahan
                        Elect. Eng. & Computer Science
                           University of California
                           Berkeley  CA  94720-1776
                                (510) 642-5638

    Abstract:
    ~~~~~~~~~
    SRTEST  is an astonishingly simple computer program devised to test the
    correctness of quotient-prediction hardware for binary floating-point
    division that uses any  Non-Restoring (SRT)  algorithm.  Nowadays that
    hardware often takes the form of a  Programmed Logic Array (PLA).  How
    long  SRTEST  should run to  "prove"  a  PLA  correct remains an open
    question;  however,  SRTEST  is intended to uncover a flaw in a  PLA
    far sooner than random testing can.  For example,  when it ran on an
    Intel Pentium  with the notorious binary floating-point divide bug of
    1994,  SRTEST  uncovered the flaw in a tiny fraction of a second after
    performing its  356 th  trial division and a few thousand arithmetic
    operations in all,  although the flaw had earlier been found to afflict
    one division in about  8,800,000,000  with independent random operands.
    However,  to use either that kind of finding or  SRTEST's  performance
    to predict failure rates among diverse users of a defective divider is
    an abuse of statistics.  A study of  SRTEST's  workings sheds light on
    that abuse and helps us avoid repeating it.


    Table of Contents:
    ~~~~~~~~~~~~~~~~~~
        Introduction
        SRTEST's  Needs and Limitations
        Outline of  SRT  Division
        Carry-Save Remainders
        What  SRTEST  Doesn't Need
        Why  SRT  Division is Hard to Test
        Proofs  vs.  Tests
        Where to Test
        Strips of Layers of Tiles
        Out of Desperation
        Scattering Samples
        Checking a Quotient
        The Proof of a Pudding is in the Eating
        A Short History
        Bruised Integers
        Illusory Randomness
        Unanswered Questions
        Temptations to be Resisted
        Annotated Bibliography
        Acknowledgements
        Appendix:  Fragility of Low Probabilities
        Appendix:  Vagaries of Carry-Save
        Appendix:  Program Listing



    Introduction:
    ~~~~~~~~~~~~~
    A millennium ago,  the  Moorish  craftsmen who built some of  Spain's
    architectural jewels manifested humility by incorporating small flaws
    into their handiwork lest the deity punish any mortal's presumption of
    artistic perfection.  To-day's computer architects accomplish the same
    effect,  but not out of humility.  Now complexity punishes the naive
    presumption that several million transistors can be commanded each to
    do the right deed at the right moment,  and now knowledgeable people
    expect no modern microprocessor to be perfectly correct.

                  How much imperfection warrants replacement?

    Nobody disputes that a task that can be performed correctly,  and whose
    correctness is provable mathematically,  should be performed correctly
    lest our ability to cope with unavoidable imperfections be overwhelmed.
    If there is a dispute,  it concerns the extent to which avoidable but
    humanly inevitable errors should occasion expensive remedies.

    Intel  faced that question again in the summer of  1994  after a flaw
    had come to light in their  Pentium (TM)  processor's floating-point
    division.  According to  Jim Jarrett (1995),  Intel's Head of Investor
    Relations,  ...

      " A task force was formed immediately to characterize the problem and
        determine its impact.  The group looked at everything from spread-
        sheets and computer-aided design programs to fluid dynamics and
        financial applications.  After a lot of work,  a relieved task
        force reported last summer that the floating-point flaw was truly
        insignificant.  Specifically,  on a statistical basis,  a user
        would encounter reduced precision in floating-point divide
        operations once every  9  billion random number pairs.  For the
        average spreadsheet user performing  1,000  floating-point divide
        operations daily,  this meant reduced precision once every  27,000
        years.  The user was much more likely to encounter failures in
        other  PC  components;  for example,  a soft error in a  DRAM  chip
        occurs statistically once every seven years in an  8-MB  array.
                                      ...
        It was insignificant enough that it wasn't even listed on an errata
        sheet.  The masks were changed to fix the flaw."

    Many a statistician and almost all engineers could find no fault with
    the task force's analysis nor with the conclusions drawn above nor with
    Intel's  consequent decision to replace defective  Pentiums  only for
    customers who could demonstrate that its flaw put their work at risk.
    That seemed to be a sound decision considering that  Pentiums  had been
    running for over two years before its divider's flaw had been noticed.

    I disagree.

    I shall try to explain how flaws like  Pentium's  can stay hidden for
    astoundingly long times almost regardless of failure rates.  I shall
    try to explain why,  regardless of the correctness of failure rate
    statistics for random divisions,  subsequent probabilistic conclusions
    were illogical and could not justify the decision to replace defective
    chips only grudgingly.  That decision was soon rescinded,  but for
    reasons having more to do with lawsuits,  public relations and public
    confidence than with doubts about its probabilistic justification.


    What are the right probabilistic conclusions?  Too little is known now
    to support scientifically any better assessment of risk than has been
    tendered by  The  MathWorks'  Dr. Cleve Moler (1995):
        " With the  Pentium,  there is a very small chance of making a
          very large error."
    To his assessment this paper will add a strong suggestion,  but short
    of proof,  that practically all users of defective  Pentiums  must fall
    into two classes.  By far the larger class consists of users who are so
    unlikely to be harmed by the defect that no underwriter could do better
    than guess at the premium they should pay for insurance against harm.
    The smaller class consists of users almost certain to be harmed by the
    defect almost regardless of the frequency of divisions in their work.
    Unfortunately,  only for users who perform no floating-point divisions
    ( and hence face no hazard )  do we know a better way than trial and
    error to determine the class into which any particular user falls.

    Our conclusions leave unanswered the question  Intel  faced as  Jarrett
    (1995)  rephrased it:

      " What happens in the future if we discover a flaw that produces a
        problem once every  90  billion random number pairs;  do we still
        need to replace processors?  What about once every  900  billion?
        No processor is ever perfect,  and if the answer to every bug ÄÄ
        no matter how minor ÄÄ  is to replace the processor,  what does
        this do to the economics of computing?  What happens to the pace of
        innovation in an industry paralyzed by caution?"

    Questions like these deserve answers,  but these do not because they
    beg the question.  They take for granted a kind of randomness that
    makes sense for transient disruptions to  DRAMs  from alpha particles
    but makes no sense for a reproducible and systematic arithmetic bug
    like  Pentium's.  The lower the failure rate for random number pairs,
    the less relevant it is to a realistic appraisal of the bug's impact,
    since hardly anyone computes quotients of uncorrelated random numbers.
    The impact of a systematic bug is best revealed through preponderantly
    non-probabilistic considerations.

    Complexity is one of those considerations.  If the bug can be awakened
    only  by extremely complicated  ( and therefore presumably unlikely )
    combinations of circumstances,  that bug may be eligible for promotion
    to a  "feature"  in the documentation provided also that a relatively
    simple software patch will let the sleeping bug lie.  Conversely,  if
    someone unaware of the bug's existence may write a very simple program
    that awakens that bug to destructive action,  that bug is dangerous.

    A simple program that will expose a bug without prior knowledge of that
    bug's location makes a good test program.  Conversely,  if this test
    program is general enough and simple enough,  that bug is dangerous.

    SRTEST  is such a test program.  Its design predates  Pentium's  by a
    decade.  It hits  Pentium's  bug first on the  356 th  division tested
    and again  2,294  times in the first million tested.  And  SRTEST  is
    very simple;  about half a dozen arithmetic instructions in its inner
    loop generate test divisors and dividends,  and another half dozen test
    whether quotients are correctly rounded.  That no such test was known
    to  Pentium's  designers is regrettable;  it is also hard to believe
    that so simple a test has remained entirely unknown until now.


    Two integer parameters sum up what  SRTEST  has to know about the  SRT
    divider under test:  one estimates how many leading bits of divisor go
    into the quotient digit selection circuitry;  the other estimates each
    digit's width.  ( These are  5  and  2  for  Pentium.)  These estimates
    need not be perfect.  SRTEST  need not know how the divider manipulates
    remainders nor whether they are held in a carry-save form.  Though the
    strategy behind  SRTEST  is simple,  its derivation is not,  nor is it
    yet logically complete;  in short,  SRTEST  works better than I can
    explain.  None the less,  its existence,  its simplicity,  and its high
    yield of division failures cast doubt upon appraisals of risk inferred
    from randomly generated failures.

    How  SRTEST  was derived and what it does will be explained.  I cannot
    say how far to trust  SRTEST,  nor how far to trust alleged proofs of
    correctness since they can lie too.  Still,  Intel's  experience should
    teach us what not to do again;  that will be summarized.  Technical
    details about the fragility of extremely low probabilities and the
    vagaries of carry-save adders have been banished to appendices along
    with a listing of the program  SRTEST.



    SRTEST's  Needs and Limitations:
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    SRTEST  needs values for two of the integer parameters that distinguish
    different variants of  SRT  division;  they will be specified later.

    SRTEST  presumes that floating-point arithmetic operations other than
    division,  namely addition,  subtraction and multiplication,  are all
    performed correctly as prescribed by  IEEE Standard 754  for  Binary
    Floating-Point Arithmetic.  Some versions of  SRTEST  perform also
    integer operations upon the exponent and logical manipulations upon the
    significand of a floating-point number but,  if portability is assigned
    precedence over speed,  these can all be replaced by slightly slower
    standard floating-point arithmetic operations including  IEEE 754's
    LogB  and  ScalB  functions.  Every version of  SRTEST  is intended to
    run on floating-point hardware whose division,  whether built-in or
    simulated in software,  is the sole rational arithmetic operation in
    doubt.  Except in that operation,  roundoff never figures in versions
    of  SRTEST  that perform logical manipulations  upon significands.

    SRTEST  is not intended to test efficiently whether division is rounded
    correctly;  that task is performed by my program  DIVCK (1986),  which
    widespread use has shown to be effective.  By the way,  Dr. D. Hough
    of  Sun Microsystems  has reported that  DIVCK,  though not designed to
    do what  SRTEST  does,  found an error in  Pentium's  single-precision
    division after two minutes' running.  That is dumb luck,  and bad luck
    for  Intel  where  DIVCK  had been run on a simulation of  Pentium  too
    slow to run long enough.  Thus do ever more ambitious designs overwhelm
    our simulators and other design aids;  to redress the imbalance we must
    invest more design effort into tests better focused upon our designs'
    vulnerabilities,  if we can figure out what they are.

    SRTEST  is focused;  it is intended to test efficiently only the logic
    that selects quotient digits in  SRT  division algorithms.  Any other
    error  SRTEST  may uncover will have been uncovered accidentally.



    Outline of  SRT  Division:
    ~~~~~~~~~~~~~~~~~~~~~~~~~~
    Almost nothing need be known about  Non-Restoring (SRT)  division to
    appreciate what  SRTEST  does.  That division process will be outlined
    briefly here in terms chosen to resemble as much as possible the way
    Long Division  has long been taught in schools.

    During the process,  each quotient digit  ( it can consist of more than
    one bit )  is selected from a table of eligible digits;  that table is
    often implemented as a  PLA.  It is best viewed as if organized in rows
    and columns.  A few leading bits of the divisor determine which column
    of the  PLA  will supply all the quotient digits for any given instance
    of division.  A few leading bits of the current remainder  ( initially
    it is the dividend )  determine the row.  The quotient digit extracted
    from the selected row and column is then multiplied by the divisor and
    their product subtracted from the current remainder;  the difference is
    the new remainder that will determine the next quotient digit.

    After sufficiently many quotient digits have been extracted by the
    foregoing process,  they are assimilated into a conventional quotient
    and rounded to produce the final result of the division operation.

    Every digit-by-digit divider has a  Radix,  which indicates roughly the
    width of each quotient digit.  For long division as taught in schools,
    where  Radix = 10  (decimal arithmetic),  the quotient digits come from
    the set  { 0, 1, ..., Radix-1 }.  Most computers use a power of  2  as
    radix;  for quotient digits  L  bits wide,  Radix = 2^L .

    SRT  division derives its speed advantage by selecting each quotient
    digit from a redundant set of digits.   For instance,  when each digit
    is one bit wide it will be selected from the set  { -1, 0, 1 } ;  when
    two bits wide it may well be selected from  { -2, -1, 0, 1, 2 },  as
    happens on  Pentium.  Redundancy renders unnecessary the revisions of
    ill-guessed quotient digits that may be remembered as part of long
    division as it is taught in schools.  Instead,  subsequent digits get
    chosen in a way that compensates for any slightly ill-guessed quotient
    digit.  The speed advantage comes about because,  once chosen,  no
    quotient digit need be revised.  The disadvantage is that a quotient
    can have many redundant representations from any one of which its
    conventional binary representation must ultimately be derived by an
    assimilation process akin to subtraction though potentially faster.



    Carry-Save Remainders:
    ~~~~~~~~~~~~~~~~~~~~~~
    There is another complication:  SRT  remainders may be represented in a
    " Carry-Save "  way.  That representation of a binary number consists
    of two numbers whose sum is the number represented,  but which are not
    added explicitly to produce that sum lest doing so impose a delay while
    carries propagate a long way from right to left.  Circuitry to add or
    subtract a third number to or from the two can be designed to produce a
    result in carry-save form  ( as a sum of two new numbers )  extremely
    quickly and compactly too because it need not provide for propagations
    of carries or borrows.  The wider the numbers,  the greater is carry-
    save's advantage in speed and compactness.  For high-precision  SRT
    division like  Pentium's  (64 sig. bits),  carry-save's advantage is
    too great to forego.


    The disadvantage is that each numerical value has vastly many different
    carry-save representations,  which complicates the designer's task a
    little and the tester's task a lot.  Those complications are explained
    in detail in the  Appendix:  Vagaries of Carry-Save.



    What  SRTEST  Doesn't Need:
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~
    SRTEST  is oblivious to all the details of the division process except
    the divider's  Radix  and the number  N  of leading bits of divisor
    used to select a column from the  PLA.  The  PLA  must have  2^(N-1)
    columns.  ( Why not 2^N ?  Because a normalized divisor's first leading
    bit is always 1 .)  The time taken by  SRTEST  is proportional to the
    number of columns,  other things being the same.  Therefore,  if  N  is
    not known,  supplying an over-estimate instead to  SRTEST  will merely
    slow its work.  ( N = 5  for  Intel's Pentium,  whose  PLA  has  16
    columns.)  Similarly,  an over-estimate of the  Radix  is expected only
    to retard  SRTEST.  ( Pentium's Radix = 4 .)  However under-estimating
    Radix  or  N  may invalidate the test by preventing or excessively
    delaying  SRTEST's  inspection of parts of the  PLA.

    SRTEST  is oblivious to the number of rows in the  PLA.  That number is
    determined by the number of leading bits of the current remainder used
    to select a row.  ( Intel's Pentium  uses  7  bits,  so its  PLA  has
    128  rows.)  Increasing the number of rows is expected to slow  SRTEST
    proportionately,  but that expectation has not yet been tested.

    SRTEST  is oblivious to the entries in the  PLA  whose correctness is
    being tested.  That is not so paradoxical as it may seem at first;  we
    care only that the  PLA  as a whole be correct regardless of how it was
    constructed.  Besides,  we must be grateful that  SRTEST  does not ask
    us for a copy of the  PLA;  Pentium's  table has  2048  entries.

    A characteristic of  SRT  division is that many of the entries in its
    PLA  could be changed without invalidating the division process.  There
    are two reasons for this.  One is that over a quarter of the  PLA's
    entries,  particularly the topmost and bottommost few in each column,
    are  " Don't Cares ":  they never get selected by the division process.
    Redundancy  in the set of quotient digits is the second reason;  this
    implies that many an error in an early quotient digit can be put right
    by changing all subsequent digits in ways that will cancel the error at
    the end when all digits are assimilated into a correct final quotient.

    SRT  division's limited capacity for self-correction is advantageous in
    some respects;  it lets circuit designers vary quotient selection logic
    within limits in order to optimize some aspect  ( cost or speed )  of a
    design.  Neither we nor  SRTEST  wish to be distracted by variations,
    in quotient digit sequences,  that yield the same quotient at the end.




    Why  SRT  Division is Hard to Test:
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    SRT  division's capacity for self-correction is limited in a way that
    is vulnerable to errors of a kind that we may expect to uncover only by
    extravagantly prolonged testing.  Some entries in the  PLA,  especially
    those adjacent to entries that can never be selected,  are selected in
    so relatively few instances of division that errors in those entries
    big enough to defeat self-correction relatively often could still stay
    hidden until one of a few critical examples were tested.  Other entries
    in the  PLA,  though selected relatively frequently,  could be altered
    in a way that would fail to be self-corrected only in relatively rare
    instances;  such errors too could remain unexposed until one of those
    rare critical instances were tested.  And representing remainders in a
    Carry-Save  fashion further rarifies critical instances,  leaving them
    excruciatingly difficult to uncover;  see the  Carry-Save  Appendix.

    The statistics of those critical instances are appallingly worse than
    anyone had suspected before  Pentium's  FDIV  bug came to light.  Its
    PLA  had errors,  0  instead of  2 ,  in five out of  2048  entries.
    These gross errors were not uncovered until too late despite billions
    of divisions tested.  Alas,  these five entries lay along the boundary
    between entries that do and entries that never get selected,  so only a
    very tiny fraction like  1/8,000,000,000  of all possible divisions
    ever selected them and therefore failed to be self-corrected.

    This tiny fraction was estimated,  after the errors had come to light,
    in an  Intel  "white paper"  by  Dr. M.L. Barton  and  H.P. Sharangpani
    (1994).  They obtained two estimates,  one from a model by  Markoff
    chains,  the other from a count of the number wrong in more than  10^12
    double-precision divisions tested at random on  Pentium.  The estimates
    agreed.  Barton  and  Sharangpani  also observed that defects appeared
    uniformly distributed in erroneous quotients from their  14th  to  52nd
    sig. bits;  and in so far as divisions were infrequent in applications
    software they inferred that all but a tiny fraction of  Pentium  users
    were extremely unlikely ever to be affected adversely by these defects.

    If random testing were the only way to test  SRT  division,  we could
    not afford it.  Neither can we afford the damage done by errors that,
    as we shall see,  can occur far more often than the tiny fraction cited
    above suggests.  The tiny fraction merely tells us that random testing
    of  SRT  dividers is too wasteful unless it fails early,  and that the
    first million or so customers who use such dividers can easily overlook
    something wrong for at least several months,  rather than that errors
    unexposed by billions of random tests must occur too rarely to matter
    much to most of us.  We shall revisit these issues.



    Proofs  vs.  Tests:
    ~~~~~~~~~~~~~~~~~~~
    If an  SRT  divider's  PLA  is correct,  it can be proved correct
    mathematically,  so why not do just that and skip bothersome testing?

    The designers of  Pentium's PLA  did  prove it correct mathematically,
    or so they thought,  but their  "proof"  of correctness was incorrect.
    They had  " simplified "  it by presuming their  PLA  sign-symmetric.
    A natural presumption,  it can be found also in  J. Fandrianto (1987),
    though he had to modify his  PLA  a little to make it work correctly.
    That presumption is slightly incompatible with carry-save remainders,
    but by itself it did no harm to  Pentium.  However,  a characterization
    of  Pentium's  PLA,  submitted to the  CAD  tool that laid it out,  was
    inadvertently incomplete,  rendering five entries in the  PLA  invalid.
    Unluckily,  those invalid entries were overlooked because they fell in
    that half of the  PLA  that was presumed identical but for sign to the
    half that was checked for correctness.  As if in collusion,  the two
    mistakes concealed each other;  still,  they need not have been fatal.

    The designers' fatal mistake,  induced by faith in their  "proof"  of
    correctness,  was not to contemplate whether their divider differed
    enough from  Intel's  previous practice to require new tests.  Instead,
    the corporate department responsible for testing ran its old tests ÄÄ
    random tests and  DIVCK ÄÄ  on a simulation of the new design.  We know
    now why those tests turned up nothing wrong.

    Reviewing the sequence of events,  many a practitioner must be thinking
                   " There go I but for the grace of ... ."

    Proofs are not infallible.  In the mid  1980s  the implementation of
    floating-point on the  Inmos Transputer T-800  was  "proved"  correct
    mechanically.  Its specifications had been spelled out in a language
    called  " Z "  and an implementation microcoded in  " Occam,"  both
    languages so contrived that the implementation's compliance with its
    specifications could be confirmed by a computerized theorem prover.
    However the specifications contained two mistakes.  One was unintended:
    the floating-point number  -0  had been confused with the most negative
    binary integer,  whose bit-pattern is the same.  The other mistake was
    misguidedly intentional:  the flags for  Invalid Operation,  Overflow
    and  Divide-by-Zero  were merged into one flag.  Consequently the  T800
    occasionally mishandles  -0 ,  and spuriously signals  INVALID  when a
    computation creates and then consumes infinity in a way that would be
    innocuous if it conformed to  IEEE Standard 754.  Early advertisements
    that perfect conformity had been  "proved"  are still believed widely.

    Come to think of it,  why should specifications and proofs be any less
    vulnerable to error than designs and programs?  Specifications are mere
    translations of good intentions into formal language;  something can
    get lost in translation,  as poets and contract lawyers know.  Proofs
    run to considerably greater lengths than the designs and programs they
    are intended to validate,  so proofs have a larger capture cross-
    section for error.  We yearn for the sense of security that proofs
    provide,  but certainty eludes us and security is an illusion.

    That is not to say that proofs are worthless.  Quite the contrary:


      +---------------------------------------------------------------+
       \   The best testimony to a design's correctness is a proof.  /
        \    But it is testimony of uncertain credibility without   /
         \     corroboration.  The best corroboration is another   /
          \     independent proof provided it is not too long.    /
           +-----------------------------------------------------+

    Tests provide corroboration too,  some more so than others.  Testimony
    from purely random tests is undermined by a suspicion that too many of
    them may have fallen into terrain already tested instead of into  Terra
    Incognita.  That is how random tests of  Pentium's SRT  division were
    wasteful;  they almost all exercised only the  PLA's  non-extreme rows,
    leaving extreme rows inadequately tested until after billions of tests.

    Regardless of their inefficiencies,  tests are indispensable;  tests of
    devices  "proved"  correct actually test our own thinking.  And since
    test programs can deceive us so easily,  they too must be assayed for
    efficiency,  proved correct and tested.  Of doubts there can be no end
    but instead a  Law of Diminishing Returns.  To enhance the returns from
    tests we must concentrate them where they will do more good.



    Where to Test:
    ~~~~~~~~~~~~~~
      " Cherchez la femme ! "  ...  attributed by  Alexandre Dumas  to his
                                    fictionalized detective  Joseph Fouch‚.

    Our motto will be  " Seek singularities ! "  There lie all errors.

    In the theory of complex variables,  the singularities of an analytic
    function are the boundary points of its domain of analyticity.  Inside
    its domain an analytic function is smooth,  infinitely differentiable.
    An aphorism attributed to  Riemann  goes roughly like this:
                 You tell me how an analytic function behaves at
                 and near every one of its  singularities;  then
                 I shall tell you what it does  everywhere else.
    When that aphorism was first uttered,  analytic functions being studied
    all had countably many singularities,  so characterizing a function by
    its singularities was the ultimate kind of data compression.

    Logical functions of numerical arguments are piecewise constant;  their
    singularities are the arguments where the functions change value.  To
    know a logical function at and near its every singularity is to know it
    everywhere.  To prove the logical function of the  PLA  correct  SRTEST
    need check it only at and near every  ( whew! )  numerical input where
    either the state of the  PLA  or the desired result changes.

    How near is  "near" ?   Roundoff blurs that question,  so  SRTEST  is
    designed to avoid interference from roundoff,  as we shall see later.

    Let us think of the singularities of the  SRT PLA  as points plotted in
    the  ( Divisor,  Dividend ) - plane.  We need consider only the square
            1  ó  Divisor  <  2    and    1  ó  Dividend  <  2
    since floating-point exponents do not interact with the  PLA.


    Singular divisors are those at which selection switches from one column
    of the  PLA  to the next,  plus endpoints;  there are  2^(N-1) + 1  at
      Divisor  =  1,  1 + h,  1 + 2h,  1 + 3h,  ...,  2 - 2h,  2 - h,  2
    where  h = 1/2^(N-1) .  They cut the square into  2^(N-1)  vertical
    strips,  one for each column of the  PLA.  Each of those strips breaks
    into tiles across whose boundaries selection switches from one row of
    the  PLA  to another.  In the  ( Divisor, Dividend ) - plane,  all the
    tiles' boundaries constitute the singularities of the  SRT PLA.  There
    are very many of them,  far too many for anyone to test them all.  At
    best,  we might hope to test some singularities in a reasonable order.

    Some of those boundaries lie on lines that are easy to describe.  The
    easiest,  as we have seen,  are the vertical boundaries of the strips.
    Next easiest are the horizontal lines across which selection for the
    first quotient digit switches.  The boundaries across which switching
    occurs for subsequent quotient digits are not so easy to decribe.



    Strips of Layers of Tiles:
    ~~~~~~~~~~~~~~~~~~~~~~~~~~
    In each vertical strip,  each quotient digit position generates a layer
    of overlapping tiles across whose non-vertical boundaries that digit's
    selection changes from one row of the  PLA  to the next.  For each
    succeeding quotient digit position,  the tiles in its layer become more
    numerous and more complicated to describe.  The tiles' shapes depend
    upon an arcane question about the division operation's implementation:

            Are remainders represented in a  Carry-Save  fashion ?

    If remainders were  NOT  represented in carry-save fashion,  every tile
    in the  K-th  quotient digit position's layer would have a simple shape
    consisting of a parallelogram with two opposite sides embedded in the
    boundaries of a strip and two whose slope is a quotient with zeros in
    the  K-th  digit and beyond.  Carry-save complicates the tiles;  they
    can have complicated interpenetrating shapes depending upon how the
    remainder's two carry-save components combine their leading several
    bits before they go to the  PLA  to select a row.  This is the kind of
    detail we wish not to have to explain to  SRTEST,  which does not know
    even how many rows the  PLA  has.  Were we to try to take such details
    into account we might have to write a test program so complicated as
    would blatantly invite blunders.  Let us put off thinking about details
    unless and until we are compelled to abandon hope for a simple test.

    SRTEST  generates and tests sample points  ( Divisor, Dividend )  each
    of which touches one tile from each of as many layers as there are
    digits in a computed quotient.  Each tile is the image,  in a strip of
    the  ( Divisor, Dividend )  square,  of an entry in the  PLA  selected
    for the quotient digit position corresponding to the tile's layer.  If
    that entry is erroneous then,  amongst its images,  there must be some
    tiles which,  if touched in a suitable place by a test,  will expose
    the error.  Other tiles,  though images of that erroneous entry,  will
    when tested conceal the error because  SRT  division is partially self-
    correcting,  so some of its quotients may be correct despite the error.

    None the less,  in order to expose whatever error may exist,  testing
    must ultimately touch every tile;  moreover,  testing must touch one of
    the places in an erroneous tile where self-correction fails though such
    places may occupy a relatively tiny fraction of the tile's area.  How
    can  SRTEST  touch those places if it shall not know where they are?



    Out of Desperation:
    ~~~~~~~~~~~~~~~~~~~
    SRTEST  knows locations for no other singularities than the boundaries
    of the  2^(N-1)  vertical strips,  so it distributes its tests with
    ever increasing density along and next to those boundaries.  If only to
    turn this necessity into virtue,  we shall contemplate two hypotheses:

          Hypothesis V :   Every tile has at least one edge that runs along
          ~~~~~~~~~~~~~~   the vertical boundary of a strip and,  whenever
                           self-correction fails,  it fails at or next to
                           such an edge  ( and perhaps elsewhere too ).

          Hypothesis W :   Self-correction fails in regions shaped roughly
          ~~~~~~~~~~~~~~   like triangles,  sometimes very small,  of which
                           the bigger ones press against tiles' corners.

    Hypothesis V  is provably true without carry-save representation;  my
    proof exploits the monotonicity of column selection vs. remainder.  And
    casual exploration of a few examples turned up no counter-example for
    the kind of carry-save representation I tried.  Hypothesis W  comes
    from empirical observation supported by the following thoughts:  when
    an erroneous quotient digit,  too big or too small,  cannot be self-
    corrected then it is often followed in the correct quotient by at least
    one digit that is already extreme,  minimal or maximal resp.  Quotients
    that contain a string of extremal digits occur preponderantly at tiles'
    corners.

    If hypothesis  V  holds then running  SRTEST  long enough will,  by
    planting a test sample on or next to a vertical edge of every tile,
    ensure ultimately that the  PLA  has been tested exhaustively.  But if
    hypothesis  V  fails,  tiles that do not touch a strip's vertical edge
    will escape testing by  SRTEST,  which may then fail to test the  PLA
    exhaustively no matter how long it runs.

    Hypothesis  W ,  if true,  accounts for  SRTEST's  efficiency:  random
    tests farther away from the vertical strip's edges would have to hit
    smaller triangles of uncorrectable errors and would miss more often.

    True or not,  the two hypotheses motivated  SRTEST.



    Scattering Samples:
    ~~~~~~~~~~~~~~~~~~~
    Random number generation comes naturally to mind as a way to scatter
    sample dividends,  but random samples yield fewer hits than  SRTEST's
    simple formula,  so it contains no explicit random number generator.

    |  Footnote:  +-----------------------------------------------------+
    |  This is not to disparage random testing altogether.  In general, |
    |  additional random testing continues to be necessary lest testing |
    |  focused at known singularities miss others that were overlooked. |
    +-------------------------------------------------------------------+

    Instead,  inspired by  Hypotheses V and W ,  SRTEST  chooses divisors
    at and near the singular divisors that mark the vertical boundaries of
    strips,  and chooses dividends that will ultimately,  if  SRTEST  runs
    long enough,  exhaust all integers of some preassigned width.  These
    choices are determined by three parameters:


         N  =  the number of leading bits of divisors that go to the  PLA.
         L  =  Log2( Radix ) ;  i.e.,  Radix = 2^L .
         M  =  maximum bit-width preassigned for divisors.

    SRTEST  uses  (2*L + 1)*2^(N-1)  divisors,  each an  M-bit  integer of
    the form
          2.0^M - j*2.0^(M-N) + S    for   j = 0, 1, 2, ..., 2^(N-1)
    where  S  runs through the set
            { -2^(L-1), ..., -4, -2, -1, 0, 1, 2, 4, ..., 2^(L-1) } .
    These perturbations  S  provide divisors on and near the boundaries of
    vertical strips scaled up by  2^(M-1) .

    Early versions of  SRTEST  ran  S  through the set  { -1, 0, 1 } ,  but
    patterns of period  L  in carry-save remainders were noticed by  Tim
    Coe,  who observed that these patterns could severely delay revelation
    of some kinds of flaws in a PLA.  He proposed that the bigger set could
    hasten the revelation of flaws even if it did not increase  SRTEST's
    yield  ( number of failures found per division tested ).  His proposal,
    augmented slightly as decribed below,  has worked well so far.

    SRTEST's  dividends start with consecutive odd integers  1, 3, 5, ... .
    These ensure that every row of the  PLA  will be touched as soon as
    possible during the earlier quotient digits' selection,  and also tend
    to generate sequences of extremal  ( hence uncorrectable )  quotient
    digits when divided by divisors that are adjacent to the edges of the
    vertical strips.  That tendency is valuable enough to merit enhancement
    by the following measures,  suggested by  Tim Coe's  recollections of
    conversations with  Dan Zuras  held at  Hewlett-Packard Laboratories.

    Each odd integer generates three neighboring dividends.  First that
    integer is normalized into the range  2^(M-1)  ó  Dividend  <  2^M  by
    scaling by a power of  2 .  Then two more dividends are obtained from
    that one by perturbing it slightly:  choose a small integer  I ò 0  and
    add  +2^I  and  -2^I  to the normalized  Dividend.  The choice  I = 0
    has worked well enough;  I = L-1  has worked better.  That perturbation
    has enhanced the yield of  SRTEST  over fiftyfold on  Pentium.

    Since  SRTEST  may be run for a long time,  some provision for stopping
    and restarting it is necessary.  To that end,  the current odd integer
    from which three dividends were derived is displayed continually and is
    available for initialization when  SRTEST  is restarted.

    Integer formulas prevent the troubles that could be caused by faults in
    a compiler's decimal-binary conversion of nonintegers,  especially if
    that conversion uses the floating-point divide instruction that  SRTEST
    is testing.  Analogous hazards inherent in binary-decimal conversion of
    displayed output have been addressed by displaying also in hexadecimal.



    Checking a Quotient:
    ~~~~~~~~~~~~~~~~~~~~
    After what ideally would be  Q = Dividend/Divisor  has been computed
    and rounded off,  how can its correctness be ascertained without using
    possibly incorrect division operations?  SRTEST  computes the remainder
                          R  =  Dividend - Q*Divisor
    exactly and checks that it is no bigger than is compatible with a
    correctly rounded  Q .  Two details deserve a few moments' attention:

        How is  R  computed exactly?      How big is  " compatible " ?


    Provided floating-point addition,  subtraction and multiplication work
    correctly,  R  can be computed exactly if each factor of  Q*Divisor  is
    first broken into pieces that can be multiplied without roundoff.  For
    instance,  suppose  Ql  is obtained from  Q  by rounding it off to half
    as many significant bits as would normally be carried;  how to do that
    will be discussed later.  Then  Qt = Q - Ql  is computed exactly with a
    floating-point subtraction,  and consists of the trailing half of the
    significant bits of  Q .  Similarly split up  Divisor = Dl + Dt .  Then
    no product  Ql*Dl,  Ql*Dt,  Qt*Dl,  Qt*Dt  engenders any roundoff,  so
           R  =  ((( Dividend - Ql*Dl ) - Ql*Dt ) - Qt*Dl ) - Qt*Dt
    exactly provided the subtractions occur without roundoff.  For the sake
    of simplicity and speed,  SRTEST  chooses divisors with  Dt = 0 ,  so
           R  =  ( Dividend - Ql*Divisor )  -  Qt*Divisor
    suffers no rounding error and is therefore computed exactly.  ( Should
    such half-width  Divisors  be found inadequate for some reason,  SRTEST
    could easily be modified to accommodate wider  Divisors.)

    The simplest way to break  Q  into a sum  Ql + Qt  is to copy  Q  to
    Ql  and then write zeros over as many trailing significant bits of  Ql
    as suffice to ensure that  Ql*Divisor  will suffer no roundoff.  This
    bit-twiddling can be replaced by a portable but slower alternative:
                     T = H*Q   and   Ql  =  ( Q + T ) - T
    using  IEEE Standard  rounded floating-point arithmetic and a constant
    H = 2^M - 1  in which  M  is the number of trailing bits to be cleared
    from  Q .  This alternative must be used when floating-point arithmetic
    carries an odd number  2M-1  of significant bits,  in which case  M
    must barely exceed half that number and every  Divisor  must be at most
    M  bits wide;  then  Qt = Q - Ql  will turn out at most  M-1  bits wide
    but possibly negative.  In any event,  IEEE Standard 754's  rounded
    floating-point arithmetic is more than good enough to deliver  Qt  with
    no rounding error,  and likewise  R .

    How big can the remainder  R  not get if the quotient  Q  is correctly
    rounded?  Then the error in  Q  must be smaller than a floating-point
    quantity  Qe  obtained by clearing to zero all the significant bits of
    Q  except its  ( possibly implicit )  leading  1 ,  and decrementing
    its exponent by the number of significant bits carried.  Just as  Qe
    should exceed the magnitude of  Q - Dividend/Divisor ,  so  Qe*Divisor
    should exceed the magnitude of the remainder  R ;  otherwise floating-
    point division is flawed and fails the test.

    Once again,  bit-twiddling can be replaced by a portable alternative:
          Qe*Divisor  =  ScalB( Divisor,  LogB(Q) - NumberofSigBits )
    wherein  ScalB  and  LogB  are functions recommended by  IEEE Standard
    754  but  " more honour'd in the breach than the observance."  The  C
    functions  frexp and ldexp  can be used instead but less conveniently.
    Similar steps assist normalization and perturbation of dividends.



    The Proof of a Pudding is in the Eating:
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    The earliest version of  SRTEST  was programmed in  Fortran  and run on
    a  Pentium  in  December  of  1994;  it found a flaw in the  13,800 th
    quotient tested.  When the current version was programmed and run in
    February  of  1995,  it took a tiny fraction of a second to uncover the
    first failure in the  17 th  decimal of the  356 th  quotient tested:

              3221225470     BFFFFFFEh
              ----------  =  ---------  =  0.9999999996895591417 ,
              3221225471     BFFFFFFFh                       ~~~
                          not  Pentium's   0.9999999996895591047 .
                                                             ~~~
    If  SRTEST  were less sensitive to errors in quotients it might have
    had to wait for the  686 th  quotient tested to uncover a failure in
    its  8 th  decimal:

                  2684354558     9FFFFFFEh
                  ----------  =  ---------  =  0.6666666663 ,
                  4026531839     EFFFFFFFh              ~~~
                              not  Pentium's   0.6666666347 .
                                                        ~~~
    The worst relative error found,  about  1/2^14 = 6.1 E-5 ,  occurred in
    the  50,096 th  quotient tested:
        2148268000     800BF7E0h
        ----------  =  ---------  =  0.666910 ,  not  Pentium's  0.666869 .
        3221225471     BFFFFFFFh           ~~                          ~~
    Almost the same error had first occurred in the  18,932 nd  quotient
    tested.  That  Pentium's  floating-point division cannot do much worse
    has been confirmed by  A. Edelman  as an extension of results obtained
    by  Coe and Tang (1995)  to be mentioned below.

    Like most gleaners,  SRTEST  ran into the  Law of Diminishing Returns
    after running for a while;  its initial yield of one faulty quotient
    per  200  or  300  tested dwindled to one per  600,000 .  Here is a
    tabulation that shows how the yield dwindled:

         Total Number of              Total Number of
         Failures Found               Divisions Tested
         ~~~~~~~~~~~~~~               ~~~~~~~~~~~~~~~~
                   1                      356 - 684
                   4                      836 - 1,119
                  10                    2,275
                  48                    9,476 - 10,194
                 100                   20,756 - 20,950
                 450                   99,956 - 100,284
               1,000                  285,715
               2,295                  999,625 - 1,000,285
               7,884                9,998,470 - 10,003,509
              10,000               19,424,950 - 19,425,865
              17,668               99,898,960 - 100,002,685
              30,506              994,345,751 - 1,001,442,070
              53,935            9,999,879,331 - 10,000,743,330
             100,000           47,689,282,440
             112,670           56,902,042,535 - ...

    Though this yield far surpasses the one per  8,000,000,000  that random
    testing would yield,  its dwindling is significant for reasons to be
    discussed later.



    A Short History:
    ~~~~~~~~~~~~~~~~
    Early in the summer of  1994,  when comparing another processor's test
    results with  Pentium's,  Intel  discovered that the latter's floating-
    point division was defective,  but kept silent in adherence to a long-
    standing policy  ( reversed nine months later ).  In  October  Prof.
    Thomas M. Nicely,  of  Lynchburg College Math. Dept.  in  Virginia,
    traced a discrepancy between his computations on other computers and on
    Pentium  to that defect and brought it to public attention.  He
    deserves a medal for checking his work.  Then  Intel  compounded its
    error by attempting to minimize the bug,  thereby ensuring that it
    would hatch into a major embarrassment.

    Soon afterward  Tim Coe,  at  Vitesse Semiconductor Corp.,  Camarillo
    CA.,  reverse-engineered  Pentium's  divider well enough to predict,
    contrary to opinions expressed at that time  ( cf.  Anonymous (1994)),
    that  Pentium's  single-precision division could suffer relative errors
    as bad as  6.1 E-5 :
        4195835     4005FBh
        -------  =  -------  =  1.33382 ,  not  Pentium's  1.33374 .
        3145727     2FFFFFh          ~~                         ~~
    Coe  had deduced that five entries in  Pentium's  PLA  that should be
    2  were  0 ,  and that therefore divisors very slightly less than the
    five integers  18, 21, 24, 27  and  30,  or numbers obtained by scaling
    them by powers of  2 ,  were the divisors most likely to cause errors.
    These are five of the seventeen singular divisors near which  SRTEST
    focuses its testing.

    In  December 1994,  Coe's  deductions were confirmed in the white paper
    by  Barton  and  Sharangpani,  which had been released by  Intel  to
    support its estimates of failure rates like one division in  8.8
    billion,  one failure in  27,000  years for most spreadsheet users,
    and so on.  Prof. Nicely had estimated the failure rate for random
    divisions at one in  31  billion.  Estimates like these were cited to
    justify  Intel's  public stance:  Pentium  is safe for almost
    everybody.

    Meanwhile all numbers very slightly less than a small positive integer
    had come to be called  " Bruised Integers "  on the  InterNet,  where a
    brisk correspondence flourished among participants ranging from idle
    curiosity seekers to technology managers fearful of worse revelations
    to come.  Since all failures found initially involved bruised integers,
    an analyst within  IBM  decided to investigate how often they arise out
    of decimal-binary conversion errors;  for example,  " 0.3/0.1 "  incurs
    two rounding errors before the division is performed,  and that is why
    double-precision calculation of that quotient produces the bruised
    integer  2.999...999 ,  in error by  1/2^51 .  Although its author did
    acknowledge that his was a  " simplistic analysis,"  it inspired  IBM
    to suspend sales of  Pentium-based PCs  for fear that a spreadsheet
    user might encounter a mishap every  24  days on average.  IBM's
    public stance was that  Pentium  is unsafe for almost everybody.

    Nobody will ever know which estimate,  Intel's  or  IBM's,  is closer
    to the truth unless someone is employed for several months to compare
    results for identical wide-ranging real-life tasks run on  Pentiums
    with and without the division bug,  or else run on a flawed  Pentium
    using a slow counting option in the software patch described below.


    Since almost everybody can see how self-interest aligns with  IBM's
    and  Intel's  public stances,  the prevailing opinion  ( equally
    unjustified )  is that the truth must lie somewhere between.  This numb
    skepticism is a predictable consequence,  well known to litigators and
    plotters of political campaigns,  of conflicting pronouncements,  some
    reassuring and others frightening.  The public interest would be served
    better by the promulgation of do-it-yourself triage kits designed to
    help individuals assess each his own exposure to hazard.



    Bruised Integers:
    ~~~~~~~~~~~~~~~~~
    To help assess the hazards posed by  Pentium's  FDIV  bug,  a summary
    of what is known about it will be presented here.  Five entries in its
    PLA  are wrong,  0  instead of  2 ,  in columns belonging to divisors
    barely less than the five integers  18 = 12h,  21 = 15h,  24 = 18h,
    27 = 1Bh  and  30 = 1Eh  and multiples of them by powers of  2 ;  these
    five,  of the seventeen singular divisors near which all of  SRTEST's
    tests occur,  are all multiples of  3 .  Divisions tend roughly ÄÄ  not
    infallibly ÄÄ  to fail less often as divisors fall farther short of
    these singular divisors;  that is why most failures have divisors that
    are bruised integers.  Division never fails at divisors too far from
    singular;  Tim Coe  and  Dr. P-T. Peter Tang (1995)  have demonstrated
    that every divisor for which division fails on  Pentium  must,  when
    scaled by a suitable power of  2 ,  fall into one of these five narrow
    intervals:

            11.FCh  =  18 - 1/64  <  Divisor  <  18  =  12.00h ,  or
            14.FCh  =  21 - 1/64  <  Divisor  <  21  =  15.00h ,  or
            17.FCh  =  24 - 1/64  <  Divisor  <  24  =  18.00h ,  or
            1A.FCh  =  27 - 1/64  <  Divisor  <  27  =  1B.00h ,  or
            1D.FCh  =  30 - 1/64  <  Divisor  <  30  =  1E.00h .

    These intervals must be practically best-possible because  Tim Coe  has
    found failures as close to their lower boundaries as these divisors:
             17.FC01FF02ECh ,   1A.FC01FF03Eh ,   1D.FC01FF006Ch .
    ( He has also predicted that failures with random operands should occur
    in quotients'  14 th  bit-position half as often as in the  16 th  and
    subsequent even bit-positions,  where he expects failures to occur
    uniformly distributed.)

    Knowing those intervals,  Dr. Cleve Moler,  Terje Mathisen  of  Norsk
    Hydro in  Oslo,  and  Tim Coe  have been able to devise an efficient
    software patch to work around defective  Pentiums.  Briefly,  the patch
    tests every divisor and decides very quickly whether it lies in one of
    those intervals,  in which case the potentially incorrect division is
    preceded by multiplications of divisor and dividend by  15/16 .  That
    moves the divisor out of danger and,  thanks to  Pentium's  Double-
    Extended  registers,  produces exactly the same quotient as would have
    been obtained by an unflawed machine from  Single- and Double-Precision
    floating-point operands.  As implemented in  The MathWorks' MATLAB  and
    MetaWare's  C and C++  compilers,  the patch is augmented by an option
    that,  when turned on,  counts the instances when a flawed  Pentium's
    division fails;  this is the slow counting option mentioned above.


    In accord with hypothesis  W  above,  the majority of failures have
    been found at divisors and dividends that are bruised small integers;
    however,  the extent of that preponderance may be due in part to the
    way searches for failures have been conducted.  A substantial fraction
    of observed failures do not have bruised integer dividends and,  more
    ominously,  many failures have modest integer divisors.  Each row of
    the tabulation below shows how many failures were counted as dividends
    ran through the first  2^31  odd integers  ( exhausting all 31-bit wide
    dividends )  for every divisor,  each a small integer,  in that row.  A
    disturbing aspect of these failures is that none hit beyond the  26 th
    sig. bit of the quotient and three quarters of them hit in the  16 th
    to  24 th  sig. bits,  so they would degrade single-precision results.


    -----------------------------------------------------------------------

    Five divisors of equal binary widths         Count of division failures
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~         ~~~~~~~~~~~~~~~~~~~~~~~~~~
     in  Decimal:        4607,   5375,    6143,    6911,   7679         38
     in  Hexadecimal:   11FFh,  14FFh,   17FFh,   1AFFh,  1DFFh

     in  Decimal:        9215,  10751,   12287,   13823,  15359         36
     in  Hexadecimal:   23FFh,  29FFh,   2FFFh,   35FFh,  3BFFh

     in  Decimal:       36863,  43007,   49151,  55295,  61439        6355
     in  Hexadecimal:   8FFFh,  A7FFh,   BFFFh,  D7FFh,  EFFFh

     in  Decimal:       73727,   86015,   98303,  110591,  122879     6642
     in  Hexadecimal:  11FFFh,  14FFFh,  17FFFh,  1AFFFh,  1DFFFh

     in  Decimal:      147455,  172031,  196607,  221183,  245759     2070
     in  Hexadecimal:  23FFFh,  29FFFh,  2FFFFh,  35FFFh,  3BFFFh

     in  Decimal:      294911,  344063,  393215,  442367,  491519    28242
     in  Hexadecimal:  47FFFh,  53FFFh,  5FFFFh,  6BFFFh,  77FFFh

     in  Decimal:      589823,  688127,  786431,  884735,  983039    17648
     in  Hexadecimal:  8FFFFh,  A7FFFh,  BFFFFh,  D7FFFh,  EFFFFh

    -----------------------------------------------------------------------


    Plots of all known failures have revealed little more than is obvious.
    ( Some plots seen on the  InterNet  are misleading because they take no
    account of decimal-binary conversion errors incurred before plotting.)
    Plotted in the  ( Divisor, Dividend ) - plane,  failures fell into near
    horizontal grass-like streaks corresponding to vulnerable quotients;
    in other words,  all failures have divisors in the narrow intervals
    allowed by  Coe  and  Tang (1995),  and have peculiar quotients in so
    far as they contain strings of extremal quotient digits.

    Perusal of hundreds of thousands of failures indicates that quotients
    always exhibit digit patterns peculiar to the interval into which the
    divisor falls.  For instance,  for divisors strictly between  11.FCh
    and  12.00h ,  and for dividends between  10.00h  and  20.00h ,  so
    that quotients must lie between  0.E38E38...h  and  1.C781AB...h ,  all
    the quotients that  Pentium  got wrong looked like this in binary:


                  right quotient  =  ...WwXx1y 101010Zz...
                  wrong quotient  =  ...WwXx1y 011111.....
    where  W, w, X, x, y, Z, z  are bits and  W+w = X+x = Z+z = 1 .  Blanks
    have been inserted just before the first bit that changed in the final
    quotient although the error in  Pentium's PLA  actually changed the
    second  1  after the blank into a  0 ,  after which a negative quotient
    digit got selected;  its propagated borrows decremented the first  1 .
    The numbers of bits after the binary point and ahead of the blank were
    always even and at least  12 .  Those same even numbers but slightly
    different bit patterns arose with divisors in the other four intervals.

    A better understanding of these patterns awaits refinements of methods
    like those  Coe  and  Tang  used for their demonstration cited above.
    The patterns are obscured by a coincidence;  all the singular divisors
    adjacent to  Pentium's  flaws are multiples of  3 ,  and generally the
    divisors that produce long sequences of extremal quotient digits from
    small integer dividends lie near small integer multiples of a divider's
    ( Radix - 1 ) ,  here  3  also.



    Illusory Randomness:
    ~~~~~~~~~~~~~~~~~~~~
    Other than the foregoing,  the failures appear random.  Appearances are
    deceptive;  the failures are not random.  They look random just because
    we have yet to find a few simple formulas that predict them well.  Such
    formulas must exist because the failures are  "algebraic";  they arise
    out of the omission of a term from an expression sent to a  CAD  tool
    that constructed  Pentium's PLA.  Therefore the failures all lie in a
    smooth curve or surface that first gets chopped up and reflected,  by
    the  SRT  division process,  in a way that resembles the effect of a
    kaleidoscope with cracked mirrors.  Second,  carry-save representation
    of remainders further riddles the failures' locus with holes in a way,
    still ill understood,  that contributes most to the failures' apparent
    randomness.  But to presume randomness is to compound our ignorance.

    There are other reasons to suspect a pattern among the failures.  It is
    foreshadowed by those failures with small integer divisors.  A better
    harbinger is the initially high yield of failures exposed by  SRTEST :
    among its  80  divisors only ten were capable of generating failures,
    and those ten did so at the rate of about one failure per  36
    divisions among the first  35,715  in which they participated.  So high
    a failure rate from so simple a program as  SRTEST  warns us that other
    software may generate high failure rates too,  but inadvertently.

    What kind of applications software is most at risk?  The risk has less
    to do with an application's data or with its frequency of divisions
    (unless that be zero)  than with the formulas the software uses.  Far
    from being independent random variables,  dividends and divisors are
    generally correlated algebraically by the formulas that generate them,
    and therefore lie in curves and surfaces peculiar to the application.
    If these figures fall tangent to the curves or surfaces comprising all
    failures,  then failures will swarm around that application.  If,  as
    is far more likely,  intersections are instead transversal  (not nearly
    tangent)  then that application will experience almost no failures.


    It follows that almost all applications software can be divided into
    two classes.  The larger by far consists of software whose failure rate
    due to  Pentium's  defective division is very tiny,  much too tiny to
    estimate reliably within orders of magnitude.  Software almost certain
    to fail constitutes the smaller class;  containing  SRTEST,  it cannot
    be empty.  Unfortunately,  to classify any particular program that does
    perform floating-point divisions,  we do not yet know a better way than
    to run it on a defective  Pentium  for a while.

    Why did  SRTEST's  yield,  the number of failures found per division
    tested,  slowly dwindle as testing continued?  Apparently,  the more
    productive  M-bit  dividends end in longer bit-strings like  ...111111
    or  ...000000  or  ...000001 .  The dwindling yield is consistent with
    the geometrical picture portrayed above,  and hence consistent with the
    existence of some simple but not yet recognized formula that would
    generate all failures.  That formula and  SRTEST's  formula generate
    dividends and divisors falling along curves tangent to each other,  but
    the neighborhood of tangency contains only earlier dividends  SRTEST
    obtained from small integers.  Later dividends,  obtained from bigger
    integers,  fall far enough from this neighborhood that the failure rate
    dwindles to something approaching the rate for random dividends.



    Unanswered Questions:
    ~~~~~~~~~~~~~~~~~~~~~
    " Seek Singularities ! "  makes a fine maxim but not a firm foundation
    for a test program like  SRTEST  until the singularities' locations are
    understood better.  SRTEST  lacks an adequate theory for its behavior
    with carry-save remainders;  further investigation is needed badly.
    Among questions that come to mind are these:

        How best can knowledge of a divider's  Radix  be exploited?

        Without knowing the redundant quotient-digit set nor whether carry-
             save remainders are used,  can we characterize all the most
             vulnerable quotient-digit sequences and then multiply them by
             near-singular divisors to get most vulnerable trial dividends?

        What can be done to render the program  SRTEST  invulnerable to the
             vagaries of compilers,  and hence more widely portable?

        If run long enough can  SRTEST  prove a divider correct?  How long?


    In our present state of  (mis)understanding,  we cannot substitute a
    test like  SRTEST  for a mathematical proof of correctness.  The best
    that might be said for  SRTEST  is that it beats the  Test of Time:
    millions of customers used  Pentiums  for months without noticing any
    error in division.  That proves customers rarely check their work,  and
    that very few of them get an opportunity to do what  Intel  and  Nicely
    did to discover  Pentium's  flaw:  compare results obtained for the
    same computation from different computers.

    Besides,  what could a customer who noticed a discrepacy do about it?


    If caught during debugging,  a numerical discrepancy would probably be
    attributed to roundoff,  which is the paramount cause of discrepancies.
    Since rounding errors are so difficult to understand,  a programmer can
    easily misdiagnose the discrepancy and alter her program ineffectually
    and,  having changed a divisor by another rounding error,  no longer
    see the discrepancy.  ( This volatility is explained in the  Appendix
    on the Vagaries of Carry-Save.)  Consequently,  despite many observed
    discrepancies,  flawed division would remain unreported.

    Ultimate users of numerical software routinely misinterpret incorrect
    results.  If noticed at all,  usually long after they were committed,
    errors are often blamed upon roundoff;  to evade them,  users will
    change to different software or change data slightly rather than waste
    time performing someone else's debugging duties.  Many a computation is
    hypothetical,  performed to predict consequences of a proposed action;
    disagreeable consequences preclude that action.  When numerical errors
    predict disagreeable consequences mistakenly,  no action follows;  the
    mistake goes unnoticed.  Flawed division would remain unnoticed too.

    Thus,  reports from the field must badly under-report the incidence of
    error due to faulty arithmetic so long as a fault remains unsuspected;
    an absence of reports during that time cannot be construed afterwards
    as evidence that nobody but the first to report it had been affected by
    the fault.  As a fault becomes public knowledge it becomes a target for
    the idly curious and a scapegoat for ills that otherwise resist easy
    explanation,  and the incidence of error tends to become over-reported.
    Worse,  conscientious professionals must then start chasing wild geese,
    checking old results for contamination and burdening new software with
    extra checks against errors that continue,  as before,  to be caused
    almost always by some mistake other than faulty arithmetic.

    The computing industry is served badly by the  Test of Time;  if passed
    for a while it conveys a false sense of security;  when failed it fails
    too late.  Tests like  SRTEST  are far preferable however imperfect.

    Like any test program,  SRTEST  has to be tested too.  It should be run
    on numerous  PLAs  each with a different intentional defect to see how
    long  SRTEST  takes to uncover defects.  Such a test is made difficult
    by the self-correcting propensity of  SRT  division;  many an intended
    "defect"  could turn out to be not a defect after all,  or to harm only
    wide quotients with more digits than the test examines.  Whatever the
    difficulties,  we have to attempt such a test before we can say with
    confidence that  SRTEST's  prompt exposure of  Pentium's  floating-
    point divide bug was no accident.



    Temptations to be Resisted:
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Neither  SRTEST's  nor any similar test's results can support estimates
    of the  "probability"  that an unspecified user of defective arithmetic
    will suffer from the defect.  The probability that a random test would
    uncover the defect is logically irrelevant.  To see why,  suppose that
    a divider produced a grossly wrong quotient when and only when division
    by  3.0  occurred in double or double-extended precision.  Since  3.0
    is one of at least  2^52 = 4.5 E 15  double-precision floating-point
    numbers,  a random test could be expected to uncover this bug with
    probability less than about  2.3 E-16 ,  which is far smaller than any
    of the probabilities that have been bandied about concerning  Pentium's
    bug.  However,  division by  3.0  is so ubiquitous that its defect
    could not reasonably be expected to stay hidden for so long as one day.

    Similarly,  SRTEST's  yield of about one defective division per  250
    of them during its first few seconds' running is no probability;  if a
    substantial fraction of  Pentium  owners faced that kind of hazard they
    would have drowned  Intel  with complaints a few weeks after purchase.

    We must not promulgate dubious statistics,  especially if they tend to
    make light of an obvious bug's impact upon the public,  lest opponents
    respond with their own self-serving statistics tending the opposite
    way.  Both sides will claim their predictions borne out by events.  The
    controversy will draw attention to ever more embarrassing mishaps that
    turn up inevitably,  most of them misattributed to the bug;  and then
    prophets of doom,  cartoonists and lawyers will close in to exact the
    public's revenge for a betrayal of their trust.

    Here is a better policy:

    1:  Never try to hide a bug nor postpone acknowledging its existence;
        the attempt merely destroys the public's confidence in whatever you
        are obliged to say at last.  The rarer the bug,  the more careful
        you must be to say only what can be said with certainty,  since the
        statistics of extremely rare events,  far out on the tails of
        probability distributions,  are notoriously treacherous;  see the
        Appendix on the Fragility of Low Probabilities  for an explanation.
        Attempts to estimate extremely low rates of incidence are too
        vulnerable to unanticipated correlations to be worth publishing.

    2:  Undertake to replace defective parts as quickly as corrected parts
        can be made available.  Past experience and statistical methods can
        serve validly to estimate the rate at which demand for replacements
        may ramp up,  but alluding to uncertifiable statistics in order to
        dampen public demand may merely exacerbate legal liabilities.

    3:  Provide software patches to work around the bug,  if that is
        possible,  as soon as possible and without waiting for mitigations
        of performance degradation.  Enhancements to performance can come
        later;  prevention of further errors must come first.  Besides,  if
        the bug will depart soon then effort spent understanding it well
        enough to circumvent it cleverly is effort misplaced,  and risky
        too;  excessive cleverness invites more bugs.


    4:  Supply utility software that scans executable files to detect any
        vulnerabilities  ( or to certify that none exist )  and then to
        patch vulnerable programs as nearly automatically as possible.
        Such a scanner would allow most customers confidently to delay
        replacing the defective part until a convenient occasion,  perhaps
        forever.  To produce such a scanner is not trivial;  it must avoid
        false negatives arising from self-modifying code,  cope with false
        positives arising from a variable-length instruction set,  and deal
        with files that get decompressed only during execution.  Aided by
        electronic bulletin-boards,  the scanner can exploit catalogs of
        programs known some to be invulnerable,  some to be unpatchable,
        and some to accept patches supplied with the help of providers of
        widely used software;  and other software experts will help too if
        not asked to do too much.

    The foregoing  " Counsel of Perfection " ÄÄ  more easily said than done
    ÄÄ  will appear altogether too high-minded to many a practical man of
    affairs called in to cope with bad news after it has broken.  Perhaps a
    policy better than mine exists,  yet it must have the same goal,  which
    is to protect a corporation's most valuable asset ÄÄ  the confidence of
    its customers.   When that confidence is lost,  the consequences are
    cynicism and distrust that have always,  in my experience going back
    over four decades,  incurred costs and lost revenues substantially
    greater than whatever short-term gain tempted somebody to adopt a mere
    expedient.  The marketplace soon forgets the technical details of a bug
    but long remembers the sentiments aroused by the way it was handled.

        Therein lies the lesson to be learned from  Intel's  tragedy:
        Bruised by  Intel's  initial handling of  Pentium's  flaw,  and
        confused by conflicting pronouncements and superficial journalism,
        the public has now lost some of its confidence in the reliability
        of  Pentium's  floating-point architecture.  That is exasperating
        because,  with its bug corrected,  Pentium  gives consumers what I
        deem to be much the most reliable among all currently commercially
        significant microprocessor arithmetics.  ( Having helped design it
        that way in  1977,  I may be biased.)




    Annotated Bibliography:
    ~~~~~~~~~~~~~~~~~~~~~~~
    Anonymous (1994 )  on p. 5 of Microprocessor Report vol. 8 no. 16 for
        Dec. 5, 1994: " ... Intel  has offered to replace the processor of
        any user affected by the problem.  This offer should not have many
        takers,  as the few divisors that generate erroneous results cause
        errors of a few parts per billion.  Single-precision calculations
        are not affected at all.  Most PC users don't even use their  FPU,
        and engineers typically use single-precision."  Early false report.

    T. Coe  and  P-T.P. Tang (1995) "It Takes Six Ones to Reach a Flaw"
        submitted for the  12th IEEE Symp. on Computer Arithmetic.

    W.J. Cody et al. (1984) "A Proposed Radix- and Word-length-independent
        Standard for Floating-point Arithmetic"  p. 86-100 of IEEE Micro,
        Aug. 1984.  This explains  IEEE Standards 754 and 854 for Floating-
        Point Arithmetic  better than the texts of the standards.

    G. Barrett (1987) "A Formal Approach to Rounding" p. 247-254 of Proc.
        8th IEEE Symp. on Computer Arithmetic,  ed. by  Mary Irwin  and  R.
        Stefanelli.  This mentions  David Shepherd's  1986  "Proof"  that
        the  Inmos T800  conforms to  IEEE 754.  I do not know where
        Shepherd's  work has been published,  other than as a report from
        Oxford University's Programming Research Group,  nor whether the
        errors in the  T800  have been acknowledged in print or corrected.

    M.L. Barton  and  H. Sharangpani (1994) "Statistical Analysis of
        Floating Point Flaw in the Pentium(TM) Processor (1994)",  an
        Intel white paper dated  November 30th 1994.

    J. Fandrianto (1987) "Algorithm for High Speed Shared Radix 4 Division
        and Radix 4 Square Root " p. 73-79 of Proc. 8th IEEE Symposium on
        Computer Arithmetic,  ed. by  Mary Irwin  and  R. Stefanelli.  This
        converts  8  leading bits of a carry-save remainder to sign-
        magnitude and feeds  7  bits of that magnitude to a smaller  PLA
        to select the magnitude of a quotient digit from  { 0, 1, 2 }.

    J. Jarrett (1995) "Postmortem on the Pentium (TM) Processor Crisis"  p.
        8-11 of the  Jan. 1995  issue of  Intel Leads,  "published by and
        for  Intel Corporation."    Explains  Intel's  position.

    W. Kahan (1986) "DIVCK",  a suite of BASIC  programs and documentation,
        distributed electronically via  NETLIB  in  1987,  written at the
        behest of  Dr. Cleve Moler,  then at  Intel  in  Oregon,  to test
        whether binary floating-point division is correctly rounded.  These
        programs test efficiently only the quotient's last digit.  Updated
        versions of these and several more programs to test floating-point
        arithmetic and its library of elementary mathematical functions
        will appear in a suite soon to be released to the public domain by
        my former students at  Sun Microsystems  led by  Dr. David G.Hough.

    I. Koren (1993) "Computer Arithmetic Algorithms" Prentice-Hall, N.J.
        Ch. 7  of this text explains details of  SRT  division well.  A new
        edition is imminent.  However,  neither its author nor any other I
        know discusses the challenge posed by black-box testing of  SRT
        dividers.  If I have missed someone I shall welcome being put
        right;  SRTEST  is too simple for me to believe that I am the first
        to think of it.


    C. Moler (1995) "A Tale of Two Numbers" p. 10-12 of MATLAB News & Notes
        for Winter 1995;  also p. 1 & 16 of SIAM News vol. 28 no. 1 (Jan.
        1995)  Recounts the  Pentium FDIV bug's history and proposed patch.

    G.S. Taylor (1981) "Compatible Hardware for Division and Square Root"
        p. 127-134 of ...
    G.S. Taylor and D.A. Patterson (1981) "VAX Hardware for the Proposed
        IEEE Floating-Point Standard"  p. 190-196 of Proc. 5th IEEE Symp.
        on Computer Arithmetic.



    Acknowledgements:
    ~~~~~~~~~~~~~~~~~
    SRTEST  is based upon conversations I held with  Dr. George S. Taylor
    in  1980  when,  as a student,  he designed an  SRT  divider very much
    like  Pentium's;  see his  1981  papers.  His design was built as part
    of a larger project that was discontinued after another student,  David
    Cary,  died in a tragic swimming accident.  The divider never got to be
    tested by a program like  SRTEST,  which was therefore not distributed.

    I am indebted to colleague  Prof. L. Rowe  for access to his  Pentium
    to run my tests initially,  to  Intel's  H.P. Sharangpani  for an early
    copy of his white paper containing the probabilistic estimates that I
    disparage herein,  to  Intel's  Jon Marshall  for a  Pentium  to test
    more thoroughly,  and to  Tim Coe  of  Vitesse Semiconductors  and
    Prof. A. Edelman  of  MIT  for indispensable advice and assistance.
    =======================================================================



    Appendix:  Fragility of Low Probabilities
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Whenever an engineer's mistake is discovered too late,  engineering
    managers will require assessments of the mistake's impact before they
    can choose a course of remedial action.  Decisions are based often upon
    probabilistic estimates of failure rates;  lower estimates incur less
    desperate remedial activity.  Unfortunately,  the lowest estimates tend
    also to be the most unreliable,  and not merely because they may be
    self-serving.  Circumstances do arise wherein low probabilities lack
    predictive power,  and then decisions based upon them will fare worse
    than decisions reached after a frank acknowledgement of ignorance.

    The kind of probabilistic reasoning customarily employed to predict
    failure rates for telephone equipment and semiconductor fabrication
    cannot be employed with confidence to predict failure rates due to an
    error in the design of an arithmetic unit.  Confidence is undermined by
    several considerations,  foremost among them being the utterly non-
    random nature of that error after it has been found.  To proceed with
    probabilistic reasoning we must assume that operands are random,  that
    operations occur with frequencies revealed by random sampling,  that
    inaccuracies in intermediate results propagate into final results at
    rates determinable by statistical methods,  and so on.  Were every such
    assumption a reasonably good approximation in its own right,  yet their
    concatenation into a chain of inference would still be unreasonable,
    the more so if it predicted an extremely low failure rate.  Predictions
    like that are often falsified by rare events that have been overlooked
    completely,  and by correlations that have been neglected.


    + Footnote:  ---------------------------------------------------------+
    | Among recent completely unanticipated rare events have been ...     |
    |  --  Keeping a  Patriot  anti-missile battery in continuous action  |
    |         for more than a few hours during the  Gulf War  of  1991,   |
    |         fatally degrading the battery's interception capability.    |
    |  --  Unseasonably cold weather that embrittled a sealing ring on    |
    |         the fuel tank of the ill-fated  Challenger  space craft.    |
    +---------------------------------------------------------------------+

    Probabilistic inferences about rare events are easily vitiated by small
    correlations that affect commonplace events negligibly.  For instance,
    suppose a calamity occurs only when two rare events  A  and  B  both
    occur,  and let their probabilities be  à  and  á  respectively,  both
    very tiny numbers like  1.0 E-6 .  If  A  and  B  are independent,  the
    calamity's probability  à á  is tinier by far.  But if  A  and  B  are
    slightly correlated,  with small correlation coefficient  ‡ ,  then the
    probability of calamity can turn out far bigger than  à á ,  depending
    much more strongly upon  ‡  than do the probabilities of other compound
    events like  ( A and not B ).  Here are relevant formulas simplified by
    the substitutions  a = 1-à  and  b = 1-á :

                 Compound Events            Probabilities
               ~~~~~~~~~~~~~~~~~~~    ~~~~~~~~~~~~~~~~~~~~~~~~
               A and B  (calamity)    ( û(àá) + ‡û(ab) ) û(àá)
                   A and not B        ( û(àb) - ‡û(aá) ) û(àb)
                   B and not A        ( û(aá) - ‡û(àb) ) û(aá)
                 neither A nor B      ( û(ab) + ‡û(àá) ) û(ab)

    For example take   à = 1.0 E-6 ,   á = 4.0 E-6 ,  and try three values

               ‡  =  1.0 E-2  ( A  slightly correlated with  B ),  or
                  =    0      ( A  independent of  B ),          or
                  = -2.0 E-6  ( A  very slightly anti-correlated with  B ).

                                       Probabilities with  ...
         Compound Events        ‡ = 1.0 E-2     ‡ = 0     ‡ = -2.0 E-6
       ~~~~~~~~~~~~~~~~~~~      ~~~~~~~~~~~   ~~~~~~~~~   ~~~~~~~~~~~~
       A and B  (calamity)       2.00 E-8     4.00 E-12    1.00 E-17
           A and not B           9.80 E-7     1.00 E-6     1.00 E-6
           B and not A           3.98 E-6     4.00 E-6     4.00 E-6
         neither A nor B         0.999995     0.999995     0.999995

    Thus can probabilistic assessments of rare events be destroyed utterly
    by almost imperceptible correlations that would customarily be ignored
    quite properly during assessments of populations' central tendencies.
    To ignore such correlations just because we cannot ascertain them is
    methodologically unsound.



    Appendix:  Vagaries of Carry-Save
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    The carry-save representations of a number  X = C + S  consist of all
    pairs  C  and  S  that sum to  X ;  they are as numerous as are all the
    binary numbers of the wordsize chosen for  C  and  S .  However,  they
    occur with rather different frequencies,  and that is why carry-save so
    severely complicates the task of testing  SRT  dividers.

    The typical provenance of carry-save representations is illustrated by
    an example that shows what happens when an addend  A  is added to  X
    to produce a new pair   C'+ S' =  A + X  =  A + C + S :

          addend           A   =    000...11110000
                        +  C   =    000...11001100
                        +  S   =    000...10101010
                       ------     -----------------
         new sum word      S'  =    000...10010110
         new carry word   C'/2 =    000...11101000


    The bits of  C'  have been displayed shifted right one bit-position to
    expose a phenomenon:  Aside from common leading zeros,  corresponding
    bits of  S  and  C/2  tend to be opposites three times as often as they
    match.  The significance of this phenomenon will become apparent later.

    Quotient digit selection is based upon only a few leading bits of the
    current remainder  X ;  let  Z(X)  denote their value.  When this is
    obtained from the leading few bits of  C  and  S  in the carry-save
    representation of  X = C+S ,  the numerical value of  X  need not
    determine  Z(X)  uniquely.

    For example,  suppose two five-bit carry-save words,  with four bits
    after the point,  are chopped to produce an integer  Z(X) ;  here are
    all possibilities  ( disregarding words' order )  for  X = 9/8 :

      Z(9/8) = 1  for  3  instances:
        0.0000   0.0001   0.0010            ( Carry-save adders cannot
        1.0010   1.0001   1.0000              yield instances in which both
                                              words end with  .....1 .)
      Z(9/8) = 0  for  7  instances:
        0.0011   0.0100   0.0101   0.0110   0.0111   0.1000   0.1001
        0.1111   0.1110   0.1101   0.1100   0.1011   0.1010   0.1001

    Thus,  carry-save renders the  " function "  Z(X)  ambiguous in so far
    as it may take either of two different adjacent values at different
    instances of the same numerical value  X .  That ambiguity exacerbates
    the volatility of quotient digit selection,  as we shall see now.

    Quotient digit selection is intrinsically volatile whenever a final
    quotient ends in a long string of zeros because then changes in the
    least-sig. bit of divisor and/or dividend that alter at most the least-
    sig. bit of the final quotient can change so many earlier selections of
    quotient digits.  The changes in those selections get undone by carry-
    propagation after the final rounding.

    Carry-save representations of remainders are volatile in a similar way,
    especially when divisors end in long strings of zeros or of ones,  as
    do  SRTEST's.  The ambiguity of  Z(X)  conducts this volatility to the
    selection of quotient digits.  Thus,  slightly different  dividend/-


    divisor  pairs that generate almost the same final quotient frequently
    generate very different sequences of  (redundant)  quotient digits.  In
    consequence,  the leading digits of divisors and dividends,  though
    they do determine leading digits of final quotients,  frequently do not
    determine as many leading selections of  (redundant)  quotient digits.

    We shall find two practical consequences of this volatility.  First,
    it will attenuate the capability of tests to uncover certain kinds of
    errors in whatever  PLA  or equivalent logic is used for quotient digit
    selection.  Second,  that kind of error will almost certainly get
    misdiagnosed as a common flaw,  namely hypersensitivity to roundoff,
    in an applications program because the error's effect will flicker out
    of sight after a change in end-figures of input data.  ( Programmers
    tend to cope with roundoff very poorly if at all.)

    What kind of error is so closely linked to carry-save's volatility?
    Think of the  PLA  as a table of quotient digit selections whose every
    entry we wish to exercise by tests that generate dividends and divisors
    for that purpose.  The table's most extreme entries ÄÄ  those on the
    boundary beyond which selection should never go ÄÄ  are most likely to
    be reachable only by deftly orchestrated sequences of division steps
    for which divisors and dividends must be calculated quite precisely
    lest,  because of volatility,  roundoff defeat their purpose.  Lacking
    knowledge of those sequences,  the architect of a test program like
    SRTEST  must exercise those extreme entries some other way,  and that
    turns out to be extremely difficult.

    The difficulty arises out of a peculiarity of carry-save adders already
    mentioned.  For definiteness continue to suppose that  Z(X)  is an
    integer obtained by chopping fractional bits.  Without carry-save
    representation of a non-integer remainder  X ,  its integer part  Z(X)
    is determined uniquely:
                        Z(X) = N  ó  X = N+f  <  N+1 .
    With carry-save there are two possibilities:  Z(X) = N  or  Z(X) = N-1.
    If one of the two carry-save words representing  X = C+S  were a
    uniformly distributed random variable,  the former possibility would
    occur with probability very near the fractional part  f  of  X .  ( A
    corroborating probabilistic computation has been omitted here.)

    However,  carry-save adders produce pairs of words of which each tends
    to have,  after their common leading zeros,  roughly as many zero bits
    as the other has ones.  This happens because those adders produce pairs
    of  "sum"  and  "carry"  bits that are three times more likely to be
    different than to be the same,  given randomly chosen addends.  In
    consequence the error committed when both words' fractional bits are
    discarded tends to avoid the extremes,  0  and  2 ,  of its range;
    when the fraction  f  is small the larger possibility  Z = N  occurs
    with probability far less than  f ,  and when  f  is nearly  1  the
    lesser possibility  Z = N-1  has probability far less than  1-f .  That
    is why some extremal values of  Z(X)  may occur very infrequently.

    In experiments with random numbers,  the probability that  Z(N+f) = N
    was observed to be approximated surprisingly well by a smooth function
                 arcsin( sin( ãf )^4 )/ã       if    0 ó f ó 1/2  ,
             1 - arcsin( sin( ã(1-f) )^4 )/ã   if  1/2 ó f < 1  .
    It is a mystery.  If this approximation can be trusted,  Z(9/8) = 1
    can be expected to occur not  1/8  but somewhat less often than  1%  of
    the time.  Perhaps bias of this kind could be alleviated by retaining
    one more bit of each carry-save word than will be delivered to  Z(X)
    from their sum,  as  Fandrianto (1987)  did,  though doing so might
    accomplish no more for an  SRT  divider than render its defect,  if
    there be one,  easier to uncover by testing.




    ???????????????????????????????????????????????????????????????????????

    Awaiting Exposition:

    Tests of  SRTEST  on artificial flaws in simulated  SRT  dividers.

    Convert  Fortran SRTEST  to  C .

    Give example of financial miscalculation due to  Pentium's  bug?
    ( Hearsay on p. 32 of PC Magazine,  7 Feb. 1995 ?)

    Are corrected  Pentiums  distinguishable visually from older faulty
         chips?  Vaughan Pratt  claims that the older ones have a mark
         " SX 857 "  on their bottom,  newer have  " SX 957 ".  But my
         old one is marked  "SX 879" .

    ???????????????????????????????????????????????????????????????????????