/*
* Nearpi, a C program to exhibit large floating-point numbers
*
* Z = m * 2 ^ L
*
* very close to integer multiples of pi / 2 ,
*
* where m = m0 / e , e <= m0 <= f , and
* e = 2 ^ (D - 1)
* f = 2 ^ D - 1
* L = the binade
* D = significant figures in floating-point numbers.
*
* Adapted by S. McDonald from a Basic program written by
* W. Kahan.
* -S.McD 6/21/83
* revised 6/ 7/84
*
* One should consult the handout "Minimizing q * m - n" below
* by W. Kahan, March 1983, for background material regarding
* this program.
*
* This program was written for VAX double precision ( D = 56 ),
* but may be modified for VAX single or quad precision by appropriate
* modification of D , MAX_EXP and the output section.
*
* With the exception of p , q , x and z , all floating-point
* variables in this program are used as D-bit integers.
*
* If you are implementing this algorithm on another machine,
* you won't need to use any floating-point numbers (except
* for p , q , x and z ) if your machine has an integer type
* with at least D bits. That is to say, this program deals almost
* entirely with integers (and will be slightly easier if you
* use them); I'm using floating-point numbers because my machine's
* integers aren't wide enough.
*
* In any case, it is remarkable that the same precision as the
* code you are trying to test, argument reduction for trigonometric
* functions, is sufficient to test the code!
*
* Simply stated, what happens is this: a continued fraction
* related to pi is repeatedly evaluated and doubled; each
* evaluation produces all the integer multiples of pi / 2
* between the current doubled value and the next.
*
* For those of you with base sixteen machines, you are in luck.
* Instead of doubling once after each evaluation, you double four
* times. Decimal machines, alas, are another matter. Indeed,
* it is possible to produce the continued fraction for five
* times a continued fraction, but it isn't as easy as doubling.
* Therefore, only the modifications necessary for base sixteen
* machines are noted throughout the program. Of course "binade"
* should be read "B_ade" when the base, B , isn't binary.
*
* In background article below, each doubled continued fraction
* was calculated afresh the hard way, i.e. for each binade
* a multi-precision number was doubled, then converted to a
* continued fraction.
*
* Prof. Kahan soon discovered, though, that the doubling needn't
* use any multi-precision floating-point arithmetic. A method
* for doubling due to A. Hurwitz (1891) can be used instead.
* -S.McD
*
*
* Minimizing q * m - n W. Kahan, March 1983
* -------------------------
*
* Notation: Letters i, j, k, l, m, n denote integer
* variables.
* B = radix of floating-point arithmetic with
* d = no. of sig. B-digits
*
*
* Problem: Given an irrational real number q, we seek the several
* closest rational approximations n / m to q with m confined
* to a prescribed interval. For example, to compute trigonometric
* functions in floating-point we must perform "Argument Reduction";
* given a floating-point number B ^ l * m with
*
* d - 1 d
* B =< m =< B - 1
*
* we seek an integer n such that
*
* l
* B * m = (n + x) pi / 2 with |x| =< 1 / 2 ,
*
* and for full accuracy we must compute the reduced argument
*
* l
* x = (2 B / pi) m - n
*
* to full precision (d sig. digits) or better despite cancellation.
* For what values m in the allowed range will x be unusually tiny
* and hence hard to compute accurately? In this example
*
* l
* q = 2 B / pi
*
* is closely approximated by
*
* n / m = q - x / m
*
* just when x is very tiny.
*
* To solve this problem we shall use the continued fraction
*
* 1
* q = i + -------------
* 0 1
* i + -------------
* 1 1
* i + -------------
* 2 1
* i + -------------
* 3 1
* i + -------------
* 4 . . .
*
*
* where either every i >= 1 except possibly i ,
* k 0
*
* or every |i | >= 2 except possibly i ,
* k 0
*
* though usually i will turn out to be enormous.
* 0
*
* Associated with q are certain sequences calculated by
* recurrences:
*
* Let q := q , p := l := j := 1 , j := l := 0 , ( l := i ,)
* 0 0 0 -1 0 1 -1 0
*
* and for k = 0, 1, 2, 3, ... in turn
*
* i := ( q chopped or rounded to an integer ), ( l := i ,)
* k k -1 0
*
* q := 1 / ( q - i ) , ... so q = i + 1 / q ,
* k+1 k k k k k+1
* k + 1
* p := -q p , ... = q q q ... q * (-1) ,
* k+1 k+1 k k+1 k k-1 1
*
* j := j - i j (so j = 1 )
* k+1 k-1 k k 1
*
* l := l - i l (so l = 0 ) [ not normally
* k+1 k-1 k k 1 calculated ]
*
*
* Note that, since all |q | > 1 for k > 0, so |p | --> oo as k --> oo
* k k
* k/2
* (In fact |p | > 2 .)
* k
*
*
* Lemma 0 : j - j / q = p
* --------- k+1 k k+1 k
*
* Proof: At k = 0 we find j - j / q = 1 - 0 / q = 1 = p
* 1 0 1 1 0
*
* as claimed, and by induction for k > 0 we find, assuming
*
* j - j / q = p , that also j - j / q = j - ( i + 1 / q ) j
* k k-1 k k-1 k+1 k k+1 k-1 k k+1 k
*
* = j - q j
* k-1 k k
*
* = -q ( j - j / q )
* k k k-1 k
*
* = -q p
* k k-1
*
* = p . []
* k
*
*
* Next we relate m, n and x to recurrences:
*
* m := n , m := m , x := x ,
* -1 0 0
*
* and for k = 0, 1, 2, 3, ... in turn
*
* m := m - i m m = m + i m
* k+1 k-1 k k invertible k-1 k+1 k k
* via
* x := m q - m x = m q - m
* k+1 k+1 k+1 k k k k k-1
*
*
* m j l m
* k+1 k+1 k+1 1
* Lemma 1 : ( ) = ( ) ( ) and
* --------- m j l m
* k k k 0
*
* m l -l m
* 1 k k k+1 k+1
* ( ) = (-1) ( ) ( )
* m -j j m
* 0 k k+1 k
*
* j l
* k+1 k+1 1 0
* Proof: When k = 0 we find ( ) = ( ) as expected.
* j l 0 1
* k k
*
* m -i m
* k+1 k 1 k
* For k > 0, ( ) = ( ) ( )
* m 1 0 m
* k k-1
*
* -i j l m
* k 1 k k 1
* = ( ) ( ) ( ) etc. by induction.
* 1 0 j l m
* k-1 k-1 0
*
* j l j l
* k+1 k+1 k k+1 k+1 -1
* Similarly, det ( ) = (-1) , so ( )
* j l j l
* k k k k
*
* l -l
* k k k+1
* = (-1) ( ) . []
* -j j
* k k+1
*
*
* Lemma 2 : x = -q x = p x
* --------- k+1 k+1 k k+1 0
*
* Proof: x = q ( m - m / q )
* k+1 k+1 k+1 k k+1
*
* = q ( m - m ( i + 1 / q ) )
* k+1 k-1 k k k+1
*
* = q ( m - m q )
* k+1 k-1 k k
*
* = -q x = (p / p ) x = p (x / p )
* k+1 k k+1 k k k+1 0 0
*
* = p x by induction. []
* k+1 0
*
*
* k+1
* Lemma 3 : m = -x + m / q = -j x + (-1) m / p
* --------- k+1 k k k+1 k+1 0 0 k+1
*
* Proof: m = ( m + x ) / q = -x + m / q by lemma 2, so
* k+1 k k+1 k+1 k k k+1
*
* m 1 / q m
* k+1 k+1 -1 k
* ( ) = ( ) ( )
* x 0 -q x
* k+1 k+1 k
* k
* -p / p (-1) / p -j m
* k k+1 -1 k k 0
* = ( ) ( ) ( )
* 0 p / p 0 p x
* k+1 k k 0
*
* k+1
* (-1) / p j p / p - p m
* k+1 k k k+1 k 0
* = ( ) ( )
* 0 p x
* k+1 0
*
* provided this lemma is true for some k >= 0 [ as it is for
* k = 0 since m = -j x - m / p = -x + m / q
* 1 1 0 0 1 0 0 1
*
* = ( x + m ) / q . ]
* 1 0 1
*
* And the proof will be complete by induction if
* j p / p - p = -j ; this follows from lemma 0 . []
* k k k+1 k k+1
*
*
* k
* Corollary : x = x = ((-1) m / p - m ) / j = x / p
* ---------- 0 0 k k k k k
*
* = ( m q - m ) / p
* k k k-1 k
*
* k
* and so m = (-1) ( ( p + q j ) m - j m )
* 0 k k k k k k-1
* k
* = (-1) ( j m - j m )
* k-1 k k k-1
*
* [ By lemma 0, p + q j = j + j ( q - 1 / q )
* k k k k+1 k k k+1
*
* = j + i j = j alright. ] []
* k+1 k k k-1
*
*
* Application to the Problem
* ---------------------------
*
* Since m = m is confined to some interval, namely
* 0
* d - 1 d
* B =< m =< B - 1 , yet |p | --> oo ,
* k
* so there must be some k = K for which the interval
* containing m / p contains just a few integers,
* k
* i.e. d d - 1
* ( B - B - 1 ) / |p | =< a few.
* k
* k
* For that k = K we find x = ((-1) m / p - m ) / j
* k k k
*
* is smallest when m is chosen to be one of the few integers
* k
* k
* over which (-1) m / p may range, i.e. roughly between
* k
* k d - 1 k d
* (-1) B / p and (-1) ( B - 1 ) / p .
* k k
* Having chosen m to be one of those few integers, we
* k
*
* observe that x = ( m q - m ) / p is tiniest when m
* k k k-1 k k-1
*
* is one of the integers nearest m q , and then
* k k
*
* d d - 1
* |x| < 1 / |p | =< (a few) / ( B - B - 1 ) .
* k
*
* Indeed, |x| may be much tinier than 1 / |p | if m q is
* k k k
*
* very nearly an integer. Finally, having chosen m and m
* k k-1
* k
* we obtain m = (-1) ( j m - j m ) ; provided this m
* 0 k-1 k k k-1 0
*
* lies in the range allowed for m we have a solution.
* Moreover, all solutions must be obtained in this way.
*
*
* Example
* --------
* B = 10, d = 4 so 1000 =< m =< 9999 .
*
* q = 200 / pi = 63.66197723 ...
*
*
* q i p j
* k | k | k | k | k |
* ----+---------------+-------+---------------+-------+
* 0 | 63.66197723 | 64 | 1 | 0 |
* | | | | |
* 1 | -2.958380585 | -3 | 2.958380585 | 1 |
* | | | | |
* 2 | 24.02724786 | 24 | -71.08174358 | 3 |
* | | | | |
* 3 | 36.70012985 | 37 | -2608.709219 | -71 |
* | | | | |
* 4 | -3.334776736 | -3 | 8699.462815 | 2630 |
* ----+---------------+-------+---------------+-------+
*
* K = 4 puts k
* 0.115 = 1000 / p =< (-1) m / p
* k k
* =< 9999 / p
* k
* = 1.149 .
*
* Thus
* m = 0 or 1 ,
* k
* and so
* m q = 0 or -3.3347 .
* k k
* Hence
* m = 1 or { -3 or -4 } .
* k-1
*
* Therefore
* m = j m - j m
* 0 k-1 k k k-1
*
* = -2630 or { 7819 or 10449 }
* -----
* And finally
* m q = -167431.0001 497772.9999 or 665204.001
* 0 ----------
* = n + x ,
*
* yielding two values since the third is slightly out of range.
*/
/*
* Global macro definitions.
*/
# define hex( double ) *(1 + ((long *) &double)), *((long *) &double)
# define sgn(a) (a >= 0 ? 1 : -1)
# define MAX_k 2500
# define D 56
# define MAX_EXP 127
# define THRESHOLD 2.22e-16
/*
* Global Variables
*/
int CFlength, /* length of CF including terminator */
binade;
double e,
f; /* [e,f] range of D-bit unsigned int of f;
form 1X...X */
/*
* This is the start of the main program.
*/
main ()
{
int k; /* subscript variable */
double i[MAX_k],
j[MAX_k]; /* i and j are continued fractions
(coeffs) */
/*
* Compute global variables e and f, where
*
* e = 2 ^ (D-1), i.e. the D bit number 10...0
* and
* f = 2 ^ D - 1, i.e. the D bit number 11...1 .
*/
e = 1;
for (k = 2; k <= D; k = k + 1)
e = 2 * e;
f = 2 * e - 1;
/*
* Compute the continued fraction for (2/e)/(pi/2) , i.e.
* q's starting value for the first binade, given the continued
* fraction for pi as input; set the global variable CFlength
* to the length of the resulting continued fraction (including
* its negative valued terminator). One should use as many
* partial coefficients of pi as necessary to resolve numbers
* of the width of the underflow plus the overflow threshold.
* A rule of thumb is 0.97 partial coefficients are generated
* for every decimal digit of pi .
*
* Note: for radix B machines, subroutine input should compute
* the continued fraction for (B/e)/(pi/2) where e = B ^ (D - 1).
*/
input (i);
/*
* Begin main loop over all binades:
* For each binade, find the nearest multiples of pi/2 in that binade.
*
* [ Note: for hexadecimal machines ( B = 16 ), the rest of the main
* program simplifies(!) to
*
* B_ade = 1;
* while (B_ade < MAX_EXP)
* {
* dbleCF (i, j);
* dbleCF (j, i);
* dbleCF (i, j);
* CFlength = dbleCF (j, i);
* B_ade = B_ade + 1;
* }
* }
*
* because the alternation of source & destination are no longer necessary. ]
*/
binade = 1;
while (binade < MAX_EXP)
{
/*
* For the current (odd) binade, find the nearest multiples of pi/2.
*/
nearPiOver2 (i);
/*
* Double the continued fraction to get to the next (even) binade.
* To save copying arrays, i and j will alternate as the source
* and destination for the continued fractions.
*/
CFlength = dbleCF (i, j);
binade = binade + 1;
/*
* Check for main loop termination again because of the
* alternation.
*/
if (binade >= MAX_EXP)
break;
/*
* For the current (even) binade, find the nearest multiples of pi/2.
*/
nearPiOver2 (j);
/*
* Double the continued fraction to get to the next (odd) binade.
*/
CFlength = dbleCF (j, i);
binade = binade + 1;
}
} /* end of Main Program */
/*
* Subroutine DbleCF doubles a continued fraction whose partial
* coefficients are i[] into a continued fraction j[], where both
* arrays are of a type sufficient to do D-bit integer arithmetic.
*
* In my case ( D = 56 ) , I am forced to treat integers as double
* precision reals because my machine does not have integers of
* sufficient width to handle D-bit integer arithmetic.
*
* Adapted from a Basic program written by W. Kahan.
*
* Algorithm based on Hurwitz's method of doubling continued
* fractions (see Knuth Vol. 3, p.360).
*
* A negative value terminates the last partial quotient.
*
* Note: for the non-C programmers, the statement break
* exits a loop and the statement continue skips to the next
* case in the same loop.
*
* The call modf ( l / 2, &l0 ) assigns the integer portion of
* half of L to L0.
*/
dbleCF (i, j)
double i[],
j[];
{
register double k,
l,
l0,
j0;
register int n,
m;
n = 1;
m = 0;
j0 = i[0] + i[0];
l = i[n];
while (1)
{
if (l < 0)
{
j[m] = j0;
break;
};
modf (l / 2, &l0);
l = l - l0 - l0;
k = i[n + 1];
if (l0 > 0)
{
j[m] = j0;
j[m + 1] = l0;
j0 = 0;
m = m + 2;
};
if (l == 0)
/*
* Even case.
*/
if (k < 0)
{
m = m - 1;
break;
}
else
{
j0 = j0 + k + k;
n = n + 2;
l = i[n];
continue;
};
/*
* Odd case.
*/
if (k < 0)
{
j[m] = j0 + 2;
break;
};
if (k == 0)
{
n = n + 2;
l = l + i[n];
continue;
};
j[m] = j0 + 1;
m = m + 1;
j0 = 1;
l = k - 1;
n = n + 1;
continue;
};
m = m + 1;
j[m] = -99999;
return (m);
}
/*
* Subroutine input computes the continued fraction for
* (2/e) / (pi/2) , where e = 2 ^ (D-1) , given pi 's
* continued fraction as input. That is, double the continued
* fraction of pi D-3 times and place a zero at the front.
*
* One should use as many partial coefficients of pi as
* necessary to resolve numbers of the width of the underflow
* plus the overflow threshold. A rule of thumb is 0.97
* partial coefficients are generated for every decimal digit
* of pi . The last coefficient of pi is terminated by a
* negative number.
*
* I'll be happy to supply anyone with the partial coefficients
* of pi . My ARPA address is mcdonald@ucbdali.BERKELEY.ARPA .
*
* I computed the partial coefficients of pi using a method of
* Bill Gosper's. I need only compute with integers, albeit
* large ones. After writing the program in bc and Vaxima ,
* Prof. Fateman suggested FranzLisp . To my surprise, FranzLisp
* ran the fastest! the reason? FranzLisp's Bignum package is
* hand coded in assembler. Also, FranzLisp can be compiled.
*
*
* Note: for radix B machines, subroutine input should compute
* the continued fraction for (B/e)/(pi/2) where e = B ^ (D - 1).
* In the case of hexadecimal ( B = 16 ), this is done by repeated
* doubling the appropriate number of times.
*/
input (i)
double i[];
{
int k;
double j[MAX_k];
/*
* Read in the partial coefficients of pi from a precalculated file
* until a negative value is encountered.
*/
k = -1;
do
{
k = k + 1;
scanf ("%E", &i[k]);
} while (i[k] >= 0);
/*
* Double the continued fraction for pi D-3 times using
* i and j alternately as source and destination. On my
* machine D = 56 so D-3 is odd; hence the following code:
*
* Double twice (D-3)/2 times,
*/
for (k = 1; k <= (D - 3) / 2; k = k + 1)
{
dbleCF (i, j);
dbleCF (j, i);
};
/*
* then double once more.
*/
dbleCF (i, j);
/*
* Now append a zero on the front (reciprocate the continued
* fraction) and the return the coefficients in i .
*/
i[0] = 0;
k = -1;
do
{
k = k + 1;
i[k + 1] = j[k];
} while (j[k] >= 0);
/*
* Return the length of the continued fraction, including its
* terminator and initial zero, in the global variable CFlength.
*/
CFlength = k;
}
/*
* Given a continued fraction's coefficients in an array i ,
* subroutine nearPiOver2 finds all machine representable
* values near a integer multiple of pi/2 in the current binade.
*/
nearPiOver2 (i)
double i[];
{
int k, /* subscript for recurrences (see
handout) */
K; /* like k , but used during cancel. elim.
*/
double p[MAX_k], /* product of the q's (see
handout) */
q[MAX_k], /* successive tail evals of CF (see
handout) */
j[MAX_k], /* like convergent numerators (see
handout) */
tmp, /* temporary used during cancellation
elim. */
mk0, /* m[k - 1] (see
handout) */
mk, /* m[k] is one of the few ints (see
handout) */
mkAbs, /* absolute value of m sub k
*/
mK0, /* like mk0 , but used during cancel.
elim. */
mK, /* like mk , but used during cancel.
elim. */
z, /* the object of our quest (the argument)
*/
m0, /* the mantissa of z as a D-bit integer
*/
x, /* the reduced argument (see
handout) */
ldexp (), /* sys routine to multiply by a power of
two */
fabs (), /* sys routine to compute FP absolute
value */
floor (), /* sys routine to compute greatest int <=
value */
ceil (); /* sys routine to compute least int >=
value */
/*
* Compute the q's by evaluating the continued fraction from
* bottom up.
*
* Start evaluation with a big number in the terminator position.
*/
q[CFlength] = 1.0E + 30;
for (k = CFlength - 1; k >= 0; k = k - 1)
q[k] = i[k] + 1 / q[k + 1];
/*
* Let THRESHOLD be the biggest | x | that we are interesed in
* seeing.
*
* Compute the p's and j's by the recurrences from the top down.
*
* Stop when
*
* 1 1
* ----- >= THRESHOLD > ------ .
* 2 |j | 2 |j |
* k k+1
*/
p[0] = 1;
j[0] = 0;
j[1] = 1;
k = 0;
do
{
p[k + 1] = -q[k + 1] * p[k];
if (k > 0)
j[1 + k] = j[k - 1] - i[k] * j[k];
k = k + 1;
} while (1 / (2 * fabs (j[k])) >= THRESHOLD);
/*
* Then mk runs through the integers between
*
* k + k +
* (-1) e / p - 1/2 & (-1) f / p - 1/2 .
* k k
*/
for (mkAbs = floor (e / fabs (p[k]));
mkAbs <= ceil (f / fabs (p[k])); mkAbs = mkAbs + 1)
{
mk = mkAbs * sgn (p[k]);
/*
* For each mk , mk0 runs through integers between
*
* +
* m q - p THRESHOLD .
* k k k
*/
for (mk0 = floor (mk * q[k] - fabs (p[k]) * THRESHOLD);
mk0 <= ceil (mk * q[k] + fabs (p[k]) * THRESHOLD);
mk0 = mk0 + 1)
{
/*
* For each pair { mk , mk0 } , check that
*
* k
* m = (-1) ( j m - j m )
* 0 k-1 k k k-1
*/
m0 = (k & 1 ? -1 : 1) * (j[k - 1] * mk - j[k] * mk0);
/*
* lies between e and f .
*/
if (e <= fabs (m0) && fabs (m0) <= f)
{
/*
* If so, then we have found an
*
* k
* x = ((-1) m / p - m ) / j
* 0 k k k
*
* = ( m q - m ) / p .
* k k k-1 k
*
* But this later formula can suffer cancellation. Therefore,
* run the recurrence for the mk 's to get mK with minimal
* | mK | + | mK0 | in the hope mK is 0 .
*/
K = k;
mK = mk;
mK0 = mk0;
while (fabs (mK) > 0)
{
p[K + 1] = -q[K + 1] * p[K];
tmp = mK0 - i[K] * mK;
if (fabs (tmp) > fabs (mK0))
break;
mK0 = mK;
mK = tmp;
K = K + 1;
};
/*
* Then
* x = ( m q - m ) / p
* K K K-1 K
*
* as accurately as one could hope.
*/
x = (mK * q[K] - mK0) / p[K];
/*
* To return z and m0 as positive numbers,
* x must take the sign of m0 .
*/
x = x * sgn (m0);
m0 = fabs (m0);
/*
* Set z = m0 * 2 ^ (binade+1-D) .
*/
z = ldexp (m0, binade + 1 - D);
/*
* Print z (hex), z (dec), m0 (dec), binade+1-D, x (hex), x (dec).
*/
printf ("\
%08x %08x Z=%22.16E M=%17.17G L+1-%d=%3d %08x %08x x=%23.16E\n",
hex (z), z, m0, D, binade + 1 - D, hex (x), x);
}
}
}
}