Some notes on the interval arithmetic and its implementation,
accompanying a package written for lisp.
1. There are several ways of thinking about an interval I = [low,high].
a. This represents a single (unknown) number in that interval.
b. This represents a collection of all possible numbers in that interval.
c. A distinct interpretation of some kind of limit set.
As an example, lim(sin(x)) as x goes to
infinity, is [-1,1]. This answer is used by Mathematica and Maple.
It is cute, but does not conform to any conventional
mathematical definition of limit. (e.g. the usual
epsilon-delta version: the difference between a specific
value and that interval must be at least 1, and it is not a limit.)
d. Intervals can be ``extended'' in various ways, e.g. exterior
intervals [a,b] where bb.
e. variations on open and closed endpoints.
f. encodings of empty or undefined intervals. e.g. real solution to $x^2=-1$ is empty,
2.Interval arithmetic
operations tend to enlarge the interval's uncertainty. Because of the
growing uncertainty encoded in the value the interval, many
potentially interesting calculations end up with [-oo,oo]
answers. (Well-known iterations, e.g. Newton, have special versions to
allow them to converge in spite of such widenings.)
We needed to make special efforts to try to keep intervals
no wider than necessary.
3. Dependency. In a computer implementation it is possible to
identify two different identifiers (or variables) x, y, as "pointers"
to the single occurrence of an interval, say [-1,1]. That is, the
data itself can be identified with a particular location in memory.
It may be appropriate, under interpretation 1a, to think of the value
of x-y as zero. While one does not know the value of x or value of y,
one could argue that the values are identical and thus x-y is zero.
Alternatively, one could store the values for x,y, each [-1,1] in two different
locations in memory, in which case the values are presumably independent,
and x-y would be some number in [-2,2], rather than zero. Under interpretation
1b it is arguable that x-x is also [-2,2].
If one is unable to tell if two numbers are identical (same location)
or isomorphic (different location but components are numerically
equal), the second interpretation should prevail since it
encloses the former range and is thus "guaranteed" though possibly
pessimistic. The dependency issue is not resolved by ``location''
comparisons because $2x$ will be stored in a different location from
$x$, and yet $2x-x-x$ should be zero.
Interval computations benefit from simplifications in which
the occurrences of variables are minimized in number. That is,
$y \times y$ is inferior to $y^2$. In the case $y=[-1,1], the
first value might be computed in a plodding way as [-1,1] whereas
know that the same value is squared narrows the result to [0,1],
which is much preferable.
The same idea of reducing the occurrences of an unknown to take
advantage of correlation, can be followed in other computations.
x/(x-z) can be rewritten as 1/(1-z/x).
sin(x)^5 - 10*cos(x)^2*sin(x)^3 + 5*cos(x)^4*sin(x) looks like it
can be bounded by [-16,16], be when it is rewritten as
sin(5*x), we see that [-1,1] is good enough.
4. What other techniques should be considered?
Reducing intervals. (Sharpening them.)
{First note that if Mathematica's Maximize and Minimize programs work
``exactly'', all we need to do is compute the max and min of a function
relative to the constraint that its argument(s) are within the
specified interval(s), to determine the lower and upper bounds of the
result. Thus to produce a sharp interval (i.e. both upper and lower
boundsd are achieved), consider this:
inteval[r_,Interval[{lo_,hi_}]]:=
Interval[{Minimize[{r[x],lo<=x<=hi},x][[1]],
Maximize[{r[x],lo<=x<=hi},x][[1]]}]
While it is not stated how effective Mathematica's programs really are,
we tried
s[z_] := Sin[z] + Cos[z];
inteval[s, Interval[{-10, 10}]];
This gives a messy expression like
$$[ \cos (2\,
\arctan (1 + {\sqrt{2}})) -
\sin (2\,\arctan (1 + {\sqrt{2}}))~,~
\cos (2\,\arctan (1 - {\sqrt{2}})) -
\sin (2\,\arctan (1 - {\sqrt{2}}))]$$
which simplification reduces to
$$[-\sqrt(2),\sqrt(2)]$$
which is exactly right.
Given an interval with floating-point components, Mathematica will
generally give an answer that is also a float. That is, Mathematica
will abandon exactness given a hint, and in the cases we've tried, will
run faster.}
5. Can we solve this problem
more effectively given the simple nature of the contraints?
Consider the evaluation of a polynomial $p(x)$ with real coefficients.
Here's one proposal: to evaluate $p(I)$, compute a list L
of (rigorous intervals around) the zeros of $p'(I)$. Find the relative extrema
of $p$ in those neighborhoods on the list L, as well as at the values of
$p$ at the upper and lower bounds of I. Given this collection, find
the min and max for $p(I)$.
The details of this computation are not always pretty because it
depends on accurately finding intervals around each zero. This is a
well-studied problem for polynomials, but is not a computation undertaken
lightly. And for non-polynomials, the techniques are not sure-fire at
all. We revisit this later.
6. Maybe there should be a general "de-optimization" technique, rather the
opposite of the compiler optimization for copy propagation; values
that are known should not be copied into later arithmetic instructions;
rather the dependences should be as explicit as possible.
Example:
a:= [-1,1];
b:= a*a+a
Constant folding gives the wrong optimization for intervals
b := [-1,1]*[-1,1]+ [-1,1]. This gives [-2,2]
Better is algebraic simplification to make
b:= a^2+a; This gives [0,1]+[-1,1] = [-1,2]
or since p(a)=a*(a+1) has derivative 2*a+1 which has a zero at a=-1/2,
test p(-1/2)= 1/4-1/2 = -1/4,
p(-1) = 0
p(1) = 2
So the value of b must be between -1/4 and 2. [-1/4,2] is
tighter than the other bounds, and "sharp" as a matter of fact.
As indicated earlier, any method requiring that the roots
of a polynomial's first derivative be computed can have problems
for ill-conditioned polynomials. Low-dimension polynomials with
modest coefficients do not cause trouble, and if one needs to
handle more difficult cases, software has been developed for
the reliable location and refinement of zeros using high-precision
calculations (e.g. ALIAS, MPFI).
See
http://www1.elsevier.com/homepage/sac/cam/mcnamee/12.htm
for over 100 interval references from McNamee's bibliography
on roots of polynomials.
We don't really need to find the zeros of the first derivative to
do better than naive evaluation.
Here's one way to think about a better way: Just divide the interval
[-1,1] into (say) 100 pieces and evaluate the expression on each of
these smaller intervals. On most of them the function p will not
wander very far. Taking the minimum and maximum over all these small
intervals we get a lower bound of -11/25. For 1000 pieces, the lower
bound is -0.252. The upper bound is 2, which again is tight. So
subdivision succeeds. Here's a Mathematica program
subdiv[Interval[{a_, b_}], n_] :=
Module[{del=b-a},
Table[{a + (i*del)/n, a + ((i + 1)*del)/n}, {i, 0, n - 1}]];
inteval[f_, q_, z_] := (* evaluate f on an interval q, z subdivisions *)
Module [{sd = subdiv[q, z]},
Interval[{Min[f[sd]], Max[f[sd]]}]]
r[a_]:=a^2+a;
inteval[r, Interval[{-1,1}], 100]
One advantage of this approach is we can plot a function over 100 such
intervals and be sure to not miss a very narrow peak, something that
might have been overlooked by merely sampling. (We described this
approach to "honest plotting" in a paper in ISSAC 1992.) If we are
merely looking to contain the extrema, we are looking at many boring
intervals. We can write a better program to only subdivide the
sub-intervals within which we believe the function f attains its maximum or
minimum; with any luck far fewer samples should be necessary.
(* heuristic *min of f on an interval, count= say 5 *)
mini[f_, s:Interval[{a_, b_}], count_] :=
If[count == 0, Min[s],
Module[{m,loint,hiint,f1,f2},
m = (a + b)/2;
loint = Interval[{a, m}];
hiint = Interval[{m, b}];
f1=Min[f[loint]];
f2=Min[f[hiint]];
If [ f1f2,
(ans= maxi[f, loint, count - 1];
other= f2;
otherint=hiint),
(ans= maxi[f, hiint, count - 1];
other= f1;
otherint=loint)];
(* Maybe we need to search other branch? *)
If [ans>=other, ans, Max[ans,maxi[f,otherint, count-1]]]]]
How does the times compare with Mma's Minimize /Maximize?
This depends on the stopping criterion (eg. setting of count),
the distribution of (bogus or actual) maxima/minima in different
regions, the cost of repeated evaluations.
Should this be done at all considering the extra effort
compared to the relatively low cost for the naive interval evaluation?
In some of our experiments if the function f is memoized, time can
be saved. When does f get called again on the same arguments: when
calls from maxi and mini are made with the same interval arguments.
This happens when the the algorithm has no idea where the maxima and minima
are, and tries essentially every interval.
While it appears that the time for this bisection should be
proportional to the count n, it is probably not so favorable. Because
the {\em other} part of the bisection is examined at least some of the
time, the actual number of evaluations can be much higher; a worst
case is $2^n$ likely this is achieved for a sum of highly oscillatory
functions that cancel each other.
About interval comparisons. We can ask for relationship
I" means
certainly or possibly, whether empty intervals should cause
exceptions, etc.) Rounding given different thread setting
seems a troublespot.
Other issues: add to interval.lisp programs for correct
intervals for sin, cos, tan, exp, log, etc.
need to do stuff for comparision, <, >, union, intersection,
hi, low, intervals with infinity as bounds.
gently rounding up and down.
What about constraint satisfaction programming using intervals?
Does that have any relationship here?
..............
Hacking on mathematica function (less interesting since now written in lisp ....)
mini[f_, s:Interval[{a_, b_}], count_] :=
mini2[f, a, b, f[s], count]
mini2[f_, a_, b_, fval_, 0]:= Min[fval]
mini2[f_, a_, b_, fval_, count_]:=
Module[{m,f1,f2,mf1,mf2,ans},
mf1=Min[f1=f[Interval[{a, m=(a+b)/2}]]];
mf2=Min[f2=f[Interval[{m, b}]]];
If [ mf1mf2,
(ans=maxi2[f, a, m, f1, count-1];
If [ans>=mf2,
ans,
Max[ans,maxi2[f,m,b,f2, count-1]]]),
(ans=maxi2[f, m, b, f2, count-1];
If [ans>=mf1,
ans,
Max[ans,maxi2[f,a,m,f1, count-1]]])]]
..............
see h:/papers/interval.tex re Taylor and Affine centered forms, quadratic completing the square, references..
June, 2006 RJF
see ninterval.lisp and minterval.lisp for implementations.