%latex
\documentclass{article}
\usepackage{fullpage}
\title {Building Algebra Systems by Overloading Lisp}
%% BASOL ??
\author {Richard Fateman\\
Computer Science\\
University of California\\
Berkeley, CA, USA}
\begin{document}
\maketitle
\begin{abstract}
Some of the earliest computer algebra systems (CAS) looked like
overloaded languages of the same era. FORMAC, PL/I FORMAC, Formula Algol,
and others, each took advantage of a pre-existing language base and
expanded the notion of a {\em numeric value} to include {\em
mathematical expressions}. Much more recently, perhaps encouraged by
the growth in popularity of C++, we
have seen a renewal of the use of overloading to implement
a CAS.
This paper makes three points. 1. It is easy to do overloading in
Common Lisp, and show how to do it in detail. 2. Overloading {\em per
se} provides an easy solution to some simple programming problems. We
show how it can be used for a ``demonstration'' CAS. Other simple and
plausible overloadings interact nicely with this basic system. 3. Not
all goes so smoothly: we can view overloading as a case study and
perhaps an object lesson since it fails to solve a number of
fairly-well articulated and difficult design issues in CAS for which
other approaches are preferable.
\end{abstract}
\section{Introduction}
Given a programming language intended to manipulate primarily numbers,
arrays of numbers, and other data objects such as strings, we can
consider extending the language to include a new set of objects---let
us initially consider them ``uninterpreted symbols''---e.g. $x$, $y$,
$z$. We then extend to language to allow using these symbols and
``algebraic expressions'' involving them in as many contexts as
possible. In particular these expressions can often be used in place of
numbers. Just as we can routinely add $2+2$ to get $4$, the extended
language should be able to add $x+x$ to get $2 x$. The nice analogy
may break down in any of several directions. For example, if you wish
to perform a loop exactly $s^2+3$ times, then $s^2$ must actually be an
integer.
Another hazard appears in comparisons when the program must
resolve whether $(x-y)(x+y)$ is
equal to $x^2-y^2$, when neither is, in fact, a number.
Does equality mean ``identical in form'' or
``mathematically equivalent''? A numerical program which, incidental to solving a
linear system looks for a ``non-zero'' pivot (or a ``large'' pivot)
may not be simply run, unmodified, on symbolic expressions.
{In addition to such serious issues, there are a plethora of minor
issues, which must be balanced against the considerable benefits that
accrue simply by allowing much of the design and implementation of a
mature language to be inherited. When the symbolic extension idea has
nothing special to contribute, the default traditional programming language semantics
take advantage
of compiler optimizations, etc.
Not only (simple fixed-length) arithmetic
but other aspects of the host language: procedures, scope,
linkage to subroutine libraries, array indexing, loops, error
handling are still present.
This unchanged base is a key advantage to designers facing the burden
of a CAS design. It relieves the CAS designer of many decisions by
providing an immediately available default behavior in the host
language. These behaviors may have benefited from an initial design
by experienced language experts, and in some cases may have evolved
substantially through informed criticisms by implementers and users,
as well as the occasional standardization committees.
The more elaborate CAS view of (say)
addition and multiplication must ``shadow'' the standard arithmetic,
{\em but only when it comes to manipulation of mathematical symbols and
expressions.}
The overloading mechanism does not determine the meaning of
operations solely in the symbolic expression domain, it only makes
it possible to append this to the previous domain.
Some of this is relatively arbitrary; this point is made nicely in the introductory
computing text by Abelson and Sussman \cite{SICP} that data-directed
abstraction solves interaction between relatively independent modules.
Other design decisions must be made when operators have ``one foot in
each camp.''
{Returning again to the text by Abelson and Sussman \cite{SICP}
we can find a system that is similar in spirit to what we will
discuss here:
it uses generic arithmetic with tagged objects to build an arithmetic system
in Scheme/Lisp, starting with integers and rationals, to include complex
numbers (in two forms), and eventually a polynomial algebra system. This
text also explains the failure of a pure ``tower'' data structure to accomodate
the abstractions needed, a point revisited below.}
There are several downsides to trying to graft a CAS to a primarily
numerical language. Most such programming languages are insufficiently
dynamic to, say, harbor the notion of a mutable run-time collection of
variables. Consider how you might approach, in C or Java the
construction at runtime of a {\em new symbol variable} that didn't
appear in the program text. Could you set it to a value, and then use
it as an array index? Or in one of these languages how
could you construct a subroutine whose body, say $\sin(1/(x+1))$ is itself
expression which is the result of a symbolic computation?
While some of these issues can be overcome to some extent by using a
more flexible base language (like Common Lisp), there are still
mismatches to a more advanced notion of what a CAS should handle. In
particular, a serious concern is that the rich relationships among
mathematical algebraic constructs (groups, rings, fields, polynomials,
etc) simply cannot be encoded in any plausible way in the usual base
language. These difficulties suggest that we should have a design for
a language explicitly representing such issues, a language such as Axiom
\cite{jenks}. This criticism may strike you as being unredeemably
vague, so we will propose a few illustrations---certainly on the
simple side, but perhaps lending some clarity to this issue, before
going further with our overloading explanations.
Let us make up a class we will call PCR[x], of polynomials in the
variable $x$ over the complex rationals, where each instance is
represented as a vectors of coefficients of pairs of rational numbers
(real and complex part) and where a rational number itself is a pair
of integers (numerator and denominator), and each integer is
potentially of arbitrary length. How does $z$, the zero polynomial in
PCR[x], which is arguably a vector of no elements, relate to the
complex number $0+0i$ the rational number $0/1$ or the integer 0? Are
they, for example, supposed to be the same {\em equal} data structure,
even though they are all different types and shapes? One is a 32-bit
integer, another is a pair of integers, one is an empty vector, etc.
Another example: When you add $a+b i$ to $1/2$ do you get $
(2\,a+1+2\,b\,i)/2$ or $(2a+1)/2+b i$ or something else?
Writing a program without clarifying these decisions is an invitation
to chaos. There is a choice in writing such programs of using any
programming language at hand, or using a system whose design
includes keeping track of such issues; this is the point of certain CAS,
most notably Axiom, a CAS implemented in Aldor (which was at least
initially implemented in Lisp).
% Others: MuPAD, Gauss subsystem in Maple, and the academic
%systems Newspeak (Foderaro), Weyl (Zippel).
}
\section{A bit of history}
\begin{quote}
We learn from history that we learn nothing from history.\\
\hskip 2.0 in --George Bernard Shaw
\end{quote}
%\begin{quote}
% Those who cannot learn from history are doomed to repeat it.
%\hskip{2in}--George Santayana
%\end{quote}
Authors of one system based on overloading C++ claim on their web
page, `` Its design is revolutionary in a sense that contrary to other
CAS it does not try to provide extensive algebraic capabilities and a
simple programming language but instead accepts a given language (C++)
and extends it by a set of algebraic capabilities.''
%[http://www.ginac.de/]
In fact, overloading, at least conceptually, is probably the oldest
approach for building a CAS. FORMAC was built as an overloaded
FORTRAN, and is probably the oldest computer algebra system (described
in a 1966 paper by Tobey \cite{tobey}). FORMAC was subsequently upgraded
to be an overloaded PL/I. Another historically interesting system was
Formula Algol (also 1966, see Perlis \cite{perlis}). For
a survey of these and other early systems see Hulzen\cite{hulzen}.
It is true that these early host languages did not necessarily
have extension
mechanisms for syntactically supporting overloading by writing
in the same language: new features were
added by writing subroutine libraries and changing or entirely
rewriting the compiler for the host language. Yet {\em to the user}, FORMAC
looked like FORTRAN (or PL/I) with some extra features. Systems of
macro-preprocessing programs for overloading languages (usually
FORTRAN) appeared only slightly later. A good example is the 1976
paper by Wyatt \cite{wyatt} which describes a use of the Augment
preprocessor for adding extended precision arithmetic.
The new aspect, if there is one, is that overloading has
been made easier. C++ (which emerged in period
1983-1985, and was standardized in 1988) has raised the level of
attention to overloading, including syntax extension,
by incorporating it into what became a popular host language. It is
not necessary to rewrite parts of the compiler.
\section{Why Lisp}
It turns out to be easy (subject to observing a few tricks) to do
overloading of arithmetic in ANSI Standard Common Lisp to accomodate
``new'' kinds of numbers. The usual argument for defining a new style
of generic arithmetic (for Lisp or other overloaded languages) is that
other programs, past, present and future in that language can directly
access these extensions with little fuss or bother. The analogy with
numbers does not always work, as we have already explained, but
opening up a different access route to programs already written in
Lisp is attractive. For example, one can easily write a numerical
quadrature program which can be handed a {\em symbolic} program and
can call a Lisp-language, free, open-source implementation of a
symbolic integration algorithm. Lisp has a long history of being able
to use dynamic loading of libraries even if they were originally
written in other languages; this advantage is no longer so striking as
more recently designed languages have adopted this technology.
%Thus overloading can include a scripting functionality for libraries
%including numerical subroutines, CAS routines written in C++, and
%other topics as well.
%graphics, web page management, and
This last argument does not really distinguish Lisp from Java or Python or C++,
but it lead us to a certain asymmetry.
Overloading ``+'' in C++ means that you can use {\tt a+b} for
{\tt StrangePlus(a,b)}. In Lisp the change is from {\tt (+ a b)} to
{\tt (StrangePlus a b)}, or even: {\tt (in-package StrangeArithmetic) (+ a b)}
and thus not so dramatic. But note that a C++ gcd algorithm would be invoked
as {\tt gcd(a,b)} or perhaps {\tt gcd(a,b,target)} but Lisp uses prefix for
{\em everything} and so uses {\tt (gcd a b)}.
There are more distinguishing characteristics. Some ideas that are
easy to implement in Lisp because it is far more dynamic in nature
these languages. For example, in Lisp a new program can be constructed
out of Lisp data, compiled into machine code on the spot and then
immediately called. We use this feature repeatedly and routinely in
setting up some of our generic arithmetic programs.
Each of the extension programs is small, but collectively they represent
a core which can be extended by a programmer who need not be a
specialist. A programmer should need only a short education, a brief
introduction, and a few examples.
It is also convenient in some cases that Lisp supports the mathematical
abstraction of integer (not integer modulo $2^{32}$), as well as
exact rational numbers, as well as floats, double-floats,
and complex numbers.
\subsection{Approaching Overloading Arithmetic in Lisp}
In Common Lisp one could (after ignoring the urgent pleas
of the evaluator that you are possibly doing grave damage!) just redefine
``+''. This is not a good idea: it is better to leave the standard
function there, but restrict its domain to ordinary numbers. To do this we
define a new {\tt package}, say
``GenericArithmetic'' or {\tt :ga} for short, in which the definition
of ``+'' in the {\tt common-lisp} or {\tt :cl} package is {\em
shadowed}. In the {\tt :ga} package we can then define methods for
``+'' of various objects (e.g. mathematical indeterminates or
expressions involving them.) After fiddling with various alternative
techniques, we found it is most convenient to create them as members of a class
of distinct objects, instances of a newly-defined type specific for
symbolic math. We call this type {\tt ma}. The operations such as
``+'' on such objects will default to the {\tt :cl} ``+'' when the
objects are simply Lisp numbers, but we are free to define them as
we wish for {\tt ma} or mixtures of {\tt ma} and numbers.
{\subsection{Lisp Arithmetic is N-ary} {A subtlety in the Lisp system
is that +, -, *, / all take a variable number of arguments. To build
these {\em n-}ary versions we decompose them into calls to
two-argument generic routines {\tt two-arg-+} etc., and provide default
cases if only one or zero arguments are supplied.
Since the set of methods for each of the arithmetic operators follows
a similar template except for the name of the operator and operations
concerned with the identity under the operation when one or zero
arguments are supplied, they can (and are) defined in one
macro-definition, used 4 times.}
A similar n-ary situation holds for comparison functions, where Lisp
requires that, for example, {\tt (< 1 2 3)} returns true, since the
operands are monotone increasing. We set out a situation in which {\tt
two-arg-<} is defined.
The code for all this is about 80 lines. Another 20 is used to declare
all the names of overloaded functions. But all this does is provide
an empty skeleton: all the arithmetic defaults to {\tt :cl}. Loading
a detailed package provides the ``worker'' functions for combining special
types and numbers.
Note that we have not said much about the arithmetic, except that
it inherits from Common Lisp. In building a CAS we might wish
to know that addition of elements of certain domains is commutative.
%As is common with most languages ....
%um, not for floats!
%[2] aml(61): (+ 1.0 -1.0 1.0e-20)
%1.0e-20
%[2] aml(62): (+ 1.0 1.0e-20 -1.0)
%0.0
{\subsection{How About Symbolic Math?}
The fundamental idea that we found vastly simplified our thinking for
overloading Lisp for ``symbolic mathematical'' computation is that we
made a distinction between these objects:
\begin{itemize}
\item The Lisp symbol
{\tt x}. This symbol can be quoted as {\tt 'x}, and has some set
of properties relating to its use in the Lisp programming language.
For example one can set it to a value by {\tt (setf x 3)}
or define it as a function by {\tt(defun x(u)(+ u 10))} or place pointers
to its symbol-table entry into a list of three symbols {\tt (list 'x 5 'x)}.
It also has a print-name and a package name.
\item
The mathematical symbol $x$.
We distinguish this mathematical object from anything else in
the programming system by wrapping it in a Lisp structure, {\tt ma}.
We must be explicit when we cross the boundary from ordinary Lisp to {\tt ma}:
Initially we make a math object $x$ by {\tt (ma 'x)}. We appear to cross
the boundary in the opposite direction,
from {\tt ma} to Lisp when we display an object as, say, a character string.
Actually, we just provide a printing method for {\tt ma} objects.
\end{itemize}
This realization was important in that the traditional temptation (rarely resisted) for
Lisp programmers is to make mathematical ``stuff'' out of ordinary Lisp symbols and lists.
It is still possible to do so,
but our view now is that memory is cheap and computing is fast; to be
careful, it is worthwhile to embed every result
in a {\tt ma} wrapper for ``public consumption.''
{Given this framework, we proceed to
implement the two-argument {\em symbolic} arithmetic operations piece by piece.
Adding one {\tt ma} object to another requires extracting their
contents, making a new list headed by + of the contents, and wrapping
it as a math object. {\tt (+ (ma 'x)(ma 'y))} is then equivalent to {\tt (ma '(+ x y))}.
Programming the addition of a number to a math object requires two methods,
depending on the order of the inputs. They work like this:
{\tt (+ (ma 'x) 3)} is changed to {\tt (ma '(+ x 3))}.
Given two numbers, naturally we just add them the
Common-Lisp way, by inheritance of the ordinary method. The {\tt :ga}
specifies that, and so that does not require any program.
Continuing the project to cover all arithmetic in Common Lisp, we
can implement all single-argument Lisp arithmetic functions by the same
macro-expansion operation if we agree that, for the moment, any
conventional operation (say, sin) on {\tt ma} objects will merely
result in an {\tt ma} object whose ``inside'' consists of {\tt (sin
x)}. }}
\subsection{Reading and Printing math}
{The programmable nature of the Lisp reading program
allows us to attach the {\tt ma} constructor to a special reserved single
character, and also make the quote mark unnecessary.
We choose \%, so
that {\tt \%(+ x y)} is analogous to the Lisp quote mark in {\tt '(+ x
y)} except instead of making a ``quoted object'' {\tt (+ x y)}, it
makes a {\tt ma} or ``mathematical object'' $x+y$.}
In our early version of the symbolic math package, a newly constructed
{\tt ma} object {\tt \%x} would display like this:
\verb|#S(ma :e x :simp nil)|. That is, its printed display would
reflect the fact that it was a Common Lisp {\tt ma} structure with two
components, the first of which is
the ``contents'' or expression or ``e'' part.
The second part we haven't mentioned before, but it is a ``simp'' component,
bound to a symbol or nil telling whether the ``e'' part
was simplified or not. For fun, we have provided a
print method for the type {\tt ma}--- a program that reaches into the form,
grabs the ``e'' component, and prints it as a string of characters
reflecting the math object in {\em infix} form. This is done by\\
{\tt(defmethod print-object ((a ma) st am)(format st "~a" (p2i (ma-e a))))\\ }
where {\tt p2i} is a program which tranverse an ordinary prefix lisp expression
in infix order, writing a string.
If the result for the above calculation is the math form
of {\tt (+ x y)},
then it is printed as {\tt x+y}. We can push this hack just a bit
further and output \LaTeX\ code instead, and insert it into this
paper. Thus we can print it as $x+y$.
Observe now that {\tt (+ 1 \%y)} produces $1+y$, but
{\tt (+ x \%y)} is an error: x is an unbound variable.\\
{\tt (+ 'x \%y)} is an error: we have no method ``+'' for objects of classes
({\tt symbol} and {\tt ma}).\\
This illustrates the important fact that the Lisp symbol {\tt x} is
not a stand-in for the mathematical object $x$. That is, they are
separate worlds unless we deliberately convert one to the other.
After shadowing all the (finite number of) arithmetic operations
defined in the Lisp language, we can declare success and go home.
We've used only about 100 additional lines of code (plus 20 lines
for declaring a package mostly importing from {\tt :ga}).
The prefix-to-infix translation is about 70 lines of code.
Actually, if our goal is to merely demonstrate the possibility of generic
arithmetic, we should quit at this point.
If we want it to be useful, we are not yet done.
\medskip
A potential user might try a few examples like\\
{\tt (+ 0 \%x)} and get $0+x$\\
{\tt (+ \%x \%x)} and get $x+x$.
These examples seem to us as not really satisfactory, especially if
the programs we intend to write have tests to see if a sequence of
arithmetic operations has reduced a math value to zero\footnote{Note,
reducing a number to zero with numeric operations works the same as
before.}. One direction is to apply a {\em simplification program for
mathematical objects}. We need to decide when to apply this operation:
One choice might be to simplify expressions only before printing. This
might be adequate for a demo, but for extensive calculation, each
construction of a {\tt ma} object from constituents would sit in
memory, and perhaps eventually just fill it up.
Also, if there were any tests, say subtracting two symbolic
expressions to test for equality, we might miss a calculation whose
complicated result is actually equivalent, after simplification, to
$0$. Recognizing this equivalence might be critical for terminating a loop.
For this reason we probably need for the simplification to be
done at least when expressions are compared. It might be prudent
to also simplify when the programmer requests it.
If we decide to simplify every time an operation is performed,
we could then replace methods of just constructing trees with
more sophisticated versions, say that {\tt(sin (* $n$ pi))} for
integer $n$ would be zero. Programming such methods takes us out of
the generic programming motif: how should should programs
be written? Some people advocate pattern matching, a
paradigm supported by Lisp, but not usually implemented by overloading.
We also need to know whether that
pattern would work, not only for $n=3$ but also for a situation in
which $n$ can be deduced to be ``an integer'' though {\em which one}
is unspecified. Simplification is discussed below.
\subsection{Simplification}
We provide a very brief survey of some of the possible strategies for
simplification.
\begin{itemize}
\item There are many variations on simplification. Some are essentially
``demoware'': easy to write, cheap, fast, and adequate for simple examples.
Some are more difficult to write, sometimes more expensive
to run, but generally far more effective. No method
is completely effective for ``all expressions''\footnote{There are ``undecidability''
theorems
to this effect.}.
\item We haven't defined ``effective''. One definition would require
that if $E(x)$ is 0 for all values of $x$, then an effective
simplifier must reduce $E(x)$ to zero. Except for expressions that are composed
of a very limited repertoire of operators, there are no effective zero-equivalence
algorithms.
\item Nevertheless, there are different methods that do more or less
depending on domains and the intentions for further processing.
Since one method does not dominate all others, it must be possible
to choose amongst them (or perhaps try them all!).
\item Even the cheap methods become slow if they are implemented naively.
\item To implement even the cheap methods as expressions scale up
in size, efforts must be made to mark ``already simplified'' subexpressions.
\end{itemize}
There is a substantial literature on the topic of simplification.
A {\em simple-minded simplifier} {\tt
simpsimp} we wrote initially for MockMMA and then modified for Tilu (1996),
has been further modified slightly to show how such a facility can be
integrated into this design. A second (polynomial and rational
function simplifier) from the same MockMMA code base provides {\tt ratexpand}
and {\tt ratsimp}. Each of these cancels common factors and
can determine algorithmically if a polynomial is zero or not.
The {\tt ratsimp} program generally keeps factors it encounters
explicitly when it can, which is a benefit to some manipulations. The {\tt ratexpand}
program provides a canonical form based on a ratio of ``expanded''
polynomials.
In this situation of multiple simplifiers, we use the ``simp'' part of the
{\tt ma} ``wrapper'' to mark that the object contained in
the {\tt e} field is simplified, and by which simplifer. We could
also record some settings
of parameters in the simplifier at the time this was
done, as well as other information that might be convenient at a later
time, including a list of the variables in {\tt e}.
\subsection{Evaluation}
The idea of evaluation is to associate some indeterminate,
say $x$ with a value, and use that value consistently
in an expression $E$ instead of $x$.
The value for $x$ could be, for example, the number 3 or the
math expression $y-45$. Evaluating $E= x-y+45$ with $x=y-45$ should provide
$(y-45)-y+45$, or after simplification, $0$.
One model for evaluation whose semantics is commendably clear, is that
of substitution for a specific variable. This model inserts a copy of
the new value in the expression $E$ each place there is an
occurrence\footnote {Technically, we should point out that
this is for free, not
bound occurrences.} of $x$. It might be a good idea to
re-simplify. We insist that the new and old ``values'' must be
mathematical objects, as distinct from Lisp symbols. Overloading the
conventional Lisp {\tt subst} program is a possibility, but it turns
out to have such a plethora of options so that it seemed prudent
to call the math
substitution program {\tt masubst}. Thus {\tt (masubst \%(- y 45) \%x
\%(+ x (- y) 45))} produces a value equivalent to $0$.
Another model, whose semantics tends to be murkier is to reconsider the
expression E, at all levels, identifying with each and every name in the
Lisp system, variables or functions, the corresponding math variables
or functions. Some of the murkiness is whether you really want to ``evaluate''
$\sin \pi$ using Lisp's value for $\pi$ in which case you get
1.2246063538223773d-16, but is only an approximation, or if you want
to {\em simplify} the same object to the exact value 0. In any case,
the function {\tt meval} implements this model.
\subsection{Input Parsing}
There are many parsers for ``infix math'' written in Lisp, and
almost any of them could be used in conjunction with this
system. As a sample, we show how to use a particular parser by
attaching a program to
a symbol (here we use \$) so that the rest of the line following
that symbol is parsed as infix mathematics into Lisp. By prefixing that
with a \%, we convert it to our ``math'' representation.
For example,
\begin{verbatim}
: (setf trythis %$sinx^2+45sin^2x+abcosc
)
sin(x^2)+a*b*cos(c)+45*sin(x)^2
: '$q(a,b,c):=-b+sqrt(b^2-4ac)/2a
(defun q (a b c)
(+ (- b) (/ (sqrt (- (expt b 2) (* (* 4 a) c))) (* 2 a))))
: $cos pi
-1.0d0
\end{verbatim}
The input for this particular parser is closer
to spoken or written mathematics than (say) FORTRAN. First developed as
part of Tilu, it includes as a subset a traditional infix language, but
has enough additional ambiguous grammar rules for dealing with
adjacency of symbols that people who leave them out may be
pleasantly surprised to see the right thing sent back out to them.
Ambiguous input like {\tt a(x+y)} provides two parses,
the function application and the product{\tt a*(x+y)}. The user has an
opportunity to choose. The user may also choose to permit multi-character
identifiers or add new names for ``built-in'' functions. It is based on
Norvig's \cite{paip} context-free parser.
\section{The Downside of Overloading in Lisp}
The nature of arithmetic computation in Lisp is that, for the most part,
operators like ``+'' and ``*'' need know nothing about their operands
until after the operands are evaluated. Only then are their types
available for consideration. Therefore if you use Lisp operators
always for simple arithmetic on double-precision floats, but keep
this knowledge a secret from the Lisp compiler, Lisp will
still make arithmetic-type discrimination decisions at runtime
repeatedly and wastefully. Lisp supports a method to {\tt declare}
types in Lisp, in which case the compiler will produce fairly
conventional and fast assembly code. The tradeoff is that such program
and will not work correctly when fed (say) arbitrary-precision
integers. In our programs, the distinction between (say) a ``symbolic
math object'' and an ordinary Lisp integer must be made at runtime
because, in fact, the same operator program must deal with either kind
of argument, and ordinarily will not be able to tell which will occur
in advance.
By a certain amount of cleverness, some specialization may be applied
to eliminate or reduce runtime checking by the object system
compiler. A far more thorough approach to such specialization,
requiring essentially that we generate code specific to the types of
arguments, can be based on our ``application based''
knowledge. Inserting such code is facilitated by Lisp's
macro-expansion capability. By additional analysis at compile time,
when the full power of the Lisp system is also available, we can write
fully ``type-declared'' and specialized code.
{\section{Composing Overloads}
We have actually built various other overloads on top of generic
arithmetic before building the CAS features; later we came to see the
{\tt :ma} package as a good vehicle for explaining what we have done.
This section interweaves some of those concrete examples
especially showing the prospect of using {\em several types simultaneously}.
This is most likely to provide the positive demonstrations
needed to convince the reader of the value of this approach.
In these examples we show that combining special
operations {\em can be} as simple as writing programs that
simple compose the ordinary Lisp functions, and execute what appears
to be normal code, though in some respects it differs, using novel
operations.
Here is a recursive definition of Legendre polynomials of the first kind.
\begin{verbatim}
(defun p(m x)(cond ((= m 0) 1)
((= m 1) x)
(t (* (/ 1 m) (+ (* (1- (* 2 m)) x (p (1- m) x))
(* (- 1 m) (p (- m 2) x)))))))
\end{verbatim}
Whether this function takes symbolic or solely numeric arguments depends
on whether {\tt p} is defined in a package which inherits the symbolic
arithmetic operations.
We introduce the definition of {\tt p} here so we can use it in the next
section.
\subsection*{Automatic Differentiation (AD)}
In the model of arithmetic introduced for Automatic Differentiation
\cite{Griewank89,Griewank91}, an ordered pair $$, represents
the value of a function of $x$, say $f(x)$ at a particular (but
unstated) point $p$, i.e. $a=f(p)$. The second value in the pair
represents the derivative of $f$, also evaluated at $p$,
i.e. $b=f'(p)$ in the usual, but uncomfortable, notation for this
concept. We have written a package to compute with these pairs. The
operations are keyed about data objects which we manufacture using a
constructor {\tt df}. A more extensive discussion of AD including
elaborations to multiple variables, higher-order derivatives,
applications and algorithms, is beyond the scope of this short
paper. Nevertheless, here's an example of the interaction.
\begin{verbatim}
: (p 3 1/2) ;exact value of legendre[3,1/2]
-7/16
: (p 3 0.5) ;single-float value
-0.4375
: (p 3 (df 0.5 1.0)) ;single-float value, as well as derivative
<-0.4375, 0.375>
: (p 3 (df 0.5d0 1.0)) ;double-float value and derivative
<-0.4375d0, 0.375d0>
: (p 3 (df 1/2 1)) ;exact value and derivative
<-7/16, 3/8>
: (p 3 \%x) ;exact value in terms of x
(1/3)*((5*x)*((1/2)*((-1)+(3*x)*x))+(-2)*x)
: (simp (p 3 \%x)) ;exact value, somewhat simplified
(1/3)*((5/2)*x*((-1)+3*x^2)+(-2)*x)
: (simp (p 3 (df \%x 1)));exact value and derivative, simplified
<(1/3)*((5/2)*x*((-1)+3*x^2)+(-2)*x),
(1/3)*((-2)+15*x^2+(5/2)*((-1)+3*x^2))>
: (ratexpand (p 3 (df \%x 1)))
<(1/2)*((-3)*x+5*x^3), (1/2)*((-3)+15*x^2)>
;same answer, using canonical rational expansion
\end{verbatim}
There are a few more kinds of arithmetic overloads that have some
popular support and could easily be implemented on top of the {\tt
:ga} package. Some are indicated in additional sections below.
\subsection*{Intervals}
Real intervals $[a,~ b]$, represent all numbers $x$ where $a \le x \le
b$ for real numbers $a$ and $b$. Should $a$ and $b$ be allowed to be
{\tt ma} objects? Probably not unless those {\tt ma} objects also
happen to be specific real numbers like $\pi$ or $\sqrt{2}$ which
can be compared. Perhaps unions of real intervals
should be allowed, e.g. this is the case in Maple and Mathematica.
The literature on ``reliable computing'' has grown to describe related
techniques.
This can be combined with AD in particular for
computing a Newton-type iteration modified for intervals.
\subsection*{Bigfloats}
Arbitrary-precision floats or bigfloats are software-based extensions
of the floating-point idea, with fraction and exponent fields represented by
variable length fields. Various implementations
have been available for 35 years or more, but it appears that
one of the most widely used is the Gnu GMP package. These probably
fit well within the generic framework, although coercions in combination
with exact numbers or floats of different precision must be developed.
Within the bignum design there are several possible variations that
could be encoded, including fixed-size fraction and extra-large
exponent, or a sum-of-floats representation.
We have interfaced Lisp to the GNU multiprecision package (GMP) for
several projects in the past, and the arithmetic presents few
difficulties. The memory management is more of a puzzle, since we need
to consider when the memory for number which has been allocated by GMP
should be returned. We have implemented for this to be done by a Lisp
garbage collector, although not within the confines of the ANSI
standard.
\subsection*{Extended Rationals}
In an unpublished 1994 paper \cite{extrat}, we proposed to use the rational
forms 1/0 and 0/0 to extend the rational arithmetic in Common
Lisp. We programmed a ``projective'' model including an un-signed infinity and
a ``not a number''. This is useful for example in
allowing the evaluation of $a+b/c$ to $a$ in the case that $c$ is $1/0$.
It is not necessary to signal an error. We have also programmed
an alternative ``affine''
model which has four extra elements, -1/0, +1/0, 0/0, and a negative zero, (0/-1).
(The ordinary 0 is ``positive'' zero.)
This is comparable to those features in the IEEE 754 floating-point
standard. We have written elsewhere \cite{extrat} about the implications of
this change, especially for use in interval arithmetic.
\subsection*{Matrices}
A version of matrix arithmetic is possible, simulating, in effect,
the arithmetic of Matlab or similar programs. The elements of the
matrices could be conventional complex-float numbers, or almost
anything else that supports (not necessarily commutative) multiplication
and addition.
\subsection*{Complex Numbers}
Common Lisp already supports complex arithmetic as represented by
pairs of rationals and floats. It does not allow extended rationals,
intervals, mathematical expressions, etc. Building a new version of
complex numbers with such extensions is functionally straightforward,
but defining some of the content is a challenge, since we would need
to carefully observe the elementary functions and their branch cuts'
behavior with respect to (say) infinite imaginary part. An alternative
representation in polar coordinates has some attraction as well. We
can even create a polar form whose modulus and arg are symbolic
expressions from the {\tt :ma} package. resolving combinations of
different forms may require reference to the expected type of the
eventual storage of the result, though Lisp does not require or even
expect that variables have types. } {\section {Comments on
Overloading and Symbolic Math} We have already mentioned the inability
of overloading {\em per se} to resolve questions of how to structure a
system for doing ``computer algebra''. Even in our small example, we
illustrated embedding symbolic math in the automatic differentiation
(AD) pairs of {\tt df} structures (which works), or the reverse. This
latter choice makes almost no sense mathematically, since AD pairs
must always be considered with respect to a particular variable.
Given this restriction, we didn't program it.
On the other hand, it would be trivial to write a
few more methods and {\em appear to do so}. Nothing in the generic
arithmetic framework would prevent it. Even among the programs we
wrote, we can ask if the existing methods in AD form, in some sense, a
complete coherent whole. Did we really do the right thing for computing
a derivative of $|x|$?
We cannot tell for sure: the tools of ``data structuring'', objects,
and an inheritance hierarchy are insufficient guidance for structuring
mathematical categories and their relationships, and thus overloading
a conventional language soon reaches limits.
Some of the more pressing advanced problems in applying CAS to
problems in applied mathematics require yet another, perhaps
orthogonal kind of reasoning, not about data types or mathematical
categories, but about relationships of values. For example dealing
with geometric or range assumptions in finding solutions to algebraic
or differential equations, domains of integration problems, or
parameter spaces in (for example) reasoning about matrices.
Some aspects of these issues may be solvable by solution space
decomposition based on systems of polynomial equations. Such
techniques fail when log, exp, or other functions are introduced to
the mix. These may require something akin to theorem proving for
resolution, and are at an even further remove from the overloading
technology discussed here.
While we are powerless to prevent programmers from constructing yet
another C++ overloaded library to provide CAS capabilities, we hope
that these comments will provide some insight in case it is
attempted. We are encouraged that some people have actually used
others' C++ library (Ginac in particular). Setting out to build a
``new'' library may take years, and history has shown than
much of that time and all of the funding can be occupied by
simply repackaging the ``easy'' parts. Here we have recycled code
that was originally written years ago.}
{\section {Conclusion}
Generic arithmetic can be easily supported in Common Lisp. We show
how it can support one application: symbolic mathematical expressions, and briefly
introduce automatic differentiation. These results are not
surprising, though we hope that our presentation has some novelties.
What is perhaps surprising is that no one has written this paper
earlier, since the Lisp language is such a fine host for this effort.
We also illustrate that, as neat as it can be for particular calculations,
the simple approach of build a CAS via ``libraries plus overloaded language''
does not provide a convenient framework for advancing important representation issues:
the typical object-oriented data structure design typically allows the construction
of nonsensical compositions. While it is possible to avoid the resulting difficulties,
the first step is to realize that these problems are ignored at one's peril.
}
\section* {Appendix G: Generic Arithmetic, ga, ma}
All the files are available in \verb|http://www.cs.berkeley.edu/~fateman/generic|.
The {\tt :ga code}
\begin{verbatim}
ga.lisp
;; Generic Arithmetic package
;; based on code posted on newsgroups by
;; Ingvar Mattsson 09 Oct 2003 12:16:09 +0100
;; code extended by Richard Fateman, November, 2005
(eval-when '(compile load) (load "packs"))
(provide "ga" "ga.lisp")
(in-package :ga)
(defun tocl(n) ; get corresponding name in cl-user package
(find-symbol (symbol-name n) :cl-user))
;; a tool to manufacture two-arg-X arithmetic for X=+,*, etc
(defmacro defarithmetic (op id0 id1) ;id0 = no args, id1 =1 arg
(let ((two-arg
(intern (concatenate 'string "two-arg-" (symbol-name op))
:ga ))
(cl-op (tocl op)))
`(progn
(defun ,op (&rest args)
(cond ((null args) ,id0 )
((null (cdr args)),id1)
(t (reduce (function ,two-arg)
(cdr args)
:initial-value (car args)))))
(defgeneric ,two-arg (arg1 arg2))
(defmethod ,two-arg ((arg1 number) (arg2 number))
(,cl-op arg1 arg2))
(compile ',two-arg)
(compile ',op)
',op)))
(defarithmetic + 0 (car args))
(defarithmetic - 0 (two-arg-* -1 (car args)))
(defarithmetic * 1 (car args))
(defarithmetic / (error "/ given no args") (car args))
(defarithmetic expt (error "expt given no args") (car args))
;; defcomparison is a tool to manufacture two-arg-X numeric comparisons.
;; CL requires comparison args to be monotonic. That is in lisp, (> 3 2 1) is true.
(defun monotone (op a rest)(or (null rest)
(and (funcall op a (car rest))
(monotone op (car rest)(cdr rest)))))
(defmacro defcomparison (op)
(let ((two-arg (intern (concatenate 'string "two-arg-"
(symbol-name op)) :ga ))
(cl-op (tocl op)))
`(progn
(defun ,op (&rest args)
(cond ((null args) (error "~s wanted at least 1 arg" ',op))
((null (cdr args)) t) ;; one arg e.g. (> x) is true
(t (monotone (function ,two-arg)
(car args)
(cdr args)))))
(defgeneric ,two-arg (arg1 arg2))
(defmethod ,two-arg ((arg1 number) (arg2 number)) (,cl-op arg1 arg2))
(compile ',two-arg)
(compile ',op)
',op)))
(defcomparison >)
(defcomparison =)
(defcomparison /=)
(defcomparison <)
(defcomparison <=)
(defcomparison >=)
(define-symbol-macro ga::* cl::*);we want to keep top-level symbols * + /
(define-symbol-macro ga::+ cl::+); for the R-E-P loop.
(define-symbol-macro ga::/ cl::/)
;; these two macros use ga::+, not cl:+, but should compute complex "places" more carefully
(defmacro incf (x &optional (delta 1)) `(setf ,x (+ ,x ,delta)))
(defmacro decf (x &optional (delta 1)) `(setf ,x (- ,x ,delta)))
(defun re-intern(s p) ;utility to "copy" expressions into package p
(cond ((consp s)(cons (re-intern (car s) p)
(re-intern (cdr s) p)))
((symbolp s)(intern (symbol-name s) p))
(t s))) ;nil, number, string, other
\end{verbatim}
{\tt ma.lisp}
\begin{verbatim}
rather than insert ma.lisp code here/ visit web site
\end{verbatim}
The programs for converting prefix to infix expressions (in ma.lisp).
\begin{verbatim}
rather than insert p2i.lisp code here/ visit web site
\end{verbatim}
Some interfaces and some tests. There is a simplifer (too long and
not interesting), in the file {\tt simpsimp.lisp} with a program in
it called {\tt simp.}
\begin{verbatim}
insert examples from ma.lisp code
\end{verbatim}
The file simpsimp is about 350 lines of code. The rational
expression (ratio of polynomials) simplifier
in several variables, is larger, but much ultimately both
faster and more effective. It is about 1000 lines of code.
%\section*{Appendix 3 Dcomp}
%\begin{verbatim}
%;; code for dcomp.lisp about 291 lines, available on request.
%;; a more serious version generating code for backwards diff
%;; version of AD is also available.
%\end{verbatim}
%Code for compiling expressions via AD .
%Code for backward differentiation.
\begin{thebibliography}{99}
\bibitem{SICP}
H. Abelson, G. Sussman, {\em Structure and Interpretation of
Computer Programs,} 2nd edition 1996. MIT Press. Full text
available on-line.
http://mitpress.mit.edu/sicp/full-text/sicp/book/book.html
\bibitem{fateman98}
``A Short Note on Short Differentiation Programs in Lisp, and
a Comment on Logarithmic Differentiation,''
{\em SIGSAM Bulletin Volume 32, Number 3}, Sept., 1998,pp.~2-7.
\verb|www.cs.berkeley.edu/~fateman/papers/deriv.pdf|.
\bibitem{extrat},
R.~Fateman and Tak Yan,
``Computation with the Extended Rational Numbers
and an Application to Interval Arithmetic'' 1994//
\verb|http://www.cs.berkeley.edu/~fateman/papers/extrat.pdf|.
\bibitem{overload-AD}
R.~Fateman, ``Building Algebra Systems by Overloading Lisp: Automatic
Differentiation'', (submitted for publication).
%\bibitem{bdiff}
%R. Fateman, ``Backward Automatic Differentiation in Lisp,''
%(in progress)
\bibitem{Griewank89}
A. Griewank, ``On Automatic
Differentiation,'' in M. Iri \& K. Tanabe (Eds.) {\em MATHEMATICAL PROGRAMMING,}
Kluwer Academic Publishers, 1989, pp.~83-107.
\bibitem{Griewank91} A. Griewank and G. Corliss (eds),
{\em Automatic Differentiation
of Algorithms}, SIAM, 1991. esp. paper by L. Rall
\bibitem{hulzen}
J.A. van Hulzen and J. Calmet, ``Computer Algebra Systems'' in
{\em Computer Algebra Symbolic and Algebraic Computation}, edited by
B. Buchberger, G.E. Collins et al, Srpinger- Verlag, 2nd ed., 1983.
\bibitem{jenks}
Richard D. Jenks and Robert S. Sutor,
{\em {\sc AXIOM:} The Scientific Computation System,}
Springer-Verlag, 1992.
\bibitem{paip}
P. Norvig, {\em Paradigms of Artificial Intelligence Programming},
Morgan Kaufmann, 1992.
\bibitem{perlis}
A. Perlis, Itturiaga, R., Standish, T. ``A Definition of Formula Algol,''
in {\em Proc. Symposium on Symbolic and Algebraic Manipulation of the ACM},
Washington, D.C., March, 1966.
\bibitem{tobey}
R.~G.~Tobey,
``Experience with FORMAC algorithm design'',
Comm.~ACM 9 no.~8 (1966) p.~589--597.
\bibitem{wyatt}
W. T. Wyatt, Jr. D.W. Lozier, and D.J. Orser,
``A Portable Extended Precision Package and Library with Fortran
Precompiler''
ACM Trans on Math Software ISSN:0098-3500
1976, vol. 2 no. 3 p. 209--231.
\end{thebibliography}
\end{document}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;leftover stuff
\begin{verbatim}
(defun run-newt1(f guess &key (count 18)) ;; Solve f=0
;; This program looks only at the sequence of guesses.
;; though if the derivative goes to 0, we are stuck
;; in the newton step.
(let ((guesses (list guess))
(reltol #.(* 100 double-float-epsilon))
(abstol #.(* 100 least-positive-double-float)))
(dotimes (i 6)(push (ni f (car guesses))
guesses)) ;; make 6 iterations.
(incf count -6)
(cond ((< (abs (car guesses)) abstol)
(car guesses))
((< (abs(/ (-(car guesses)(cadr guesses))(car guesses))) reltol)
(car guesses))
((<= count 0)
(format t "~%newt1 failed to converge; guess =~s" (car guesses))
(car guesses))
(t (run-newt1 f (car guesses) count)))))
;; to run this, write
;; (run-newt1 'sin 3.0d0) ;; converges to pi.
;; (run-newt1 'test 3.0d0);; fails to converge
\end{verbatim}
;; Automatic Differentiation code for Common Lisp
;; Richard Fateman, November, 2005
;; This is all provided in the context of a Generic Arithmetic Package.
;; Package based in part on code posted on comp.lang.functional newsgroup by
;; Ingvar Mattsson 09 Oct 2003
(defpackage :ga ;generic arithmetic
(:shadow "+" "-" "/" "*" "expt" ;binary arith
"=" "/=" ">" "<" "<=" ">=" ;binary comparisons
"sin" "cos" "tan" ;... more trig
"atan" "asin" "acos" ;... more inverse trig
"sinh" "cosh" "atanh" ;... more hyperbolic
"expt" "log" "exp" "sqrt" ;... more exponential, powers
"1-" "1+" "abs"
)
(:use :common-lisp))
(in-package :ga)
;; df structure for f,d: f is function value, and d derivative, default 0
(defstruct (df (:constructor df (f &optional (d 0)))) f d )
;; print df structures with < , >
(defmethod print-object ((a df) stream)(format stream "<~a, ~a>" (df-f a)(df-d a)))
;;function ARITHMETIC-IDENTITY: When fed an operator and a non-nil
;;argument, it returns a value for unary application. What does (+ a) mean?
;;A nil arg means there were NO operands. What does (+ ) mean.
;;It is used only by defarithmetic, which in turn helps
;; us to write out + * - / of arbitrary number of args.
(defmacro arithmetic-identity (op arg)
`(case ,op
(+ (or ,arg 0))
(- (if ,arg (two-arg-* -1 ,arg) 0))
(* (or ,arg 1))
(/ (or ,arg (error "/ given no arguments")))
(expt (or ,arg (error "expt given no arguments")))
(otherwise nil))) ;binary comparisons?
(defun tocl(n) ; get corresponding name in cl-user package
(find-symbol (symbol-name n) :cl-user))
(defmacro defarithmetic (op)
(let ((two-arg
(intern (concatenate 'string "two-arg-" (symbol-name op))
:ga ))
(cl-op (tocl op)))
`(progn
(defun ,op (&rest args)
(cond ((null args) (arithmetic-identity ',op nil))
((null (cdr args))(arithmetic-identity ',op (car args)))
(t (reduce (function ,two-arg)
(cdr args)
:initial-value (car args)))))
(defgeneric ,two-arg (arg1 arg2))
(defmethod ,two-arg ((arg1 number) (arg2 number))
(,cl-op arg1 arg2))
(compile ',two-arg)
(compile ',op)
',op)))
(defarithmetic +) ;; defines some of + programs. See below for more
(defarithmetic -)
(defarithmetic *)
(defarithmetic /)
(defarithmetic expt)
;; defcomparison helps us generate numeric comparisons: 2 args
;; and n-arg. CL requires they be monotonic.
;; That is in Lisp, (> 3 2 1) is true.
(defun monotone (op a rest)(or (null rest)
(and (funcall op a (car rest))
(monotone op (car rest)(cdr rest)))))
(defmacro defcomparison (op)
(let ((two-arg (intern (concatenate 'string "two-arg-"
(symbol-name op)) :ga ))
(cl-op (tocl op)))
`(progn
(defun ,op (&rest args)
(cond ((null args) (error "~s wanted at least 1 arg" ',op))
((null (cdr args)) t) ;; one arg e.g. (> x) is true
(t (monotone (function ,two-arg)
(car args)
(cdr args)))))
(defgeneric ,two-arg (arg1 arg2))
(defmethod ,two-arg ((arg1 number) (arg2 number)) (,cl-op arg1 arg2))
(defmethod ,two-arg ((arg1 df) (arg2 df)) (,cl-op (df-f arg1)(df-f arg2)))
(defmethod ,two-arg ((arg1 number) (arg2 df))(,cl-op arg1(df-f arg2)))
(defmethod ,two-arg ((arg1 df) (arg2 number))(,cl-op (df-f arg1) arg2 ))
(compile ',two-arg)
(compile ',op)
',op)))
(defcomparison >) ;;provides ALL the comparison methods
(defcomparison =)
(defcomparison /=)
(defcomparison <)
(defcomparison <=)
(defcomparison >=) ;; that's all
;; extra + methods specific to df
(defmethod ga::two-arg-+ ((a df) (b df))
(df (cl:+ (df-f a)(df-f b))
(cl:+ (df-d a)(df-d b))))
(defmethod ga::two-arg-+ ((b df)(a number))
(df (cl:+ a (df-f b)) (df-d b)))
(defmethod ga::two-arg-+ ((a number)(b df))
(df (cl:+ a (df-f b)) (df-d b)))
;;extra - methods
(defmethod ga::two-arg-- ((a df) (b df))
(df (cl:- (df-f a)(df-f b))
(cl:- (df-d a)(df-d b))))
(defmethod ga::two-arg-- ((b df)(a number))
(df (cl:- (df-f b) a) (df-d b)))
(defmethod ga::two-arg-- ((a number)(b df))
(df (cl:- a (df-f b)) (df-d (cl:- b))))
;;extra * methods
(defmethod ga::two-arg-* ((a df) (b df))
(df (cl:* (df-f a)(df-f b))
(cl:+ (cl:* (df-d a) (df-f b)) (cl:* (df-d b) (df-f a)))))
(defmethod ga::two-arg-*
((b df)(a number)) (df (cl:* a (df-f b)) (cl:* a (df-d b))))
(defmethod ga::two-arg-*
((a number) (b df)) (df (cl:* a (df-f b)) (cl:* a (df-d b))))
;; extra divide methods
(defmethod ga::two-arg-/ ((u df) (v df))
(df (cl:/ (df-f u)(df-f v))
(cl:/ (cl:+ (cl:* -1 (df-f u)(df-d v))
(cl:* (df-f v)(df-d u)))
(cl:* (df-f v)(df-f v)))))
(defmethod ga::two-arg-/ ((u number) (v df))
(df (cl:/ u (df-f v))
(cl:/ (cl:* -1 (df-f u)(df-d v))
(cl:* (df-f v)(df-f v)))))
(defmethod ga::two-arg-/ ((u df) (v number))
(df (cl:/ (df-f u) v)
(cl:/ (df-d u) v)))
;; extra expt methods
(defmethod ga::two-arg-expt ((u df) (v number))
(df (cl:expt (df-f u) v)
(cl:* v (cl:expt (df-f u) (cl:1- v)) (df-d u))))
(defmethod ga::two-arg-expt ((u df) (v df))
(let* ((z (cl:expt (df-f u) (df-f v))) ;;z=u^v
(w ;; u(x)^v(x)*(dv*log(u(x))+du*v(x)/u(x))
;; = z*(dv*log(u(x))+du*v(x)/u(x))
(cl:* z (cl:+
(cl:* (cl:log (df-f u)) ;log(u)
(df-d v)) ;dv
(cl:/ (cl:* (df-f v)(df-d u)) ;v*du/ u
(df-f u))))))
(df z w)))
(defmethod ga::two-arg-expt ((u number) (v df))
(let* ((z (cl:expt u (df-f v))) ;;z=u^v
(w ;; z*(dv*LOG(u(x))
(cl:* z (cl:* (cl:log u) ;log(u)
(df-d v)))))
(df z w)))
;; A rule to define rules, a new method for df, the old method for numbers
(defmacro r (op s)
`(progn
(defmethod ,op ((a df))
(df (,(tocl op) (df-f a))
(,(tocl '*) (df-d a) ,(subst '(df-f a) 'x s))))
(defmethod ,op ((a number)) (,(tocl op) a))))
;; Add rules for every built-in numeric program.
;; Must insert the name in the shadow list too.
;; This is just a sampler.
(r sin (cos x)) ;; provides EVERYTHING ADIL needs about sin
(r cos (* -1 (sin x)))
(r asin (expt (+ 1 (* -1 (expt x 2))) -1/2))
(r acos (* -1 (expt (+ 1 (* -1 (expt x 2))) -1/2)))
(r atan (expt (+ 1 (expt x 2)) -1))
(r sinh (cosh x))
(r cosh (sinh x))
(r atanh (expt (1+ (* -1 (expt x 2))) -1))
(r log (expt x -1))
(r exp (exp x))
(r sqrt (* 1/2 (expt x -1/2)))
(r 1- 1)
(r 1+ 1)
(r abs x);; hm does this matter?
(defun re-intern(s p) ;; move expression to :ga package
(cond ((or (null s)(numberp s)) s)
((symbolp s)(intern (symbol-name s) p))
(t(cl-user::cons (re-intern (car s) p)
(re-intern (cdr s) p)))))
(defun cl-user::deval(r x p &optional (dx 1))
(ga::deval (ga::re-intern r :ga)
(ga::re-intern x :ga) p dx))
;; factorial with "right" derivative at x=1 to match gamma function
(defun fact(x) (if (= x 1) (df 1 0.422784335098d0) (* x (fact (1- x)))))
;; an integer (arbitrary! ) factorial
(defun ifact(x) (if (= x 1) x (* x (ifact (1- x)))))
;; compare fact, ifact to Stirling's approximation to factorial
(defun stir(n) (* (expt n n) (exp (- n))
(sqrt (*(+ (* 2 n) 1/3) 3.141592653589793d0))))
(defun ex(x)(ex1 x 1 15))
(defun ex1(x i lim) ;; generate Taylor summation for exp()
(if (= i lim) 0 (+ 1.0d0 (* x (ex1 x (+ i 1) lim)(expt i -1)))))
'(defun ex1(x i lim) ;; generate Taylor summation exactly for exp()
(if (= i lim) 0 (+ 1 (* x (ex1 x (+ i 1) lim)(expt i -1)))))
;; what you say depends on which package is current in your
;; top-level read-eval-print loop
;; in :user (setf one (ga::df 1 1))
;; in :user (deval '(dotimes (i 10)(print (fact (+ i one)))) 'x 'irrelevant)
;; in :ga (setf one (df 1 1))
;; in :ga (dotimes (i 10)(print (fact (+ i one))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
#|Some timing results suggest that we lose about a factor of 10 in
speed just using generic arithmetic, even if we are using (say)
floating point numbers, if we are doing mostly calls and returns, but
only trivial arithmetic. Under these conditions, if we actually use df
numbers, it is perhaps a factor of two further.
The big slowdown here is in generic over the built-in (but still generic)
Lisp arithmetic that allows floats, doubles, rationals, bignums.
If we compile with declarations for fixnum types among the lisp built-ins,
we improved by another 30 percent.
(time (dotimes (i 10)(slowmul 1000000 2 0))) ;compiled 19.38s in :ga
(time (dotimes (i 10)(slowmul 1000000 2 (df 0)))) ; 24.45s in :ga
(time (dotimes (i 10)(slowmul 1000000 2 0))) ;compiled .37s in :user
(time (dotimes (i 10)(slowmulx 10000.0d0 2.0d0 0.0d0))); .38 in :user declared doubles
(time (dotimes (i 10)(slowmulx 1000000 2 0))) ; .29 in :user declared fixnums
(defun slowmul(x y ans)(if (= x 0) ans (slowmul (1- x) y (+ y ans))))
(defun slowmulx(x y ans)
(declare (fixnum x y ans) ;; or (double-float x y ans)
(optimize (speed 3)(safety 0)(debug 0)))
(if (= x 0) ans (slowmulx (1- x) y (+ y ans))))
|#
;; this is an interesting function that numerically computes (sin x)
(defun s(x) (if (< (abs x) 1.0d-5) x
(let ((z (s (* -1/3 x))))
(-(* 4 (expt z 3))
(* 3 z)))))
#| (s (df 1.23d0 1.0)) computes both sin and cos of 1.23.
(time (dotimes (i 100) (s (df 100000.0d0 1))) ) :ga 40ms
(time (dotimes (i 100) (s 100000.0d0 ))) :ga 20 ms
(time (dotimes (i 100) (s 100000.0d0 ))) :user 10ms av over more runs
;; we can compile this to run faster..
(defun ss(x) (declare (double-float x)
(optimize (speed 3)(safety 0) (debug 0)))
(if (< (abs x) 1.0d-5) x (let ((z (ss (* #.(/ -1 3.0d0) x))))
(-(* 4.0d0 (expt z 3))
(* 3.0d0 z)))))
(time (dotimes (i 100) (ss 100000.0d0 ))) :user 8ms av over more runs
|#
;; newton iteration (ni fun guess)
;; usage: fun is a function of one arg.
;; guess is an estimate of solution of fun(x)=0
;; output: a new guess. (Not a df structure, just a number
(defun ni (f z) ;one newton step
(let* ((pt (if (df-p z) z (df z 1))) ; make sure init point is a df
(v (funcall f pt))) ;compute f, f'
(df-f (- pt (/ (df-f v)(df-d v))))))
(defun ni2 (f z) ;one newton step, give more info
(let* ((pt (if (df-p z) z (df z 1)))
(v (funcall f pt))) ;compute f, f' at pt
(format t "~%v = ~s" v)
(values
(df-f (- pt (/ (df-f v)(df-d v)))) ;the next guess
v)))
(defun run-newt1(f guess &optional (count 18)) ;; Solve f=0
;; Look only at the guesses.
;; though if the derivative goes to 0, we are stuck
;; in the newton step.
(let ((guesses (list guess))
(reltol #.(* 100 double-float-epsilon))
(abstol #.(* 100 least-positive-double-float)))
(dotimes (i 6)(push (ni f (car guesses))
guesses)) ;; make 6 iterations.
(incf count -6)
(cond ((< (abs (car guesses)) abstol)
(car guesses))
((< (abs(/ (-(car guesses)(cadr guesses))(car guesses))) reltol)
(car guesses))
((<= count 0)
(format t "~%newt1 failed to converge; guess =~s" (car guesses))
(car guesses))
(t (run-newt1 f (car guesses) count)))))
(defun run-newt2(f guess &key (abstol 1.0d-8) (count 18)) ;; Solve f=0
;; Looks only at the residual.
(dotimes (i count
(format t "~%Newton iteration not convergent after ~s iterations: ~s" count guess))
(multiple-value-bind
(newguess v)
(ni2 f guess)
(if (< (abs (df-f v)) abstol) (return newguess)
(setf guess newguess)))))
;; dcomp
\begin{verbatim}
;;; -*- Mode: Lisp; Syntax: Common-Lisp; -*-
;; dcomp, use instead of /or with/ generic3.lisp for ADIL, automatic differentiation
;; Richard Fateman 11/2005
;; structure for f,d: f is function value, and d derivative, default 0
(defstruct (df (:constructor df (f &optional (d 0)))) f d )
(defmethod print-object ((a df) stream)(format stream "<~a, ~a>" (df-f a)(df-d a)))
(defvar *difprog* nil);; the text of the program
(defvar *bindings* nil) ;; bindings used for program
(defun dcomp (f x p)
;; the main program
(let ((*difprog* nil)
(*bindings* nil))
(multiple-value-bind
(val dif)
(dcomp1 f x p)
(emit `(values ,val ,dif))
`(let ,*bindings*
,(format nil "~s wrt ~s" f x)
,@(nreverse *difprog*)))))
;; Assist in compilation of a function f and its derivative at a point x=p
;; dcomp1 changes *bindings* and *difprog* as it munches on the expression f.
(defun dcomp1 (f x p)
(declare (special *v*))
(cond((atom f)
(cond ((eq f x) (values p 1.0d0))
((numberp f)(values f 0.0d0))
(t (let ((r (make-dif-name f x)))
(push r *bindings*)
(emit `(setf ,r 0.0d0)); unnecessary? already bound to 0.0d0
(push (cons f r) *lvkd*)
(values f r)))))
(t (let* ((op (first f))
(program (get op 'dopcomp)));otherwise, look up the operator's d/dx
(if program
(funcall program (cdr f) x p)
(error "~% dcomp cannot do ~s" f);; for now
)))))
(defun gentempb(r)
(push (gentemp r) *bindings*) (car *bindings*))
(defun emit(item)(unless (and (eq (car item) 'setf)
(eq (cadr item)(caddr item)))
(push item *difprog*)))
;; simple unary f(x) programs.
;;We define them all with the same template.
(defmacro dr(op s)
(setf(get op 'dopcomp) ;;define a rule for returning v, d for dcomp
(dr2 op (treefloat s))))
(defun dr2 (op s) ;; a rule to make rules for dcomp
`(lambda(l x p)
(cond ((cdr l)
(error "~&too many arguments to ~s: ~s" ',op l))
(t (let ((v (gentempb 'f))
(d (gentempb 't)))
(multiple-value-bind
(theval thedif)
(dcomp1 (car l) x p)
(emit (list 'setf v (list ',op theval)))
(emit (list 'setf d (*chk thedif (subst theval 'x ',s)))))
(values v d))))))
(defun treefloat(x) ;; convert all numbers to double-floats
(cond ((null x) nil)
((numberp x)(coerce x 'double-float))
((atom x) x)
(t (mapcar #'treefloat x))))
;; Here's how the rule definition program works.
;; Set up rules.
(dr tan (power (cos x) -2))
(dr sin (cos x))
(dr cos (* -1 (sin x)))
(dr asin (power (+ 1 (* -1 (power x 2))) -1/2))
(dr asin (power (+ 1 (* -1 (power x 2))) -1/2))
(dr acos (* -1 (power (+ 1 (* -1 (power x 2))) -1/2)))
(dr atan (power (+ 1 (power x 2)) -1))
(dr sinh (cosh x))
(dr cosh (sinh x))
(dr log (power x -1))
(dr exp (exp x))
(dr sqrt (* 1/2 (power x -1/2)))
;; etc etc
;; Next, functions of several arguments depending on x
;; These include + - * / expt, power
(setf (get '* 'dopcomp) '*rule)
(setf (get '+ 'dopcomp) '+rule)
(setf (get '/ 'dopcomp) '/rule)
(setf (get '- 'dopcomp) '-rule)
(setf (get 'power 'dopcomp) 'power-rule)
(setf (get 'expt 'dopcomp) 'expt-rule)
(defun +rule (l x p)
(let ((valname (gentempb 'f))
(difname (gentempb t)))
(multiple-value-bind (v d) (dcomp1 (car l) x p)
(emit `(setf ,difname ,d ,valname ,v)))
(dolist (i (cdr l))(multiple-value-bind (v d) (dcomp1 i x p)
(emit `(setf ,difname ,(+chk d difname)))
(emit `(setf ,valname ,(+chk v valname)))))
(values valname difname)))
(defun -rule (l x p)
(let ((valname (gentempb 'f))
(difname (gentempb t)))
(cond ((cdr l)
(multiple-value-bind (v d) (dcomp1 (car l) x p)
(emit `(setf ,difname ,d ,valname ,v)))
(dolist (i (cdr l))(multiple-value-bind (v d) (dcomp1 i x p)
(emit `(setf ,difname (- ,difname ,d)))
(emit `(setf ,valname (- ,valname ,v ))))) )
;; just one arg
(t (multiple-value-bind (v d) (dcomp1 (car l) x p)
(emit `(setf ,difname (- ,d) ,valname (- ,v))))))
(values valname difname)))
;; (- x y z) means (+ x (* -1 y) (* -1 z)). (- x) means (* -1 x)
(defun *rule (l x p)
(let ((valname (gentempb 'f))
(difname (gentempb t)))
(multiple-value-bind (v d) (dcomp1 (car l) x p)
(emit `(setf ,difname ,d ,valname ,v)))
(dolist (i (cdr l))(multiple-value-bind (v d) (dcomp1 i x p)
(emit `(setf ,difname ,(+chk (*chk v difname)
(*chk d valname))))
(emit `(setf
,valname ,(*chk v valname)))))
(values valname difname)))
(defun /rule (l x p)
(let ((vname (gentempb 'f))
(dname (gentempb t)))
(multiple-value-bind (v d) (dcomp1 (car l) x p)
(emit `(setf ,dname ,d ,vname ,v)))
(multiple-value-bind (v d) (dcomp1 (cadr l) x p)
(emit `(setf ,dname (/ (- (* ,v ,dname)(* ,vname ,d))
(* ,v ,v))
,vname (/ ,vname ,v))))
(values vname dname)))
(defun power-rule (l x p)
(cond ((cddr l) (error "~&too many arguments to power: ~s" l))
;; now we assume that everything is a-ok
;; i.e. power has only two arguments, the second argument
;; which is the power to be raised to is independent of x
;; so we can use for sqrt, cube-root etc.
(t (let ((vname (gentempb 'f))
(dname (gentempb t)))
(multiple-value-bind (v d) (dcomp1 (car l) x p)
;;; We could use +chk and *chk in the following emissions
;;; but they are really inessential.
(emit `(setf ,vname (power ,v ,(cadr l))))
(emit `(setf ,dname (* ,d (power ,v (1- ,(cadr l))) ,(cadr l)))))
(values vname dname)))))
(defun expt-rule (l x p)
"this is the general power rule where the exponent could be an
arbitrary function of x"
(cond ((cddr l) (error "~&too many arguments to expt : ~s" l))
((numberp (cadr l))(power-rule l x p))
;; we had z = (expt f g)
;; z' = z*(g*log f)' = z*(g*f'/f + g'*log f)
(t (let ((vname (gentempb 'f))
(dname (gentempb t)))
(multiple-value-bind (v d) (dcomp1 (cadr l) x p)
(emit `(setf ,vname ,v ,dname ,d)))
(multiple-value-bind (v d) (dcomp1 (car l) x p)
(emit `(setf ,dname ,(+chk `(/ ,(*chk vname d) ,v)
(*chk dname `(log ,v)))))
(emit `(setf ,vname (expt ,v ,vname)))
(emit `(setf ,dname ,(*chk vname dname))))
(values vname dname)))))
;; the next two programs, "optimize":
;; generated code so as to not add 0 or mult by 0 or 1
(defun +chk (a b)(cond ((and (numberp a)(= a 0)) b)
((and (numberp b)(= b 0)) a)
(t `(+ ,a ,b))))
(defun *chk (a b)(cond ((and (numberp a)(= a 1)) b)
((and (numberp b)(= b 1)) a)
((and (numberp a)(= a 0)) a)
((and (numberp b)(= b 0)) b)
(t `(* ,a ,b))))
;;;; this is the main program for compiling
(defun dc (f x &optional (otherargs nil))
;; produce a program, p(v) ready to go into the compiler to
;; compute f(v), f'(v), returning result as a structure, a df
(let ((*difprog* nil)
(*bindings* nil)
(*v* (gentemp "g")))
(declare (special *v*))
(multiple-value-bind
(val dif)
(dcomp1 f x *v*)
(emit `(df ,val ,dif))
`(lambda (,*v* ,@otherargs)
,(format nil "~s wrt ~s" f x)
;;; comment out these declares if you prefer v's type to be unknown
(declare (double-float ,*v*))
(declare (optimize (speed 3)(debug 0)(safety 0)))
; (assert (typep ,*v* 'double-float))
(let ,(mapcar #'(lambda (r)(list r 0d0)) *bindings*)
(declare (double-float ,@*bindings*))
,@(nreverse *difprog*))))))
;; A plausible way to use dc is as follows:
(defmacro defdiff (name arglist body) ;; put the pieces together
(progn
(setf (get name 'defdiff) name)
(let ((r (dc body (car arglist) (cdr arglist)))) ;; returns a df.
`(defun ,name , (cadr r) ,@(cddr r)))))
;; usage (defdiff f (x) (* x (sin x)))
;; Then you can call (f 3.0d0)
;; a few more pieces to allow inside defdiff: if, progn, setf.
;; we could try for a few more. Oh, > < = etc. work "automagically."
(defun if-rule (l x p)
(let ((valname (gentempb 'f))
(difname (gentempb t)))
(emit `(multiple-value-setq
(,valname ,difname)
;; assume only variables, not derivatives in condition
(if ,(subst p x (car l))
(funcall (function ,(dc (cadr l) x)) ,p)
(funcall (function ,(dc (caddr l) x)) ,p))))
(values valname difname)))
(defun progn-rule(l x p)
(mapc #'(lambda (k)(dcomp1 k x p)) (butlast l))
(dcomp1 (car (last l)) x p))
(defun setf-rule(l x p)
(if (not(symbolp (car l)))(error "dcomp cannot do setf ~s" (car l)))
(multiple-value-bind (v d)
(dcomp1 (cadr l) x p)
(emit `(setf ,(car l) ,v))
(let ((dname (make-dif-name (car l) x)))
(push dname *bindings*)
(emit `(setf ,dname ,d)))
(values v d)))
(defun make-dif-name(s x) ;s is a symbol. make a new one like s_dif_x
(intern (concatenate 'string (symbol-name s) "_DIF_" (symbol-name x))))
(setf (get 'if 'dopcomp) 'if-rule)
(setf (get 'progn 'dopcomp) 'progn-rule)
(setf (get 'setf 'dopcomp) 'setf-rule)
\end{verbatim}
{\subsection*{A Tangent}
Shouldn't AD (whether of Lisp or not) be widely used since it appears
to be so useful? Our understanding is that getting computational
scientists to adopt the relatively unfamiliar technique of AD at all,
rather than using finite differences or programming a separate
derivative algorithm, already presents a barrier. There are simpler
(but generally far less accurate and perhaps slower) ways of
approximating derivatives of ``programs'' with finite differences.
And further complicating {\em our} particular ``sales pitch'' is the
implicit assumption we make that the computation to be differentiated
was originally written in Lisp, or that people are willing to see
their programs translated (perhaps automatically) into Lisp. For a
variety of reasons people are far more likely to wish to differentiate
FORTRAN.
The computational scientists may be attracted to AD only after the
having written large programs (essentially treated as black
boxes). AD comes into the picture in using these programs as modules
in an optimization tool, or for sensitivity analysis. The optimization
tools then may require separate modules to compute values for
functions and derivatives. So a typical example might be a
``function'' {\tt F} from computational fluid dynamics that is defined
as a FORTRAN program (or a C program). The function might, for
example, represent a solution method for a differential equation. The
goal is to produce the moral equivalent of {\tt FPRIME}, or a function
that produces the pairs referred to earlier.
That's what the AD people have been trying to support,
ADIFOR, for FORTRAN, ADOL-C for C or C++, etc.
It would be wrong to think that the AD programs for FORTRAN, C, C++
generally work on ``black box'' programs in those languages,
automatically. They may reduce the programming effort to take a
program and produce a ``derivative'' program considerably, say
reducing the time from a year to a week. A visit to the previously
mentioned http://www.autodiff.org website has links to some
applications illustrating the level of human effort to build
automation tools, document them for others to use, and then the
continued effort and partnerships needed to use them. While the goal
has been to differentiate nearly any program written in FORTRAN or C,
it seems that some attention is required for success.
As for our own tools, we could, oddly enough, convert FORTRAN to
Common Lisp using a program {\tt f2cl} and try to differentiate that, and
we could easily print out equivalent FORTRAN or C (etc.).
However, to keep this paper brief, we start and end with the Lisp; the
integration of tools in Lisp is quite good, even if you are addicted
to graphical interactive development environments. We ordinarily
convert the AD code into assembly language without ever leaving the
Lisp system.
There may be additional issues arising in ADIL if it enjoys wider use,
but we are confident that the structure we have set out is consistent
with solving the forward-differentiation AD task for Lisp. The major
barrier may be convincing a computational scientist to write serious
numerical code in Lisp. (or to continue to work on the code when the
FORTRAN is translated to Lisp).
End of tangent.
}
;;; sections removed because they are about AD
{\section{Symbolic Differentiation and AD}
In this section we motivate the ``automatic differentiation'' (AD)
idea to follow. We contrast it with the notion of symbolic differentiation.
{The Lisp programming language has a number of strengths, prominent
among them the ease with which it can be used for prototyping other
programming languages. However, in brief surveys of languages
it is often characterized (or even dismissed) as
a language which is suitable for computing ``symbolic differentiation''.}
In fact, the compact representation for a symbolic differentiation
program is cited as one driving application for the original Lisp
language design. In a conventional setup for Lisp, $x \sin x \log x
+3$ would be written as {\tt (+ (* x (sin x) (log x)) 3)}\footnote{As
mentioned earlier, there are parsers from infix available.}
%you are uncomfortable with this notation, many parsers from more
%``conventional'' infix notation are available. The interface may be
%as simple as using a normal Lisp program text, but enclosing such
%infix expressions in markers, for example {\tt
%[x*sin(x)*log(x)+3]}. The output to this form is simple too.}.
A brief program\footnote{see Appendix 1} can differentiate this with
respect to {\tt x} to get
\begin{verbatim}
(+ (* (* x (sin x) (log x))
(+ (* 1 (expt x -1)) (* (* (cos x) 1) (expt (sin x) -1))
(* (* (expt x -1) 1) (expt (log x) -1))))
0)
\end{verbatim}
an answer which is correct but clumsy in appearance, even if it is
converted to more conventional infix:\\
{\verb|(x*sin(x)*log(x))*(1*x^(-1)+(cos(x)*1)*sin(x)^(-1)+(x^(-1)*1)*log(x)^(-1))+0|}.
A proper CAS would contain a (probably much
longer) simplification program to take the symbolic result and reduce
it to simpler terms, removing multiplications by 1 and additions with
0, perhaps cancelling factors, etc. resulting in something
like
$ \log x\,\sin x+\sin x+x\,\cos x\,\log x $
This differentiation program {\tt d} takes an expression and a symbol, and
computes a derivative. To be more precise and pedantic. the program
identifies a Lisp symbol say
{\tt x}
with a corresponding mathematical indeterminate, say $x$, and the expression
is based on a recipe which relates {\tt (sin x)} with $\sin x$, etc.
In the short course of this program, it does no harm to conflate
these concepts. In fact, the Lisp program is pretty good this way:
even setting {\tt x} to 3 does not ``confuse'' it into thinking it should
take the derivative with respect to 3, if that had any meaning. Lisp does
not mistake a symbol for its value.
The {\tt :ga} package can also implement a differentiation program by
(essentially) observing that the derivative operation {\tt d} commutes
with the {\tt ma} constructor. This extension of the method for {\tt
d} is applied only when it is given an {\tt ma} object. It unwraps the
(ordinary Lisp) expression in its inside by the function {\tt ma-e},
computes the derivative, and re-wraps it.
\begin{verbatim}
(defmethod d ((u ma)(var t)) ;define an additional d method for (ma, t)
(ma (d (ma-e u) var)))
\end{verbatim}
If the {\tt ma} constructor does not always call a simplifier, it
might be a good idea to call one explicitly here. This symbolic
differentiation is cute, but hardly earthshaking. A better one would
at least take some clue from the fact that a subexpression that does
not depend on the independent variable has a zero derivative, and
would not pursue such subparts. What is cute in this context is that
we have taken a Lisp program from an earlier world of Lisp modules and
re-used it with almost no work. That module was written without any
knowledge of this generic arithmetic package.
\section{Introduction to Automatic Algorithm Differentiation (AD)}
Automatic Differentiation or AD has a literature of its own,
mostly quite separate from Lisp, and it provides
features that are quite different from symbolic
differentiation\footnote{At least initially}. The mostly-true one-sentence summary:
AD will read the text of a FORTRAN program $f(x)$ that computes some result $k$ and
will write the text of a new program $g(x)$ that computes a pair: $k$ and $dk/dx$.
But, you may say, that's not really possible, is it? Well, to the
extent possible, AD tries to do it. Philosophically, all non-constant functions
on floating-point objects represent discrete (not continuous) mappings,
and so they have no derivatives. This detail is one of those ignored in practice.
For a complete introduction to the topic of Automatic Differentiation
as used in this paper we recommend a visit to the website {\tt
www.autodiff.org}. Here you can find a collection of links to the
standard literature, including AD programs and applications. There
are also links to recent conference publications and articles.
There are two major variants of AD techniques, forward and ``reverse''
differentiation. Although we have programmed both, in this paper
we will discuss only forward, because it fits into this overloading
concept. For brevity we have name our system ADIL for AD in Lisp.
{\subsection{A brief defense of Forward Differentiation for ADIL}
Consider a space of ``differentiable functions evaluated at a point
c.'' In this space we can represent a ``function $f$ at a point $c$''
by a pair $\langle f(c),~ f'(c)\rangle$. That is, in this system
every object is a pair: a value $f(c)$ and the derivative with respect
to its argument $D_x f(x)$, evaluated at $c$. (written as $f'(c)$).
For a start, note that every number $n$ is really a special case of
its own Constant function, $C_n(x)$ such that $C_n(x) = n$ for all
$x$. $C_3(x)$ is thus $\langle3,~ 0\rangle$. The constant $\pi$ is
$t_1 = \langle 3.14159265\cdots, 0.0\rangle$, which represents a
function that is always $\pi$ and has zero slope. The object
$t_2=\langle c,1\rangle $ represents the function $f(x)=x$ evaluated
at $c$. At this point we must be clear that all our functions are
functions of the same variable, and that furthermore we will be fixing
a point $x=c$ of interest. It does not make sense to operate
collectively on $\langle f(y),D_y f(y)\rangle$ at $y=a$ and $\langle
g(x),D_x g(x)\rangle$ at $x=b$.
For example, sin operating on $t_2$ is the pair $\langle \sin(c),
\cos(c)\rangle$. In general, $\sin(\langle a, a'\rangle)$ is $\langle
\sin(a)~,~\cos(a) \times a'\rangle$.
We can compute other operations unsurprisingly, as, for example the
sum of two pairs: $\langle a ,a'\rangle + \langle b , b'\rangle =
\langle a+b ~,~a'+b'\rangle. $ {\em Note, we have abused notation
somewhat: The ``+'' on the left is adding in our pair-space, the ``+''
on the right is adding real numbers. Such distinctions are important
when you write programs!} Similarly, the product of two pairs in this
space is $\langle a ,a'\rangle \times \langle b , b'\rangle = \langle
a \times b ,~ a\times b' + a' \times b \rangle. $ This can be
extended to many standard arithmetic operations in programming
languages, at least the differentiable ones\cite{Griewank91}. AD
implementors seek to find some useful analogy for other operations
which do not have obvious derivatives. Falling in this category are
most data structure manipulations, calls to external routines, loops,
creating arrays, etc.\footnote{It is a mistake to {\tt declare}
variables to be (say) double-floats, when in the ADIL framework they
will be {\tt df} structures. We are not aware of any other systematic
problems.} In ADIL, these mostly come free.}
\subsection{AD as Taylor series}
AD technology is not ``finding symbolic derivatives'' but from a CAS
perspective is closer to arithmetic with truncated Taylor series. AD converts
a {\em conventional program} computing with conventional scalar values
like $u$ to a {\em new AD program} operating pairs such as $p=\langle
u,~v\rangle$ representing that scalar function $u(t)$ and its
derivative $v(t)$ with respect to an implicit parameter, say $t$ at
some (presumably numeric) point $t=c$ In particular, a constant $k$
looks like $\langle k,~0\rangle$, and the parameter $t$ at the point
$c$ looks like $\langle c ,~1\rangle$. That is, $dt/dt=1$ As a Taylor
series, the pair $p$ represents $ s=u+v\,t+\cdots $. The equivalence
to computation with Taylor series is perhaps the clearest guidance as
to how to compute with these pairs. (Generally a CAS Taylor series
program can provide an appropriate result, at least for functions with
continuous derivatives.) The Taylor expansion of $\cos s$ is $ \cos
u-\sin u\,v\,t+\cdots $ and so $\cos p=\langle \cos u,~-v \times \sin
u\rangle$. Some programming operations have discontinuities; if their
derivatives are needed, some escape must be arranged: perhaps an error
is signalled, or more often a derivative is defined by some pragmatic
justification.
The extension of this AD idea to produce programs for higher
derivatives is straightforward although tedious. Here is how it could
work: we use triples, e.g. if the second derivative is called $w$: If
$p=\langle u,~v,~w \rangle$, compute the Taylor expansion of each
operator. See, for example, that $\cos p$ yields:
$$ \cos p = \cos u-\sin u\,v\,t-{{\left(\cos u\,v^{2}+2\,\sin
u\,w\right)\,t^{ 2}}\over{2}}+\cdots $$ The result triple consists of
the coefficients of different powers of $t$. For a given set of
values for $u$, the program computing those coefficients naturally
will need to compute $\sin u$ and $\cos u$ only once.
\subsection{ADIL limitations}
Let us be up-front about several limitations.
\begin{enumerate}
\item If you have a scientific program written in C or C++ or most
likely, FORTRAN, you will not be initially attracted to a tool that
works on Lisp programs. (Though there are translators from FORTRAN to
Lisp (f2cl).)
\item The AD computer program's running time should be no more
than a small multiple of the ordinary program. There is a way
of trading time for space by using ``reverse'' AD, but we will not
discuss that here. The overloading scheme is slower.
\item Neat things like arithmetic on exact integers or rationals
or symbols is not possible in FORTRAN. Neither is recursion (in
FORTRAN 77)
Supplying those features is only going to slow us down compared to
FORTRAN. Making a choice to use a program that is slower
is unpopular in the scientific computing community
\footnote{Benchmarking for speed, even acknowledging
that it is subject to abuse, is still more easily quantifiable than
say, correctness, generality, or modularity. Thus speed is the substitute for all good
qualities. And slow is therefore bad.
Thus programmers might write in ``C'' because it is faster than Java.
This would, logically, drive all programmers to write in assembler, but they don't.}.
\end{enumerate}
On the other hand, just because we don't expect anyone to
shift from FORTRAN to Lisp to use this package
does not prevent us from discussing the design, especially
since it is so much nicer than corresponding programs in
other languages.
%(Some of which
%AD in Maple has all the limitations (and then some), but appears to
%have one package, GRADIENT \cite{maple} and another ADrien
%\cite{adrien} described in publications.
ADIL has, among several major advantages, the obvious one of not
having to parse its own programs to represent them as data.
\subsection{ADIL implementation}
There are two fundamentally different
approaches to building a forward AD system, and a third that looks plausible,
at least briefly, for Lisp.
\begin{enumerate}
\item We can take a text for a program $P$ in (more-or-less) unchanged form and by
{\em overloading} each of its operations, make $P$ executable in another
domain in which scalars are replaced by pairs.
\item We can transform the source code of $P$ into another program in which
computations are ``doubled'' in a particular way to compute the original
function as well as the requested derivative.
\item A third technique, usually ignored, could be used in Lisp: write
a new version of Lisp's {\tt eval} to do AD. This is not a good idea,
even in Lisp, because most Lisp programs don't use {\tt eval}: they are
compiled into assembler. We don't want to slow down normal
programs doing normal things by running them through a version of {\tt eval}.
The technique of overloading operators
is conceptually similar to patching the {\tt eval} program anyway, and
so we discard this option.
\end{enumerate}
We provide code for each of the first two approaches.
Because it inherits all the Lisp
facilities not specifically shadowed by AD variations, the overloading
method has the advantage of covering just about everything one can do
in Lisp. The code is structured in a way that is fairly easy to
understand.
Overloading is unfortunately slower in execution compared to
the second method, which tends to run at a (small) multiple of the
original code's speed. But source code transformation is ticklish,
requiring attention to almost every aspect of the language. Thus our
version {\tt dcomp} does not cover as much of Lisp as overloading.
The two techniques can, fortunately, be used together, so that
if expression evaluation of the composition of
built-in functions (or in a specified way, ``friendly'' functions)
is needed, the code can in effect be compiled in-line.
For the purpose of overloading,
we must decide how to wrap up the pair of function value and derivative.
We choose a {\tt defstruct} named {\tt df} with two parts, {\tt f} and {\tt d},
though if we wanted higher derivatives, we might use a vector.
A {\tt df} is
easily distinguished from a conventional scalar, and so methods
can be overloaded.
(The source-code transformation technique we describe later does
not require these structures; the generated code carries along
near-duplicate names, e.g. if the original program has
variable {\tt v}, we generate {\tt v\_diff\_x} (etc.))
\subsection{Overloading for AD}
In this section we describe in more detail
how we overload the arithmetic operators and let the
rest of the language be inherited from Lisp's standard
implementation. (or other variations on generic arithmetic!)
Here's the beginning
of that package declaration needed for AD. It could be extended easily. (For
example, to add the hyperbolic tangent requires only one line).
\begin{verbatim}
(defpackage :ga ;generic arithmetic for AD
(:shadow "+" "-" "/" "*" "expt" ;binary arith
"=" "/=" ">" "<" "<=" ">=" ;binary comparisons
"sin" "cos" "tan" ;... more trig
"atan" "asin" "acos" ;... more inverse trig
"sinh" "cosh" "atanh" ;... more hyperbolic
"expt" "log" "exp" "sqrt" ;... more exponential, powers
"1-" "1+" "abs" ;... odds and ends
)
(:use :common-lisp))
\end{verbatim}
{The implementation of AD looks like the previous symbolic system, but
with new methods that do the arithmetic on pairs. Comparisons,
however, need some work. We define {\tt two-arg- } variants
for {\tt df} pairs by ignoring the derivative and looking only at the function value.
To accomplish this we wrote macro programs
so that for example, the macro-expansion of
{\tt (defcomparison /= )}, says all we need to say about {\tt /=}, the
not-equal function,
We also use macro-expansions to set up programs based
on the derivative property for each operator that looks like a
single-argument differentiable function. To define all that our {\tt
df} handling needs to know about the $\sin$ function, we declare {\tt
(r sin (cos x))}. The {\tt r} macro expansion defines the necessary
piece of code for the chain rule.
Other than these one-argument functions we need to write some specific programs for
{\tt two-arg-+} etc.
This system constitutes
about 130 lines of Lisp code other than the package declarations and some
examples.
\subsection{Using ADIL}
Consider the function $f(x)=x \times \sin x \times \log x + 3$
written in Lisp as\\
{\tt (defun f(x)(+ (* x (sin x) (log x)) 3))}.\\
We can evaluate $f$ at a point $x$ where $x$ is 1.23 by
{\tt (f (df 1.23 1.0))} which returns
\begin{verbatim}
<3.2399835288524628d0, 1.2227035>
\end{verbatim}
In this system we define {\tt df} is a constructor for a pair of numbers.
This pair is displayed in angle-brackets.
$p=\langle 1.23,~1.0 \rangle$.
Note that the text of the program {\tt f} has not changed {\tt at all};
applying the function to a different type of argument, and running the
program within the {\tt :ga} package is all that is needed.
Of course the meaning of the functions like ``+'' within the {\tt :ga}
package is different; one should not be entirely fooled by the
argument that the text is the same. (The function {\tt f} can be
invoked from Lisp programs in other packages by calling it by
a fully-qualified name like {\tt ga::f}.)
\medskip
A more interesting program is this one.
\begin{verbatim}
(defun s(x) (if (< (abs x) 1.0d-5) x
(let ((z (s (* -1/3 x))))
(-(* 4 (expt z 3))
(* 3 z)))))
\end{verbatim}
Not at all obviously, this computes an approximation to $\sin x$ by
applying an identity recursively. That is, $\sin x$ is a polynomial
$4z^3-3z$ in $z=\sin (-x/3)$, and that for small enough $x$, $\sin x =
x$. Let us try it out by typing {\tt (s (df 1.23 1))}. We get
\begin{verbatim}
<0.942489, 0.33423734>
\end{verbatim}
This not only provides a value for $\sin(1.23)$
but also computes 0.33423734, the second part of the pair, which happens to be
$\cos(1.23)$.
It appears that ADIL somehow knew that {\tt s} computed $\sin()$ and
therefore it also computed $\cos()$.
ADIL ``knew'' nothing of the kind. It computed something that looked like
the derivative of sin only because it ran the {\em derivative of the program
that computed something that looked like sin.}
\medskip
The classic recursive program in Lisp is factorial:\\
{\tt (defun fact(x) (if (= x 1) 1 (* x (fact (1- x)))))}\\
which we modify to\\
{\tt (defun fact(x) (if (= x 1) (df 1 0.422784335098d0) (* x (fact (1- x)))))}\\
whose peculiar base case is now the value one, with a peculiar
derivative.
We do this so as to make it correspond to the
derivative of the Gamma function\footnote{a continuous
version of factorial well known among the ``special functions''.}.
With this form we can provide not only
the factorial but its ``derivative'' (at least at integer points).
\medskip
In these examples we have not modified Lisp syntax at all. Anything
{\em else} from Lisp, namely parts of the language not overloaded,
is imported without attention or comment. Thus to
compute and print ten $\sin x$ values, we can use {\tt dotimes} as in
{\tt (dotimes (i 10) (print (s (df i 1))))}.
\subsection{Newton Iteration, or, Why is AD useful?}
{The point is that if we are given a complicated function
$F:~R\rightarrow R$ arranged as an expression, and all the
sub-functions ``cooperate'' properly, we can feed in a pair $\langle
c,1\rangle$, representing the expression $x$ and its derivative with
respect to $x$, namely 1, each evaluated at $x=c$. We get out pairs
$\langle f,f'\rangle = F(\langle c,1\rangle )$ where the latter is the
pair $f=F(c)$, and $f'= D_x(F(x))|_{x=c}$. and this may be exactly
what we want in an application. These are discussed in papers available
via www.autodiff.org. Here we take only the simplest and easiest
to motivate example, Newton iteration.
For example, to converge to a root in a Newton iteration for $f(z)=0$
given an initial guess $c_0$ or $t_0= \langle c_0,0\rangle$ , we
compute $ F(t_{i}) = \langle f(t_i), f'(t_i)\rangle .$ Then the next
iteration $t_{i+1} = t_i - f(t_i)/f'(t_i)$. If this is not
sufficiently accurate we consider repeating with $\langle f,f'\rangle
= F(t_{i+1}),$ etc.
The program is shorter than the explanation.
{
\begin{verbatim} Newton iteration: (ni fun guess) usage: fun is a
;; function of one argument guess is an estimate of solution of
;; fun(x)=0 output: a new guess. (Not a df structure, just a number)
(defun ni (f z) ;one Newton step
(let* ((pt (if (df-p z) z (df z 1))) ; make sure init point is a df
(v (funcall f pt))) ;compute f, f' at pt
(df-f (- pt (/ (df-f v)(df-d v))))))
\end{verbatim}
As a simple example, consider $f(x)=sin(1+2x)$ or\\
{\tt (defun f(x)(sin(+ 1 (* 2 x))))}.
then
\begin{verbatim}
(setf h 2.0d0);; just a guess
(setf h (ni 'f h))
(setf h (ni 'f h))
;; h converges to 1.0707963267948966d0
\end{verbatim}
We can write a version of Newton iteration to return the value too.
Then both the residual and the derivative can be taken into account in
testing whether the Newton iteration has converged sufficiently. That
program would look like:
\begin{verbatim}
(defun ni2 (f z)
(let* ((pt (if (df-p z) z (df z 1))) ;if z is not a df, make it one
(v (funcall f pt))) ;compute f, f' at pt
(values (df-f (- pt (/ (df-f v)(df-d v)))) ;the next guess
v))) ; the residual and derivative
\end{verbatim}
%;; A harder test for Newton iteration is this function,
%\begin{verbatim}
%(defun test(x)(+ (* 1/3 x) 1 (sin x))) ;; f(x) = x/3+1+sin x.
%;; which is good if you start close enough to -0.8 or -3.2 or -5.4
%;; but ni fails for most other initial guesses.
%
%\end{verbatim}
Now that we have a program for only one step, we can show a program
that can use it to find the zero. This is a ticklish proposition
because sometimes this iteration does not converge. We have to use
some stopping heuristic. We could stop when two successive iterations
are close in terms of relative or absolute error, or use some other
measure.
A particularly simple iteration driver program uses {\tt ni2} which
returns the value of the residual which we test. We also
quit with an error message if some count is exceeded.
\begin{verbatim}
(defun run-newt2(f guess &key (abstol 1.0d-8) (count 18)) ;; Solve f=0
;; It looks only at the residual.
(dotimes (i count ;; do at most count times. failure prints msg
(error "~%Newton quits after ~s iterations: ~s" count guess))
(multiple-value-bind
(newguess v)
(ni2 f guess)
(if (< (abs (df-f v)) abstol)
(return newguess)
(setf guess newguess)))))
\end{verbatim}
In this Newton iteration example, our program must, by its nature,
separately get a value and derivative. We have used three
functions particular to ADIL, namely {\tt df-p} a predicate which will
return true if applied to a {\tt df} object, as well as the two
selection functions, {\tt df-f} and {\tt df-d} for extracting the
value and derivative respectively from a {\tt df} object.
\subsection{What about speed?}
The generic arithmetic (ga) system for ADIL
produces code that is slower than ordinary Lisp code,
especially if we make efforts to optimize that Lisp\footnote{ Note that without
type declarations and compilation, ordinary Lisp already has its own
burden of generic arithmetic. Short and long (arbitrary length)
integers, single or double floats, rationals, and complex numbers, as
well as all plausible combinations are handled by standard ANSI Common
Lisp. With appropriate instructions to the compiler
Common Lisp arithmetic can be compiled to ``non-generic'' straight-line
floating-point code, comparable to that of other high-level languages.
}.
How much of a speed difference is there?
On a variety of benchmarks involving mostly computations of how to
dispatch-to-the-right-method, using the generic arithmetic for ADIL
seems to be about a factor of 10 over ``normal Lisp'' and perhaps a
factor of 50 over ``optimized'' Lisp (constrained say,
to double-floats.)
On tests where most of the work is done in subroutines such as
$\sin$ or $\log$, the difference is much less: the $\log$ routine
takes the same time whether it is called from the {\tt ga} package
or from the {\tt user} package. On parts of tests where the operations
are {\em other than generic arithmetic} such as looping over indexes,
the {\tt ga} programs run at full speed because they are in fact
identical.
\subsection{Source code transformation for AD}
As mentioned earlier, there is another part of ADIL, {\tt dcomp} that
can compile programs, in-line, in a restricted language subset of
Lisp, essentially that of functional-style arithmetic programs. In
this situation, benchmarking a simple example suggests that the
comparison between the ordinary Lisp for computing a function $f$ and
the ADIL Lisp for computing a function $f$ and its derivative, is
a small factor of two; we expect perhaps a factor of up to 5, judging
from the literature.
This experiment computes $3+z*(4+z)$ where $z=\sin x$.
The {\tt dcomp}
version is shown with the {\tt defdiff} defining form.
All code is run through a Lisp compiler before timing, and the
reported times are for a run of 10,000 computations.
The first set shows that the code transformation provides function
and derivative values only 2.3 times slower than
the fastest program we could reasonably expect to provide just the
function value.
\begin{verbatim}
(defun kk(x &aux z) ;;optimized version
(declare (double-float x z)(optimize (speed 3)(safety 0)))
(setf z (sin x))
(+ 3.0d0 (* z (+ 4.0d0 z))))
(defdiff kz(x)
(progn (setf z (sin x))
(+ 3.0d0 (* z (+ 4.0d0 z)))))
(kk 1.2d0) --> 30 ms /no derivs!
(kz 1.2d0) --> 70 ms /with derivs!!
\end{verbatim}
In the next example we use only one function, but define
it in the generic arithmetic package. By calling it on
different types we get different behavior.
\begin{verbatim}
(defun kk(x &aux z) (setf z (sin x))(+ 3.0d0 (* z (+ 4.0d0 z))))
(kk ( 1.2d0)) --> 1523 ms /no derivs! ; 51 times slower
(kk (df 1.2d0)) --> 40100 ms /with derivs ; 573 times slower
\end{verbatim}
The time spent in the pure overloaded case is substantially higher.
Fortunately the program {\tt kz} could be used instead
of {\tt kk} in writing a presumably more elaborate
computation using the overloading methods.
The body of {\tt kz} can be examined before it is compiled
by tracing the program {\tt dc}. This shows:
\begin{verbatim}
(lambda (g94)
"(progn (setf z (sin x)) (+ 3.0d0 (* z (+ 4.0d0 z)))) wrt x"
(declare (double-float g94))
(declare (optimize (speed 3) (debug 0) (safety 0)))
(let ((t102 0.0d0) (f101 0.0d0)
(t100 0.0d0) (f99 0.0d0)
(t98 0.0d0) (f97 0.0d0)
(z_DIF_x 0.0d0) (t96 0.0d0)
(f95 0.0d0))
(declare (double-float t102 f101 t100 f99 t98 f97 z_DIF_x t96 f95))
(setf f95 (sin g94))
(setf t96 (cos g94))
(setf z f95)
(setf z_DIF_x t96)
(setf t98 0.0d0 f97 3.0d0)
(setf t100 z_DIF_x f99 z)
(setf t102 0.0d0 f101 4.0d0)
(setf t102 (+ z_DIF_x t102))
(setf f101 (+ z f101))
(setf t100 (+ (* f101 t100) (* t102 f99)))
(setf f99 (* f101 f99))
(setf t98 (+ t100 t98))
(setf f97 (+ f99 f97))
(df f97 t98)))
\end{verbatim}
What else can {\tt dcomp} do? In addition to the usual arithmetic operations
and built-in functions (sin, cos, log, etc.) as declared in the generic arithmetic
package,
{\tt dcomp} can handle {\tt if, progn, setf}. However,
{\tt dcomp} is not as general as overloading, at least not
as we have implemented it. Functions defined with {\tt defdiff}
take only scalar arguments, not {\tt df} structures. They return only
{\tt df} structures. The derivative is always with respect to the first formal
argument. Thus {\tt (defdiff f(x y z)(cos (+ x y z)))} is legal, but only
one derivative is computed.
Within the {\tt dcomp} framework,
a restricted version of recursion is possible. Explaining the restriction
(or removing it!) led to too much complexity and was taken out.
How does this fit into the generic framework, then? As shown in the timings,
a function like {\tt f} above, running at full speed.
The {\tt dcomp file} is about 290 lines of code
\section{Does it really make sense to use ADIL?}
For fans of Lisp, there is no question that one motivation is to show
off the natural advantage of the language: Lisp provides a natural
representation for programs as data and a natural form for writing
programs that write programs, which is what we do in ADIL. It also has
an unobtrusive object-oriented programming system, only part of which
is used here. We used subsetting of types and dispatched on methods
based on the types of {\em all} the parameters, not just the methods
associated with the type of the ``first argument''.
The code is short, and is in ANSI standard Common Lisp. It should
run without any change in any conforming system. While it
is not using the most obvious idioms of introductory Lisp, that
is should not be a barrier. There are now several excellent
books on Common Lisp; some people have argued that reading
them will make you a better program in any language.
For persons only slightly familiar with Lisp, or whose acquaintance
comes from one of the too-common texts that use the Lisp 1.5 of
1959 as the definition, a glance at the Lisp shows how short
such code can be. If you are familiar with the complexities of
automatic differentiation when presented in another language,
the relatively brevity should also be observable.
If you care not a whit about Lisp or implementation strategies, you
may prefer to refer to the recently-revised online documentation for ADIFOR
2.0 version D, some 99 pages, and read in detail how programs to be
differentiated must be distinguished from ordinary FORTRAN (77)
programs. Using ADIFOR requires declaring and marking program
variables, adjusting numerous parameters, perhaps revising the FORTRAN
in order to obey various restrictions, and following detailed
guidelines on the use of these programs.
The flexibility one gets is
apparent by looking at the code in which Lisp macro-definitions have
made the addition of new derivative information simple. For example,
to insert a new rule for differentiation of $\tanh x$ one adds {\tt
"tanh"} to the {\tt shadow} list, and execute\\
{\tt (r tan (expt (cosh x) -2))}.\\
By comparison, large and monolithic systems are
relatively rigid, and require some effort to port, extend or optimize:
the original designers must anticipate the spectrum of possible
choices, perhaps freezing some choices, and then describe all the
variants in detail. The user must then read the details, and hope
there are no bugs; the user is unlikely to be able, in any case, to
repair them. Some of the restrictions of the large systems seem to be
arbitrary--perhaps a small point, but it did not occur to us to have
to exclude recursive functions, although recursion is a feature
lacking in ADIFOR.
Finally, we wish to point out that Lisp is perfectly adequate for
expressing numeric computation; sophisticated compilers are
available for generating efficient code.