Math 128a - Spring 2002 - Lecture Notes on Approximation and Interpolation
Read sections 6.1 through 6.4 in text.
Idea: given a complicated function f(x), replace it by
a simpler one that approximates it
Motivation:
1) Solving f(x) = 0. We have seen that approximating f(x)
by a straight line (or linear function in more variables)
is the basis of Newton and other good methods.
2) Evaluating f(x). This refers to the library routines all
computers have for sin(x), exp(x), log(x), etc. We considered
exp(x) in a homework assignment, and saw that it reduced to
evaluating exp(y) for 0 <= y <= 1, for which we used a
Taylor expansion, i.e. polynomial.
It also refers to graphics, where we want to plot f(x);
in this case we may want an algorithms that is both
cheap (since we have to evaluate at many point, perhaps
in "real time") and smooth (so it looks good).
It is also essential in CAD = Computer Aided Design, where
you design physical objects, and determine their shapes by
picking a few value of f(x) and fill f(x) in "smoothly"
in between.
3) Integrating f(x) from a to b. If f(x) were a polynomial, or
piecewise polynomial, this would be easy, so in general
we approximate f(x) by a (piecewise) polynomial and integrate
this appoximation.
4) Solving ordinary differential equations (ODEs) dx/dt = f(x).
If f(x) is a constant f(x) = a, then the ODE is
easy to solve explicitly: x(t) = x(t0) + a*(t-t0), where
x(t0) is the value of the solution at t0.
We again want to exploit this to solve more complicated ODEs.
5) Solving partial differential equations (PDEs). In addition to the
same idea as for ODEs, Finite Element Methods (popular
in civil and mechanical engineering), use polynomials on
little patches called "finite elements" to approximate the PDE.
What functions are we going to use for our approximations?
This depends both on what computers can effectively implement,
and the properties of the functions we want to approximate.
First consider what computers can effectively implement.
Since computers have +,- and * built in, polynomials are obvious
candidates. Division make rational functions possible, but
for some operations (like integration) they usually offer no
advantage over polynomials (since they are harder to integrate
explicitly), and division is usually several times more expensive
than addition or multiplication.
Next, the ability to have branches (if-then-else) means that
we can have "piecewise" polynomials (or rational functions), i.e.
functions that use different polynomials for different intervals of x
if (a <= x < b) then y = f1(x)
if (b <= x < c) then y = f2(x)
...
We will concentrate on polynomials and piecewise polynomials in this course.
Second, let us briefly consider the properties of the functions we
want to evaluate. There is a Theorem of Fourier which approximately
states that every periodic function can be written as an infinite
sum of sines and cosines, called a Fourier series. Finite sums of
sines and cosines are frequently used to approximate periodic functions,
which typically arise in signal processing. The algorithm used for
this is called the Fast Fourier Transform, or FFT.
Besides sines and cosines,
another very recent class of functions used for approximation are
wavelets, see Math 118. We will not have time for this in this class.
To make the notation more precise, we want to compute a polynomial (or
piecewise polynomial) p(x) that somehow approximates another function f(x)
on an interval a <= x <= b. We need to define more carefully how we
measure how well p(x) approximates f(x), because there are several possible
definitions, all of which are reasonable and useful in different situations.
First, we present a limitation on how well polynomials can approximate
anything. Suppose we were interested in approximating f(x) by a polynomial
p(x) for all real x. Write p(x) = sum from i=0 to n c(i)*x^i.
when |x| is very large, we can write
p(x) = c(n)*x^n * [ 1 + sum from i=0 to n-1 (c(i)/c(n)) * x^(i-n) ]
c(n)*x^n * [ 1 + E(x) ]
When |x| is large, E(x) -> 0 like 1/x, so p(x) ~ c(n) x^n.
Thus, all polynomials go to infinity like a power of x as |x| -> infinity.
Thus we cannot hope to approximate any function f(x) for all real x that
does not go to infinity this way. For example sin(x) remains bounded
for all x, so we cannot expect to approximate it by a polynomial for all x.
Instead, in all our approximations we can only ask that p(x) approximate f(x)
on a finite interval I = [a,b], or perhaps even only a finite number of
points x{0},x{1},...,x{n-1}.
Now we talk about our notions of error and approximation. Here is a summary:
1) We start with the most natural definition of approximation,
finding a polynomial p*(x) of a given degree that most closely approximates
a continuous function f(x) everywhere in an interval [a,b].
Definition: || g(x) ||_inf = max (a <= x <= b) |g(x)| is called the
"infinity norm" or "maximum norm" of g on [a,b]
Definition: The error of approximation of polynomial p(x) for a
function f(x) on [a,b] is || p(x) - f(x) ||_inf
Definition: The best approximating polynomial p*(x) of degree at most n
satisifies
|| p*(x) - f(x) ||_inf =
min (all polynomials q of degree at most n) || q(x) - f(x) ||_inf
Note that p*(x) does not have to have degree n, just at most n.
For example, if f(x) = x, then p*(x)=x is the best approximation for n >= 1
We will eventually prove the
Weierstrass Approximation Theorem: If f(x) is continuous on [a,b], then
|| p*(x) - f(x) ||_inf goes to 0 as the degree of p*(x) goes to infinity.
Why is this not just a restatement of Taylor's Theorem?
What about f(x) = abs(x) on [-1,1]?
It turns out that computing the best approximation p*(x) of a given degree n
is very expensive, and so we don't do it very often (It is used for functions
like sin(x) or exp(x) that are in the computer's math library, and only need
to be determined once.
The name of the algorithm for this is the Remez algorithm,
which we will not cover.)
2) The second notion of approximation is very widely used, and
easy to compute. We are given n points in the plane
(x{0},f(x{0})), (x{1},f(x{1})), ... , (x{n-1},f(x{n-1}))
where all the x{i} are distinct, and ask for a polynomial p(x)
that "interpolates these points" i.e. satisfies
p(x{i}) = f(x{i}), for i=0,...,n-1
How can we compute this polynomial? We will see that the degree
of p will be at most n-1, and there is a simple O(n^2) algorithm to
compute it. The algorithm will be called the "Newton divided-difference
formula", or simply "Newton interpolation."
So if we only measure the error |p(x) - f(x)| at the points x{i},
it is zero by construction. But we expect to evaluate p(x) at
other points, and want a bound on |p(x)-f(x)| there too. We will
derive one, and it will look similar to the error bound for
a Taylor expansion:
f(x) - p(x) = [ f^{(n+1)}(eta) / (n+1)! ] * product (i=0 to n) (x-x{i})
where a <= eta <= b
But there is one important difference with respect to the Taylor expansion
depending on the choice of x{i} and f(x), the error does
NOT necessarily get smaller as the degree increases, even if
f(x) has a convergent Taylor expansion around every point in [a,b].
In fact, the error may increase once n gets large enough,
sometimes with wild oscillations in the error f(x)-p(x).
We will look at
several examples, and derive a general bound on the error
|| p(x) - f(x) ||_inf, and analyze it.
3) The third notion of approximation is a variation on the last one:
We are given n triples:
(x{0},f(x{0}),f'(x{0})), ... , (x{n-1},f(x{n-1}),f'(x{n-1}))
and ask for a polynomial p(x) that matches f(x) and its first
derivative f'(x) at x{0},..,x{n-1}:
p(xi) = f(xi), p'(xi) = f'(xi) for i=0,...,n-1
This is called "Hermite interpolation", and results in simultaneously
making p(x) approximate f(x) and p'(x) approximate f'(x).
We will again show that there is a simple algorithm for it. Since we
are asking p(x) to "better match" f(x), we will need to use a higher
degree polynomial than n-1. Indeed, one can ask more generally that
p(x{i}) = f(x{i}), p'(x{i}) = f'(x{i}), ... ,
p^(k{i})(x{i}) = f^(k{i})(x{i})
i.e. that p(x) matches the first k{i} derivative of f(x) at x{i}
for i=0 to n-1.
One special case of this is n=1, in which case we are asking for
the Taylor expansion. Thus we see that the Taylor expansion is
one special case of a large variety of interpolating polynomials,
that match the value of a polynomial and some of its derivative
at one or more points. All the algorithms and error bounds for
approximations of type 2) generalize to this case, though we will
not have time to cover this.
4) The fourth notion uses piecewise polynomials. The data is again
(x{0},f(x{0})), (x{1},f(x{1})) ,..., (x{n-1},f(x{n-1}))
where n is fairly large. We want to use a (different) low degree
polynomial on each interval (x{i},x{i+1}).
The low degree will keep the evaluation
cheap, and prevent the wild oscillations interpolation
polynomials might exhibit (examples later).
The simplest low degree polynomial is a constant, i.e.
p(x) = f(x{i}) for x{i} <= x < x{i+1}
(picture)
This is not even continuous. The next best polynomial is a linear
one, gotten by "connecting the dots" (x{i},f(x{i})) by straight lines.
(picture)
This is continuous, but its first derivative is clearly not at the x{i}.
If we use a higher degree polynomials in each interval (x{i},x{i+1}),
then we can make the function smoother, by making its derivatives
continuous at the x{i}. Such piecewise polynomials are called splines.
The most popular spline is the cubic spline, which has continuous
second derivatives. but there are many others.
They are widely used in graphics and Computer Aided Design.
Matlab has a "spline toolbox" of many types of splines.
Now we go back and do the details.
1) We start with perhaps the most natural definition of approximation.
We recall that the error of approximation of any polynomial p is
|| p(x) - f(x) ||_inf = max (a <= x <= b) |p(x) - f(x)|
This is perhaps the most natural definition of error: the largest
error at any point in the interval [a,b].
The goal is to compute a polynomial p*(x) of degree at most n, such that
|| p*(x) - f(x) ||_inf =
min (polynomials q of degree <= n) || q(x)-f(x) ||_inf
Here is the reason for limiting the degree to be at most n:
Weierstrass Approximation Theorem: If f(x) is any continuous function,
then the error || p*(x) - f(x) ||_inf decreases to 0 as we increase n.
In other words, any continuous function on [a,b] can be approximated
arbitrarily closely by a polynomial.
There is a proof of this result is in the book; we will do a different
one here. This means that if we want to decrease the error, we can just
use a higher degree polynomial. But the higher the degree, the more
expensive it is to find and evaluate p*.
It turns out to be complicated and expensive to compute p*, requiring
running a Newton-like iterative algorithm (called the Remez algorithm),
so this kind of approximation is usually only used for computing the
approximations used for sin(x), exp(x), etc. in the computer's
math library, since they need only be computed once and for all.
Proof of the Weierstrass Approximation Theorem:
We claim that
Lemma 1: We may assume without loss of generality that [a,b] = [0,1],
f(x) = 0 for x outside [0,1], and f(0) = f(1) = 0.
Proof: Homework!
Define the polynomial
Q_n(x) = c_n (1-x^2)^n
where c_n > 0 is a positive constant chosen so that
integral from -1 to 1 of Q_n(x) dx = 1 for all n
Show plot of Q_n(x) for large n
Lemma 2: c_n < sqrt(n)
Proof: Homework!
Now set p_n(x) = integral from -1 to 1 f(x+t)Q_n(t) dt
We claim
1) p_n(x) is a polynomial
2) || p_n(x) - f(x) ||_inf goes to 0 as n -> infinity
To prove 1), change the integral to
integral from -x to 1-x f(x+t)Q_n(t) dt
since f(z) is zero outside [0,1]
= integral from 0 to 1 f(z)Q_n(z-x) dz
by changing the variable of integration to z = x+t
Now Q_n(z-x) = c_n (1-(z-x)^2)^n
is a polynomial in z and x; write it as
Q_n(z-x) = sum (i=0 to 2n) q_i(z)*x^i,
where q_i(z) is a polynomial in z.
Substitute into the integral to get
integral from 0 to 1 f(z) sum q_i(z)*x^i dz
= sum [ integral from 0 to 1 f(z) q_i(z) dz ] x^i
which is obviously a polynomial in x, where the integrals
are the coefficients
To prove 2) we need
Lemma 3: There is a constant M such that |f(x)| <= M for all 0 <= x <= 1
For every epsilon > 0, there is a 0 < delta < 1 such that
|x-y| < delta implies |f(x)-f(y)| < epsilon
Proof of Lemma 3: This follows from continuity of f(x) on the
closed bounded interval [0,1]: f(x) is bounded and uniformly continuous.
Now choose any eta > 0. We want to show that when n is large enough,
the error | p_n(x) - f(x) | is at most eta. Let epsilon = eta / 2,
choose delta according to Lemma 3, and write
| p_n(x) - f(x) | = | integral from -1 to 1 f(x+t)Q_n(t) dt - f(x) |
= | integral from -1 to 1 [f(x+t)-f(x)]Q_n(t) dt |
since integral from -1 to 1 Q_n(t) dt = 1
<= integral from -1 to 1 |[f(x+t)-f(x)]Q_n(t)| dt
since taking absolute values makes the area
under the curve bigger
= integral from -1 to 1 |f(x+t)-f(x)|Q_n(t) dt
since Q_n(t) >= 0
= integral from -1 to -delta (same integrand)
+ integral from -delta to +delta (same integrand)
+ integral from +delta to 1 (same integrand)
= I1 + I2 + I3 (the three integrals in the last line)
We will show that if n is large enough then
I2 <= eta / 2, and I1 and I3 <= eta/4,
so their sum is <= eta as desired.
First consider I2. The integrand is bounded by eta/2 Q_n(t) by
Lemma 3, so
I2 <= integral from -delta to delta (eta/2) Q_n(t) dt
<= integral from -1 to 1 (eta/2) Q_n(t) dt
since delta < 1 and Q_n(t) is nonnegative
= (eta/2)
since integral from -1 to 1 Q_n(t) dt = 1
Now consider I1. The integrand is bounded by
|f(x+t)-f(x)|Q_n(t) <= 2M Q_n(t)
since |f(x+t)-f(x)| <= |f(x+t)|+|f(x)| <= 2M
<= 2M c_n (1-delta^2)^n
since Q_n(t) is largest when t is closest to 0
<= 2M sqrt(n) (1-delta^2)^n
by Lemma 2
Thus I1 <= integral from -1 to -delta 2M sqrt(n) (1-delta^2)^n
= (1-delta) 2M sqrt(n) (1-delta^2)^n
As n -> infinity, this upper bound approaches 0. To see why, take
its logarithm, which is
const + .5*log n + n*log(1-delta^2)
Since n is much larger than log n when n is large, this expression
is dominated by n*log(1-delta^2) for large n,
and since log(1-delta^2) < log(1) < 0, n*log(1-delta^2)
goes to -infinity.
So, we can take n large enough so that I1 <= eta/4.
The same upper bound as for I1 works for I3, and shows that I3 <= eta/4
for n large enough.
This completes the proof of the Weierstrass approximation theorem,
except for Lemmas 1 and 2, which you'll prove on homework.
There is a more powerful version of this theorem, called the Stone-Weierstrass
Theorem, which we will talk about but not prove. In addition to polynomials
on intervals like [a,b], we often need to approximate functions of more
variables, or functions with properties like being periodic. Generally, one is
given a closed bounded subset X of R^n (like X=[a,b] when n=1), and a set of
"basis functions" g0(x)=1,g1(x),g2(x),..,gm(x), and the question is whether one
can approximate any continuous function f(x) as closely as desired by using
"combinations" of these basis functions. By combination we mean any polynomial
like 3*g0(x) + 2*g1(x) - 5*g2(x)*g3(x), etc. For example, Weierstrass
deals with the case m=1 and g1(x)=x, i.e. standard polynomials.
But you might also let X = unit circle (where the angle theta is the coordinate),
where continuous functions f(theta) mean periodic functions with period 2*pi,
and let m=2 with g1(x) = cos(x) and g2(x) = sin(x) as the bases.
There turns out to be a very simple criterion for the set {g1(x),...,gm(x)}
of basis functions to satisfy that guarantees that any continuous function
can be approximated by a polynomial in them. Suppose that there were two distinct
points y and z in X where gi(y) = gi(z) for all i. Then any function
of the gi(x) will also obviously have the same value at y and z. This means
that we clearly could not approximate any continuous function that had different
values at y and z. So a trivial necessary condition to be able to approximate
arbitrary continuous functions is that for any distinct pair y and z, some gi(x)
satisfies gi(y) != gi (z).
Amazingly, this necessary condition is also sufficient:
Thm (Stone-Weierstrass). Let X be a closed bounded subset of R^n, and
{g0(x)=1,g1(x),...,gm(x)} a set of continuous functions with the property
that for any distinct y and z in X, there is a gi(x) such that gi(y) != gi(z).
Then any continuous function f(x) on X can be approximated arbitrarily
closely by a polynomial in the gi(x).
Example: Let X = [a,b], m=1 and g1(x)=x. Since y != z means g1(y) != g1(z),
we get than any continuous function on [a,b] can be approximate arbitrarily
closely by a polynomial in x. This is the version of Weierstrass's Thm that
we proved.
Example: Let X = any closed bounded subset of R^n. Let m=n and gi(x) = x{i},
the i-th coordinate of x. Since y != z means they differ in at least
one coordinate, Stone-Weierstrass applies, and we get that we can approximate
any continuous function on X with polynomials in the coordinates x{1},...,x{n}.
Example: Let X = unit circle in R^2, with coordinate theta, and
g1(x) = cos(x) and g2(x) = sin(x). Clearly y != z means that
their coordinates on the circle (g1(x),g2(x)) differ. This means that all continuous
2*pi-periodic functions can be approximated arbitrarily closely by polynomials
in sin(x) and cos(x). Since sin(x)^m can be written (using trig identities)
as a linear combination of sin(x),sin(2x),...,sin(m*x),cos(x),cos(2x),...,cos(m*x)
this means that any continuous 2*pi-periodic functions can be arbitrarily
well approximated by a linear combination of these sin(x),...,cos(m*x).
This is called a Fourier expansion.
Exercise: Modify our proof of the Weierstrass Approximation Thm for polynomials
to work for Fourier expansions. Hint: replace 1-x^2 by cos(pi*x/2)
This completes our discussion of approximation in the sense of minimimzing
|| p(x) - f(x) ||_inf.
2) Recall that for the second notion of approximation
we are given n points in the plane
(x{0},f(x{0})), (x{1},f(x{1})), ... , (x{n-1},f(x{n-1}))
where all the x{i} are distinct, and ask for a polynomial p(x)
that "interpolates these points" i.e. satisfies
p(x{i}) = f(x{i}), for i=0,...,n-1
We will see how to compute this polynomial by an algorithm called
the "Newton divided-difference formula", and analyze the error
|| p(x) - f(x) ||_inf .
So if we only measure the error |p(x) - f(x)| at the points x{i},
it is zero by construction. But we expect to evaluate p(x) at
other points, and want a bound on |p(x)-f(x)| there too. We will
derive one, and it will look similar to the error bound for
a Taylor expansion. But there is one important difference:
depending on the choice of x{i} and f(x), the error does
NOT necessarily get smaller as the degree increases. In fact,
it often increases once n gets large enough. Let's look at some
examples.
Example: f(x) = x^3 - 3*x^2 + x - 2, x = -1 : 4/(n-1) : 3, n = 2,3,4,5,10
[c,err,NewtonTable]=NewtonInterp('ft',-1:4/(n-1):3,1); err
Note: after n = 4, result same up to roundoff, because 4 points
determine a cubic uniquely
Example: f(x) = sin (x), x = 0 : 2*pi/(n-1) : 2*pi,
n = 2,3,4,5,6,8,10,15,20,25,50,60,70
[c,err,NewtonTable]=NewtonInterp('sin',0:2*pi/(n-1):2*pi,1); err
Note how error decreases as n increases
We can also plot error versus the number of interpolation points:
for n=2:70, ...
[c,errN(n),NewtonTable] = ...
NewtonInterp('sin',0:2*pi/(n-1):2*pi,0); ...
end, semilogy(errN)
Note that the error decreases for a while (up to n~20)
and then increases. The red-curve shows the round off error
in evaluating the polynomial, so we see that the error
is due in some places to roundoff, but not in all (see curve for n=70).
Example: Let us try plotting the degree-20 polynomial on a wider
interval:
n=20;
x = 0:2*pi/(n-1):2*pi;
[c,err,NewtonTable]=NewtonInterp('sin',x,0); err
[y,ebnd]=NewtonEval(c,x,-2*pi:.001:4*pi,1);
Note that outside the interval of approximation [0,2*pi],
it does not approximate sin(x) very well.
Example: Let us try different sample points: n = 30, 40
nn=0:n;x=sort(cos((2*nn+1)*pi/(2*n+2)));x=(x+1)*pi; plot(x)
[c,err,NewtonTable]=NewtonInterp('sin',x,1)
So we see that choosing the interpolation points can be important,
because it can make the error much smaller (when n=30, 1e-14 vs
1e-11). We will justify this choice of points later.
Example: f(x) = 1/(1+x^2), x = -2:4/(n-1):2
(n equally spaced points between -2 and 2), n=5,10,15,20,30
[c,err,Newtontable]=NewtonInterp('ft1',-2:4/(n-1):2,1);err
The error shrinks in the middle of the interval [-2,2] but
grows near the borders. This lack of convergence is called the
"Runge effect" after its discoverer.
Example: Contrast with Hermite, Splines:
n=10, [c,err,N]=NewtonInterp('sin',0:2*pi/(n-1):2*pi,1); err
n=10, [c,err,N]=HermiteInterp('sinH',0:2*pi/(n-1):2*pi,1); err
n=10, [xout,yout,err]=SplineFit('sin',0:2*pi/(n-1):2*pi,1); err
N = (2:50)';[errN,errH,errS]=CompareInterp('sin','sinH',0,2*pi,N,1);
Conclusions: Hermite most accurate, but then error starts increasing,
Newton next, then error starts increasing
Spline least, but error always goes down, smooth
n=10, [c,err,N]=NewtonInterp('ft1',-2:4/(n-1):2,1); err
n=10, [c,err,N]=HermiteInterp('ft1H',-2:4/(n-1):2,1); err
n=10, [xout,yout,err]=SplineFit('ft1',-2:4/(n-1):2,1); err
N = (5:5:75)';[errN,errH,errS]=CompareInterp('ft1','ft1H',-2,2,N,1);
Now the Spline is most accurate for n >= 10
Now we talk about algorithms for computing an interpolation polynomial,
and compute bounds on the error || p(x) - f(x) ||_inf
Simple Algorithm 1 (not recommended for use in practice!)
Suppose n=4, so we seek c0,c1,c2,c3 such that
c0 + c1*x{i} + c2*x{i}^2 + c3*x{i}^3 = f(x{i}) for i=0,1,2,3
This is a system of 4 linear equations in c0,c1,c2,c3:
[ 1 x0 x0^2 x0^3 ] [ c0 ] = [ f(x0) ]
[ 1 x1 x1^2 x1^3 ] * [ c1 ] = [ f(x1) ]
[ 1 x2 x2^2 x2^3 ] [ c2 ] = [ f(x2) ]
[ 1 x3 x3^2 x3^3 ] [ c3 ] = [ f(x3) ]
The coefficient matrix has a very special structure, and is
called a Vandermonde matrix. We could imagine solving this
system for c0,c1,c2,c3 using Gaussian elimination.
For general n, the matrix has (j,k)-th entry x{j}^(k-1), and the
right-hand-side vector as j-th entry f(x{j}).
Solving using Gaussian elimination straightforward, but turns out
to be much more expensive (O(n^3) vs O(n^2)) and less accurate
than our ultimate algorithm below.
It is also not immediately clear why this matrix is nonsingular,
which is a necessary and sufficient condition to get a unique solution
for all f(x). This will follow from other expressions for ci below.
Simple Algorithm 2 (not recommended for use in practice!)
Lagrange Interpolation. Again consider n=4. Consider
the 4 special polynomials of degree 3
L0(x) = (x-x1)*(x-x2)*(x-x3)/[ (x0-x1)*(x0-x2)*(x0-x3) ]
L1(x) = (x-x0)* *(x-x2)*(x-x3)/[ (x1-x0)* *(x1-x2)*(x1-x3) ]
L2(x) = (x-x0)*(x-x1)* *(x-x3)/[ (x2-x0)*(x2-x1)* *(x2-x3) ]
L3(x) = (x-x0)*(x-x1)*(x-x2) /[ (x3-x0)*(x3-x1)*(x3-x2) ]
Note that L0(x) = 0 when x=x1,x2,x3, and L0(x0)=1
Similarly Li(x) = 0 when x = xj, j != i, and Li(xi)=1
Example: Lagrange([1 2 3 4],1)
Lagrange([1 1.5 2 4],1)
Lagrange([1 1.1 1.2 4],1) - note the huge polynomial values near x=3,
which mean that we might get a large error there
Thus p(x) = sum (j=0 to 3) f(x{j})*L{j}(x) has two properties
1) It is a polynomial of degree 3
2) p(x{i}) = sum (j=0 to 3) f(x{j})*L{j}(x{i}) = f(x{i})*L{i}(x{i}) = f(x{i})
since all other terms vanish
so it is the interpolating polynomial we want.
(we will prove its uniqueness later).
Written in matrix form as in Algorithm 1,
we are again asking for coefficients c0,c1,c2,c3 such that
[ L1(x0) L2(x0) L3(x0) L4(x0) ] [ c0 ] = [ f(x0) ]
[ L1(x1) L2(x1) L3(x1) L4(x1) ] * [ c1 ] = [ f(x1) ]
[ L1(x2) L2(x2) L3(x2) L4(x2) ] [ c2 ] = [ f(x2) ]
[ L1(x3) L2(x3) L3(x3) L4(x3) ] [ c3 ] = [ f(x3) ]
But now we have defined the polynomial Li(x) so that the coefficient
matrix is the "identity matrix" (ones on the diagonal, zeros off the diagonal)
so that ci = f(x{i}).
In the general case,
L{i}(x) = prod (j = 0 to n-1 except i) [(x-x{j})/(x{i}-x{j})]
which is 0 at x = x{j} (j != i) and is 1 at x=x{i}
Algorithm 3: Newton Divided Difference Formula
We pick a "basis set" of polynomials p1(x),p2(x),...
1, x, x^2,... in Algorithm 1
L0(x),L1(x),L2(x),... in Algorithm 2
p0(x),p1(x),p2(x),... here
and solve a linear system of equations for coefficients c0,c1,...
so that p(x) = sum (i=0 to n-1) ci*pi(x) satisfies p(x{i}) = f(x{i})
Our preferred algorithm uses the basis set
p{0}(x) = 1
p{1}(x) = (x-x{0})
p{2}(x) = (x-x{0})*(x-x{1})
p{3}(x) = (x-x{0})*(x-x{1})*(x-x{2})
...
p{i}(x) = prod (j=0 to i-1) (x-x{j})
and expresses p(x) = sum i=0 to n-1 c{i}*p{i}(x).
Note that p{i}(x{k}) = 0 if i > k, since p{i}(x) has a factor (x-x{k})
Thus the linear system
[ p0(x0) p1(x0) p2(x0) p3(x0) ] [ c0 ] = [ f(x0) ]
[ p0(x1) p1(x1) p2(x1) p3(x1) ] * [ c1 ] = [ f(x1) ]
[ p0(x2) p1(x2) p2(x2) p3(x2) ] [ c2 ] = [ f(x2) ]
[ p0(x3) p1(x3) p2(x3) p3(x3) ] [ c3 ] = [ f(x3) ]
has zeros above the main diagonal, i.e. it is "lower triangular"
[ p0(x0) 0 0 0 ] * [ c0 ] = [ f(x0) ]
[ p0(x1) p1(x1) 0 0 ] [ c1 ] = [ f(x1) ]
[ p0(x2) p1(x2) p2(x2) 0 ] [ c2 ] = [ f(x2) ]
[ p0(x3) p1(x3) p2(x3) p3(x3) ] [ c3 ] = [ f(x3) ]
This lets us solve for
c0 = f(x0)/p0(x0)
c1 = [ f(x1) - p0(x1)*c0 ]/p1(x1)
c2 = ...
by "substitution"
Here is a simpler and more flexible approach.
First note that c{i} only depends on x{0},...,x{i-1} and f(x{0}),...,f(x{i-1}),
because of the form of the triangular linear system. Here are the
first two values of c:
c{0} = f(x{0})
c{1} = [ f(x{1}) - f(x{0}) ] / [ x{1} - x{0} ]
This justifies the following
Definition: c{i} = f[x{0},x{1},...,x{i}] is called the i-th divided difference
of f at x{0},x{1},...,x{i}
A look at c{1} explains the term "divided difference".
This means that we can write, for example, that the polynomial that
interpolates f(x) at x=x0,x1,x2 can be written
p(x) = f[x0] + f[x0,x1]*(x-x0) + f[x0,x1,x2]*(x-x0)*(x-x1)
To make the the notation clear, we can also express
the polynomial that interpolates f(x) at x=x2,x3,x4 as
p(x) = f[x2] + f[x2,x3]*(x-x2) + f[x2,x3,x4]*(x-x2)*(x-x3)
The next theorem about divided-differences justifies our simple algorithm
for computing them:
Theorem: f[z{0},z{1},...,z{n}] = [ f[z{1},...,z{n}] - f[z{0},...,z{n-1} ]
/ [ z{n} - z{0}]
Lemma: The interpolating polynomial at z{0},...,z{n} is unique,
assuming the z{i} are distinct.
Proof of Lemma: Assume there are two degree n polynomials p(x) and q(x)
that interpolate f(x) at the distinct points z{0},..,z{n}.
We must show that r(x) = p(x) - q(x) = 0. First, note
that r(z{0}) = r(z{1}) = ... = r(z{n}) = 0, so r(x) has
at least n+1 distinct zeros. But since it is a polynomial of
degree n, it can have at most n distinct zeros, unless it
is identially zero.
Proof of Theorem:
Let p_n(x) be the polynomial that interpolates f at z{0},...,z{n}
p_{n-1}(x) be the polynomial that interpolates f at z{0},...,z{n-1}
q(x) be the polynomial that interpolates f at z{1},...,z{n}
Thus p_n(x) has degree n, and p_{n-1}(x) and q(x) have degrees n-1.
We claim that
p_n(x) = (x-z{0})/(z{n}-z{0})*q(x) + (z{n}-x)/(z{n}-z{0})*p_{n-1}(x)
To see why, note that
1) The RHS (right hand side) is a polynomial of degree n
2) At x=z{i}, i = 1,...,n-1, RHS
RHS = (z{i}-z{0})/(z{n}-z{0})*f(z{i}) + (z{n}-z{i})/(z{n}-z{0})*f(z{i})
since q(z{i}) = p_{n-1}(z{i}) = f(z{i})
= (z{n}-z{0})/(z{n}-z{0})*f(z{i}) = f(z{i})
3) At x=z{0}, RHS = 0 + (z{n}-z{0})/(z{n}-z{0})*f(z{0}) = f(z{0})
4) At x=z{n}, RHS = (z{n}-z{0})/(z{n}-z{0})*f(z{n}) + 0 = f(z{n})
Thus the RHS is an interpolating polynomial, and by the Lemma,
must equal p_n(x). In particular, the coefficient of x^n is the
same in both polynomials.
The coefficient of x^n in p_n(x) is defined to be f[z{0},...,z{n}].
The coefficient of x^n in RHS is
1/(z{n}-z{0})*(coeff of x^{n-1} of q(x)) -
1/(z{n}-z{0})*(coeff of x^{n-1} of p_{n-1}(x))
= [ f[z{1},...,z{n}] - f[z{0},...,z{n-1}] ]/(z{n}-z{0})
as desired.
This theorem immediately jusitifes the following algorithm.
We use an array c(j,k) to store c(j,k) = f[x{j},x{j+1},x{j+2},...,x{j+k}],
Divided-Difference Algorithm:
for j = 0 to n, c(j,0) = f(x{j}), end
for k = 1 to n,
for j = 0 to n-k
c(j,k) = [ c(j+1,k-1) - c(j,k-1) ] / [ x{j+k} - x{j} ]
end
end
Then f[x{0},...,x{i}] = c(0,i) is the first row of the table.
The algorithm has the attractive feature of giving a table of c(j,k)'s that
defines many interpolating polynomials, namely those interpolating
f() at x{j} through x{j+k}, for any 0 <= j <= j+k <= n
(use c(j,0) through c(j,k) instead of c(0,0) through c(0,n)),
as well as error bounds as described below.
Lemma 1: Let p(t) be the interpolation polynomial for f(t) at x{0},..,x{n}. Then
we can write the error as
f(t) - p(t) = f[x{0},...,x{n},t] prod {j=0 to n} (t-x{j})
Proof: If t = x{j}, both sides are zero, so assume t does not equal any x{j}.
Let q(x) be the polynomial that interpolates f(x) at x{0},...,x{n},t,
where t is a new point.
Then by the properties of Newton Interpolation and divided differences
q(x) = p(x) + f[x{0},...,x{n},t] prod {j=0 to n} (x-x{j})
Substitute x=t and q(t)=f(t) to get
f(t) = p(t) + f[x{0},...,x{n},t] prod {j=0 to n} (t-x{j})
Lemma 2: f[x{0},...,x{n}] = f^{(n)}(z) / n! for some z in [min(x{i}),max(x{i})]
Proof: Let p(x) be the polynomial interpolating f(x) at x{0},...,x{n}.
We know that p(x) has degree n with highest term
f[x{0},...,x{n}]*x^n
Now consider g(x) = f(x)-p(x).
This function has zeros at n+1 points, x{0},...,x{n} (which we assume
for simplicity are sorted x{0} < x{1} < ... < x{n})
By Rolle's Theorem, g'(x) has n distinct zeros in the intervals
(x{0},x{1}), (x{1},x{2}), ... , (x{n-1},x{n})
Applying Rolle's Theorem again, g''(x) has n-1 distinct zeros,
g'''(x) has n-2 distinct zeros, and finally g^{(n)}(x) has
one distinct zero, call it z. Thus
f{(n)}(z) = p^{(n)}(z) = n! f[x{0},...,x{n}],
since all the lower powers of p(x) disappears when differentiated n times.
Dividing by n! give the result.
Polynomial Approximation Error Theorem:
Let p(x) interpolate f(x) at x{0},...,x{n}. Then the error is given by
f(x) - p(x) = f^{(n+1)}(z)/(n+1)! prod {j=0 to n} (x-x{j})
where z is in the range [min(x{i},x),max(x{i},x)]
Proof: By Lemma 1,
f(t) - p(t) = f[x{0},...,x{n},t] prod {j=0 to n} (t-x{j})
By Lemma 2,
f[x{0},...,x{n},t] = f^{(n+1)}(z) / (n+1)!
for some z in the range from the smallest value among x{j},t to the largest
Finally, substitute x for t.
Let us use this theorem to understand the error in approximating
f(x) by a polynomial p(x). Let us carefully list the stages of the
overall procedure:
1) Given x{0},...,x{n}, compute the f(x{i})
2) Compute the divided differences using the Divided Difference Algorithm
3) Evaluate the polynomial p(x)
This careful listing shows that there are actually 4 sources error in
approximating f(x) by p(x):
1) roundoff in evaluating f(x{i}) - this is O(1e-16) for our examples
and so we will ignore it
2) roundoff in running the divided difference algorithm - we will see
that this can be important for large n, and is shown in green in
the plots below
3) roundoff in evaluating the polynomial p(x) - this can also be
important for large n, and is shown in red in the plots below.
4) approximating error from the Polynomial Approximation Error Theorem
this is the usually the dominant error for small n, and sometimes all n
Let us consider the 4th source of error; how do we estimate it?
Note that from our Theorem we can write this source of error as
4a) f^(n+1)(z)/(n+1)! prod (j=0 to n) (x-x{j})
so if we happen to know a bound on f^(n+1)(z), we can multiply this by
the rest ( 1/(n+1)! prod (j=0 to n) (x-x{j}) ), which is shown in magenta
in the plots below
Also note from Lemma 1 that we can write this source of error as
4b) f[x{0},...,x{n},x] prod {j=0 to n} (x-x{j})
and we can simply compute f[x{0},...,x{n},x] for several randomly
chosen x and take the largest (10 in this example);
this means computing just one more
column of the divided difference table;
This error estimate is show in black in the plots
Also, the plots below show the true error in blue.
Example: Let's consider the error in interpolating f(x)=sin(x) by a
polynomial on n equally spaced points in [0,2*pi]; Note that all
derivatives of f(x) are either +-sin(x) or +-cos(x), and so all
bounded by 1. This means that we expect our error factor 4a) (magenta)
to closely match the error
n=5, [c,err,N,NE,ef]=NewtonInterpEbnd('sin',0:2*pi/(n-1):2*pi,1);ef
This routine does the interpolation and plots several error measures:
blue: actual error |f(x)-p(x)|
Error factor 4a) (magenta) and estimate 4b) (black)
closely match the true error, with
the roundoff many orders of magnitude smaller
n=6; [c,err,N,NE,ef]=NewtonInterpEbnd('sin',0:2*pi/(n-1):2*pi,1);ef
This is similar, but shows that the true error is "accidentally
zero" at x=0, so we overestimate the true error
n=11; [c,err,N,NE,ef]=NewtonInterpEbnd('sin',0:2*pi/(n-1):2*pi,1);ef
Again, error factor 4a and estimate 4b agree closely with the true error
n=17; [c,err,N,NE,ef]=NewtonInterpEbnd('sin',0:2*pi/(n-1):2*pi,1);ef
Now error estimate 4b) is 1 to 2 orders of magnitude too large,
where error factor 4a is accurate. This is because of roundoff in
computing the divided difference table. Note that
in exact arithmetic the (n+2)-nd column should be f^(n)/n! <= 1/n!
But ef(1) = 1.87e-14 is the largest number in the (n+2)-nd column,
whereas 1/n! is ef(3)=2.8e-15. Our estimate of the roundoff error
in ef(1) is ef(2)=1.5e-13, 55x larger. (Note: numbers depend
on computer and version of Matlab).
n=19; [c,err,N,NE,ef]=NewtonInterpEbnd('sin',0:2*pi/(n-1):2*pi,1);ef
Now the roundoff in evaluating p(x) dominates for x near pi.
Otherwise error factor 4a is accurate
n=25; [c,err,N,NE,ef]=NewtonInterpEbnd('sin',0:2*pi/(n-1):2*pi,1);ef
Roundoff in p(x) (red) dominates except for x near 0 or 2*pi.
Near 0 and 2*pi, roundoff in computing the divided differences (green)
dominates. Without this roundoff, the magenta line shows that
we'd expact a much smaller error
n=51; [c,err,N,NE,ef]=NewtonInterpEbnd('sin',0:2*pi/(n-1):2*pi,1);ef
axis([0 2*pi 1e-70 1]
Roundoff (red,green) dominates again. Without it, we'd expect
50 decimal places of accuracy (magenta)
Example: Let's consider the error in interpolating f(x)=1/(1+x^2) by a
polynomial on n equally spaced points in [-2,2]; In contrast
to sin(x), the n-th derivative of f(x) can get quite large, around n!.
n=6; [c,err,N,NE,ef]=NewtonInterpEbnd('ft1',-2:4/(n-1):2,1);ef
Now factor 4a (in magenta) is too small,
and estimate 4b (in black) is accurate.
Roundoff is unimportant
n=10; [c,err,N,NE,ef]=NewtonInterpEbnd('ft1',-2:4/(n-1):2,1);ef
same as with n=6
n=20; [c,err,N,NE,ef]=NewtonInterpEbnd('ft1',-2:4/(n-1):2,1);ef
more of the same
How to choose the x{i}: notice in these plots how the magenta curve,
which is proportional to prod (i=0 to n) (x-x{i}),
which really gives the basic shape of the error, is larger at the
end points than the middle. This leads us to ask if there is a
better choice of x{i} that makes the error smaller overall.
Example: Let's try the following set of n+1 points:
xmin = 0; xmax = 2*pi;
nn=0:n;c=sort(cos((2*nn+1)*pi/(2*n+2)));x=(c+1)*(xmax-xmin)/2 + xmin;
Note that nn consists of integers from 0 to n;
(2*nn+1)*pi/(2*n+2) consists of equally spaced numbers from
pi/(2*n+2) to pi*(1 - 1/(2*n+2)),
c consists of the cosines of these numbers, and
x consists of x shifted and scaled to go from xmin to xmax
Let's compare interpolating sin(x) at equally space points to these points
n=5,10,15,20, cheb - just plot prod {i=0 to n} (x-x{i}) for
x uniformly spaced, and x above set of points.
n=17, [c,err,N,NE,ef]=NewtonInterpEbnd('sin',0:2*pi/(n-1):2*pi,1);ef
n=17, [c,err,N,NE,ef]=NewtonInterpEbnd('sin',x,1);ef
Two observations:
1) The actual error goes is about 1e-13 or less, vs 1e-9
2) The error is more nearly constant for all x, rather than
small in the middle. The error factor prod {i=0 to n} (x-x{i})
has the same peaks around 1e-12, all across the x axis.
In fact, we will show that this choice of x(i) are "best possible",
in the sense of making the value of
max (xmin <= x <= xmax) | prod {i=0 to n} (x-x{i}) |
as small as possible
Example: n=20; [c,err,N,NE,ef]=NewtonInterpEbnd('ft1',-2:4/(n-1):2,1);
nn=0:n;c=sort(cos((2*nn+1)*pi/(2*n+2)));c=2*x;
[c,err,N,NE,ef]=NewtonInterpEbnd('ft1',x,1);
Chebyshev Polynomials:
Define the sequence of polynomials
T{0}(x) = 1
T{1}(x) = x
T{j}(x) = 2*x*T{j-1}(x) - T{j-2}(x)
(e.g. T{2}(x) = 2*x*x - 1 = 2*x^2 - 1,
T{3}(x) = 2*x*(2*x^2-1) - x = 4*x^3 - 3*x, etc.)
Theorem: Chebyshev polynomails have the following properties:
1) T{j}(x) = cos(n*arccos x) when |x| <= 1
2) The roots of T{j+1}(x) are
z{i} = cos((2*i+1)*pi/(2*j+2)) for i=0,..,j
3) |T{j}(x)| <= 1 if |x| <= 1
4) T{j+1}(x) = 2^j*x^j + lower order terms
5) T*{j+1}(x) = 2^(-j)*T{j+1}(x) is a monic polynomial and satisfies
|T*{j+1}(x)| <= 2^(-j) if |x| <= 1
6) If p(x) is any monic polynomial of degree j+1, then
max {|x| <= 1} |p(x)| >= 2^(-j).
7) Let [xmin,xmax] be any interval, and let
x{i} = (z{i}+1)*(xmax-xmin)/2 + xmin for i=0,...,j
Then of all possible choices of interpolation points in
[xmin,xmax], these minimize the largest value of the
factor of the interpolation error:
2^(-j) = || prod {i=0 to j} (x-x{i}) ||_inf
Proof:
1) If |x| <= 1, write x = cos t, so we are trying to prove
T{j}(cos t) = cos (j*t)
We will use induction and trig identities.
It is clearly true for j=0 and j=1. Assume it is true
for j-1, j-2,..., 0. Then
T{j}(cos t) = 2*cos(t)*cos((j-1)*t) - cos((j-2)*t)
= 2*cos(t)*[cos(j*t)*cos(t)+sin(j*t)*sin(t)]
-[cos(j*t)*cos(2*t)+sin(j*t)*sin(2*t)]
= 2*cos(t)^2*cos(j*t) + 2*sin(j*t)*cos(t)*sin(t)
-[cos(j*t)*cos(t)^2 - cos(j*t)*sin(t)^2
+sin(j*t)*2*sin(t)*cos(t)]
= cos(j*t)*(cos(t)^2+sin(t)^2)
= cos(j*t) as desired.
2) cos((j+1)*t)=0 when (j+1)*t = pi*(i+ 1/2) or
t = (2*i+1)*pi/(2*j+2) or
x = cos((2*i+1)*pi/(2*j+2))
3) |cos(j*t)| <= 1
4) By the recurrence, the highest order term of T{j}(x) is
2*x*highest order term of T{j-1}(x)
5) Follows from 3) and 4)
6) We do proof by contradiction: Assume there is a monic degree j+1
polynomial p(x) such that |p(x)| < 2^{-j} for |x|<=1.
Let v{i} = cos(i*pi/(j+1)), i=0,...,j+1
T*{j+1}(v{i}) = cos(i*pi)/2^j = (-1)^i/2^j
Then r(x)= T*{j+1}(x)-p(x) has the following properties
1) r(v{i}) is nonzero with sign (-1)^i, same as T*, since
|T*{j+1}(v{i})| = 1/2^j > |p(v{i})|, for i=0,...,j+1
2) Since r(x) change sign j+1 times (in each interval [v{i},v{i+1}],
i = 0,...,j), it has at least j+1 zeros
3) r(x) has degree j, since leading terms of T* and p cancel
4) By 2) and 3), r(x) must be identically zero, a contradiction
7) Homework!
3) The last notion of approximation by a polynomial
is a variation on the last one:
We are given n triples:
(x1,f(x1),f'(x1)), (x2,f(x2),f'(x2)), ... , (xn,f(xn),f'(xn))
and ask for a polynomial p(x) that matches f(x) and its first
derivative f'(x) at x1,..,xn:
p(xi) = f(xi), p'(xi) = f'(xi) for i=1,...,n
This is called "Hermite interpolation", and results in simultaneously
making p(x) approximate f(x) and p'(x) approximate f'(x).
We will again show that there is a simple algorithm for it. Since we
are asking p(x) to "better match" f(x), we will need to use a higher
degree polynomial than n-1. Indeed, one can ask more generally that
p(xi) = f(xi), p'(xi) = f'(xi), ... , p^(ki)(xi) = f^(ki)(xi)
i.e. that p(x) matches the first ki derivative of f(x) at xi.
One special case of this is n=1, in which case we are asking for
the Taylor expansion. Thus we see that the Taylor expansion is
one special case of a large variety of interpolating polynomials,
that match the value of a polynomial and some of its derivative
at one or more points. All of the algorithms and error bounds generalize
to this case, and we will not do it.
This completes our discussion of interpolating function values by a
single polynomial.
Next topic:
Interpolating function values using "piecewise polynomials", i.e.
different polynomial on each interval (x{i},x{i+1}) - called "splines".
Each such polynomial will be low degree, which keeps it cheap to evaluate
and prevents the wild oscillation we saw high degree Newton polynomials exhibit.
The simplest low degree polynomial is a constant, i.e.
p(x) = f(x{i}) for x{i} <= x < x{i+1}
This is not even continuous. The next best polynomial is a linear
one, gotten by "connecting the dots" (xi,f(xi)) by straight lines:
p(x) = f(x{i}) + (x-x{i})/(x{i+1}-x{i}) * (f(x{i+1})-f(x{i}))
This is continuous, but its first derivative is clearly not continuous
at the xi. We can continue with piecewise quadratics, and it turns out that
the extra degree of freedom we get is enough to make the first
derivative continuous at the xi, but not the second derivative.
Generally, suppose we use polynomials of degree k on each interval.
Then it turns out we can pick the polynomials to
1) interpolate the data (xi,f(xi))
2) have k-1 continuous derivatives on the whole interval
These polynomial approximations are called splines.
The most popular spline is the cubic spline, but there are
many others. Matlab has a "spline toolbox" of many types of splines.
We stated above that splines have the advantage of always being smooth,
as compared to using a single polynomial. In fact, we will show that the
one kind of spline is actually the "smoothest" possible function that
interpolates the points (xi,f(xi)). Here smooth means having a certain
derivative as small as possible (detail later).
For example, consider the example, 1/(1+x^2), for n=20, and the Runge Effect:
[c,err,N,NE,ef]=NewtonInterpEbnd('ft1',-2:4/(n-1):2,1);
Splines do not have this problem:
[xout,yout,err]=SplineFit('ft1',-2:4/(n-1):2,1);
shows that the Runge effect is not present. On the other hand, the error
decreases more slowly with n than interpolation:
[errN,errH,errS]=CompareInterp('sin','sinH',0,2*pi,(2:2:50),1);
[errN,errH,errS]=CompareInterp('ft1','ft1H',-2,2,(2:4:70),1);
We derive the equations for a cubic spline on the points
(x{i},y{i}), i=0,...,n. Let h{i} = x{i+1}-x{i} be the distance between
adjacent points x{i}. Let S{i}(x) be the unknown cubic polynomial
on the interval (x{i},x{i+1}), i=0,...,n-1. A cubic polynomial has
4 coefficients, so we have 4*n unknown coefficients to compute.
This means we need 4*n conditions to determine them uniquely. Insisting
that each S{i}(x) interpolate its endpoints (x{i},y{i}) and
(x{i+1},y{i+1}) gives 2*n conditions, 2 per polynomial. Insisting
that the first derivatives are continuous at internal points x{1},...,x{n-1}
(*) S{i}'(x{i}) = S{i-1}'(x{i})
gives n-1 more conditions. We get another n-1 conditions from
having continuous second derivatives at the same points:
(**) S{i}''(x{i}) = S{i-1}''(x{i})
This adds up
to 4*n-2 conditions, giving us 2 more to determine. We have some freedom
of choice here. We'll do something simple here, called a "natural spline".
To derive the coefficients of the S{i}(x), we start with the fact
that the second derivative of the overall spline is
continuous, i.e.
S{i}''(x{i}) = S{i-1}''(x{i}) = z{i}, i=1,...,n-1
where we have introduced the unknown quantities z{i}.
Our plan is to express each S{i}(x) explicitly in terms of the
z{i} and z{i+1}, and then show that the z{j}, j=1,...,n-1 satisfy
a system of linear equations that we can solve.
Since S{i} is cubic, its second derivative S''{i}(x)is linear, so the
two conditions S{i}''(x{i}) = z{i} and S{i}''(x{i+1}) = z{i+1}
determine S''{i}(x) uniquely as
S{i}''(x) = z{i}/h{i}*(x{i+1} - x) + z{i+1}/h{i}*(x - x{i})
(this is obviously a straight line interpolating (x{i},z{i}) and
(x{i+1},z{i+1}).
Integrating twice to get S{i}(x) yields
S{i}(x) = z{i}/(6*h{i}) *(x{i+1} - x)^3 +
z{i+1}/(6*h{i})*(x - x{i})^3 +
C(x-x{i}) + D(x{i+1}-x)
where C and D are 2 constants of integration, to be determined.
We use the two interpolation conditions
S{i}(x{i}) = y{i} and S{i}(x{i+1}) = y{i+1}
to determine the two unknowns C and D. We claim that the solutions
of these two linear equations in 2 unknowns yields
S{i}(x) = z{i}/(6*h{i}) *(x{i+1} - x)^3 +
z{i+1}/(6*h{i})*(x - x{i})^3 +
(y{i+1}/h{i} - z{i+1}h{i}/6) * (x-x{i}) +
(y{i}/h{i} - z{i}*h{i}/6 ) * (x{i+1}-x)
Plugging in x=x{i} and x=x{i+1} shows that S{i}(x) is equal
to y{i} and y{i+1} as desired.
It remains to figure out the z{i}s. We have one last condition to use,
continuity of the first derivative (*). Substituting the above expression
into (*) yields
S{i}'(x{i}) = -h{i}/3 * z{i} - h{i}/6 * z{i+1} - y{i}/h{i} + y{i+1}/h{i}
= h{i-1}/6 * z{i-1} + h{i-1}/3 * z{i} - y{i-1}/h{i-1}
+ y{i}/h{i-1}
= S{i-1}'(x{i})
or
(***) h{i-1}*z{i-1} + 2*(h{i} + h{i-1})*z{i} + h{i}*z{i+1}
= 6/h{i} * (y{i+1} - y{i}) - 6/h{i-1} * (y{i} - y{i-1})
for i=1,...,n-1 (the internal points). This give us n-1 equations in
n+1 unknowns z{0},z{1},...,z{n}. At this point we could pick z{0} and
z{n} to be anything we like, leaving us with n-1 equations and n-1 unknowns:
[ u{1} h{1} ] [ z{1} ] [ v{1} - h{0}*z{0} ]
[ h{1} u{2} h{2} ] [ z{2} ] [ v{2} ]
[ h{2} u{3} h{3} ] [ z{3} ] [ v{3} ]
[ h{3} u{4} h{4} ] * [ z{4} ] = [ v{4} ]
[ .... ] [ ... ] [ ... ]
[ h{n-3} u{n-2} h{n-2} ] [ z{n-2} ] [ v{n-2} ]
[ h{n-2} u{n-1} ] [ z{n-1} ] [ v{n-1} - h{n-1}*z{n}]
where u{i} = 2*(h{i}+h{i-1}) and v{i} = RHS of (***).
We will pick z{0}=z{n}=0, a so-called "natural spline" (good reason given
below). This leaves us with a simple "tridiagonal" system of linear
equations to solve, for which we use Gaussian elimination. If we ignore
all the zeros in the matrix, Gaussian elimination would normally cost
(2/3)n^3 operations, but the special structure of this matrix lets us
solve in O(n) time instead. We will defer discussion of such linear
algebra details until later in the course.
Now we prove the "optimal smoothness" of this spline. We measure smoothness
with the second derivative. Here two motivations: First, S''(x) measure how fast
the first derivative S'(x) can change, which intuitively measures how fast
the curve can "change directions", or how rough it is. Second, recall that
the "curvature" of a curve y=S(x) is defined by
S''(x)/sqrt(1+(S'(x))^2)^3,
which is bounded above by S''(x).
Theorem: Suppose x{0} < x{1} < ... < x{n}, and let S(x) be the natural cubic
spline interpolating f(x) at these points. Then
integral (from x{0} to x{n}) (S''(x))^2 dx <=
integral (from x{0} to x{n}) (f''(x))^2 dx <=
In other words, the "average" second derivative of S(x) is as small
as possible, among all functions passing interpolating (x{i},y{i})
Proof: Let g(x) = f(x) - S(x). Note that g(x{i}) = 0. Then
integral (f'')^2 = integral (g''+S'')^2
= integral (g'')^2 + integral (S'')^2 + integral g''*S''
= I1 + I2 + I3
To prove the Theorem, we need to show
integral (f'')^2 >= I2
This will be true if I1 >= 0 and I3 >= 0.
Clearly I1 >= 0 because the integrand is >= 0, so it remains to show
I3 >= 0. In fact, we will show I3 = 0:
I3 = integral (from x{0} to x{n}) g''(x)*S''(x) dx
= sum (from i=0 to n-1) integral (from x{i} to x{i+1}) g''(x)*S''(x)dx
= sum (from i=0 to n-1) integral (from x{i} to x{i+1}) S''(x) dg'(x)
preparation for integration by parts
= sum (from i=0 to n-1) g'(x{i+1})*S''(x{i+1}) - g'(x{i})*S''(x{i})
- sum (from i=0 to n-1) integral (from x{i} to x{i+1}) g'(x) dS''(x)
integration by parts
= g'(x{n})*S''(x{n}) - g'(x{0})*S''(x{n})
- sum (from i=0 to n-1) integral (from x{i} to x{i+1}) c{i}*g'(x)dx
First sum: telescopes
Second sum: dS''(x) = S'''(x)dx = c{i}dx, since S'''(x) is
a constant c{i} on each interval
= 0
- sum (from i=0 to n-1) c{i} * ( g(x{i+1}) - g(x{i}) )
First two terms:
because S''(x{n}) = z{n} = 0 and S''(x{0}) = z{0} = 0,
by our choice of "natural spline"
Last term: by doing integration
= 0 since g(x{i}) = 0 by construction.
This completes our discussion of polynomial approximation
(important topics omitted: functions of 2 or more variables
important in graphics, CAD
algorithms for "best approximation",
i.e. minimizing || f(x)-p(x) ||_infty,
used in algorithms for sin(x), exp(x) etc.
(Remez algorithm)
least squares: wait for linear algebra)
Fourier Series: approximating periodic functions)