Math 128a - Spring 2002 - Lecture Notes on Approximation and Interpolation Read sections 6.1 through 6.4 in text. Idea: given a complicated function f(x), replace it by a simpler one that approximates it Motivation: 1) Solving f(x) = 0. We have seen that approximating f(x) by a straight line (or linear function in more variables) is the basis of Newton and other good methods. 2) Evaluating f(x). This refers to the library routines all computers have for sin(x), exp(x), log(x), etc. We considered exp(x) in a homework assignment, and saw that it reduced to evaluating exp(y) for 0 <= y <= 1, for which we used a Taylor expansion, i.e. polynomial. It also refers to graphics, where we want to plot f(x); in this case we may want an algorithms that is both cheap (since we have to evaluate at many point, perhaps in "real time") and smooth (so it looks good). It is also essential in CAD = Computer Aided Design, where you design physical objects, and determine their shapes by picking a few value of f(x) and fill f(x) in "smoothly" in between. 3) Integrating f(x) from a to b. If f(x) were a polynomial, or piecewise polynomial, this would be easy, so in general we approximate f(x) by a (piecewise) polynomial and integrate this appoximation. 4) Solving ordinary differential equations (ODEs) dx/dt = f(x). If f(x) is a constant f(x) = a, then the ODE is easy to solve explicitly: x(t) = x(t0) + a*(t-t0), where x(t0) is the value of the solution at t0. We again want to exploit this to solve more complicated ODEs. 5) Solving partial differential equations (PDEs). In addition to the same idea as for ODEs, Finite Element Methods (popular in civil and mechanical engineering), use polynomials on little patches called "finite elements" to approximate the PDE. What functions are we going to use for our approximations? This depends both on what computers can effectively implement, and the properties of the functions we want to approximate. First consider what computers can effectively implement. Since computers have +,- and * built in, polynomials are obvious candidates. Division make rational functions possible, but for some operations (like integration) they usually offer no advantage over polynomials (since they are harder to integrate explicitly), and division is usually several times more expensive than addition or multiplication. Next, the ability to have branches (if-then-else) means that we can have "piecewise" polynomials (or rational functions), i.e. functions that use different polynomials for different intervals of x if (a <= x < b) then y = f1(x) if (b <= x < c) then y = f2(x) ... We will concentrate on polynomials and piecewise polynomials in this course. Second, let us briefly consider the properties of the functions we want to evaluate. There is a Theorem of Fourier which approximately states that every periodic function can be written as an infinite sum of sines and cosines, called a Fourier series. Finite sums of sines and cosines are frequently used to approximate periodic functions, which typically arise in signal processing. The algorithm used for this is called the Fast Fourier Transform, or FFT. Besides sines and cosines, another very recent class of functions used for approximation are wavelets, see Math 118. We will not have time for this in this class. To make the notation more precise, we want to compute a polynomial (or piecewise polynomial) p(x) that somehow approximates another function f(x) on an interval a <= x <= b. We need to define more carefully how we measure how well p(x) approximates f(x), because there are several possible definitions, all of which are reasonable and useful in different situations. First, we present a limitation on how well polynomials can approximate anything. Suppose we were interested in approximating f(x) by a polynomial p(x) for all real x. Write p(x) = sum from i=0 to n c(i)*x^i. when |x| is very large, we can write p(x) = c(n)*x^n * [ 1 + sum from i=0 to n-1 (c(i)/c(n)) * x^(i-n) ] c(n)*x^n * [ 1 + E(x) ] When |x| is large, E(x) -> 0 like 1/x, so p(x) ~ c(n) x^n. Thus, all polynomials go to infinity like a power of x as |x| -> infinity. Thus we cannot hope to approximate any function f(x) for all real x that does not go to infinity this way. For example sin(x) remains bounded for all x, so we cannot expect to approximate it by a polynomial for all x. Instead, in all our approximations we can only ask that p(x) approximate f(x) on a finite interval I = [a,b], or perhaps even only a finite number of points x{0},x{1},...,x{n-1}. Now we talk about our notions of error and approximation. Here is a summary: 1) We start with the most natural definition of approximation, finding a polynomial p*(x) of a given degree that most closely approximates a continuous function f(x) everywhere in an interval [a,b]. Definition: || g(x) ||_inf = max (a <= x <= b) |g(x)| is called the "infinity norm" or "maximum norm" of g on [a,b] Definition: The error of approximation of polynomial p(x) for a function f(x) on [a,b] is || p(x) - f(x) ||_inf Definition: The best approximating polynomial p*(x) of degree at most n satisifies || p*(x) - f(x) ||_inf = min (all polynomials q of degree at most n) || q(x) - f(x) ||_inf Note that p*(x) does not have to have degree n, just at most n. For example, if f(x) = x, then p*(x)=x is the best approximation for n >= 1 We will eventually prove the Weierstrass Approximation Theorem: If f(x) is continuous on [a,b], then || p*(x) - f(x) ||_inf goes to 0 as the degree of p*(x) goes to infinity. Why is this not just a restatement of Taylor's Theorem? What about f(x) = abs(x) on [-1,1]? It turns out that computing the best approximation p*(x) of a given degree n is very expensive, and so we don't do it very often (It is used for functions like sin(x) or exp(x) that are in the computer's math library, and only need to be determined once. The name of the algorithm for this is the Remez algorithm, which we will not cover.) 2) The second notion of approximation is very widely used, and easy to compute. We are given n points in the plane (x{0},f(x{0})), (x{1},f(x{1})), ... , (x{n-1},f(x{n-1})) where all the x{i} are distinct, and ask for a polynomial p(x) that "interpolates these points" i.e. satisfies p(x{i}) = f(x{i}), for i=0,...,n-1 How can we compute this polynomial? We will see that the degree of p will be at most n-1, and there is a simple O(n^2) algorithm to compute it. The algorithm will be called the "Newton divided-difference formula", or simply "Newton interpolation." So if we only measure the error |p(x) - f(x)| at the points x{i}, it is zero by construction. But we expect to evaluate p(x) at other points, and want a bound on |p(x)-f(x)| there too. We will derive one, and it will look similar to the error bound for a Taylor expansion: f(x) - p(x) = [ f^{(n+1)}(eta) / (n+1)! ] * product (i=0 to n) (x-x{i}) where a <= eta <= b But there is one important difference with respect to the Taylor expansion depending on the choice of x{i} and f(x), the error does NOT necessarily get smaller as the degree increases, even if f(x) has a convergent Taylor expansion around every point in [a,b]. In fact, the error may increase once n gets large enough, sometimes with wild oscillations in the error f(x)-p(x). We will look at several examples, and derive a general bound on the error || p(x) - f(x) ||_inf, and analyze it. 3) The third notion of approximation is a variation on the last one: We are given n triples: (x{0},f(x{0}),f'(x{0})), ... , (x{n-1},f(x{n-1}),f'(x{n-1})) and ask for a polynomial p(x) that matches f(x) and its first derivative f'(x) at x{0},..,x{n-1}: p(xi) = f(xi), p'(xi) = f'(xi) for i=0,...,n-1 This is called "Hermite interpolation", and results in simultaneously making p(x) approximate f(x) and p'(x) approximate f'(x). We will again show that there is a simple algorithm for it. Since we are asking p(x) to "better match" f(x), we will need to use a higher degree polynomial than n-1. Indeed, one can ask more generally that p(x{i}) = f(x{i}), p'(x{i}) = f'(x{i}), ... , p^(k{i})(x{i}) = f^(k{i})(x{i}) i.e. that p(x) matches the first k{i} derivative of f(x) at x{i} for i=0 to n-1. One special case of this is n=1, in which case we are asking for the Taylor expansion. Thus we see that the Taylor expansion is one special case of a large variety of interpolating polynomials, that match the value of a polynomial and some of its derivative at one or more points. All the algorithms and error bounds for approximations of type 2) generalize to this case, though we will not have time to cover this. 4) The fourth notion uses piecewise polynomials. The data is again (x{0},f(x{0})), (x{1},f(x{1})) ,..., (x{n-1},f(x{n-1})) where n is fairly large. We want to use a (different) low degree polynomial on each interval (x{i},x{i+1}). The low degree will keep the evaluation cheap, and prevent the wild oscillations interpolation polynomials might exhibit (examples later). The simplest low degree polynomial is a constant, i.e. p(x) = f(x{i}) for x{i} <= x < x{i+1} (picture) This is not even continuous. The next best polynomial is a linear one, gotten by "connecting the dots" (x{i},f(x{i})) by straight lines. (picture) This is continuous, but its first derivative is clearly not at the x{i}. If we use a higher degree polynomials in each interval (x{i},x{i+1}), then we can make the function smoother, by making its derivatives continuous at the x{i}. Such piecewise polynomials are called splines. The most popular spline is the cubic spline, which has continuous second derivatives. but there are many others. They are widely used in graphics and Computer Aided Design. Matlab has a "spline toolbox" of many types of splines. Now we go back and do the details. 1) We start with perhaps the most natural definition of approximation. We recall that the error of approximation of any polynomial p is || p(x) - f(x) ||_inf = max (a <= x <= b) |p(x) - f(x)| This is perhaps the most natural definition of error: the largest error at any point in the interval [a,b]. The goal is to compute a polynomial p*(x) of degree at most n, such that || p*(x) - f(x) ||_inf = min (polynomials q of degree <= n) || q(x)-f(x) ||_inf Here is the reason for limiting the degree to be at most n: Weierstrass Approximation Theorem: If f(x) is any continuous function, then the error || p*(x) - f(x) ||_inf decreases to 0 as we increase n. In other words, any continuous function on [a,b] can be approximated arbitrarily closely by a polynomial. There is a proof of this result is in the book; we will do a different one here. This means that if we want to decrease the error, we can just use a higher degree polynomial. But the higher the degree, the more expensive it is to find and evaluate p*. It turns out to be complicated and expensive to compute p*, requiring running a Newton-like iterative algorithm (called the Remez algorithm), so this kind of approximation is usually only used for computing the approximations used for sin(x), exp(x), etc. in the computer's math library, since they need only be computed once and for all. Proof of the Weierstrass Approximation Theorem: We claim that Lemma 1: We may assume without loss of generality that [a,b] = [0,1], f(x) = 0 for x outside [0,1], and f(0) = f(1) = 0. Proof: Homework! Define the polynomial Q_n(x) = c_n (1-x^2)^n where c_n > 0 is a positive constant chosen so that integral from -1 to 1 of Q_n(x) dx = 1 for all n Show plot of Q_n(x) for large n Lemma 2: c_n < sqrt(n) Proof: Homework! Now set p_n(x) = integral from -1 to 1 f(x+t)Q_n(t) dt We claim 1) p_n(x) is a polynomial 2) || p_n(x) - f(x) ||_inf goes to 0 as n -> infinity To prove 1), change the integral to integral from -x to 1-x f(x+t)Q_n(t) dt since f(z) is zero outside [0,1] = integral from 0 to 1 f(z)Q_n(z-x) dz by changing the variable of integration to z = x+t Now Q_n(z-x) = c_n (1-(z-x)^2)^n is a polynomial in z and x; write it as Q_n(z-x) = sum (i=0 to 2n) q_i(z)*x^i, where q_i(z) is a polynomial in z. Substitute into the integral to get integral from 0 to 1 f(z) sum q_i(z)*x^i dz = sum [ integral from 0 to 1 f(z) q_i(z) dz ] x^i which is obviously a polynomial in x, where the integrals are the coefficients To prove 2) we need Lemma 3: There is a constant M such that |f(x)| <= M for all 0 <= x <= 1 For every epsilon > 0, there is a 0 < delta < 1 such that |x-y| < delta implies |f(x)-f(y)| < epsilon Proof of Lemma 3: This follows from continuity of f(x) on the closed bounded interval [0,1]: f(x) is bounded and uniformly continuous. Now choose any eta > 0. We want to show that when n is large enough, the error | p_n(x) - f(x) | is at most eta. Let epsilon = eta / 2, choose delta according to Lemma 3, and write | p_n(x) - f(x) | = | integral from -1 to 1 f(x+t)Q_n(t) dt - f(x) | = | integral from -1 to 1 [f(x+t)-f(x)]Q_n(t) dt | since integral from -1 to 1 Q_n(t) dt = 1 <= integral from -1 to 1 |[f(x+t)-f(x)]Q_n(t)| dt since taking absolute values makes the area under the curve bigger = integral from -1 to 1 |f(x+t)-f(x)|Q_n(t) dt since Q_n(t) >= 0 = integral from -1 to -delta (same integrand) + integral from -delta to +delta (same integrand) + integral from +delta to 1 (same integrand) = I1 + I2 + I3 (the three integrals in the last line) We will show that if n is large enough then I2 <= eta / 2, and I1 and I3 <= eta/4, so their sum is <= eta as desired. First consider I2. The integrand is bounded by eta/2 Q_n(t) by Lemma 3, so I2 <= integral from -delta to delta (eta/2) Q_n(t) dt <= integral from -1 to 1 (eta/2) Q_n(t) dt since delta < 1 and Q_n(t) is nonnegative = (eta/2) since integral from -1 to 1 Q_n(t) dt = 1 Now consider I1. The integrand is bounded by |f(x+t)-f(x)|Q_n(t) <= 2M Q_n(t) since |f(x+t)-f(x)| <= |f(x+t)|+|f(x)| <= 2M <= 2M c_n (1-delta^2)^n since Q_n(t) is largest when t is closest to 0 <= 2M sqrt(n) (1-delta^2)^n by Lemma 2 Thus I1 <= integral from -1 to -delta 2M sqrt(n) (1-delta^2)^n = (1-delta) 2M sqrt(n) (1-delta^2)^n As n -> infinity, this upper bound approaches 0. To see why, take its logarithm, which is const + .5*log n + n*log(1-delta^2) Since n is much larger than log n when n is large, this expression is dominated by n*log(1-delta^2) for large n, and since log(1-delta^2) < log(1) < 0, n*log(1-delta^2) goes to -infinity. So, we can take n large enough so that I1 <= eta/4. The same upper bound as for I1 works for I3, and shows that I3 <= eta/4 for n large enough. This completes the proof of the Weierstrass approximation theorem, except for Lemmas 1 and 2, which you'll prove on homework. There is a more powerful version of this theorem, called the Stone-Weierstrass Theorem, which we will talk about but not prove. In addition to polynomials on intervals like [a,b], we often need to approximate functions of more variables, or functions with properties like being periodic. Generally, one is given a closed bounded subset X of R^n (like X=[a,b] when n=1), and a set of "basis functions" g0(x)=1,g1(x),g2(x),..,gm(x), and the question is whether one can approximate any continuous function f(x) as closely as desired by using "combinations" of these basis functions. By combination we mean any polynomial like 3*g0(x) + 2*g1(x) - 5*g2(x)*g3(x), etc. For example, Weierstrass deals with the case m=1 and g1(x)=x, i.e. standard polynomials. But you might also let X = unit circle (where the angle theta is the coordinate), where continuous functions f(theta) mean periodic functions with period 2*pi, and let m=2 with g1(x) = cos(x) and g2(x) = sin(x) as the bases. There turns out to be a very simple criterion for the set {g1(x),...,gm(x)} of basis functions to satisfy that guarantees that any continuous function can be approximated by a polynomial in them. Suppose that there were two distinct points y and z in X where gi(y) = gi(z) for all i. Then any function of the gi(x) will also obviously have the same value at y and z. This means that we clearly could not approximate any continuous function that had different values at y and z. So a trivial necessary condition to be able to approximate arbitrary continuous functions is that for any distinct pair y and z, some gi(x) satisfies gi(y) != gi (z). Amazingly, this necessary condition is also sufficient: Thm (Stone-Weierstrass). Let X be a closed bounded subset of R^n, and {g0(x)=1,g1(x),...,gm(x)} a set of continuous functions with the property that for any distinct y and z in X, there is a gi(x) such that gi(y) != gi(z). Then any continuous function f(x) on X can be approximated arbitrarily closely by a polynomial in the gi(x). Example: Let X = [a,b], m=1 and g1(x)=x. Since y != z means g1(y) != g1(z), we get than any continuous function on [a,b] can be approximate arbitrarily closely by a polynomial in x. This is the version of Weierstrass's Thm that we proved. Example: Let X = any closed bounded subset of R^n. Let m=n and gi(x) = x{i}, the i-th coordinate of x. Since y != z means they differ in at least one coordinate, Stone-Weierstrass applies, and we get that we can approximate any continuous function on X with polynomials in the coordinates x{1},...,x{n}. Example: Let X = unit circle in R^2, with coordinate theta, and g1(x) = cos(x) and g2(x) = sin(x). Clearly y != z means that their coordinates on the circle (g1(x),g2(x)) differ. This means that all continuous 2*pi-periodic functions can be approximated arbitrarily closely by polynomials in sin(x) and cos(x). Since sin(x)^m can be written (using trig identities) as a linear combination of sin(x),sin(2x),...,sin(m*x),cos(x),cos(2x),...,cos(m*x) this means that any continuous 2*pi-periodic functions can be arbitrarily well approximated by a linear combination of these sin(x),...,cos(m*x). This is called a Fourier expansion. Exercise: Modify our proof of the Weierstrass Approximation Thm for polynomials to work for Fourier expansions. Hint: replace 1-x^2 by cos(pi*x/2) This completes our discussion of approximation in the sense of minimimzing || p(x) - f(x) ||_inf. 2) Recall that for the second notion of approximation we are given n points in the plane (x{0},f(x{0})), (x{1},f(x{1})), ... , (x{n-1},f(x{n-1})) where all the x{i} are distinct, and ask for a polynomial p(x) that "interpolates these points" i.e. satisfies p(x{i}) = f(x{i}), for i=0,...,n-1 We will see how to compute this polynomial by an algorithm called the "Newton divided-difference formula", and analyze the error || p(x) - f(x) ||_inf . So if we only measure the error |p(x) - f(x)| at the points x{i}, it is zero by construction. But we expect to evaluate p(x) at other points, and want a bound on |p(x)-f(x)| there too. We will derive one, and it will look similar to the error bound for a Taylor expansion. But there is one important difference: depending on the choice of x{i} and f(x), the error does NOT necessarily get smaller as the degree increases. In fact, it often increases once n gets large enough. Let's look at some examples. Example: f(x) = x^3 - 3*x^2 + x - 2, x = -1 : 4/(n-1) : 3, n = 2,3,4,5,10 [c,err,NewtonTable]=NewtonInterp('ft',-1:4/(n-1):3,1); err Note: after n = 4, result same up to roundoff, because 4 points determine a cubic uniquely Example: f(x) = sin (x), x = 0 : 2*pi/(n-1) : 2*pi, n = 2,3,4,5,6,8,10,15,20,25,50,60,70 [c,err,NewtonTable]=NewtonInterp('sin',0:2*pi/(n-1):2*pi,1); err Note how error decreases as n increases We can also plot error versus the number of interpolation points: for n=2:70, ... [c,errN(n),NewtonTable] = ... NewtonInterp('sin',0:2*pi/(n-1):2*pi,0); ... end, semilogy(errN) Note that the error decreases for a while (up to n~20) and then increases. The red-curve shows the round off error in evaluating the polynomial, so we see that the error is due in some places to roundoff, but not in all (see curve for n=70). Example: Let us try plotting the degree-20 polynomial on a wider interval: n=20; x = 0:2*pi/(n-1):2*pi; [c,err,NewtonTable]=NewtonInterp('sin',x,0); err [y,ebnd]=NewtonEval(c,x,-2*pi:.001:4*pi,1); Note that outside the interval of approximation [0,2*pi], it does not approximate sin(x) very well. Example: Let us try different sample points: n = 30, 40 nn=0:n;x=sort(cos((2*nn+1)*pi/(2*n+2)));x=(x+1)*pi; plot(x) [c,err,NewtonTable]=NewtonInterp('sin',x,1) So we see that choosing the interpolation points can be important, because it can make the error much smaller (when n=30, 1e-14 vs 1e-11). We will justify this choice of points later. Example: f(x) = 1/(1+x^2), x = -2:4/(n-1):2 (n equally spaced points between -2 and 2), n=5,10,15,20,30 [c,err,Newtontable]=NewtonInterp('ft1',-2:4/(n-1):2,1);err The error shrinks in the middle of the interval [-2,2] but grows near the borders. This lack of convergence is called the "Runge effect" after its discoverer. Example: Contrast with Hermite, Splines: n=10, [c,err,N]=NewtonInterp('sin',0:2*pi/(n-1):2*pi,1); err n=10, [c,err,N]=HermiteInterp('sinH',0:2*pi/(n-1):2*pi,1); err n=10, [xout,yout,err]=SplineFit('sin',0:2*pi/(n-1):2*pi,1); err N = (2:50)';[errN,errH,errS]=CompareInterp('sin','sinH',0,2*pi,N,1); Conclusions: Hermite most accurate, but then error starts increasing, Newton next, then error starts increasing Spline least, but error always goes down, smooth n=10, [c,err,N]=NewtonInterp('ft1',-2:4/(n-1):2,1); err n=10, [c,err,N]=HermiteInterp('ft1H',-2:4/(n-1):2,1); err n=10, [xout,yout,err]=SplineFit('ft1',-2:4/(n-1):2,1); err N = (5:5:75)';[errN,errH,errS]=CompareInterp('ft1','ft1H',-2,2,N,1); Now the Spline is most accurate for n >= 10 Now we talk about algorithms for computing an interpolation polynomial, and compute bounds on the error || p(x) - f(x) ||_inf Simple Algorithm 1 (not recommended for use in practice!) Suppose n=4, so we seek c0,c1,c2,c3 such that c0 + c1*x{i} + c2*x{i}^2 + c3*x{i}^3 = f(x{i}) for i=0,1,2,3 This is a system of 4 linear equations in c0,c1,c2,c3: [ 1 x0 x0^2 x0^3 ] [ c0 ] = [ f(x0) ] [ 1 x1 x1^2 x1^3 ] * [ c1 ] = [ f(x1) ] [ 1 x2 x2^2 x2^3 ] [ c2 ] = [ f(x2) ] [ 1 x3 x3^2 x3^3 ] [ c3 ] = [ f(x3) ] The coefficient matrix has a very special structure, and is called a Vandermonde matrix. We could imagine solving this system for c0,c1,c2,c3 using Gaussian elimination. For general n, the matrix has (j,k)-th entry x{j}^(k-1), and the right-hand-side vector as j-th entry f(x{j}). Solving using Gaussian elimination straightforward, but turns out to be much more expensive (O(n^3) vs O(n^2)) and less accurate than our ultimate algorithm below. It is also not immediately clear why this matrix is nonsingular, which is a necessary and sufficient condition to get a unique solution for all f(x). This will follow from other expressions for ci below. Simple Algorithm 2 (not recommended for use in practice!) Lagrange Interpolation. Again consider n=4. Consider the 4 special polynomials of degree 3 L0(x) = (x-x1)*(x-x2)*(x-x3)/[ (x0-x1)*(x0-x2)*(x0-x3) ] L1(x) = (x-x0)* *(x-x2)*(x-x3)/[ (x1-x0)* *(x1-x2)*(x1-x3) ] L2(x) = (x-x0)*(x-x1)* *(x-x3)/[ (x2-x0)*(x2-x1)* *(x2-x3) ] L3(x) = (x-x0)*(x-x1)*(x-x2) /[ (x3-x0)*(x3-x1)*(x3-x2) ] Note that L0(x) = 0 when x=x1,x2,x3, and L0(x0)=1 Similarly Li(x) = 0 when x = xj, j != i, and Li(xi)=1 Example: Lagrange([1 2 3 4],1) Lagrange([1 1.5 2 4],1) Lagrange([1 1.1 1.2 4],1) - note the huge polynomial values near x=3, which mean that we might get a large error there Thus p(x) = sum (j=0 to 3) f(x{j})*L{j}(x) has two properties 1) It is a polynomial of degree 3 2) p(x{i}) = sum (j=0 to 3) f(x{j})*L{j}(x{i}) = f(x{i})*L{i}(x{i}) = f(x{i}) since all other terms vanish so it is the interpolating polynomial we want. (we will prove its uniqueness later). Written in matrix form as in Algorithm 1, we are again asking for coefficients c0,c1,c2,c3 such that [ L1(x0) L2(x0) L3(x0) L4(x0) ] [ c0 ] = [ f(x0) ] [ L1(x1) L2(x1) L3(x1) L4(x1) ] * [ c1 ] = [ f(x1) ] [ L1(x2) L2(x2) L3(x2) L4(x2) ] [ c2 ] = [ f(x2) ] [ L1(x3) L2(x3) L3(x3) L4(x3) ] [ c3 ] = [ f(x3) ] But now we have defined the polynomial Li(x) so that the coefficient matrix is the "identity matrix" (ones on the diagonal, zeros off the diagonal) so that ci = f(x{i}). In the general case, L{i}(x) = prod (j = 0 to n-1 except i) [(x-x{j})/(x{i}-x{j})] which is 0 at x = x{j} (j != i) and is 1 at x=x{i} Algorithm 3: Newton Divided Difference Formula We pick a "basis set" of polynomials p1(x),p2(x),... 1, x, x^2,... in Algorithm 1 L0(x),L1(x),L2(x),... in Algorithm 2 p0(x),p1(x),p2(x),... here and solve a linear system of equations for coefficients c0,c1,... so that p(x) = sum (i=0 to n-1) ci*pi(x) satisfies p(x{i}) = f(x{i}) Our preferred algorithm uses the basis set p{0}(x) = 1 p{1}(x) = (x-x{0}) p{2}(x) = (x-x{0})*(x-x{1}) p{3}(x) = (x-x{0})*(x-x{1})*(x-x{2}) ... p{i}(x) = prod (j=0 to i-1) (x-x{j}) and expresses p(x) = sum i=0 to n-1 c{i}*p{i}(x). Note that p{i}(x{k}) = 0 if i > k, since p{i}(x) has a factor (x-x{k}) Thus the linear system [ p0(x0) p1(x0) p2(x0) p3(x0) ] [ c0 ] = [ f(x0) ] [ p0(x1) p1(x1) p2(x1) p3(x1) ] * [ c1 ] = [ f(x1) ] [ p0(x2) p1(x2) p2(x2) p3(x2) ] [ c2 ] = [ f(x2) ] [ p0(x3) p1(x3) p2(x3) p3(x3) ] [ c3 ] = [ f(x3) ] has zeros above the main diagonal, i.e. it is "lower triangular" [ p0(x0) 0 0 0 ] * [ c0 ] = [ f(x0) ] [ p0(x1) p1(x1) 0 0 ] [ c1 ] = [ f(x1) ] [ p0(x2) p1(x2) p2(x2) 0 ] [ c2 ] = [ f(x2) ] [ p0(x3) p1(x3) p2(x3) p3(x3) ] [ c3 ] = [ f(x3) ] This lets us solve for c0 = f(x0)/p0(x0) c1 = [ f(x1) - p0(x1)*c0 ]/p1(x1) c2 = ... by "substitution" Here is a simpler and more flexible approach. First note that c{i} only depends on x{0},...,x{i-1} and f(x{0}),...,f(x{i-1}), because of the form of the triangular linear system. Here are the first two values of c: c{0} = f(x{0}) c{1} = [ f(x{1}) - f(x{0}) ] / [ x{1} - x{0} ] This justifies the following Definition: c{i} = f[x{0},x{1},...,x{i}] is called the i-th divided difference of f at x{0},x{1},...,x{i} A look at c{1} explains the term "divided difference". This means that we can write, for example, that the polynomial that interpolates f(x) at x=x0,x1,x2 can be written p(x) = f[x0] + f[x0,x1]*(x-x0) + f[x0,x1,x2]*(x-x0)*(x-x1) To make the the notation clear, we can also express the polynomial that interpolates f(x) at x=x2,x3,x4 as p(x) = f[x2] + f[x2,x3]*(x-x2) + f[x2,x3,x4]*(x-x2)*(x-x3) The next theorem about divided-differences justifies our simple algorithm for computing them: Theorem: f[z{0},z{1},...,z{n}] = [ f[z{1},...,z{n}] - f[z{0},...,z{n-1} ] / [ z{n} - z{0}] Lemma: The interpolating polynomial at z{0},...,z{n} is unique, assuming the z{i} are distinct. Proof of Lemma: Assume there are two degree n polynomials p(x) and q(x) that interpolate f(x) at the distinct points z{0},..,z{n}. We must show that r(x) = p(x) - q(x) = 0. First, note that r(z{0}) = r(z{1}) = ... = r(z{n}) = 0, so r(x) has at least n+1 distinct zeros. But since it is a polynomial of degree n, it can have at most n distinct zeros, unless it is identially zero. Proof of Theorem: Let p_n(x) be the polynomial that interpolates f at z{0},...,z{n} p_{n-1}(x) be the polynomial that interpolates f at z{0},...,z{n-1} q(x) be the polynomial that interpolates f at z{1},...,z{n} Thus p_n(x) has degree n, and p_{n-1}(x) and q(x) have degrees n-1. We claim that p_n(x) = (x-z{0})/(z{n}-z{0})*q(x) + (z{n}-x)/(z{n}-z{0})*p_{n-1}(x) To see why, note that 1) The RHS (right hand side) is a polynomial of degree n 2) At x=z{i}, i = 1,...,n-1, RHS RHS = (z{i}-z{0})/(z{n}-z{0})*f(z{i}) + (z{n}-z{i})/(z{n}-z{0})*f(z{i}) since q(z{i}) = p_{n-1}(z{i}) = f(z{i}) = (z{n}-z{0})/(z{n}-z{0})*f(z{i}) = f(z{i}) 3) At x=z{0}, RHS = 0 + (z{n}-z{0})/(z{n}-z{0})*f(z{0}) = f(z{0}) 4) At x=z{n}, RHS = (z{n}-z{0})/(z{n}-z{0})*f(z{n}) + 0 = f(z{n}) Thus the RHS is an interpolating polynomial, and by the Lemma, must equal p_n(x). In particular, the coefficient of x^n is the same in both polynomials. The coefficient of x^n in p_n(x) is defined to be f[z{0},...,z{n}]. The coefficient of x^n in RHS is 1/(z{n}-z{0})*(coeff of x^{n-1} of q(x)) - 1/(z{n}-z{0})*(coeff of x^{n-1} of p_{n-1}(x)) = [ f[z{1},...,z{n}] - f[z{0},...,z{n-1}] ]/(z{n}-z{0}) as desired. This theorem immediately jusitifes the following algorithm. We use an array c(j,k) to store c(j,k) = f[x{j},x{j+1},x{j+2},...,x{j+k}], Divided-Difference Algorithm: for j = 0 to n, c(j,0) = f(x{j}), end for k = 1 to n, for j = 0 to n-k c(j,k) = [ c(j+1,k-1) - c(j,k-1) ] / [ x{j+k} - x{j} ] end end Then f[x{0},...,x{i}] = c(0,i) is the first row of the table. The algorithm has the attractive feature of giving a table of c(j,k)'s that defines many interpolating polynomials, namely those interpolating f() at x{j} through x{j+k}, for any 0 <= j <= j+k <= n (use c(j,0) through c(j,k) instead of c(0,0) through c(0,n)), as well as error bounds as described below. Lemma 1: Let p(t) be the interpolation polynomial for f(t) at x{0},..,x{n}. Then we can write the error as f(t) - p(t) = f[x{0},...,x{n},t] prod {j=0 to n} (t-x{j}) Proof: If t = x{j}, both sides are zero, so assume t does not equal any x{j}. Let q(x) be the polynomial that interpolates f(x) at x{0},...,x{n},t, where t is a new point. Then by the properties of Newton Interpolation and divided differences q(x) = p(x) + f[x{0},...,x{n},t] prod {j=0 to n} (x-x{j}) Substitute x=t and q(t)=f(t) to get f(t) = p(t) + f[x{0},...,x{n},t] prod {j=0 to n} (t-x{j}) Lemma 2: f[x{0},...,x{n}] = f^{(n)}(z) / n! for some z in [min(x{i}),max(x{i})] Proof: Let p(x) be the polynomial interpolating f(x) at x{0},...,x{n}. We know that p(x) has degree n with highest term f[x{0},...,x{n}]*x^n Now consider g(x) = f(x)-p(x). This function has zeros at n+1 points, x{0},...,x{n} (which we assume for simplicity are sorted x{0} < x{1} < ... < x{n}) By Rolle's Theorem, g'(x) has n distinct zeros in the intervals (x{0},x{1}), (x{1},x{2}), ... , (x{n-1},x{n}) Applying Rolle's Theorem again, g''(x) has n-1 distinct zeros, g'''(x) has n-2 distinct zeros, and finally g^{(n)}(x) has one distinct zero, call it z. Thus f{(n)}(z) = p^{(n)}(z) = n! f[x{0},...,x{n}], since all the lower powers of p(x) disappears when differentiated n times. Dividing by n! give the result. Polynomial Approximation Error Theorem: Let p(x) interpolate f(x) at x{0},...,x{n}. Then the error is given by f(x) - p(x) = f^{(n+1)}(z)/(n+1)! prod {j=0 to n} (x-x{j}) where z is in the range [min(x{i},x),max(x{i},x)] Proof: By Lemma 1, f(t) - p(t) = f[x{0},...,x{n},t] prod {j=0 to n} (t-x{j}) By Lemma 2, f[x{0},...,x{n},t] = f^{(n+1)}(z) / (n+1)! for some z in the range from the smallest value among x{j},t to the largest Finally, substitute x for t. Let us use this theorem to understand the error in approximating f(x) by a polynomial p(x). Let us carefully list the stages of the overall procedure: 1) Given x{0},...,x{n}, compute the f(x{i}) 2) Compute the divided differences using the Divided Difference Algorithm 3) Evaluate the polynomial p(x) This careful listing shows that there are actually 4 sources error in approximating f(x) by p(x): 1) roundoff in evaluating f(x{i}) - this is O(1e-16) for our examples and so we will ignore it 2) roundoff in running the divided difference algorithm - we will see that this can be important for large n, and is shown in green in the plots below 3) roundoff in evaluating the polynomial p(x) - this can also be important for large n, and is shown in red in the plots below. 4) approximating error from the Polynomial Approximation Error Theorem this is the usually the dominant error for small n, and sometimes all n Let us consider the 4th source of error; how do we estimate it? Note that from our Theorem we can write this source of error as 4a) f^(n+1)(z)/(n+1)! prod (j=0 to n) (x-x{j}) so if we happen to know a bound on f^(n+1)(z), we can multiply this by the rest ( 1/(n+1)! prod (j=0 to n) (x-x{j}) ), which is shown in magenta in the plots below Also note from Lemma 1 that we can write this source of error as 4b) f[x{0},...,x{n},x] prod {j=0 to n} (x-x{j}) and we can simply compute f[x{0},...,x{n},x] for several randomly chosen x and take the largest (10 in this example); this means computing just one more column of the divided difference table; This error estimate is show in black in the plots Also, the plots below show the true error in blue. Example: Let's consider the error in interpolating f(x)=sin(x) by a polynomial on n equally spaced points in [0,2*pi]; Note that all derivatives of f(x) are either +-sin(x) or +-cos(x), and so all bounded by 1. This means that we expect our error factor 4a) (magenta) to closely match the error n=5, [c,err,N,NE,ef]=NewtonInterpEbnd('sin',0:2*pi/(n-1):2*pi,1);ef This routine does the interpolation and plots several error measures: blue: actual error |f(x)-p(x)| Error factor 4a) (magenta) and estimate 4b) (black) closely match the true error, with the roundoff many orders of magnitude smaller n=6; [c,err,N,NE,ef]=NewtonInterpEbnd('sin',0:2*pi/(n-1):2*pi,1);ef This is similar, but shows that the true error is "accidentally zero" at x=0, so we overestimate the true error n=11; [c,err,N,NE,ef]=NewtonInterpEbnd('sin',0:2*pi/(n-1):2*pi,1);ef Again, error factor 4a and estimate 4b agree closely with the true error n=17; [c,err,N,NE,ef]=NewtonInterpEbnd('sin',0:2*pi/(n-1):2*pi,1);ef Now error estimate 4b) is 1 to 2 orders of magnitude too large, where error factor 4a is accurate. This is because of roundoff in computing the divided difference table. Note that in exact arithmetic the (n+2)-nd column should be f^(n)/n! <= 1/n! But ef(1) = 1.87e-14 is the largest number in the (n+2)-nd column, whereas 1/n! is ef(3)=2.8e-15. Our estimate of the roundoff error in ef(1) is ef(2)=1.5e-13, 55x larger. (Note: numbers depend on computer and version of Matlab). n=19; [c,err,N,NE,ef]=NewtonInterpEbnd('sin',0:2*pi/(n-1):2*pi,1);ef Now the roundoff in evaluating p(x) dominates for x near pi. Otherwise error factor 4a is accurate n=25; [c,err,N,NE,ef]=NewtonInterpEbnd('sin',0:2*pi/(n-1):2*pi,1);ef Roundoff in p(x) (red) dominates except for x near 0 or 2*pi. Near 0 and 2*pi, roundoff in computing the divided differences (green) dominates. Without this roundoff, the magenta line shows that we'd expact a much smaller error n=51; [c,err,N,NE,ef]=NewtonInterpEbnd('sin',0:2*pi/(n-1):2*pi,1);ef axis([0 2*pi 1e-70 1] Roundoff (red,green) dominates again. Without it, we'd expect 50 decimal places of accuracy (magenta) Example: Let's consider the error in interpolating f(x)=1/(1+x^2) by a polynomial on n equally spaced points in [-2,2]; In contrast to sin(x), the n-th derivative of f(x) can get quite large, around n!. n=6; [c,err,N,NE,ef]=NewtonInterpEbnd('ft1',-2:4/(n-1):2,1);ef Now factor 4a (in magenta) is too small, and estimate 4b (in black) is accurate. Roundoff is unimportant n=10; [c,err,N,NE,ef]=NewtonInterpEbnd('ft1',-2:4/(n-1):2,1);ef same as with n=6 n=20; [c,err,N,NE,ef]=NewtonInterpEbnd('ft1',-2:4/(n-1):2,1);ef more of the same How to choose the x{i}: notice in these plots how the magenta curve, which is proportional to prod (i=0 to n) (x-x{i}), which really gives the basic shape of the error, is larger at the end points than the middle. This leads us to ask if there is a better choice of x{i} that makes the error smaller overall. Example: Let's try the following set of n+1 points: xmin = 0; xmax = 2*pi; nn=0:n;c=sort(cos((2*nn+1)*pi/(2*n+2)));x=(c+1)*(xmax-xmin)/2 + xmin; Note that nn consists of integers from 0 to n; (2*nn+1)*pi/(2*n+2) consists of equally spaced numbers from pi/(2*n+2) to pi*(1 - 1/(2*n+2)), c consists of the cosines of these numbers, and x consists of x shifted and scaled to go from xmin to xmax Let's compare interpolating sin(x) at equally space points to these points n=5,10,15,20, cheb - just plot prod {i=0 to n} (x-x{i}) for x uniformly spaced, and x above set of points. n=17, [c,err,N,NE,ef]=NewtonInterpEbnd('sin',0:2*pi/(n-1):2*pi,1);ef n=17, [c,err,N,NE,ef]=NewtonInterpEbnd('sin',x,1);ef Two observations: 1) The actual error goes is about 1e-13 or less, vs 1e-9 2) The error is more nearly constant for all x, rather than small in the middle. The error factor prod {i=0 to n} (x-x{i}) has the same peaks around 1e-12, all across the x axis. In fact, we will show that this choice of x(i) are "best possible", in the sense of making the value of max (xmin <= x <= xmax) | prod {i=0 to n} (x-x{i}) | as small as possible Example: n=20; [c,err,N,NE,ef]=NewtonInterpEbnd('ft1',-2:4/(n-1):2,1); nn=0:n;c=sort(cos((2*nn+1)*pi/(2*n+2)));c=2*x; [c,err,N,NE,ef]=NewtonInterpEbnd('ft1',x,1); Chebyshev Polynomials: Define the sequence of polynomials T{0}(x) = 1 T{1}(x) = x T{j}(x) = 2*x*T{j-1}(x) - T{j-2}(x) (e.g. T{2}(x) = 2*x*x - 1 = 2*x^2 - 1, T{3}(x) = 2*x*(2*x^2-1) - x = 4*x^3 - 3*x, etc.) Theorem: Chebyshev polynomails have the following properties: 1) T{j}(x) = cos(n*arccos x) when |x| <= 1 2) The roots of T{j+1}(x) are z{i} = cos((2*i+1)*pi/(2*j+2)) for i=0,..,j 3) |T{j}(x)| <= 1 if |x| <= 1 4) T{j+1}(x) = 2^j*x^j + lower order terms 5) T*{j+1}(x) = 2^(-j)*T{j+1}(x) is a monic polynomial and satisfies |T*{j+1}(x)| <= 2^(-j) if |x| <= 1 6) If p(x) is any monic polynomial of degree j+1, then max {|x| <= 1} |p(x)| >= 2^(-j). 7) Let [xmin,xmax] be any interval, and let x{i} = (z{i}+1)*(xmax-xmin)/2 + xmin for i=0,...,j Then of all possible choices of interpolation points in [xmin,xmax], these minimize the largest value of the factor of the interpolation error: 2^(-j) = || prod {i=0 to j} (x-x{i}) ||_inf Proof: 1) If |x| <= 1, write x = cos t, so we are trying to prove T{j}(cos t) = cos (j*t) We will use induction and trig identities. It is clearly true for j=0 and j=1. Assume it is true for j-1, j-2,..., 0. Then T{j}(cos t) = 2*cos(t)*cos((j-1)*t) - cos((j-2)*t) = 2*cos(t)*[cos(j*t)*cos(t)+sin(j*t)*sin(t)] -[cos(j*t)*cos(2*t)+sin(j*t)*sin(2*t)] = 2*cos(t)^2*cos(j*t) + 2*sin(j*t)*cos(t)*sin(t) -[cos(j*t)*cos(t)^2 - cos(j*t)*sin(t)^2 +sin(j*t)*2*sin(t)*cos(t)] = cos(j*t)*(cos(t)^2+sin(t)^2) = cos(j*t) as desired. 2) cos((j+1)*t)=0 when (j+1)*t = pi*(i+ 1/2) or t = (2*i+1)*pi/(2*j+2) or x = cos((2*i+1)*pi/(2*j+2)) 3) |cos(j*t)| <= 1 4) By the recurrence, the highest order term of T{j}(x) is 2*x*highest order term of T{j-1}(x) 5) Follows from 3) and 4) 6) We do proof by contradiction: Assume there is a monic degree j+1 polynomial p(x) such that |p(x)| < 2^{-j} for |x|<=1. Let v{i} = cos(i*pi/(j+1)), i=0,...,j+1 T*{j+1}(v{i}) = cos(i*pi)/2^j = (-1)^i/2^j Then r(x)= T*{j+1}(x)-p(x) has the following properties 1) r(v{i}) is nonzero with sign (-1)^i, same as T*, since |T*{j+1}(v{i})| = 1/2^j > |p(v{i})|, for i=0,...,j+1 2) Since r(x) change sign j+1 times (in each interval [v{i},v{i+1}], i = 0,...,j), it has at least j+1 zeros 3) r(x) has degree j, since leading terms of T* and p cancel 4) By 2) and 3), r(x) must be identically zero, a contradiction 7) Homework! 3) The last notion of approximation by a polynomial is a variation on the last one: We are given n triples: (x1,f(x1),f'(x1)), (x2,f(x2),f'(x2)), ... , (xn,f(xn),f'(xn)) and ask for a polynomial p(x) that matches f(x) and its first derivative f'(x) at x1,..,xn: p(xi) = f(xi), p'(xi) = f'(xi) for i=1,...,n This is called "Hermite interpolation", and results in simultaneously making p(x) approximate f(x) and p'(x) approximate f'(x). We will again show that there is a simple algorithm for it. Since we are asking p(x) to "better match" f(x), we will need to use a higher degree polynomial than n-1. Indeed, one can ask more generally that p(xi) = f(xi), p'(xi) = f'(xi), ... , p^(ki)(xi) = f^(ki)(xi) i.e. that p(x) matches the first ki derivative of f(x) at xi. One special case of this is n=1, in which case we are asking for the Taylor expansion. Thus we see that the Taylor expansion is one special case of a large variety of interpolating polynomials, that match the value of a polynomial and some of its derivative at one or more points. All of the algorithms and error bounds generalize to this case, and we will not do it. This completes our discussion of interpolating function values by a single polynomial. Next topic: Interpolating function values using "piecewise polynomials", i.e. different polynomial on each interval (x{i},x{i+1}) - called "splines". Each such polynomial will be low degree, which keeps it cheap to evaluate and prevents the wild oscillation we saw high degree Newton polynomials exhibit. The simplest low degree polynomial is a constant, i.e. p(x) = f(x{i}) for x{i} <= x < x{i+1} This is not even continuous. The next best polynomial is a linear one, gotten by "connecting the dots" (xi,f(xi)) by straight lines: p(x) = f(x{i}) + (x-x{i})/(x{i+1}-x{i}) * (f(x{i+1})-f(x{i})) This is continuous, but its first derivative is clearly not continuous at the xi. We can continue with piecewise quadratics, and it turns out that the extra degree of freedom we get is enough to make the first derivative continuous at the xi, but not the second derivative. Generally, suppose we use polynomials of degree k on each interval. Then it turns out we can pick the polynomials to 1) interpolate the data (xi,f(xi)) 2) have k-1 continuous derivatives on the whole interval These polynomial approximations are called splines. The most popular spline is the cubic spline, but there are many others. Matlab has a "spline toolbox" of many types of splines. We stated above that splines have the advantage of always being smooth, as compared to using a single polynomial. In fact, we will show that the one kind of spline is actually the "smoothest" possible function that interpolates the points (xi,f(xi)). Here smooth means having a certain derivative as small as possible (detail later). For example, consider the example, 1/(1+x^2), for n=20, and the Runge Effect: [c,err,N,NE,ef]=NewtonInterpEbnd('ft1',-2:4/(n-1):2,1); Splines do not have this problem: [xout,yout,err]=SplineFit('ft1',-2:4/(n-1):2,1); shows that the Runge effect is not present. On the other hand, the error decreases more slowly with n than interpolation: [errN,errH,errS]=CompareInterp('sin','sinH',0,2*pi,(2:2:50),1); [errN,errH,errS]=CompareInterp('ft1','ft1H',-2,2,(2:4:70),1); We derive the equations for a cubic spline on the points (x{i},y{i}), i=0,...,n. Let h{i} = x{i+1}-x{i} be the distance between adjacent points x{i}. Let S{i}(x) be the unknown cubic polynomial on the interval (x{i},x{i+1}), i=0,...,n-1. A cubic polynomial has 4 coefficients, so we have 4*n unknown coefficients to compute. This means we need 4*n conditions to determine them uniquely. Insisting that each S{i}(x) interpolate its endpoints (x{i},y{i}) and (x{i+1},y{i+1}) gives 2*n conditions, 2 per polynomial. Insisting that the first derivatives are continuous at internal points x{1},...,x{n-1} (*) S{i}'(x{i}) = S{i-1}'(x{i}) gives n-1 more conditions. We get another n-1 conditions from having continuous second derivatives at the same points: (**) S{i}''(x{i}) = S{i-1}''(x{i}) This adds up to 4*n-2 conditions, giving us 2 more to determine. We have some freedom of choice here. We'll do something simple here, called a "natural spline". To derive the coefficients of the S{i}(x), we start with the fact that the second derivative of the overall spline is continuous, i.e. S{i}''(x{i}) = S{i-1}''(x{i}) = z{i}, i=1,...,n-1 where we have introduced the unknown quantities z{i}. Our plan is to express each S{i}(x) explicitly in terms of the z{i} and z{i+1}, and then show that the z{j}, j=1,...,n-1 satisfy a system of linear equations that we can solve. Since S{i} is cubic, its second derivative S''{i}(x)is linear, so the two conditions S{i}''(x{i}) = z{i} and S{i}''(x{i+1}) = z{i+1} determine S''{i}(x) uniquely as S{i}''(x) = z{i}/h{i}*(x{i+1} - x) + z{i+1}/h{i}*(x - x{i}) (this is obviously a straight line interpolating (x{i},z{i}) and (x{i+1},z{i+1}). Integrating twice to get S{i}(x) yields S{i}(x) = z{i}/(6*h{i}) *(x{i+1} - x)^3 + z{i+1}/(6*h{i})*(x - x{i})^3 + C(x-x{i}) + D(x{i+1}-x) where C and D are 2 constants of integration, to be determined. We use the two interpolation conditions S{i}(x{i}) = y{i} and S{i}(x{i+1}) = y{i+1} to determine the two unknowns C and D. We claim that the solutions of these two linear equations in 2 unknowns yields S{i}(x) = z{i}/(6*h{i}) *(x{i+1} - x)^3 + z{i+1}/(6*h{i})*(x - x{i})^3 + (y{i+1}/h{i} - z{i+1}h{i}/6) * (x-x{i}) + (y{i}/h{i} - z{i}*h{i}/6 ) * (x{i+1}-x) Plugging in x=x{i} and x=x{i+1} shows that S{i}(x) is equal to y{i} and y{i+1} as desired. It remains to figure out the z{i}s. We have one last condition to use, continuity of the first derivative (*). Substituting the above expression into (*) yields S{i}'(x{i}) = -h{i}/3 * z{i} - h{i}/6 * z{i+1} - y{i}/h{i} + y{i+1}/h{i} = h{i-1}/6 * z{i-1} + h{i-1}/3 * z{i} - y{i-1}/h{i-1} + y{i}/h{i-1} = S{i-1}'(x{i}) or (***) h{i-1}*z{i-1} + 2*(h{i} + h{i-1})*z{i} + h{i}*z{i+1} = 6/h{i} * (y{i+1} - y{i}) - 6/h{i-1} * (y{i} - y{i-1}) for i=1,...,n-1 (the internal points). This give us n-1 equations in n+1 unknowns z{0},z{1},...,z{n}. At this point we could pick z{0} and z{n} to be anything we like, leaving us with n-1 equations and n-1 unknowns: [ u{1} h{1} ] [ z{1} ] [ v{1} - h{0}*z{0} ] [ h{1} u{2} h{2} ] [ z{2} ] [ v{2} ] [ h{2} u{3} h{3} ] [ z{3} ] [ v{3} ] [ h{3} u{4} h{4} ] * [ z{4} ] = [ v{4} ] [ .... ] [ ... ] [ ... ] [ h{n-3} u{n-2} h{n-2} ] [ z{n-2} ] [ v{n-2} ] [ h{n-2} u{n-1} ] [ z{n-1} ] [ v{n-1} - h{n-1}*z{n}] where u{i} = 2*(h{i}+h{i-1}) and v{i} = RHS of (***). We will pick z{0}=z{n}=0, a so-called "natural spline" (good reason given below). This leaves us with a simple "tridiagonal" system of linear equations to solve, for which we use Gaussian elimination. If we ignore all the zeros in the matrix, Gaussian elimination would normally cost (2/3)n^3 operations, but the special structure of this matrix lets us solve in O(n) time instead. We will defer discussion of such linear algebra details until later in the course. Now we prove the "optimal smoothness" of this spline. We measure smoothness with the second derivative. Here two motivations: First, S''(x) measure how fast the first derivative S'(x) can change, which intuitively measures how fast the curve can "change directions", or how rough it is. Second, recall that the "curvature" of a curve y=S(x) is defined by S''(x)/sqrt(1+(S'(x))^2)^3, which is bounded above by S''(x). Theorem: Suppose x{0} < x{1} < ... < x{n}, and let S(x) be the natural cubic spline interpolating f(x) at these points. Then integral (from x{0} to x{n}) (S''(x))^2 dx <= integral (from x{0} to x{n}) (f''(x))^2 dx <= In other words, the "average" second derivative of S(x) is as small as possible, among all functions passing interpolating (x{i},y{i}) Proof: Let g(x) = f(x) - S(x). Note that g(x{i}) = 0. Then integral (f'')^2 = integral (g''+S'')^2 = integral (g'')^2 + integral (S'')^2 + integral g''*S'' = I1 + I2 + I3 To prove the Theorem, we need to show integral (f'')^2 >= I2 This will be true if I1 >= 0 and I3 >= 0. Clearly I1 >= 0 because the integrand is >= 0, so it remains to show I3 >= 0. In fact, we will show I3 = 0: I3 = integral (from x{0} to x{n}) g''(x)*S''(x) dx = sum (from i=0 to n-1) integral (from x{i} to x{i+1}) g''(x)*S''(x)dx = sum (from i=0 to n-1) integral (from x{i} to x{i+1}) S''(x) dg'(x) preparation for integration by parts = sum (from i=0 to n-1) g'(x{i+1})*S''(x{i+1}) - g'(x{i})*S''(x{i}) - sum (from i=0 to n-1) integral (from x{i} to x{i+1}) g'(x) dS''(x) integration by parts = g'(x{n})*S''(x{n}) - g'(x{0})*S''(x{n}) - sum (from i=0 to n-1) integral (from x{i} to x{i+1}) c{i}*g'(x)dx First sum: telescopes Second sum: dS''(x) = S'''(x)dx = c{i}dx, since S'''(x) is a constant c{i} on each interval = 0 - sum (from i=0 to n-1) c{i} * ( g(x{i+1}) - g(x{i}) ) First two terms: because S''(x{n}) = z{n} = 0 and S''(x{0}) = z{0} = 0, by our choice of "natural spline" Last term: by doing integration = 0 since g(x{i}) = 0 by construction. This completes our discussion of polynomial approximation (important topics omitted: functions of 2 or more variables important in graphics, CAD algorithms for "best approximation", i.e. minimizing || f(x)-p(x) ||_infty, used in algorithms for sin(x), exp(x) etc. (Remez algorithm) least squares: wait for linear algebra) Fourier Series: approximating periodic functions)