Notes for Math 221, Lecture 3, Sep 3 2009 Goals: Finish matrix and vector norms geometric interpretation of condition numbers sensitivity of solving Ax=b Matrix and vector norms Recall: Def norm : Let B be linear space R^n (or C^n). It is normed if there is a function ||.||:B -> R s.t. (1) ||x|| >= 0, and ||x|| = 0 iff x=0 (positive definiteness) (2) ||c*x|| = |c|*||x|| (homogeneity) (3) ||x+y|| <= ||x|| + ||y|| (triangle inequality) Ex: p-norms, inf-norm, C-norm ||C*x|| where C nonsingular Lemma (1.4): all norms are equivalent i.e. given any norm_a(x) and norm_b(x), there are nonzero constants alpha and beta such that alpha*norm_a(x) <= norm_b(x) <= beta*norm_b(x) (proof: compactness) This lemma is an excuse to use the easiest norm to prove later results. Def: matrix norm (vector norm on mxn vectors) (1) ||A|| >= 0, and ||A|| = 0 iff A=0 (positive definiteness) (2) ||c*A|| = |c|*||A|| (homogeneity) (3) ||A+B|| <= ||A|| + ||B|| (triangle inequality) Ex: max, Frobenius norms Def: operator norm norm(A) = max_x nonzero norm(A*x)/norm(x) Lemma (1.6): operator norm is a matrix norm proof: homework Q 1.15 Lemma (1.7): if norm(A) operator norm, there exists x such that norm(x)=1 and norm(A*x)=norm(A). proof: norm(A) = max_x nonzero norm(A*x)/norm(x) = max_x nonzero norm(A * x/norm(x) ) = max_y unit norm(A * y ) y attaining this maximum exists since norm(A*y) is a continuous function of y on compact (closed and bounded) set = unit ball Notation: Q^* = conj(Q^T), sometimes Q^H Def: orthogonal matrix: Q square, real and inv(Q) = Q^T unitary matrices: Q square, complex, and inv(Q) = Q^* Fact: Q orthogonal <=> Q^T*Q = I <=> all columns mutually orthogonal, unit vectors, Q*Q^T = I implies same about rows Fact: norm(Q*x,2)=norm(x,2) - Pythagorean theorem norm(Q*x,2)^2 = (Q*x)^T*(Q*x) = x^T*Q^T*Q*x = x^T*x = norm(x,2)^2 Lemma (most proofs in homework, Q1.16) norm(A*x) <= norm(A)*norm(x) for vector and its operator norm norm(A*B) <= norm(A)*norm(B) for operator norm norm(Q*A*Z,2) = norm(A,2) if Q, Z orthogonal (Pythagorean theorem - proof for Q only) norm(Q,2)=1 norm(A,2) = sqrt(lambda_max(A^T*A)) proof of last part only: norm(A,2) = max_x nonzero norm(A*x)/norm(x) = max_x nonzero sqrt( (A*x)^T*(A*x) )/ sqrt(x^T*x) = sqrt ( max_x nonzero (A*x)^T*(A*x) / (x^T*x) ) = sqrt ( max_x nonzero x^T*A^T*A*x / x^T*x ) Use fact that A^T*A is symmetric and so has eigenvalue decomposition A^T*A*q_i = l_i * q_i where l_i real, q_i real, unit, orthogonal thus l_i = l_i * q_i^T * q_i = q_i^T A^T A q_i = (A q_i)^T(A q_i) >= 0 and A^T*A*Q = Q*Lambda where Q = [q_1,...,q_n], Lambda = diag(l_1,...,l_n) so A^T*A = Q*Lambda*Q^T and continuing from above norm(A,2) = sqrt ( max_x nonzero x^T*Q*Lambda*Q^T*x / x^T*x ) = sqrt ( max_x nonzero x^T*Q*Lambda*Q^T*x / x^T*Q*Q^T*x ) = sqrt ( max_y nonzero y^T*Lambda*y / y^T*y ) where y = Q^T*x = sqrt ( max_y nonzero sum_i y_i^2 l_i / sum_i y_i^2 ) <= sqrt ( max_y nonzero l_max * sum_i y_i^2 / sum_i y_i^2 ) = l_max, attained by choosing y_max = 1, rest 0 norm(A^T,2) = norm(A,2) Now for a common property of condition numbers, that uses norms This is a geometric property of condition numbers that is true for many (and in an approximate sense all) of the condition numbers we encounter. Simple case: evaluating scalar function |f(x+dx) - f(x)| / |f(x)| ~ |f'(x)*x|/|f(x)| * |dx|/|x| rel change of output f(x) ~ cond(x) * rel change of input x Def: A problem x is "ill-posed" if cond(x) = inf ASK: for evaluating scalar function f(x), what is {ill-posed problems}? ANS: f(x) = 0, simple zero, try f(x) = x-1 at 1 f(x) = 0, multiple zero , try f(x) = (x-1)^2 at 1 f(x) = inf, simple pole, try f(x) = 1/(x-1) at 1 Suppose cond(x) is very large, intuition is that must x be close to {ill-posed problems}, how close? Newton's Method for simple root => if f(x) tiny enough, root ~ x - f(x)/f'(x) so root - x ~ -f(x)/f'(x) where f(root) = 0, i.e. root in {ill-posed} so rel dist to {ill-posed} ~ |(root - x)/x| ~ |-f(x)/(f'(x)*x)| = 1 / cond(x) or cond(x) ~ 1/ relative distance to nearest ill-posed problem Ex: if f(x) = prod_i (x-r(i)) then cond(x) = |f'(x)*x/f(x)| = | sum_i x/(x-r(i)) | so if x very close to one r(j), then cond(x) ~ |x/(x-r(j))| = 1/relative distance from x to r(j) Is also true for polynomial evaluation at fixed x: what is the function? f(coefficients) = value of polynomial at x what is {ill-posed}? polynomials whose value at x is zero from earlier analysis in class, cond = sum_i |a(i)*x^i| / | sum_i a(i)*x^i | See Thm 1.2 in text for proof Is also true for matrix inversion: what is {ill-posed}? {singular matrices} - we'll prove this more carefully Now start using this material to analyze matrix inversion Lemma: If operator norm(X)<1, then I-X is nonsingular and inv(I - X) = sum_{i>=0} X^i and norm(inv(I-X)) <= 1/(1-norm(X)) proof: claim I + X + X^2 + ... converges: norm(X^i) <= norm(X)^i -> 0 as i increases so by equivalence of norms max_{kj} | (X^i)_kj | <= C* norm(X)^i and so kj-th entry of I + X + X^2 ... dominated by convergent geometric sequence and so converges claim (I - X)*( I + X + X^2 + ... + X^i) = I - X^(i+1) -> I as i increases next norm(I + X + X^2 + ...) <= norm(I) + norm(X) + norm(X^2) + ... by triangle <= norm(I) + norm(X) + norm(X)^2 + ... by Lemma = 1 + norm(X) + norm(X)^2 + ... by def of operator norm = 1/(1 - norm(X)) ... geometric sum What does this say for 1x1 matrices? Lemma: Suppose A invertible. Then A-E invertible if norm(E) < 1/norm(inv(A)) in which case inv(A-E) = Z + Z(EZ) + Z(EZ)^2 + Z(EZ)^3 + ... where Z=inv(A) and norm(A-E) <= norm(Z)/(1-norm(E)*norm(Z)) proof: inv(A-E) = inv((I-E*Z)*A) = Z*inv(I-E*Z) exists if I-E*Z invertible i.e. if norm(E*Z) < 1 i.e. if norm(E)*norm(Z) < 1 in which case inv(A-E) = Z*(1+EZ + (EZ)^2 + ...) take norms What does this say for 1x1 matrices? Finally, we can ask how much inv(A) and inv(A+E) differ. Lemma: Suppose A invertible. If norm(E) < 1/norm(Z), then norm(inv(A-E)-Z) <= norm(Z)^2*norm(E)/(1-norm(E)*norm(Z)) proof: inv(A-E)-Z = Z(EZ) + Z(EZ)^2 + ... = ZEZ ( I + EZ + (EZ)^2 + ...) and take norms relative change in inv(A) is norm(inv(A-E)-Z)/norm(Z) <= [ norm(A)*norm(Z) * 1/(1 - norm(E)*norm(Z)) ] * [ norm(E)/norm(A) ] = [ norm(A)*norm(Z) } * [ norm(E)/norm(A) ] + O(norm(E)^2) What does this say for 1x1 matrices? Thus justifies the following definition DEF condition number kappa(A) = norm(A)*norm(Z) Fact: kappa(A) >= 1. proof: 1 = norm(I) = norm(A*Z) <= norm(A)*norm(Z) Theorem: min{norm(E)/norm(A): A+E singular} = dist(A, {singular matrices}) = 1/kappa(A) proof: lemma above shows inv(A+E) exists if norm(E) < 1/norm(Z), or norm(E)/norm(A) < 1/kappa(A), so dist >= 1/kappa(A) Still need to show that there exists E such that A+E singular, with norm 1/kappa(A). We'll just do the 2-norm: Need to make I-ZE singular, i.e. choose E so that norm(E)=1/norm(Z) and norm(ZE) large as possible. Let's try unit x,y so norm(Zx)=norm(Z) and y=Zx/norm(Zx) then E = x*y^T/norm(Z). Then (A-E)y = A(I-ZE)y = A(y-Zx*y^T*y/norm(Z)) = A(y-y)=0 and norm(E) = norm(x*y^T)/norm(Z) = norm(x)*norm(y)/norm(Z) = 1*1/norm(Z) We've looked at sensitivity of inv(A), now look at solving A*x=b Now consider A*x = b vs (A-E)*x' = b+f where x' = x+dx subtract to get A*dx - E*x - E*dx = f (A-E)dx = f + E*x dx = inv(A-E)*(f + E*x) norm(dx) = norm(inv(A-E)*(f + E*x)) <= norm(inv(A-E))*(norm(f) + norm(E)*norm(x)) <= norm(Z)/(1-norm(E)*norm(Z))*(norm(f) + norm(E)*norm(x)) norm(dx)/norm(x) <= norm(Z)*norm(A) * 1/(1-norm(E)*norm(Z)) *( norm(f)/(norm(A)*norm(x) + norm(E)/norm(A)) <= norm(Z)*norm(A) * 1/(1-norm(E)*norm(Z)) *( norm(f)/(norm(b) + norm(E)/norm(A)) relerr in x <= kappa(A) * something close to 1 unless A nearly singular * (rel change in b + rel change in A) Our algorithms will attempt to guarantee that (*) computed solution of A*x=b is (A-E)*x' = b+f where norm(f)/norm(b) = O(macheps) and norm(E)/norm(A) = O(macheps) so relerr in x ~ O( kappa(A)*macheps ) Recall that property (*) is called "backward stability" Another practical approach: given x', how accurate a solution is it? compute residual r = A*x'-b = A*x'-A*x = A*(x'-x) = A*error so error = inv(A)*r and norm(error) <= norm(inv(A))*norm(r) norm(r) also determines the backward error in A Theorem: The smallest E such that (A+E)x' = b has norm(E) = norm(r)/norm(x') proof: r = A*x' - b = -E*x' so norm(r) <= norm(E)*norm(x') to attain lower bound on norm(E), choose E = -r*x'^T/norm(x')^2, in 2 norm All our practical error bounds depend on norm(inv(A)). To actually compute inv(A) costs several times as much as just solving A*x=b, so we will use cheap, estimates of norm(inv(A)) that avoid computing inv(A) explicitly, and work with small probability of large error. In fact there is a theorem (D., Diament, Malajovich, 2000) that says that estimating kappa(A) even roughly (but still with some guarantee, say "to within a factor of 10", or wihtin a factor of any constant) is as about expensive as multiplying matrices, which in turn is about as expensive as inverting them.