Notes for Ma221 Lecture 18, Oct 29 2009

Continue Chapter 5 on symmetric eigenvalue problems and the SVD

Goals: 
   Perturbation Theory (review from Prof Kahan's guest lecture)
   Algorithms for the symmetric eigenproblem
   Algorithms for the SVD
   Connection to differential equations

Review
   We consider only real, symmetric A = A^T. The case of complex Hermitian A = A^* is similar.
   Then we know that we can write A = Q*Lambda*Q^T where Lambda = {lambda_1 , ... , lambda_n}
   with real eigenvalues, and Q = [q_1,...,q_n] consists of real orthonormal eigenvectors.
   We'll assume lambda_1 >= ... >= lambda_n.

   Note that the complex symmetric case is totally different: A = [[1 , i];[i , -1]] has
   2 eigenvalues at 0, and is not diagonalizable

   Most of what we say will also be true for the SVD of an arbitrary matrix: Recall that
   the SVD of A is simply related to the eigendecomposition of the symmetric matrix
   B = [ 0   A ], with eigenvalues of B = +- singular values of A (plus some zeros
       [ A^T 0 ]
   if A is rectangular). So a small change from A to A+E is also a small symmetric change to B.
   So if we show this doesn't change the eigenvalues of B very much (in fact by no
   more than norm(E)), then this means the singular values of A can change just as little.

   Def: Rayleigh quotient rho(u,A) = u^T*A*u/u^T*u.
   Properties: if A*q = lambda*q, then rho(q,A) = lambda
               if u = sum_i beta_i*q_i = Q*b where b = [beta_1,...,beta_n], then 
                 rho(u,A) = (Q*b)^T*A*(Q*b)/(Q*b)^T*(Q*b)
                          = b^T * Q^T*A*Q * b / (b^T * Q^t*Q * b)
                          = b^T * Lambda * b / (b^T * b)
                          = sum_i lambda_i*b_i^2 / sum_i b_i^2
                          = convex combination of {lambda_1,...,lambda_n}
               so lambda_1 >= rho(u,A) >= lambda_n for all u
               and lambda_1 = max_u rho(u,A)
                   lambda_n = min_u rho(u,A)

    In fact all the eigenvalues can be written using the Rayleigh quotient:
    Thm (Courant-Fischer minimax thm): Let R^j denote a j-dimensional subspace of 
        n-dimensional space, and S^(n-j+1) denote an n-j+1 dimensional subspace. Then
           max_{R^j}  min_{0 neq r in R^j} rho(r,A) 
         = lambda_j
         = min_{S^(n-j+1)}  min_{0 neq s in S^(n-j+1)} rho(s,A) 
    The max over R^j is attained when R^j = span(q_1,...,q_j), 
        and the min over r is attained for r = q_j.
    The min over S^(n-j+1) is attained for S^(n-j+1) = span(q_j,q_j+1,...,q_n)
        and the min over s is attained for s = q_j too.

    Proof:
       For any pair R^j and S^(n-j+1), since their dimensions add up to n+1, they must intersect 
       in some nonzero vector x_RS. So
             min_(0 neq r in R^j) rho(r,A) <= rho(x_RS,A) <= max_(0 neq s in S^(n-j+1)) rho(s,A)
       Let R' maximize min_(0 neq r in R^j) rho(r,A) and
       Let S' minimize max_(0 neq s in S^(n-j+1)) rho(s,A) so

            max_{R^j}  min_{0 neq r in R^j} rho(r,A) 
          =            min_{0 neq r in R'^j} rho(r,A) 
         <= rho(x_R'S') 
         <=                  max_{0 neq s in S'^(n-j+1)} rho(s,A) 
          = min_{S^(n-j+1)}  max_{0 neq s in S^(n-j+1)} rho(s,A) 

       But if we choose R_j = span(q_1,..,q_j) and r = q_j, we get
              min_{0 neq r in R^j} rho(r,A) = rho(q_j,A) = lambda_j 
       and if we choose S_(n-j+1) = span(q_j,..,q_n) and s = q_j, we get
              max_{0 neq s in S'^(n-j+1)} rho(s,A) = rho(q_j,A) = lambda_j
       so the above inequalities are bounded below and above by lambda_j,
       and so must all equal lambda_j.

   Weyl's Theorem:  if A is symmetric, with eigenvalues lambda_1 >= ... >= lambda_n
              and if A+E is symmetric, with eigenvalues mu_1     >= ... >= mu_n
        then | lambda_j - mu_j | <= ||E||_2 for all j

   Corollary: If A general, with singular values sigma_1 >= ... >= sigma_n, and
                         A+E has singular values tau_1   >= ... >= tau_n, then
                |sigma_j - tau_j| <= ||E||_2 for all j

   Proof of Weyl: mu_j = min_{S^(n-j+1)} max_(neq s in S^(n-j+1)) rho(s,A+E)
                       = min_{S^(n-j+1)} max_(neq s in S^(n-j+1)) s^T*(A+E)*s/s^T*s
                       = min_{S^(n-j+1)} max_(neq s in S^(n-j+1)) [ s^T*A*s/s^Ts + s^T*E*s/s^T*s ]
                      <= min_{S^(n-j+1)} max_(neq s in S^(n-j+1)) [ s^T*A*s/s^Ts + ||E||_2 ]
                       = lambda_j + ||E||_2
        swapping roles of mu_j and lambda_j, we get lambda_j <= mu_j + ||E||_2, or what we want.

   Def: Inertia(A) = (#negative eigenvalues(A), #zero eigenvalues(A), #positive eigenvalues(A))

   Sylvester's Theorem: If A is symmetric and X nonsingular, then Inertia(A) = Inertia(X^T*A*X)

   Fact: Suppose we do A = L*D*L^T (Gaussian elimination without pivoting). Then A and D have
     the same Inertia, namely (#D(i,i)<0, #D(i,i)=0, #D(i,i)>0), which is easy to compute.
     Now suppose we do A - x*I = L*D*L'; then 
         #D(i,i)<0 = #eigenvalues of A-x*I < 0 = #eigenvalues of A < x
     Similarly suppose we do A - y*I = L'*D'*L'^T for some y > x, then we can
         similarly compute #eigenvalues of A <= y. Then
          #eigenvalues in (x,y] = (#evals <= y) - (#evals < x)
     so we can count the # eigenvalues in any interval [x,y]. 
     Now suppose we count # evals < (x+y)/2; we can get the number number of evals 
       in each half of the interval. By repeatedly bisecting the interval, we can 
       can compute any subset of the eigenvalue as accurately as we like.
       We say more on how to do this cheaply later.

    Proof of Sylvester's Theorem: 
       Suppose #evals of A < 0 is m, and #evals of X^T*A*X is m', but m'= 0, or y in X*P means y^T*A*y >= 0.
       But dimension(X*P) = dimension(P) = n-m', and dimension(N)+dimension(X*P) = n-m'+m > n,
       so they intersect, i.e. there is some nonzero x in both spaces N and X*P, i.e.
       x^T*A*x < 0 and X^T*A*x >= 0, a contradiction.

    Now a little about eigenvectors: 
    Thm: Let A = Q*Lambda*Q^T be the eigendecomposition of A, with Lambda = diag(lambda_i),
         and Q = [q_1,...,q_n], and A+E = Q' * Lambda' * Q'^T be the eigendecomposition of A+E,
         with Lambda' = diag(lambda'_i), and Q' = [q'_1,...,q'_n].
         Let theta_i = angle between q_i and q'_i. We want to bound theta_i.
         Let gap(i,A) = min_{j neq i} |lambda_j - lambda_i|. then
             |.5*sin(2*theta_i)| <= ||E||_2 / gap(i,A)
         Note that when theta_i is small, .5*sin(2*theta_i) ~ theta_i
         In other words, if the perturbation ||E||_2 is small compared to the distance gap(i,A)
         to the nearest other eigenvalue, the change in direction of the eigenvector will be small.

    Proof of a weaker result: write eigenvector i of A+E as q_i + d where d is perpendicular to q_i
        (so q'_i = (q_i+d)/|| q_i+d ||_2 ). Then (A+E)*(q_i+d) = (lambda'_i)*(q_i+d)
        Ignoring second order terms we get A*q_i + E*q_i + A*d = lambda'_i*q_i + lambda'_i*d
        or (A+E-lambda'_i)*q_i = (lambda'_i-A)*d
        or (lambda_i + E - lambda'_i)*q_i = (lambda'_i - A)*d
        write d = sum_{j \neq i} d_j*q_j yielding
           (lambda_i + E - lambda'_i)*q_i = sum_{j neq i} d_j (lambda'_i-lambda_j) * q_j
        Taking norms, the LHS <= 2*||E||_2 by Weyl's Theorem
        Taking norms, the RHS >= ( gap(i,A) - ||E||)*||d||_2
        So 2*||E||_2/(gap(i,A) - ||E||) >= ||d||_2 = tan(theta_i).
        See the text for the full proof.

    Important fact about the Rayleigh Quotients: The Rayleigh Quotient is a best 
    approximation to an eigenvalue in a certain sense:
    Thm: Suppose x is a unit vector and beta a scalar. Then A has an eigenvalue alpha
         such that | alpha - beta | <= || A*x - beta*x ||_2. Given just x, the choice
         beta = rho(x,A) minimizes || A*x - beta*x ||_2. So given any unit vector x,
         there is always an eigenvalue within distance || A*x - rho(x,A)*x ||_2 of rho(x,A),
         and rho(x,A) minimizes this bound.
         Furthermore, let lambda_i be the eigenvalue of A closest to rho(x,A), and
           gap = min_{j neq i} | lambda_j - rho(x,A) |
         be the distance to the next closest eigenvalue. Then
           | lambda_i - rho(x,A) | <= || A*x - rho(x,A)*x ||^2 / gap
         i.e. the error in rho(x,A) as an approximate eigenvalue
         is proportional to the norm of the residual A*x - rho(x,A)*x squared.
              
    Proof of part 1: If x is a unit vector
        1 = ||x|| = inv(A - beta*I)*(A - beta*I) * x
        so 1 <= || inv(A - beta_I) ||_2 * || A*x - beta*x ||_2
        or  1 <= 1/min_i |lambda_i - beta| * || A*x - beta*x ||_2
     To show that beta = rho(x,A) minimizes || A*x - beta*x ||_2,
     write A*x - beta*x = [ A*x - rho(x,A)*x ] + [ rho(x,A)*x - beta*x ]
                        =        y             +      z
     If we show z is orthogonal to y, we are done:
           z^T*y = (rho(x,A) - beta) * x^T * ( A*x - rho(x,A)*x)
                 = (rho(x,A) - beta) * ( x^T*A*x - rho(x,A)*x^T*x)
                 = 0

    Partial Proof of Part 2: We do just the special case of a 2x2 diagonal matrix,
     which seems very special, but has all the ingredients of the general case.
     So we assume A = diag(lambda_1,lambda_2), and x = [c;s] where c^2+s^2=1,
     so rho =  c^2*lambda_1 + s^2*lambda_2. We'll assume c>s, so rho is closer
     to lambda_1 than lambda_2, so
         |lambda_i - rho| = |lambda_1 - c^2*lambda_1 - s^2*lambda_2|
                          = s^2*|lambda_1 - lambda_2|
         gap = |lambda_2 - rho| = |lambda_2 - c^2*lambda_1 - s^2*lambda_2|
             = c^2*|lambda_1 - lambda_2|
         r = A*x-rho*x = [ lambda_1*c ] - ( c^2*lambda_1 + s^2*lambda_2 ) * [ c ]
                         [ lambda_2*s ]                                     [ s ]
           = [ c*s^2*(lambda_1 - lambda_2) ]
           = [ s*c^2*(lambda_2 - lambda_1) ]
        and so ||r||_2 = s^2*c^2*(lambda_1-lambda_2)^2*(s^2+c^2)
                       = s^2*c^2*(lambda_1-lambda_2)^2
        and ||r||_2/gap = s^2*|lambda_1 - lambda_2|
                        = |lambda_1 - rho|
        exactly (whereas the theorem only claims an upper bound, in the general case).