Notes for Ma221 Lecture 19, Nov 3 2009

Continue Chapter 5 on symmetric eigenvalue problems and the SVD

Goals: 
   Finish Perturbation Theory 
   Algorithms for the symmetric eigenproblem
   Algorithms for the SVD

Perturbation Theory:
   We will outline section 5.2.1. These ideas underlie two algorithms:
   (1) the fastest algorithm for finding eigenvalues and eigenvectors of symmetric
   matrices (in LAPACK routine ssyevr), which is based on work by Prof. Parlett
   and Inderjit Dhillon (now at UT Austin), and won the 
   SIAM Linear Algebra Best Paper Prize in 2006, and 
   (2) the most accurate algorithm for the SVD (in LAPACK routine sgesvj), which is
   based on work by Zlatko Drmac and Kresimir Veselic, and won the 
   SIAM Linear Algebra Best Paper Prize in 2009.

   Most algorithms so far have had a small backward error as a goal, for example
   computing the exact eigenvalues of A+E where ||E|| = O(macheps)*||A||. If the 
   algorithm is careful to keep E symmetric, then by Weyl's Thm this means
   the error in each computed eigenvalue is bounded by 
      | lambda_i,true - lambda_i,comp | <= ||E||_2 = O(macheps)*||A||_2
   So the error is always small compared to the largest eigenvalue | lambda_max | = ||A||_2.
   If each matrix entry A_ij is uncertain by this amount (O(macheps)*||A||) then 
   one cannot expect more. But is this always the case?

   Matlab demo 
   A = randn(3,3); A = eye(3) + (A+A')/10; eigA=eig(A), D=diag([1,1e-10,1e-20]); H = D*A*D, eigH=eig(H), cholH=chol(H)
   A is clearly positive-definite, i.e. has positive eigenvalues, so H should be too:
   chol says it is, eig says it isn't. But the backward error is tiny, though the 
   condition number norm(H)*norm(inv(H)) ~ 1e40 is huge, so should we care? 
   Yes, sometimes:

   It turns out that chol has a much tinier backward error than eig, and actually gets the
   right answer. For this matrix, knowing each matrix entry to k significant digits
   means that all the entries of the Cholesky factor and all the eigenvalues are known to about 
   k significant digits, so the condition number of the eigenvalues is about 1, not 1e40.
   But the eigenvalue routine in Matlab does not compute them this accurately.

   Another way to say it is this: A small *absolute* error in the matrix 
   (i.e. E is small in norm) causes only small *absolute* errors in the 
   eigenvalues. But a small *relative* error in the matrix (to be defined!) 
   causes only small *relative* errors in the eigenvalues, so tiny eigenvalues 
   have as many correct digits as the big ones.

   Thm ("Relative Weyl's Thm"): Consider the symmetric matrix H with eigenvalues 
   lambda_i, and the perturbed symmetric matrix  H' = X*H*X^T where 
   ||I-X*X^T||_2 < 1 (this is what we mean by a small "relative change" in A). 
   Let lambda'_i be the eigenvalues of H'. Then
       | lambda_i - lambda'_i | <= ||I-X*X^T||_2 * |lambda_i|
   i.e. all the eigenvalues have small relative errors, 
   independent of their magnitudes.

   Proof: Write H' - lambda_i*I = X*(H - lambda_i*I)*X^T + (X*X^T - I)*lambda_i
                                =   H1  +   H2
          Applying Weyl's Theorem, since H1 is singular (has a 0 eigenvalue), then 
          H' - lambda_i must have an eigenvalue within ||H2||_2 of 0, i.e. H' has an 
          eigenvalue within ||H2||_2 of lambda_i. 
          And ||H2||_2 <= || I-X*X^T || * |lambda_i| as desired.

   Our intuition about the numerical example above is that the sensitivity of the  
   eigenvalues of H should only depends on the condition number of A not H = D*A*D.
   We prove this using Relative Weyl's Thm: Suppose we change H to H'=D*(A+E)*D.
   Then we claim that the difference between eigenvalues lambda_i of H and lambda'_i of H'
   is bounded by | lambda_i - lambda'_i | <= ||A^{-1}|| * ||E|| * | lambda_i |
   So if ||E|| <= eps * ||A||, this is bounded by
                 | lambda_i - lambda'_i | <= ||A^{-1}|| * ||A|| * eps * | lambda_i |
                                           = eps * cond(A) * | lambda_i |
   Proof:
   Write H = H' = D*(L*L^T + E)*D = D*L*(I + L^(-1)*E*L^(-T))*L^T*D = D*L*(I+Y)*L^T*D.
   Here Y = L^(-1)*E*L^(-T) is small, since ||Y|| <= ||A^(-1)||*||E|| and A is
   well-conditioned.  Write I+Y = L'*L'^T; since Y is small, we expect L' to be close 
   to the Cholesky factor of I, namely I.  Now H' = (D*L*L')*(L'^T*L^T*D). 
   From homework we know H = (D*L)*(L^T*D) and K = (L^T*D)*(D*L) have the same 
   eigenvalues, as do H'= (D*L*L')*(L'^T*L^T*D) and K' = L'*T*L^T*D*D*L*L' = L'^T*K*L'.
   Now "Relative Weyl's Theorem" says the eigenvalues of K and K',
   i.e. the eigenvalues of H and H', differ by a small relative amount,
   namely ||I - L'^T*L'||_2 = ||I - L'*L'^T||_2 = ||Y||_2 <= ||A^(-1)||_2*||E|_2 as claimed.

  Algorithms: There are several approaches, depending on what one wants to compute:
      (1) "Usual accuracy": backward stable in the sense that you get the exact 
          eigenvalues and eigenvectors for A+E where ||E|| = O(macheps)*||A||
            (1.1) Get all the eigenvalues (with or without the corresponding eigenvectors)
            (1.2) Just get all the eigenvalues in some interval [x,y]
                 (with or without the corresponding eigenvectors)
            (1.3) Just get lambda_i through lambda_j (with or without the corresponding vectors)
                  Ex: get the 10 smallest eigenvalues (lambda_1 through lambda_10)
          (1.2) and (1.3) can be rather cheaper than (1.1) if only a few values wanted
      (2) "High accuracy": compute tiny eigenvalue more accurately than the "usual"
          accuracy guarantees. This costs more (but not too much, because of Drmac and
          Veselic), than (1.1) with all values and vectors
      (3) Updating - given the eigenvalues and eigenvectors of A, find them for
          A with a "small change", small in rank, eg A +- x*x^T

   Algorithms and costs:
      (1) We begin by reducing A to Q^T*A*Q = T where I is tridiagonal, costing O(n^3) flops.
          This uses the same approach as in chapter 4, where Q^T*A*Q was upper Hessenberg,
          since a matrix that is both symmetric and upper Hessenberg must be tridiagonal.
          This is currently done by LAPACK routine ssytrd.
          It is an open question as to whether this can be done moving only
          O(n^3/sqrt(fast_memory_size)) words between fast and slow memory.

          When A is banded, a different algorithm (in ssbtrd) takes advantage of the band
          structure to reduce to tridiagonal form in just O(n^2*bandwidth) operations.
         (1.1) Given T we need to find all its eigenvalues (and possibly vectors) 
               There are a  variety of algorithms; all cost O(n^2) just for
               eigenvalues, but anywhere from O(n^2) to O(n^3) for eigenvectors
               We summarize their properties here, and describe a few in more
               detail below.
               (1) Oldest is QR iteration, as in chapter 4, but for tridiagonal T.
                   Thm (Wilkinson): With right shift, tridiagonal QR is globally
                     convergent, and usually cubically convergent (# correct digits
                     triples at each step!).
                   It is only useful for getting all the eigenvalues, and possibly
                   vectors, but costs O(n^3) to get the vectors, unlike later routines.
                   It is used by LAPACK routine ssyev (sgesvd for SVD).
               (2) Next is "divide-and-conquer", faster, also used for updating (3)
                     ssyevd (sgesdd for SVD). Its speed is harder to analyze, but
                     empirically it is like O(n^g) for some 2 < g < 3.
               (3) Most recent is MRRR = Multiple Relatively Robust Representations.
                   related to inverse iteration from Chap 4, so T = Z*Lambda*Z^T 
                    costs O(n^2): optimal, using algorithm developed by Prof. Parlett
                   and former student Inderjit Dhillon, since output Z of size n^2)
                     ssyevr (still under development for SVD, see Willems talk at SIAM ALA)
             
                In theory, there is an even faster algorithm than O(n^2), based on
                divide-and-conquer, by representing Z implicitly rather than as n^2 
                explicit matrix entries, but since most users want explicit eigenvectors, 
                and the cost of reduction to tridiagonal form T is already much larger, 
                we do not usually use it:
                  Thm (Ming Gu): can compute Z in O(n log^p n) time, provided
                  we represent it implicitly (so that we can multiply any vector
                  by it cheaply)

            Thus A = Q*T*Q^T = Q*Z*Lambda*Q^T*Z^T = (Q*Z)*Lambda*(Q*Z)^T,
               so we need to multiply Q*Z to get final eigenvectors (costs O(n^3))

      (1.2) or (1.3) Reduce A to Q^T*A*Q = T as above.
            Use bisection (based on Sylvester's Thm) on T to get a few eigenvalues,
             and then inverse iteration to get their eigenvectors if desired.
               ssyevx (not available for SVD - project!)
            MRRR (in ssyevr) could be used for this too, but this is not yet 
            implemented.

      (2) Use Jacobi's Method (based on historically oldest method, modern version
           by Drmac and Veselic
              sgesvj for SVD (not yet for eigenproblem)

      (3) Use same idea as divide-and-conquer: assuming we have the eigenvalues and
          eigenvectors for A = Q*Lambda*Q^T, it is possible to compute the
          eigenvalues of A +- u*u^ in O(n^2), much faster than starting from scratch.

    We now illustrate important properties of some of these algorithms
   Matlab demo of QR: illustrating cubic convergence
    n=6; A = randn(n,n), A = A+A', T = hess(A), 
    s = T(n,n); [Q,R]=qr(T-s*eye(n)); T = R*Q+s*eye(n); T = tril(T,1);T=(T+T')/2

   Cubic convergence will follow from analysis of simpler algorithm called
   Rayleigh Quotient iteration:
     choose unit vector x_0
     i = 0
     repeat
       s_i = rho(x_i,A) = x_i^T*A*x_i 
       y = (A - s_i*I)^(-1) * x_i
       x_(i+1) = y / ||y||_2
       i = i+1
     until convergence ( s_i and/or x_i stop changing )

   Suppose that A*q = lambda*q, ||q||_2 = 1, and ||x_i-q|| = e << 1.
   We want to show that ||x_(i+1)-q|| = O(e^3).
   We need to bound |s_i - lambda|; a first bound is
      s_i = x_i^T*A*x_i = (x_i - q + q)^T*A*(x_i - q + q)
          = (x_i-q)^T*A*(x_i-q) + q^T*A*(x_i-q) + (x_i-q)^T*A*q + q^T*A*q
          = (x_i-q)^T*A*(x_i-q) + lambda*q^T*(x_i-q) + (x_i-q)^T*q*lambda + lambda
     so s_i - lambda = (x_i-q)^T*A*(x_i-q) + 2*lambda*(x_i-q)^T*q
     so |s_i - lambda| = O( || x_i-q ||^2 + || x_i-q|| ) = O( || x_i-q || )
   A tighter bound is
      |s_i - lambda| <= || A*x_i - s_i*x_i ||^2 / gap   
                         ... where gap = distance from s_i to next closest eigenvalue
                         ... we assume the gap is not too small
                      = || A*(x_i - q + q) - s_i*(x_i - q + q) ||^2 / gap
                      = || (A-s_i*I)*(x_i-q) + (lambda - s_i)*q ||^2 / gap
                     <= ( || (A-s_i*I)*(x_i-q) || + || (lambda - s_i)*q || )^2 / gap
                      = O(e^2)
      Now we take one step of inverse iteration to get x_(i+1); from 
      earlier analysis we know
        || x_(i+1)-q || <= || x_i-q || * | s_i - lambda |/gap
                         = e * O(e^2) / gap
                         = O(e^3) as desired

   To see that QR iteration is doing Rayleigh Quotient iteration in disguise,
   look at one step: 
      T - s(i)*I = Q*R so
      (T - s(i)*I)^(-1) = R^(-1)*Q^(-1)
                        = R^(-1)*Q^T  ... since Q is orthogonal
                        = (R^(-1)*Q^T)^T   ... since the matrix is symmetric
                        = Q*R^(-T)
    so (T - s(i)*I)^(-1) * R^T = Q
    so (T - s(i)*I)^(-1) * e_n * R(n,n) = q_n = last column of Q
    Since s(i) = T(n,n) = e_n^T * T * e_n, s(i) is just the Rayleigh Quotient
    for e_n, and q_n is the result of one step of Rayleigh quotient iteration.
    Then the updated T is R*Q + s(i)*I = Q^T*T*Q, so 
     (updated T)(n,n) = q_n^T * T * q_n is the next Rayleigh quotient as desired.

   In practice QR iteration is implemented analogously to Chap 4, by "bulge chasing",
   at a cost of O(n) per iteration, or O(n^2) to find all the eigenvalues. But
   multiplying together all the Givens rotations still costs O(n^3), so something
   faster is still desirable.

   Divide-and-conquer: The important part is how to cheaply compute the 
   eigendecomposition of a rank-one update to a symmetric matrix A + alpha*u*u^T, 
   where we already have the eigendecomposition A = Q*Lambda*Q^T. Then
       A + alpha*u*u^T = Q*Lambda*Q^T + alpha*u*u^t
                       = Q*(Lambda + alpha*(Q^T*u)*(Q^T*u))*Q^T
                       = Q*(Lambda + alpha*v*v^T)*Q^T
   so we only need to think about computing the eigenvalues and eigenvectors of 
   Lambda + alpha*v*v^T. Lets compute its characteristic polynomial:
  
   Lemma (homework!) det (I + x*y^T) = 1 + y^T*x

   Then the characteristic polynomials is
      det(Lambda + alpha*v*v^T - lambda*I) 
         = det((Lambda - lambda*I)*(I + alpha*(Lambda-lambda*I)^(-1)*v*v^T)
         = prod_i (lambda_i - lambda) * (1 + alpha*sum_i v(i)^2/(lambda_i - lambda)
         = prod_i (lambda_i - lambda) * f(lambda)

  So our goal is to solve the so-called secular equation f(lambda) = 0.
  Consider figure 5.2 in the text (Secular1.ps): this plot of
      f(lambda) = 1 + .5/(1-lambda) + .5/(2-lambda) + .5/(3-lambda) + .5/(4-lambda)
  looks benign: we see there is one root of f(lambda) between each pair
   [lambda_i , lambda_(i+1)] of adjacent eigenvalues, and in each such
   interval f(lambda) is monotonic, so some Newton-like method should work well.

  But now consider figure 5.3 in the text (Secular2.ps):
      f(lambda) = 1 + .001/(1-lambda) + .001/(2-lambda) + .001/(3-lambda) + .001/(4-lambda)
  Here f(lambda) no longer looks so benign, and simple Newton would not work, 
  taking a big step out of the bounding interval. But we can still use Newton,
  but instead of approximating the f(lambda) by a straight line at each step,
  and getting the next guess funding finding a zero of that straight line,
  we approximate f(lambda) by an simple but non-linear function that better matches 
  the graph, with poles at lambda_i and lambda_(i+1):
      g(lambda) = c1 + c2/(lambda_i - lambda) + c2/(lambda_(i+1) - lambda)
  where c1, c2 and c3 are chosen as in Newton to match f and f' at the last
  approximate 0. Solving g(lambda)=0 is then solving a quadratic for lambda.
  (picture).
        
  It remains to compute the eigenvectors: 
  Lemma: If lambda is an eigenvalue of Lambda + alpha*v*v^T, then
    (Lambda - lambda*I)^(-1)*v is its eigenvectors. Since Lambda is
    diagonal, this costs O(n) to compute.
  Proof:  (Lambda + alpha*v*v^T)*(Lambda - lambda*I)^(-1)*v
             = (Lambda - lambda*I + lambda*I + alpha*v*v^T)*(Lambda - lambda*I)^(-1)*v
             = v + lambda*(Lambda - lambda*I)^(-1)*v + v*(alpha*v^T*(Lambda - lambda*I)^(-1)*v)
             = v + lambda*(Lambda - lambda*I)^(-1)*v - v
                since alpha*v^T*(Lambda - lambda*I)^(-1)*v + 1 = f(lambda) = 0
             = lambda*(Lambda - lambda*I)^(-1)*v  as desired
  Unfortunately this simple formula is not numerically stable; the text describes
  the clever fix (due to Ming Gu and Stan Eisenstat, for which Prof. Gu won 
  the Householder Prize for best thesis in Linear Algbra in 1996).

  When two eigenvalues lambda_i and lambda_i+1 are (nearly) identical, then 
  we know there is a root between them without doing any work. Similarly
  when v(i) is (nearly) zero, we know that lambda(i) is (nearly) an
  eigenvalue without doing any work. In practice a surprising number
  of eigenvalues can be found quickly ("deflated") this way, which
  speeds up the algorithm.

  Write [Q',Lambda']=Eig_update[Q,Lambda,alpha,u] as the function we just
  described that takes the eigendecomposition of A = Q*Lambda*Q^T
  and updates it, returning the eigendecomposition of 
       A + alpha*u*u^T = Q'*Lambda'*Q'^T
  We can also Eig_update as a building block for an entire eigensolver,
  using Divide-and-Conquer. To do so we need to see how to divide
  a tridiagonal matrix into two smaller ones of half the size, just
  using a rank one change alpha*u*u^T. We just divide in the middle:
  If T = tridiag(a(1),...,a(n); b(1),...,b(n-1)) is a symmetric tridiagonal
  matrix with diagonals a(i) and off-diagonals b(i), then we can write
   T = [ a(1)  b(1)                                         ]
       [ b(1)  a(2)  b(2)                                   ]
       [                    ...                             ]
       [               b(i-1) a(i) b(i)                     ]
       [                      b(i) a(i+1) b(i+1)            ]
       [                           ...                      ]
       [                                       b(n-1)  a(n) ]

     = [ a(1)  b(1)                                         ]  + [                     ]
       [ b(1)  a(2)  b(2)                                   ]    [                     ]
       [                    ...                             ]    [                     ]
       [               b(i-1) a(i)-b(i)      0              ]    [        b(i) b(i)    ]
       [                          0    a(i+1)-b(i) b(i+1)   ]    [        b(i) b(i)    ]
       [                           ...                      ]    [                     ]
       [                                       b(n-1)  a(n) ]    [                     ]

    =  diag(T1,T2) + b(i)*u*u^T

  where T1 and T2 are two half-size tridiagonal matrices (if i is n/2) and
  u is a vector with u(i)=u(i+1)=1 and the other u(j)=0. 
  using this notation, we can write our overall algorithm as follows:
     function [Q,Lambda] = DC_eig(T) ... return T = Q*Lambda*Q'
        n = dimension(T)
        if n small enough, 
          use QR iteration, 
        else
          i = floor(n/2)
          write T = diag(T1,T2) + b(i)*u*u^T    ... just notation, no work
          [Q1,Lambda1] = DC_eig(T1)
          [Q2,Lambda2] = DC_eig(T2)
          ... note that diag(Q1,Q2) and diag(Lambda1,Lambda2) are
          ... eigendecomposition of diag(T1,T2)
          [Q,Lambda] = Eig_update(diag(Q1,Q2),diag(Lambda1,Lambda2),b(i),u)
        endif
        return

  Now we turn to algorithms that are fastest when only a few 
  eigenvalues and eigenvectors are desired. Afterward we return to
  briefly describe MRRR, the fastest algorithm when all eigenvalues
  and eigenvectors are desired.

  Recall Sylvester's Theorem:  Suppose A is symmetric and X nonsingular.
  Then A and X*A*X^T have the same Inertia = (#evals<0,#evals=0,#evals>0).
  So A - sigma*I and X*(A-sigma*I)*X^T have the same Inertia, namely
   (#evals of A < sigma, #evals of A = sigma, #evals of A > sigma) 
  So if X*(A-sigma*I)*X^T were diagonal, it would be easy to count
  the number of eigenvalues of A less than, equal to or greater than sigma,
  for any sigma. By doing this for sigma_1 and sigma_2, we can count
  the number of eigenvalues in any interval [sigma_1 , sigma_2],
  and by repeatedly cutting intervals in half, we can compute intervals
  containing the eigenvalues that are as narrow as we like, or that
  only contain eigenvalues in regions of interest (eg the smallest).

  But how do we cheaply choose X to make X*(A - sigma*I)*X^T diagonal?
  By starting with A = T tridiagonal, and doing Gaussian elimination
  without pivoting:  T - sigma*I = L*D*L^T, so
  inv(L)*(T - sigma*I)*inv(L)^T and D have the same inertia.

  However, LU without pivoting seems numerically dangerous. Fortunately,
  because of the tridiagonal structure, it is not, if done correctly:

  function c = Negcount(T,s) ... count the number c of eigenvalues of T < s
    ... assume T has a(1),...,a(n) on diagonal and b(1),...,b(n-1) off-diagonal
    ... only compute diagonal entries d(i) of D in T-s*I = L*D*L^T
    c = 0
    b(0) = 0
    d(0) = 1
    for i = 1 to n
       d(i) = (a(i) - s) - b(i-1)^2/d(i-1)  ... need to obey parentheses!
       if (d(i)<0), c=c+1
    end
  
    We don't need l(i) = i-th off diagonal of L, because 
       (T-s*I)(i,i+1) = b(i) = (L*D*L^T)(i,i+1) = d(i)*l(i)
    so we can replace the usual inner loop of LU
       d(i) = a(i)-s - l(i-1)^2*d(i-1)  
    by
       d(i) = a(i)-s - b(i-1)^2/d(i-1)

  Thm: Assuming we don't divide by zero, overflow or underflow,
       the computed c from Negcount(T,s) is exactly correct
       for a slightly perturbed T + E,
       where E is symmetric and tridiagonal, 
             E(i,i) = 0 (the diagonal is not perturbed at all) and 
             |E(i,i+1)| <= 2.5*macheps*|T(i,i+1)|

  Proof:  As before, we replace each operation like a+b in the algorithm
          with (a+b)*(1+delta) where |delta| <= macheps. If we obey the
          parentheses in the algorithm, so a(i) - s is subtracted first,
          then we can divide out by all the (1+delta) factors multiplying
          a(i)-s to get
            d(i)*F(i) = a(i)-s - b(i-1)^2*G(i)/d(i-1)
          where F(i) and G(i) are the product of some (1+delta) factors.
          Now write this as
            d(i)*F(i) = a(i)-s - b(i-1)^2*G(i)*F(i-1)/(d(i-1)*F(i-1))
          or
            d'(i)     = a(i) - s - b'(i-1)^2 / d'(i-1)
          the exact recurrence for T+E. Since d(i) and d'(i) = d(i)*F(i)
          have the same sign, we get the same exact Negcount for either.
 

  In fact more is true: This works even if some pivot d(i-1) is exactly zero,
  and we divide by zero, so the next d(i) is infinite, because we end up dividing 
  by d(i) in the next step, and the "infinity" disappears. But it does mean
  that to correctly compute Negcount, we need to count -0 as negative, 
  and +0 as positive (-0 is a feature of IEEE floating point arithmetic).
  Furthermore, the function g(sigma) = #evals < sigma is computed
  monotonically increasing, as long as the arithmetic is correctly rounded
  (otherwise the computed number of eigenvalues in a narrow enough interval 
   might be negative!).

  The cost of Negcount is clearly O(n). To find one eigenvalue with Negcount via
  bisection, Negcount needs to be called at most 64 (in double) or 32 (in single)
  times, since each time the interval gets half as big, and essentially
  determines one more bit in the floating point format. So the cost is still O(n).
  So to find k eigenvalues using bisection, the cost is O(k*n).

  Given an accurate eigenvalue lambda_j from bisection, we find its eigenvector by
  inverse iteration:
     choose x(0), i=0
     repeat
       i = i+1
       solve (T - lambda_j*I)*y = x(i-1) for y
       x(i) = y / ||y||_2
     until x(i) converges
  which should converge in a few steps, and so at cost O(n) per eigenvector.

  This seems like an optimal algorithm for all n eigenvalues and eigenvectors: 
  at a cost of O(n) per eigenpair, the total cost should be O(n^2). But it has
  a problem: the computed eigenvectors, while they individually very nearly 
  satisfy A*x = lambda(j)*x as desired, may not be orthogonal when lambda(j)
  is very close to another lambda(j+1); there is nothing in the algorithm to
  enforce this, and when lambda(j) is close to lambda(j+1), solving with
  T - lambda(j)*I is nearly the same as solving with T - lambda(j+1)*I.
  Imagine the extreme case when lambda(j) and lambda(j+1) are so close that
  they round to the same floating point number!
  One could start with a random starting vector x(0) and hope for the best
  but there is no guarantee of orthogonality.

  At first people tried to guarantee orthogonality by taking all the 
  computed eigenvectors belonging to a cluster of nearby eigenvalues
  and running QR decomposition on them, replacing them by the columns of Q.
  This guarantees orthogonality, but has two problems:
    (1) if the computed vectors are not linearly independent, the
        columns of Q may not satisfy A*q = lambda*q very well.
    (2) if the size of the cluster of close eigenvalues is s, the
        cost of QR decomposition is O(s^2*n), so if s is large, the
        cost could go back up to O(n^3).

  The MRRR = Multiple Relatively Robust Representations algorithm was
  motivated by this problem: computing all the eigenvectors in O(n^2)
  time such that are also guaranteed orthogonal.  It is the fastest 
  algorithm available (for large enough problems; if defaults to 
  other algorithms for small problems), but is rather complicated to 
  explain. Two lectures explaining it in detail are available from
  the 2004 Ma221 web page, see the class web site for details.

  The final algorithm of importance is Jacobi's method, whose classical 
  (and so slow) implementation is described in section 5.3.5. 
  Jacobi can be shown to be the potentially most accurate algorithm, 
  with an error determined by Relative-Weyl's Theorem discussed earlier, 
  and so getting the tiny eigenvalues (and their eigenvectors) much 
  more accurately. It was much slower than the other methods discussed 
  so far, until work by Drmac and Veselic showed how it could be made much 
  faster.

  All the algorithms discussed so far extend to the SVD, typically by
  using the relationship between the SVD of A and the eigenvalues and
  eigenvectors of the symmetric matrix [ 0  A ; A^T 0 ]. The one
  exception is MRRR, where some open problems remain. There is recent
  work on this (reported at the SIAM Linear Algebra meeting a few
  weeks ago by Paul Willems), so we can hope to have MRRR available for
  the SVD in the near future.