\begin{verbatim}
Math 110 - Fall 05 - Lecture notes # 41 - Dec 7 (Wednesday)

The next topic is how to show the Jordan Canonical Form exists.
We will follow the following approach:
  (1) Compute the Schur Form: A = Q * T * Q^*, where all
      equal eigenvalues appear in adjacent diagonal entries of T.
      (Recall that we can put the eigenvalues of A onto the diagonal
      of T into any desired order).  Thus we can write 
                m_1  m_2  m_3  ... m_k
          T = [ T_11 T_12 T_13 ... T_1k ]  m_1
              [   0  T_22 T_23 ... T_2k ]  m_2
              [   0   0   T_33 ... T_2k ]  m_3
              [           ....          ]  ...
              [   0       ....  0  T_kk ]  m_k
      where T_ii is upper triangular and has all diagonal entries 
      equal to lambda_i, and each lambda_i is distinct.
  (2) Compute a similarity S so that
          T = S * B * S^{-1} where B = [ T_11  0    0 ...    0 ]
                                       [  0   T_22  0 ...    0 ]
                                       [           ...         ]
                                       [  0    0    0 ...  T_kk]
  (3) Show how to compute each Jordan form J_i of T_ii, 
      T_ii = W_i * J_i * W_i^{-1}

Putting these three steps together gives

      A = Q * T * Q^*    ... from step (1)
        = Q * (S * B * S^{-1}) * Q^*   ... from step (2)
        = Q * S * ( diag(W_1,...,W_k) * diag(J_1,...,J_k)
                    * diag(W_1^{-1},...,W_k^{-1}) ) * S^{-1} * Q^*
          ... from step (3)
        = ( Q * S * diag(W_1,...,W_k) )
             * diag(J_1,...,J_k) *
               ( diag(W_1^{-1},...,W_k^{-1}) * S^{-1} * Q^* )
          ... by associativity
        = ( Q * S * diag(W_1,...,W_k) )
             * diag(J_1,...,J_k) *
               * ( Q * S * diag(W_1,...,W_k) )^{-1}
          ... inverse of product = product of inverses in reverse
        = Y * diag(J_1,...,J_k) * Y^{-1}
the desired Jordan Canonical Form.

We have seen how to compute the Schur form before, so now we go on to step (2).

Def 1: Suppose T^hat: V -> V is linear. Then a subspace S of V is called
an invariant subspace of T^hat if T^hat(S) is a subset of S.

Ex: Suppose T^hat(x) = lambda*x, so (lambda,x) is an eigenpair.
Then S = span(x) is invariant.

Ex: Suppose T^hat(s_i) = lambda_i*s_i for i=1,..,m. Then
 S = span(s_1,...,s_m) is invariant.

Not all invariant subspaces are spanned by eigenvectors (unless
T is diagonalizable, in which case we don't need the Jordan form!). 
What step (2) does is to compute a basis for an invariant subspace 
for each distinct eigenvalue lambda_i. To see this, let us interpret
Def 1 for matrices:

Lemma 1: Suppose S = [s_1,...,s_m] is an n by m matrix whose columns
are independent and span an m dimensional invariant subspace of the
n by n matrix T. Then there is an m by m matrix B such that
T*S = S*B. Furthermore, every eigenvalue of B is an eigenvalue of T.

Proof:  span(S) invariant means that for each column s_i of S:
           T*s_i = sum_{j=1 to m} B_ji * s_j   
        for some constants B_ji. Writing all these together gives

  T*S = [T*s_1,...,T*s_m]
      = [sum_{j=1 to m} B_j1 * s_j , ... , sum_{j=1 to m} B_jm * s_m]
      = S * B

        Furthermore, if B*x = lambda*x then
          T*S*x = S*B*x = lambda*S*x
        so lambda is an eigenvalue of T with eigenvector S*x.

Lemma 2: Suppose S_1,...,S_k are n by m_k matrices, all forming bases 
for independent invariant subspaces of T, i.e. T*S_i = S_i*B_i for each i,
and all the vectors in S_1,...,S_k taken together are independent.
Suppose also that altogether they span C^n: m_1 + ... m_k = n.
Let S = [S_1,...,S_k] by an n by n matrix. Then S is
invertible and "block diagonalizes" T:
   S^{-1} * T * S = diag(B_1,..l,B_k) = B

Proof: By assumption 
          T*S = T*[S_1,...,S_k]
              = [T*S_1,...,T*S_k]
              = [S_1*B_1,...,S_k*B_k]
              = [S_1,...,S_k] * diag(B_1,...,B_k)
              = S * diag(B_1,...,B_k)
              = S * B
       Multiply both sides on the left by S^{-1} to get the result.

Step (2) is based on Lemma 2 where each diagonal block B_i = T_ii
from the Schur form in step (1). We show how to compute S_i explicitly:

Lemma 3: There is an invariant subspace S_i of T for each T_ii,
         i.e. T*S_i = S_i * T_ii, of the forms:

          m_1                          m_i
   S_1 = [ I ] m_1         or  S_i = [ X_i ]  mm_1 = m_1+...+m_{i-1} 
         [ 0 ] m_2+...+m_k           [  I  ]  m_i
                                     [  0  ]  mm_2 = m_{i+1}+...+m_k

   where each matrix X_i satisfies a linear system of equations that 
   we can solve "by substitution". (A main reason for starting with
   the Schur form T is to make finding S_i so easy!).
   All the columns of all the S_1,...,S_k are independent.

Proof: It is easy to see that T * S_1 = S_1 * T_11, so we
       concentrate on showing T * S_i = S_i * T_11.
       We write T as a block matrix with the same sized blocks as S_i
       to make it easier:
              
                mm_1 m_i  mm_2
          T = [ R_11 R_12 R_13 ]  mm_1
              [  0   R_22 R_23 ]  m_i
              [  0    0   R_33 ]  mm_2

       where each diagonal block R_ii is square, and R_22 = T_ii. Then
          T*S_i = [ R_11 R_12 R_13 ] * [ X_i ]
                  [  0   R_22 R_23 ]   [  I  ]
                  [  0    0   R_33 ]   [  0  ]
                = [ R_11*X_i + R_12 ]
                  [      R_22*I     ]
                  [       0         ]
       Setting this equal to S_i*T_ii = S_i*R_22 yields
                  [ R_11*X_i + R_12 ] = [ X_i ] * R_22 = [ X_i*R_22 ]
                  [      R_22*I     ]   [  I  ]          [ I * R_22 ]
                  [       0         ]   [  0  ]          { 0 * R_22 ]
       or   R_11*X_i + R_12 = X_i*R_22
                       R_22 = R_22
                          0 = 0
       Only the first of these (where we change X_i to X for simplicity)
       (*)  X*R_22 - R_11*X  = R_12

       depends on X. This is actually a system of linear equations for the 
       entries of the rectangular matrix X, called the Sylvester Equation.
       To see this note that the left hand side is a linear function of X:
           (c*X+Y)*R_22 - R_11*(c*X+Y) 
                = c*(X*R_22 - R_11*X) + (Y*R_22 - R_11*Y)
       We describe how to solve it in two equivalent ways.

       First, we can change (*) into a standard linear system by rearranging all
       the unknown entries of the mm_1 by m_i matrix X into a big vector x of 
       dimension r = mm_1 * m_i, and rewriting X*R_22 - R_11*X as a big r by r 
       matrix L times x.  Suppose we arrange X into x in the right order, 
       namely as illustrated below for X being 3 by 4:
           X = [ x_3  x_6  x_9  x_12 ]
               [ x_2  x_5  x_8  x_11 ]
               [ x_1  x_4  x_7  x_10 ]
       (column by column, left to right, bottom to top within a column). Then 
       the corresponding L will be a lower triangular matrix, so we can solve 
       (*) for X "by substitution", one entry at time in the order shown. 
       We explain why L is nonsingular below.

       Second, we do the same thing without explicitly rearranging X, just by 
       noting that looking at the (i,j) entry on each side of (*) yields
           (R_12)_ij = (X*R_22 - R_11*X)_ij
                     = sum_{k=1 to m_i} X_ik * (R_22)_kj
                         - sum_{k=1 to mm_1} (R_11)_ik * X_kj
                     = sum_{k=1 to j} X_ik * (R_22)_kj
                         - sum_{k=i to mm_1} (R_11)_ik * X_kj
                       ... since R_11 and R_22 are triangular
                     = sum_{k=1 to j-1} X_ik * (R_22)_kj
                       + X_ij * (R_22)_jj
                         - sum_{k=i+1 to mm_1} (R_11)_ik * X_kj
                         - (R_11)_ii * X_ij

       or, solving for the X_ij terms,

       (**)   X_ij * ( (R_22)_jj - (R_11)_ii )
                  = (R_12)_ij - sum_{k=1 to j-1} X_ik * (R_22)_kj
                              + sum_{k=i+1 to mm_1} (R_11)_ik * X_kj

       To see that we can solve this for X_ij, we note 2 things:
          (1) We can divide by (R_22)_jj - (R_11)_ii, which is
              nonzero because (R_22)_jj and (R_11)_ii are different 
              eigenvalues of T, since they come from different
              diagonal blocks of T.  This is why we grouped the 
              eigenvalues of T into different blocks. (The triangular
              matrix L described above has diagonal entries 
              (R_22)_jj - (R_11)_ii for all i and j).
          (2) If we solve for the X_ij in the order described above,
              column by column, left to right, bottom to top 
              within a column, then the right hand side of (**)
              will only depend on values of X_ik to the left of X_ij,
              and values of X_kj below X_ij, i.e. values we have
              already computed.

ASK&WAIT?  Why are all the columns of all the S_1,...,S_k are independent?

Ex: Suppose we are trying to solve X*R_22 - R_11*X = R_12 of the form
      [ x2 x4 ] * [ 1 2 ]  -  [ 2 3 ] * [ x2 x4 ] = [ 5 6 ]
      [ x1 x3 ]   [ 0 1 ]     [ 0 4 ]   [ x1 x3 ]   [ 7 8 ]
    where we have numbered the unknown entries of X in the order in which
    we will solve for them:
      (1) compute (2,1) entry of both sides, yielding
           (x1*1 + x3*0) - (0*x2 + 4*x1) = 7  
          =>  (-3)*x1 = 7 => x1 = -7/3
      (2) compute (1,1) entry of both sides, yielding
           (x2*1 + x4*0) - (2*x2 + 3*x1) = 5  
          =>  (-1)*x2 = 5 + 3*x1 => x2 = 2
      (3) compute (2,2) entry of both sides, yielding
           (x1*2 + x3*1) - (0*x4 + 4*x3) = 8  
          =>  (-3)*x3 = 8 - 2*x1 => x3 = -38/9
      (4) compute (1,2) entry of both sides, yielding
           (x2*2 + x4*1) - (2*x4 + 3*x3) = 6  
          =>  (-1)*x4 = 6 - 2*x2 + 3*x3 => x4 = 32/3

Now we go on to step (3), computing the Jordan Canonical Form of
an upper triangular matrix with a single eigenvalue. For simplicity 
of notation we write this matrix as T = lambda*I + N, where N is 
"stricly upper triangular" (nonzero only above the diagonal). Since 
   S*T*S^{-1} = S*(lambda*I + N)*S^{-1} = lambda*I + S*N*S^{-1}
is in JCF if and only if S*N*S^{-1} is in JCF, we assume without
loss of generality that lambda = 0. Suppose that T is m by m.

Here we illustrate our approach in a simple 4 by 4 case. Note that
for any 4 by 4 matrix X = [ x_1 x_2 x_3 x_4 ] we have

  X * J0 = [ x_1 x_2 x_3 x_4 ] * [ 0 1 0 0 ] = [ 0 x_1 x_2 x_3 ]
                                 [ 0 0 1 0 ]
                                 [ 0 0 0 1 ]
                                 [ 0 0 0 0 ]

Therefore if we can find an x_4 such that 
          x_3 = N*x_4 is nonzero, and
          x_2 = N*x_3 = N^2*x_4 is nonzero, and
          x_1 = N*x_2 = N^3*x_4 is nonzero, and
          0   = N*x_1 = N^4*x_4

then we will get
     N * X = N * [ x_1 x_2 x_3 x_4 ] 
           = [ N*x_1 N*x_2 N*x_3 N*x_4 ]
           = [     0   x_1   x_2   x_3 ]
           = X * J0

If it is also the case that X is invertible, then we can multiply on 
the right by X^{-1} to get N = X * J0 * X^{-1} which is the desired 
Jordan Canonical Form. Therefore, we need to find "chains" of 
independent vectors like x_1,...,x_4 above that are related by 
x_i = N * x_{i+1}, with x_1 = N^3*x_4 nonzero and N^4*x_4 = 0.
Each such chain will correspond to a Jordan block.

Now we go back to the general case.
Lemma 4: N is "nilpotent", i.e. N^m = 0.

Ex: N = [ 0 1 2 3 ], N^2 = [ 0 0 4 17 ], N^3 = [ 0 0 0 24 ], N^4 = 0
        [ 0 0 4 5 ]        [ 0 0 0 24 ]        [ 0 0 0  0 ]
        [ 0 0 0 6 ]        [ 0 0 0  0 ]        [ 0 0 0  0 ]
        [ 0 0 0 0 ]        [ 0 0 0  0 ]        [ 0 0 0  0 ]

It is easy to see how to construct the matrix X above needed to reduce to
Jordan Form: An "obvious" nonzero vector x_4 such that x_1 = N^3*x_4 is also
nonzero is e_4. Then 
    x_3 = N * e_4 = [ 3 ],  x_2 = N^2 * e_4 = [ 17 ], x_1 = N^3 * e*4 = [ 24 ]
                    [ 5 ]                     [ 24 ]                    [  0 ]
                    [ 6 ]                     [  0 ]                    [  0 ]
                    [ 0 ]                     [  0 ]                    [  0 ]
and so X = [ x_1 x_2 x_3 x_4] = [ 24 17 3 0 ] which is clearly invertible.
                                [  0 24 5 0 ]
                                [  0  0 6 0 ]
                                [  0  0 0 1 ]

Proof of Lemma 4: As in the example, we use the fact that N is strictly upper 
triangular.  We number the superdiagonals of N from 0 (the 
main diagonal) through m-1 (the upper right corner).  In other words, 
N_ij is on diagonal j-i.  We use induction on k to show that 
superdiagonals 0 to k of N^{k+1} are zero. The base case is k=0, and 
indeed only the main diagonal (the 0th diagonal) of N starts out zero.
Then a typical entry on diagonal d <= k of N^{k+1} is
   (N^{k+1})_{i,i+d} = (N * N^k)_{i,i+d}
        = sum_{j=1 to m} N_ij * (N^k)_{j,i+d}
        = sum_{j=1 to i} N_ij * (N^k)_{j,i+d}
          + sum_{j=i+1 to m} N_ij * (N^k)_{j,i+d}
        = sum_{j=1 to i} 0 * (N^k)_{j,i+d}
             ... because N is zero on and to the left of the diagonal
          + sum_{j=i+1 to m} N_ij * 0
             ... because N^k is zero on diagonal i+d-j <= d-1 by induction
        = 0

Corollary 1: The m by m matrix A is nilpotent (A^m=0)
if and only if all its eigenvalues are 0.
Proof: If A has all 0 eigenvalues, use its Schur form A = Q * N * Q^*
to write A^m = Q * N^m * Q^* and apply Lemma 4. If A is nilpotent,
and A*x = lambda*x, then A^m*x = lambda^m*x = 0, so lambda = 0.

Def 2: Let lambda be an eigenvalue of any m by m matrix A. Then
       Let S_i = NullSpace((A - lambda*I)^i)
       Any nonzero x in S_i is a generalized eigenvector of A.

Ex: S_1 = NullSpace(A - lambda*I) = span(eigenvectors of A for lambda)
Ex: S_1 = NullSpace(N) = span(eigenvectors of N)

From now on we let S_i = Nullspace(N^i).

Lemma 5: There is a smallest k with 1 <= k <= m, so that for 
1 <= i < k, S_{i-1} is a subset of S_i and S_k = C^m.
If x_k is in S_k = C^m but not in S_{k-1}, then the vectors x_1,...,x_{k-1}
defined by x_i = N * x_{i+1} satisfy x_i in S_i, and x_1,...,x_k are
all linearly independent.
    More generally, if x_k^(1) through x_k^(d) are d = m - dim(S_{k-1}) 
linearly independent vectors such that C^m = span(S_{k-1},x_k^(1),...,x_k^(d)),
then the d*k vectors defined by x_i^(j) = N * x_{i+1}^(j) for i=1 to k-1 and 
j=1 to d are all linearly independent.

Proof: If N^{i-1}*x = 0, then also N^i*x = 0, so
S_{i-1} is a subset of S_i. Since N^m = 0 by Lemma 4, there
is some smallest integer k <= m for which N^k = 0, i.e. S_k = C^m,
but N^{k-1} is nonzero. Thus S_{k-1} is a proper subset of S_k = C^m.

Note that x_i = N * x_{i+1} = N^(k-i) * x_k, and so N^i * x_i = N^k * x_k = 0.
Therefore x_i is in S_i.

We assume the x_i^(j) are dependent and seek a contradiction: 
For some a_ij not all zero
   0 = sum_{i=1 to k}  sum_{j=1 to d} a_ij * x_i^(j)     
     = sum_{i=1 to r}  sum_{j=1 to d} a_ij * x_i^(j)     
           ... with r <= k and some a_rj nonzero
     = sum_{i=1 to r}  sum_{j=1 to d} a_ij * N^(k-i) * x_k^(j)     
           ... since x_i^(j) = N^(k-i) * x_{k}^(j)
     = N^(r-1) * sum_{i=1 to r}  sum_{j=1 to d} a_ij * N^(k-i) * x_k^(j)     
           ... multiply by N^(r-1)
     = sum_{i=1 to r}  sum_{j=1 to d} a_ij * N^(k-i+r-1) * x_k^(j)     
     = sum_{i=1 to r-1}  sum_{j=1 to d} a_ij * N^(k-i+r-1) * x_k^(j)     
         + sum_{j=1 to d} a_rj * N^(k-1) * x_k^(j)
           ... pull out the "i=r terms"
Thus, solving for the "i=r terms" we get that
(*)   N^(k-1) * [ sum_{j=1 to d} a_rj * x_k^(j) ]
is a linear combination of vectors N^(k-i+r-1) * x_k^(j) for i <= r-1,
i.e. for k-i+r-1 >= k. But N^k = 0, so N^(k-i+r-1) = 0, so the vector (*) is 0,
which means the nonzero vector sum_{j=1 to d} a_rj * x_k^(j) is in S^{k-1}. 
This contradicts C^m = span(S^{k-1},x_k^(1),...,x_k^(d)).


As in our 4 by 4 example, we therefore get that for each j
 
N * [x_1^(j) x_2^(j) ... x_k^(j)] = [N*x_1^(j) N*x_2^(j) ... N*x_k^(j)    ] 
                                  = [    0       x_1^(j) ...   x_{k-1}^(j)] 
                                  = [  x_1^(j)   x_2^(j) ...   x_k^(j)    ] * J0
where J0 is a k by k Jordan block with eigenvalue 0.

Thus each vector x_k^(j) in S_k = C^m starts one of the "chains" 
illustrated in our 4 by 4 example.  The chain will have length k, 
and give rise to a Jordan block of dimension k. In other words, we 
will have dim(S_k) - dim(S_{k-1}) Jordan blocks of dimension k.  
We "remove" all the vectors in these chains from the spaces S_i in 
which they live (this decreases the dimensions of all the remaining 
nullspaces S_i by dim(S_k) - dim(S_{k-1}) ).  Then we repeat the 
process on the remaining vectors, getting chains of length k-1 for 
each independent vector remaining in S_{k-1} but not in S_{k-2}, etc.

To summarize, let d_i = dim(S_i), and e_i = d_i - d_{i-1}. Then  
we can read off the Jordan form of N from these dimensions:

   There are e_k               Jordan blocks of size k
   There are e_{k-1} - e_k     Jordan blocks of size k-1
   There are e_{k-2} - e_{k-1} Jordan blocks of size k-2, etc.

Translating back to the general matrix A whose JCF we want, we get

Thm 1: Let lambda be an eigenavalue of the m by m matrix A, 
       let S_i = NullSpace((A - lambda*I)^i), 
       let k be the smallest number for which dim(S_k) reaches its maximum.
       let d_i = dim(S_i) (with d_0 = 0) and 
       let e_i = d_i - d_{i-1}   (with e_{k+1} = 0).
       Then in the JCF of A, for all i <= k there are e_i - e_{i+1}.
         Jordan blocks of size i with eigenvalue lambda.

Ex: Suppose A is a single m by m Jordan block. Then for i=1 to m
    S_i = span(e_1,...,e_i), d_i = dim(S_i) = i, and e_i = d_i - d_{i-1} = 1.
    Thus Thm 1 tells us there are e_m = 1 Jordan blocks of size m,
    and e_i - e_{i-1} = 0 blocks of all other sizes i.




Ex: Suppose A has 
        4 Jordan blocks of size 1,
        7 Jordan blocks of size 2,
        0 Jordan blocks of size 3, and
        8 Jordan blocks of size 4.
    Then m = dimension of A = 4*1 + 7*2 + 0*3 + 8*4 = 50, and
      d_1 = dim(S_1) = 4*1 + 7*1 + 0*1 + 8*1 = 19
      d_2 = dim(S_2) = 4*1 + 7*2 + 0*2 + 8*2 = 34
      d_3 = dim(S_3) = 4*1 + 7*2 + 0*3 + 8*3 = 42
      d_4 = dim(S_4) = 4*1 + 7*2 + 0*3 + 8*4 = 50
    Thus e_4 = d_4 - d_3 = 8 so we have e_4       = 8 Jordan blocks of size 4
         e_3 = d_3 - d_2 = 8 so we have e_3 - e_4 = 0 Jordan blocks of size 3
         e_2 = d_2 - d_1 =15 so we have e_2 - e_3 = 7 Jordan blocks of size 2
         e_1 = d_1 - d_0 =19 so we have e_1 - e_2 = 4 Jordan blocks of size 1

A complete proof of Thm 1 would use Lemma 5 inductively, extracting the
Jordan blocks of size k, then k-1, and so on.


We end the course with final comments on available software for 
linear algebra. In practice after taking this class you may want
to use the various decompositions we learned about to solve linear
systems of equations, solve least-squares problems, find 
eigenvalues and eigenvectors, etc. Good software for these problems is
widely available. Some of the most widely used software, for
example used inside the interactive system Matlab, and the basis
of mathematical software libraries that the computer vendors like
IBM, HP, Intel etc. distribute, is called LAPACK. LAPACK has been
developed over many years in a joint project between UC Berkeley and
U Tennessee, and with contributors in other universities, companies 
and countries. LAPACK (and its parallel version ScaLAPACK) are
freely available at
   www.netlib.org/lapack
   www.netlib.org/scalapack
In the easy-to-use Matlab notation, we can summarize the results
of this course as follows:


  To compute the LU decomposition, type  [L,U,P] = lu(A),
       to get P*A = L*U, or A = P^t * L * U
  To compute the QR decomposition, type [Q,R] = qr(A), 
       to get A = Q*R
  To compute the Schur decomposition, type [U,T] = schur(A), 
       to get A = U * T * U^*
  To diagonalize a matrix (if possible) type [U,Lambda] = eig(A)
       to get A = U * Lambda * U^{-1}
  To compute the Jordan Form, type [V,J] = jordan(A)
       to get A = V * J * V^{-1}
  To compute a function F = f(A) of a matrix, type F = funm(A,f)
       where f is another scalar function (there are special
       versions called expm, sqrtm and logm for those functions)

For details on any of these, type "help lu", for example, in Matlab.

The Jordan Form is harder to compute, for reasons better explained in a 
course on numerical methods like Ma128 or Ma221, and only works effectively 
on relatively small matrices with relatively small integer entries, and 
is generally much slower than the other factorizations.  Many of the things 
you might use Jordan Form for can also be done with the Schur form, 
if the matrix is not diagonalizable. Again, see Ma128 or Ma221 for details.
\end{verbatim}