\begin{verbatim}
Math 110 - Fall 05 - Lectures notes # 39 - Dec 2 (Friday)

We begin Chapter 7, starting with a summary of the main points
of the chapter. We will only cover some of it.

In the last chapter we discussed using unitary similarity
transformations Q to either make a complex matrix A upper triangular:
   A = Q * T * Q^*   ... the Schur Factorization
(where Q^* * Q = I and T is upper triangular)
or to diagonalize A if possible, which is possible if and only
if A is "normal": A^* * A = A * A^*

Chapter 7 discusses the similarity transformation that is "closest"
to diagonal, for any A, diagonalizable or not.
This matrix factorization is called the Jordan Canonical Form:
   A = S * J * S^{-1}
Here J is upper triangular, with the eigenvalues on the diagonal,
and possibly with some 1s right above the diagonal, and the
other matrix entries 0:

Thm (Jordan Canonical Form, or JCF). Every square complex
matrix A is similar to a matrix in JCF: A = S * J * S^{-1},
where J has the following properties:
   (1) J = diag(J_1,J_2,...,J_k) is block diagonal. 
       Each J_i is called a Jordan block.
   (2) Each J_i is upper triangular with an eigenvalue lambda_i
       on the diagonal, 1s right above the diagonal, and 0s elsewhere:
         J_i = [ lambda_i    1      0 0  ...         0    ]
               [    0     lambda_i  1 0  ...         0    ]
               [                    ...                   ]
               [    0         ...       lambda_i     1    ]
               [    0         ...          0     lambda_i ]
   (3) The lambda_i may or may not be distinct
Up to the order of the blocks J_i on the diagonal, the JCF is unique. In other 
words, two matrices are similar if and only if they have the same Jordan blocks.

Ex: The following matrices have the same eigenvalues and multiplicities,
but different JCFs, so no two of them are similar:
    A1 = [ 1 0 0 0 ], A2 = [ 1 1 0 0 ], A3 = [ 1 0 0 0 ], A4 = [ 1 1 0 0 ]
         [ 0 1 0 0 ]       [ 0 1 0 0 ]       [ 0 1 0 0 ]       [ 0 1 0 0 ]
         [ 0 0 2 0 ]       [ 0 0 2 0 ]       [ 0 0 2 1 ]       [ 0 0 2 1 ]
         [ 0 0 0 2 ]       [ 0 0 0 2 ]       [ 0 0 0 2 ]       [ 0 0 0 2 ]
In fact, any 4 by 4 matrix with two eigenvalues equal to 1 and 
two eigenavalues equal to 2 must be similar to one of A1 through A4.

Before we prove this Theorem, we show what it is good for:

(1) Computing eigenvectors
(2) Computing A^m explicitly, for any A
(3) Deciding if the limit of A^m exists, as m -> infinity
(4) Computing any function f(A) explicitly, for any A
(5) Solving the differential equation dx(t)/dt = A*x(t) or
    dx(t)/dt = A*x(t) + f(t), explicitly, for any A

(1) Computing eigenvectors: A*x = lambda*x means S*J*S^{-1}*x = lambda*x
    or J*S^{-1}*x = lambda*S^{-1}*x or J*y = lambda*y where y = S^{-1}*x.
    So it suffices to compute eigevectors y of J to get eigenvectors
    x = S*y of A. This means we need to compute the null space of J - lambda*I.

    Let's start with J equal to a single Jordan block with eigenvalue lambda:
      J - lambda*I = [ 0 1 0 ... 0 ]
                     [ 0 0 1 ... 0 ]
                     [     ...     ]
                     [ 0 0 ... 0 1 ]
                     [ 0 0 ... 0 0 ]
    It is easy to see the e_1 is in the null space, since the first
    column of J - lambda*I is 0. Since the other columns of J - lambda*I
    are linearly independent (being e_1 through e_{n-1}), e_1 is
    the only eigenvector of a single Jordan block.

    Now consider the general case J = diag(J_1,...,J_k), where
    each J_i is a Jordan block with eigenvalue lambda_i and dimension n_i.
    Then J - lambda*I = diag(J_1 - lambda*I,...,J_k - lambda*I) 
    where each I has the same dimension n_i as the J_i it is being
    subtracted from. Then (J - lambda*I)*y = 0 means
      0 = diag(J_1 - lambda*I,...,J_k - lambda*I) * [ y_1 ]   n_1
                                                    [ y_2 ]   n_2
                                                    [ ... ]   
                                                    [ y_k ]   n_k
        = [ (J_1 - lambda_I)*y_1 ]   n_1
          [ (J_2 - lambda_I)*y_2 ]   n_2
          [        ...           ]
          [ (J_k - lambda_I)*y_k ]   n_k
    so that each y_i must be in the nullspace of J_i - lambda*I.
    When lambda neq lambda_i, then J_i - lambda*I has 
    lambda_i - lambda neq 0 on its diagonal, and
    so is nonsingular, and so y_i = 0. 
    When lambda = lambda_i, then we showed above that either y_i = 0 or
    y_i = [ 1, 0, ..., 0]^t.

    In other words, any eigenvector for lambda is a linear combination
    of the unique eigenvectors of the Jordan blocks J_i for which
    lambda_i = lambda. If there are m such Jordan blocks, the dimension
    of the eigenspace corresponding to lambda is therefore m.

(2) Computing A^m explicitly, for any A. Since
    A^m = (S * J * S^{-1})^m
        = S * J * S^{-1} * S * J * S^{-1} * ... * S * J * S^{-1}
                 ... m times
        = S * J * J * ... * J * S^{-1}  
                 ... cancelling the m-1 pairs S^{-1} * S
        = S * J^m * S^{-1}
    it suffices to compute J^m. And since
    J^m = [ J_1  0  0 ... 0 ]^m = [ J_1^m   0    0  ...    0   ]
          [  0  J_2 0 ... 0 ]     [   0   J_2^m  0  ...    0   ]
          [        ...      ]     [             ...            ]
          [  0   0 ... 0 J_k]     [   0     0   ...  0   J_k^m ]
    it suffices to compute the m-th power of a single Jordan block.

    Let's do some examples. For simpler notation, we let x be the 
    eigenvalue of the single Jordan block J.
    If J = [ x ] is 1 x 1, then of course J^m = [ x^m ].
    If J = [ x 1 ], then J^2 = [ x^2 2*x ], J^3 = [ x^3 3*x^2 ]
           [ 0 x ]             [  0  x^2 ]        [  0  x^3   ]
    J^4 = [ x^4 4*x^3 ], ...
          [  0  x^4   ]
    A natural conjecture is that J^m = [ x^m  m*x^(m-1) ]
                                       [  0   x^m       ]
    which we could prove by induction. Here is another example:


    Ex:  J = [ 0 1 0 0 0 ], J^2 = [ 0 0 1 0 0 ], J^3 = [ 0 0 0 1 0 ]
             [ 0 0 1 0 0 ]        [ 0 0 0 1 0 ]        [ 0 0 0 0 1 ]
             [ 0 0 0 1 0 ]        [ 0 0 0 0 1 ]        [ 0 0 0 0 0 ]
             [ 0 0 0 0 1 ]        [ 0 0 0 0 0 ]        [ 0 0 0 0 0 ]
             [ 0 0 0 0 0 ]        [ 0 0 0 0 0 ]        [ 0 0 0 0 0 ]
         
         J^4=[ 0 0 0 0 1 ], J^5 = [ 0 0 0 0 0 ]
             [ 0 0 0 0 0 ]        [ 0 0 0 0 0 ]
             [ 0 0 0 0 0 ]        [ 0 0 0 0 0 ]
             [ 0 0 0 0 0 ]        [ 0 0 0 0 0 ]
             [ 0 0 0 0 0 ]        [ 0 0 0 0 0 ]

    Lemma 1: Let N be an n by n Jordan block with eigenvalue = 0. Then 
             N^m is zeros except for the m-th diagonal above the main 
             diagonal (the m-th "superdiagonal"), which is all ones.
    Proof: Write N^(m+1) = N^m * N, and note that since
           N = [ 0, e_1, e_2,... e_{n-1} ], multiplying any 
           matrix A = [a_1, a_2,...,a_n] by N yields
           A*N = [A*0, A*e_1, A*e_2,..., A*e_{n-1}]
               = [ 0 ,  a_1 ,  a_2, ... , a_{n-1} ]
           i.e. it moves all the columns right by one, and makes
           the first column 0.

    Ex:  J = [ 1 1 0 0 0 ], J^2 = [ 1 2 1 0 0 ], J^3 = [ 1 3 3 1 0 ]
             [ 0 1 1 0 0 ]        [ 0 1 2 1 0 ]        [ 0 1 3 3 1 ]
             [ 0 0 1 1 0 ]        [ 0 0 1 2 1 ]        [ 0 0 1 3 3 ]
             [ 0 0 0 1 1 ]        [ 0 0 0 1 2 ]        [ 0 0 0 1 3 ]
             [ 0 0 0 0 1 ]        [ 0 0 0 0 1 ]        [ 0 0 0 0 1 ]
         
         J^4=[ 1 4 6 4 1 ], J^5 = [ 1 5 10 10  5]
             [ 0 1 4 6 4 ]        [ 0 1  5 10 10]
             [ 0 0 1 4 6 ]        [ 0 0  1  5 10]
             [ 0 0 0 1 4 ]        [ 0 0  0  1  5]
             [ 0 0 0 0 1 ]        [ 0 0  0  0  1]

    Note appearance of Pascal's triangle in the rows of these matrices:

                               1
                             1   1
                           1   2   1
                         1   3   3   1
                       1   4   6   4   1
                     1   5  10  10   5   1




    Thm 2: Let J be an n by n Jordan block with eigenvalue x.  Then
             (J^m)_ij = m!/((m-j+i)! *  (j-i)!) * x^(m-j+i) 
                      = (  m  ) * x^(m-j+i) 
                        ( j-i )
                      = "m choose j-i" * x^(m-j+i)
           where we interpret "m choose j-i" = 0 if j-i>m  In other words

                (m)*x^m = x^m  appears all along the main diagonal
                (0)

                (m)*x^(m-1) = m*x^{m-1} appears along superdiagonal 1
                (1)

                (m)*x^(m-2) = m*(m-1)/2*x^{m-2} appears along superdiagonal 2
                (2)
                
                (m)*x^(m-i) = (m!/(i! * (m-i)!)) * x^(m-i)   appears along
                (i)                                          superdiagonal i

    Proof 1: We use the binomial theorem: Write J = x*I + N,
             where N contains just the offdiagonal 1s of J. Then
             J^m = (x*I + N)^m 
                 = sum_{i=0 to m} (m choose i)* (x*I)^(m-i) * N^i
             (The proof that this formula is true is the same
              induction on m as in the scalar case, but depends on
              the fact the x*I and N commute.)
             From Lemma 1 we know that N^i has nonzeros only
             on superdiagonal i (or is zero if i >= n). Thus the sum is
             J^m = sum_{i=0 to min(n-1,m)} (m choose i) * x^(m-i) * N^i
             So since (J^m)_ij is on diagonal j-i, the only
             nonzero part of the sum is
             (J^m)_ij = (m choose j-i) * x^(m-j+i) * (N^(j-i))_ij
                      = (m choose j-i) * x^(m-j+i)  ... as desired.

    Here is a proof that does not assume the binomial theorem:

    Lemma 2: Let J' by an n by n Jordan block with eigenvalue 1. Then
             (J'^m)_ij = m!/((m-j+i)! *  (j-i)!) 
                      = (  m  ) 
                        ( j-i )
                      = "m choose j-i" 
             where we interpret "m choose j-i" = 0 if j-i>m.

    Proof: The proof is the same induction as for Pascal's Triangle
           or the binomial theorem.
           When j = i, (J'^m)_ij = 1 = m!/(m! * 0!) as desired.
           When j > i
            (J'^(m+1))_ij = (J' * J'^m)_ij
                 = sum_{k=1 to n} J'_ik * (J'^m)_kj   
                       ... definition of matrix multiply
                 = sum_{k=i to i+1} (J'^m)_kj
                       ... since only nonzeros in J' are J'_ii = J'_{i,i+1} = 1
                 = (J'^m)_ij + (J'^m)_{i+1,j}
                 = m!/((m-j+i)! * (j-i)!)  *  m!/((m-j+i+1)! * (j-i-1)!)
                 = m!/((m-j+i)! * (j-i-1)!) * ( 1/(j-i) + 1/(m-j+i+1) )
                 = m!/((m-j+i)! * (j-i-1)!) * ( (m+1)/((j-i)*(m-j+i+1)) )
                 = (m+1)!/((m-j+i+1)! * (j-i)!)
                 = "m+1 choose j-i"   ... as desired

    Different proof of Thm 2: We reduce to the case in the above example, 
    where Pascal's Triangle appeared. For simplicity we look at n=5; 
    all other dimensions are similar.  Assume the eigenvalue x neq 0.  
    Let D = diag(1,x,x^2,x^3,x^4).  Then we can easily confirm that

              J = [ x 1 0 0 0 ] = D * [ x x 0 0 0 ] * D^{-1} 
                  [ 0 x 1 0 0 ]       [ 0 x x 0 0 ]
                  [ 0 0 x 1 0 ]       [ 0 0 x x 0 ]
                  [ 0 0 0 x 1 ]       [ 0 0 0 x x ]
                  [ 0 0 0 0 x ]       [ 0 0 0 0 x ]
            
                = x * D * [ 1 1 0 0 0 ] * D^{-1} 
                          [ 0 1 1 0 0 ] 
                          [ 0 0 1 1 0 ] 
                          [ 0 0 0 1 1 ] 
                          [ 0 0 0 0 1 ] 

                = x * D * J' * D^{-1}
     Therefore

     J^m = x^m * D * J' * D^{-1} * D * J' * D^{-1} * ... * D * J' * D^{-1}
                       ... m "copies" of D * J' * D^{-1}
         = x^m * D * J'^m * D{-1}  ... cancelling all pairs D^{-1} * D

     where J'^m looks like the matrix in Lemma 2, with entries from
     Pascal's Triangle. Thus for j >= i
        (J^m)_ij = x^m * D_ii * (J'^m)_ij * (D_jj)^{-1}
                 = x^m * x^(i-1) * ( m ) * x^(1-j)
                                   (j-i)
                 = ( m ) * x^(m-j+i)
                   (j-i)

     Note that (m) * x^(m-k) = (d^k x^m / dx^k ) / k!
               (k)
     In other words, if we let f(x) = x^m, we can say that
        (J^m)_ij = 1/(j-i)!  *  d^(j-i) f(x) / dx^(j-i)
     We will see below that this formula is true for any function f(x), 
     not just x^m.

(3) Deciding if the limit of A^m exists, or stays bounded, as m -> infinity.
    By part 2, it suffices to look at what happens to J^m, for
    J a single Jordan block. We leave the proofs of the following results
    as an exercise:
     (1) lim_{k -> infinity} A^k exists if and only if all the Jordan
         blocks of A either
             have eigenvalue |lambda| < 1, or
             have eigenvalue lambda   = 1 and are 1 by 1
         Furthermore the limit is nonzero if and only if there is
         at least one Jordan block with eigenvalue 1
     (2) A^k remains bounded for all k if and only if all the Jordan
         blocks of A either
             have eigenvalue |lambda| < 1, or
             have eigenvalue |lambda| = 1 and are 1 by 1
     (3) In all other cases, A^k is unbounded as k -> infinity:
         at least one Jordan block of A
             has eigenvalue |lambda| > 1, or
             has eigenvalue |lambda| = 1, and is 2 by 2 or larger.


(4) Computing f(A) explicitly, for any A. Suppose 
        f(A) = sum_{i=0 to d} p_i*A^i.
    where d could be finite (so f(A) is a polynomial) or infinite
    (so the f(A) is given by a Taylor expansion).
    Since we know how to compute A^i, we see that all we need to do 
    is compute f(J) for a Jordan block, since
       A = S * diag(J_1,...,J_k) * S^{-1}
    implies
      f(A) = sum_{i=0 to d} p_i * (S * diag(J_1,...,J_k) * S^{-1})^i
           = sum_{i=0 to d} p_i * S * diag(J_1^i,...,J_k^i) * S^{-1}
           = S * [ sum_{i=0 to d} p_i * diag(J_1^i,...,J_k^i) ] * S^{-1}
           = S * [ diag(f(J_1),...,f(J_k)) ] * S^{-1}
    Continuing, for simplicity let J be a single Jordan block 
    with eigenvalue x. Then
       (f(J))_jj = (sum_{i=0 to d} p_i * J^i)_jj
                 =  sum_{i=0 to d} p_i * (J^i)_jj
                 =  sum_{i=0 to d} p_i * x^i   
                 = f(x)
    and for k>j
       (f(J))_jk = (sum_{i=0 to d} p_i * J^i)_jk
                 =  sum_{i=0 to d} p_i * (J^i)_jk
                 =  sum_{i=0 to d} p_i * (i choose k-j) * x^(i-k+j)
                 =  sum_{i=0 to d} p_i/(k-j)! * d^{k-j} x^i / dx^{k-j}
                 =  1/(k-j)! * d^{k-j} ( sum_{i=0 to d} p_i * x^i ) / dx^{k-j}
                 =  1/(k-j)! * d^{k-j} f(x) / dx^{k-j}
    For example, for n=5, we get
       f(J) = [ f(x) f'(x)/1!  f''(x)/2!  f'''(x)/3!  f''''(x)/4! ]
              [ 0    f(x)      f'(x)/1!   f''(x)/2!   f'''(x)/3!  ]
              [ 0    0         f(x)       f'(x)/1!    f''(x)/2!   ]
              [ 0    0         0          f(x)        f'(x)/1!    ]
              [ 0    0         0          0           f(x)        ]

(5) Solving dx(t)/dt = A*x(t) explicitly. Write A = S * J * S^{-1} so
      dx(t)/dt = S*J*S^{-1}*x(t) 
    or
      S^{-1}*dx(t)/dt = J*S^{-1}*x(t) 
    or
      d(S^{-1}*x(t))/ dt = J*(S^{-1}*x(t))
    or
      dy(t)/ dt = J*y(t)    ... where y(t) = S^{-1}*x(t)
    If J = diag(J_1,...,J_k) where J_i is n_i by n_i, and
      y = [y_1]  n_1
          [y_2]  n_2
          [...]  
          [y_k]  n_k
    then dy(t)/dt = J*y(t) implies that for each i we get
      dy_i(t)/dt = J_i * y_i(t)
    so it suffices to solve a differential equation with a single Jordan block.
    Earlier we showed the solution to the differential equation
      d y(t)/dt = A*y(t)
    for A diagonalizable was 
      y(t) = exp(t*A) * y(0)
           = [ sum_{i=0 to infinity} (t^i * A^i / i!) ] * y(0)
    We can differentiate this term by term to confirm it is true for any A:
      d y(t)/dt = d [ sum_{i=0 to infinity} (t^i * A^i / i!) ] * y(0) ] /dt
                = sum_{i=0 to infinity} d(t^i * A^i / i!)/dt ] * y(0) 
                = sum_{i=0 to infinity} (d(t^i)/dt * A^i / i!) ] * y(0) 
                = sum_{i=0 to infinity} (i*t^(i-1)) * A^i / i!) ] * y(0) 
                = sum_{i=1 to infinity} (t^(i-1)) * A^i / (i-1)!) ] * y(0) 
                = sum_{i=0 to infinity} (t^i * A^(i+1) / i!) ] * y(0) 
                = A * sum_{i=0 to infinity} (t^i * A^i / i!) ] * y(0) 
                = A * y(t)   ... as desired.
    When A = J is a single Jordan block with eigenvalue lambda, then we can write
       exp(t*A)_jk = 1/(k-j)! * t^(k-j) * exp(t*lambda)
\end{verbatim}