Math 110 - Fall 05 - Lectures notes # 13 - Sep 28 (Wednesday)

We continue studying the connection between linear transformations 
T:V -> W between finite dimensional vector spaces and matrices,
as well as the connection between the vector space of all 
linear transformations L(V,W) from V to W, and the corresponding 
vector space of all matrices M_{m x n}(F).
Note that for this correspondence to make sense, we need
m = dim(W), n = dim(V) and F is the common field for the 
vector spaces V and W.

Last time we showed that given an ordered basis 
   beta  = {v_1,...,v_n} for V 
we could write any v in V uniquely as v = sum_{i=1 to n} a_i*v_i, and
so represent v by its coordinate vector of coefficients a_i relative to beta:
   [v]_beta = [a_1; ... ; a_n]
The coordinate vectors are themselves members of the vector space F^n.
Since every v in V has such a unique representation, []_beta : V -> F^n 
is a one-to-one correspondence between V and F^n. 
Its inverse function is easily seen to be

Def: []^beta : F^n -> V is defined by 
         [ [x_1;...;x_n] ]^beta = sum_{i=1 to n} x_i*v_i

We showed []_beta is a linear transformation. It is easy to see that
its inverse is linear too:

Lemma 1: []^beta : F^n -> V is linear
  Proof: [c*x+y]^beta = [ [c*x_1+y_1;...;c*x_n+y_n] ]^beta
                      = sum_{i=1 to n} (c*x_i+y_i)*v_i
                      = sum_{i=1 to n} (c*x_i)*v_i + sum_{i=1 to n} y_i*v_i
                      = c* sum_{i=1 to n} x_i*v_i + sum_{i=1 to n} y_i*v_i
                      = c* [x]^beta + [y]^beta

Similarly, given an ordered basis gamma = {w_1,...,w_m} for W, we
can write any w in W uniquely as w = sum_{j=1 to m} b_j*w_j so that
   [w]_gamma = [b_1 ; ... ; b_m] 
is w's coordinate vector relative to gamma. 
In summary, we have that []_beta: V -> F^n was a linear transformation
and a 1-to-1 correspondence:


                []_beta
   x in V       ------>   [x]_beta in F^n 
               <------  
                []^beta

Similarly, []_gamma: W -> F^m is linear and a 1-to-1 correspondence

                []_gamma
   y in W       ------->   [y]_gamma in F^m 
                <-------
                []^gamma

Given beta and gamma, we also showed there was a linear transformation
   []_beta^gamma: L(V,W) -> M_{m x n}(F)
that took any T in L(V,W) and gave a matrix [T]_beta^gamma in M_{m x n}(F):

              []_beta^gamma
 T in L(V,W) --------------> [T]_beta^gamma in M_{m x n}(F) 

In a moment we will ask, and answer affirmatively, 
the natural question as to whether this operation is 
also a 1-to-1 correspondence between L(V,W) and M_{m x n}(F),
as well as what its inverse is.

Before we do this, we recall that we already showed that 
[T]_beta^gamma corresponds to T in a number of important ways:

   The 0 linear transformation ----> the 0 matrix

   Matrix-vector multiplication by A was "the same"
   as applying T to a vector, provided we use the right coordinate vectors:

      y = T(x)  ----> [y]_gamma = [T]_beta^gamma * [x]_beta


   If T is in L(V,W), and U is in L(W,Z), then we can compose them to get
   S = UT in L(V,Z). If delta = {z_1,...,z_p} is an ordered basis for Z,
     composing linear transformations U and T to get S is the same as 
     multiplying their matrices:

      S = UT   ---->  [S]_beta^delta = [U]_gamma^delta * [T]_beta^gamma

Let us go back to the transformation

              []_beta^gamma
 T in L(V,W) --------------> [T]_beta^gamma in M_{m x n}(F)

and show it is a 1-to-1 correspondence. This is always true,
but for simplicity we will only prove this when
V = F^n with the standard ordered basis, and
W = F^m with the standard ordered basis.

Def: Let A be in M_{m x n}(F). Define L_A: F^n -> F^m by
     matrix-vector multiplication (by A on the left)
     L_A(x) = L_A([x_1;...;x_n])
            = A*[x_1;...;x_n]
            = [ sum_{i=1 to n} a_1i*x_i ; ... ; sum_{i=1 to n} a_mi*x_i ]

First, a lemma, hopefully familiar from Ma54:

Lemma 2: L_A: F^n -> F_m is a linear transformation:
       L_A(c*x+y) = c*L_A(x) + L_A(y),
     Proof:  L_A(c*x+y) = A*(c*x+y)  ... def of L_A
                   = A*[c*x_1 + y_1 ; ... ; c*x_n + y_n]
                   = [sum_{i=1 to n} a_1i * (c*x_i + y_i) ; 
                      sum_{i=1 to n} a_2i * (c*x_i + y_i) ;  
                        ...
                      sum_{i=1 to n} a_mi * (c*x_i + y_i)]
                        ... def of matrix-vector multiplication
                   =  
                     [c*sum_{i=1 to n} a_1i*x_i + sum_{i=1 to n} a_1i*y_i;
                      c*sum_{i=1 to n} a_2i*x_i + sum_{i=1 to n} a_2i*y_i;
                        ...
                      c*sum_{i=1 to n} a_mi*x_i + sum_{i=1 to n} a_mi*y_i]
                   = c*[sum_{i=1 to n} a_1i*x_i;...;sum_{i=1 to n} a_mi*x_i]
                     + [sum_{i=1 to n} a_1i*y_i;...;sum_{i=1 to n} a_mi*y_i]
                   = c*A*x + A*y
                   = c*L_A(x) + L_A(y)

The last lemma lets us think of L_ as a mapping that takes any
matrix A in M_{m x n}(F) and produces a linear transformation in L(F^n,F^m).
The next Lemma shows that this mapping is linear:


Lemma 3: Suppose A and B are in M_{m x n}(F), x in F^n, then
         L_(c*A+B) = c*L_A + L_B
         (to interpret this, note that both sides are members of L(F^n,F^m))
     Proof: We show that for all x in F^n, L_(c*A+B)(x) = c*L_A(x) + L_B(x):
         L_(c*A+B)(x) = (c*A+B)*x   ... def of L_
                = [ sum_{i=1 to n} (c*a_1i + b_1i)*x_i ;
                      ...
                    sum_{i=1 to n} (c*a_mi + b_mi)*x_i ]
                = [ c*sum_{i=1 to n} a_1i*x_i + sum_{i=1 to n} b_1i*x_i;
                      ...
                    c*sum_{i=1 to n} a_mi*x_i + sum_{i=1 to n} b_mi*x_i]
                = c*[sum_{i=1 to n} a_1i*x_i; ... ; sum_{i=1 to n} a_mi*x_i]
                  + [sum_{i=1 to n} b_1i*x_i; ... ; sum_{i=1 to n} b_mi*x_i]
                = c*(A*x) + (B*x)
                = c*L_A(x) + L_B(x)   ... def of L_

Thm:  Let V = F^n with the standard ordered basis beta, and
      let W = F^m with the standard ordered basis gamma. Then
      []_beta^gamma: L(F^n,F^m) -> M_{m x n}(F) is a 1-to-1 correspondence,
      with inverse L_ : M_{m x n}(F) -> L(F^n,F^m)
  Proof: We need to show that for any A in M_{m x n}(F), and any T in L(f^n,F^m),
         (1) [ L_A ]_beta^gamma = A and (2) L_([T]_beta^gamma) = T
      
         (1): Let e_ni be i-th standard ordered basis vector of F^n,
              and e_mi be i-th standard ordered basis vector of F^m.
              We compute [ L_A ]_beta_gamma as follows:
                 L_A(e_nj) = L_A([0;...0;1;0;...;0])  
                        ... with 1 in j-th location, rest zero
                    = A*[0;...0;1;0;...;0]   ... by def of L_A
                    = [ a_1j ; a_2j ; ... ; a_mj ]  
                    = sum_{i=1 to m} a_ij * e_mi    ... by def of e_mi
              so the (i,j)th entry of [ L_A ]_beta^gamma is a_ij, 
              by the definition of []_beta^gamma, as desired

         (2): T(x) = T(sum_{j=1 to n} x_j*e_nj)  ... by def of x
                   = sum_{j=1 to n} x_j*T(e_nj)  ... by linearity of T
                   = sum_{j=1 to n} x_j* sum_{i=1 to m} A_ij*e_mi  
                        ... by def of A = [T]_beta^gamma
                   = sum_{j=1 to n} sum_{i=1 to m} A_ij*x_j*e_mi  
                        ... move x_j into summation
                   = sum_{i=1 to m} sum_{j=1 to n} A_ij*x_j*e_mi  
                        ... reverse order of summation
                   = sum_{i=1 to m} e_mi * ( sum_{j=1 to n} A_ij*x_j )
                        ... move e_mi out of summation
                   = [sum_{j=1 to n} A_1j*x_j ;...; sum_{j=1 to n} A_mj*x_j]
                        ... by def of e_mi
                   = L_A(x)  ... by def of L_A = L_{[T]_beta^gamma}

The connection between applying L_A to a vector and multiplying A times
a vector can be used to derive:

Corollary:
  (1) L_A = L_B iff A = B
  (2) L_{AB} = L_A L_B
  (3) A(BC) = (AB)C, i.e. matrix-matrix multiplication is associative
Proof:(1) A = [L_A]_beta^gamma  ... by part 1 of Thm
            = [L_B]_beta^gamma  ... since L_A = L_B
            = B                 ... by part 1 of Thm
          Conversely, A=B implies L_A=L_B by the definition of L_
      (2) It suffices to show that L_{AB}(e_j) = (L_A L_B)(e_j) for
           every standard ordered basis vector e_j:

           L_{AB}(e_j) = (AB)*[0;...;1;...;0]   
                               ... def of L_{AB} and e_j
                       = [(AB)_1j; (AB)_2j; ... ; (AB)_mj ]
                              ... def of matrix-vector multiply AB
                       = [sum_{i=1 to n} A_1i*B_ij; 
                          sum_{i=1 to n} A_2i*B_ij; ... ;
                          sum_{i=1 to n} A_mi*B_ij ]  
                           ... def of matrix-matrix multiply
                       = A * [B_1j ; B_2j ; ... B_nj ]
                           ... def of matrix-vector multiply by A
                       = A * (B * e_j)
                           ... def of matrix-vector multiply by B, e_j
                       = L_A ( L_B( e_j ) )

      (3) It suffices to show L_{A(BC)} = L_{(AB)C}, because of part (1):
           L_{A(BC)} = L_A L_BC          ... by part (2)
                     = L_A ( L_B L_C )   ... by part (2) again
                     = ( L_A L_B ) L_C   ... by associativity of
                                             function composition (see App B)
                     = ( L_AB ) L_C      ... by part (2)
                     = L_{(AB)C}         ... by part (2)

Example: A graph G is a set of vertices some of which are connected
by edges.  For example, the set of cities (vertices) along with highways 
connecting them is a graph. Another graph is the Web, viewed as 
a set of web pages (vertices) connected by links that you can click on
to get from one page to another (edges). An edge has a direction,
eg it may be possible to get from vertex i to vertex j but not
backwards. Thus a highway connecting city i and city j 
creates 2 edges, one from i to j and one from j to i
(unless one direction of the highway is closed by an accident, for example).

The simplest questions we can ask about a graph G are these:
Can you get from vertex i (a city or web page) to vertex j
directly, i.e. by using one edge? If not, what is the fewest
number of edges you have to use to get from i to j? Or is
there no way to get from i to j? If one can get from i to j,
how many ways are there to do it?

We will answer these questions by reducing them to matrix-matrix multiply.
You probably depend on this every day, as does anyone else who uses Google.

Def: Suppose a graph G has n vertices (numbered from 1 to n).
The incidence matrix A of G is an n x n matrix defined as follows:
   A_ij = { 1 if there is an edge connecting vertex i to vertex j
          { 0 if there is no edge connecting vertex i to vertex j
          { 0 if i = j

The matrix A records the answer to the simplest question about G:
it is possible to go directly from i to j iff A_ij = 1.

Now consider taking exactly two edges to from from i to j, i.e.
taking a path from i to k to j for some vextex k. Such
a path exists if there is a path from i to k (i.e. iff A_ik=1)
and from k to j (i.e. iff A_kj=1), i.e. iff A_ik * A_kj = 1.
If no such path exists then A_ik * A_kj = 0. 
Note that this includes the possibility of asking if there is
a path of length 2 from i to i (which may exist if there are
edges from i to j and j to i for some j neq i).
Thus we can count the number of such paths for each possible value of k:
   number of paths of length 2 from from i to j 
      = sum_{k = 1 to n except i and j} A_ik * A_kj
      = sum_{k = 1 to n} A_ik * A_kj   ... since A_ii = A_jj = 0
      = (A*A)_ij
In other words the matrix A*A = A^2 has entries that count the 
number of length 2 paths between vertices: 
   (A^2)_ij = # paths from i to j of length 2

Applying induction, we can show

Thm: If A is the incidence matrix of a graph, then (A^m)_ij is the 
     number of paths of exactly length m from i to j
 Proof: We have already shown this for m=1 and 2. Now assume it is 
        true for m, and prove it for m+1:
        # paths of length m+1 from i to j
           = sum_{k=1 to n except j} 
              (# paths of length m from i to k, if there is also a path k to j)
                ... note that k=i is possible
           = sum_{k=1 to n except j} 
              (# paths of length m from i to k) * A_kj   ... by def of A
           = sum_{k=1 to n except j} 
              (A^m)_ik * A_kj     ... by def of A^m
           = sum_{k=1 to n} 
              (A^m)_ik * A_kj     ... since A_kk = 0
           = (A^m*A)_ij           ... by def of matrix-matrix multiply
           = (A^{m+1})_ij         ... by def of A^{m+1}

ASK & WAIT: suppose there are 4 vertices, with edges 1 -> 2 <-> 3 -> 4,
i.e. from 1 to 2, 2 to 3, 3 to 2, and 3 to 4.
How many paths of length 5 are there from 1 to 4?

Corollary: If A is the incidence matrix of a graph then
   (A^m + A^{m-1} + ... + A)_ij is the number of paths of length
    at most m from i to j.

All the answers to our questions so far have been expressed using
products of matrices.  Sometimes it is useful to express the answers
in terms of multiplying matrices times vectors, because these
take less time to compute. We discuss these costs for a moment.

Multiplying two n-by-n matrices C = A*B means evaluating the formula
   C_ij = sum_{k=1 to n} A_ik*B_kj
for i and j varying between 1 and n. Doing this in the most obvious
way costs n multiplications and n-1 additions for each C_ij, or
2n-1 arithmetic operations in all. Doing this for all n^2 different C_ij
therefore costs 2n^2(n-1) ~ 2n^3 arithmetic operations. If n > 8 billion,
as it does for the Google matrix, then 2n^3 > 10^30. If we could
somehow use a million 10GHz computers (faster than current PCs),
i.e. computers that do 10^10 operations/second,
it would take 
   10^30 operations/(10^6 * 10^10 operations/second 
                          * 60 seconds/minute
                          * 60 minutes/hours
                          * 24 hours/day
                          * 365 days/year)
  > 3 million years
to multiply the Google matrix times itself. 
This is too long to be useful. One way to do less work
is to take advantage of the fact that most entries are
zero, and just skip them. Another way is to to
matrix-vector multiplication. This is because y = A*x or
   y_i = sum_{j=1 to n} A_ij*x_j
still costs 2n-1 arithmetic operations for each y_i,
but there are only n y_i, for a total cost of 2n(n-1) ~ 2n^2,
a factor of n less. So instead of 3 million years, it would take
  3 million years / 8 billion ~ 3 hours
This is a lot better. But Google still needs to avoid multiplying
by all the zeros to make this practical.

So let's compute the number of paths from i to j from the 
last Corollary, just using matrix-vector multiply.
According to the Corollary, the answer is 
  (A^m + ... + A)_ij = i-th entry of the j-th column of A^m+...+A
                     = i-th entry of (A^m+...+A)*e_j
                       where e_j is the j-th standard basis vector
    = i-th entry of A^m*e_j + A^(m-1)*e_j + ... + A*e_j
    = i-th entry of A*(A*(A...(A*e_j)...) + ... + A*(A*e_j)) + A*e_j
This expression means that we only need to multiply A times a vector
m times to get what we want
    v = e_j
    s = 0
    repeat m times:
       v = A*v
       s = s+v
Induction shows that at the end, s = A^m*e_j + ... + A^j*e_j