EIGENVECTORS
============
[I don't know if you were properly taught about eigenvectors here at Berkeley,
 but I sure don't like the way they're taught in most linear algebra books.
 So I'll start with a review.  You all know the definition of an eigenvector:]

Given matrix A, if Av = lambda v for some vector v != 0, scalar lambda, then
v is an _eigenvector_ of A and lambda is the associated _eigenvalue_ of A.

[But what does that mean?  It means that v is a magical vector that, after
 being multiplied by A, still points in the *same*direction*, or in exactly
 the *opposite*direction*.]

Eigenvalue 2:                                                             3   /
                                                                         A v /
          ^                   ^                   ^                   ^     /  
          |                   |                   |                   |    /   
          |                   |                   |   / 2             |   /    
          |                   |                   |  / A v            |  /     
          |  v                | / Av              | /                 | /      
          |/                  |/                  |/                  |/       
  <-------+------->   <-------+------->   <-------+------->   <-------+------->
          |                   |                   |                   |        
          v                   v                   v                   v        

Eigenvalue -1/2:

          ^                   ^                   ^                   ^        
          |                   |                   |                   |        
          |               \   |                   |                   |        
          |             Aw \  |                   |               3   |        
          |                 \ |                   |              A w  |        
          |                  \|                   |                  \|        
  <-------+------->   <-------+------->   <-------+------->   <-------+------->
          |\                  |                   |\                  |        
          | \                 |                   | \  2              |        
          |  \                |                   |   A w             |        
          |   \               |                   |                   |        
          |    \              |                   |                   |        
          v     \ w           v                   v                   v        
                 \
                  \

[For most matrices, most vectors don't have this property.  So the ones that do
 are special, and we call them eigenvectors.]
[Clearly, when you scale an eigenvector, it's still an eigenvector.  Only the
 direction matters, not the length.  Let's look at a few consequences.]

Theorem:  if v is eigenvector of A w/eigenvalue lambda,
        then v is eigenvector of A^k w/eigenvalue lambda^k     [will use later]

         2                          2
Proof:  A  v = A (lambda v) = lambda  v, etc.

Theorem:  moreover, if A is invertible,
          then v is eigenvector of A^-1 w/eigenvalue 1 / lambda^k

         -1       1     -1        1                [Look at the figures above,
Proof:  A   v = ------ A   Av = ------ v            but go from right to left.]
                lambda          lambda

[Stated simply:  When you invert a matrix, the eigenvectors don't change, but
 the eigenvalues get inverted.  When you square a matrix, the eigenvectors
 don't change, but the eigenvalues get squared.]

[Those theorems are pretty obvious.  The next theorem is not obvious at all.
 But it's going to be very useful for understanding the effect of a symmetric
 matrix on a vector that is *not* an eigenvector.]

_Spectral_Theorem_:  every symmetric n-by-n matrix has n eigenvectors that are
                                                 T
                     mutually orthogonal, i.e., v  v  = 0   for all i != j
                                                 i  j

[This takes about a page of math to prove.
 One minor detail is that a matrix can have more than n eigenvector directions.
 If two eigenvectors happen to have the same eigenvalue, then every linear
 combination of those eigenvectors is also an eigenvector.  Then you have
 infinitely many eigenvector directions, but they all span the same plane.
 So you just arbitrarily pick two vectors in that plane that are orthogonal to
 each other.  By contrast, the set of eigenvalues is always uniquely determined
 by a matrix, including the multiplicity of eigenvalues.]

We can use them as a basis for R^n.

[Now we can ask, what happens to a vector that *isn't* an eigenvector when
 you apply a symmetric matrix to it?  Express that vector as a linear
 combination of eigenvectors, and look at each eigenvector separately.]      #
                                                                         3  # /
                                                                        A x# /
          ^              Ax # ^                   ^                   ^    #/  
          |                 # |                   |                   |   #/   
          |               \  #|                   |   /               |  #/    
          |                \ #|                   |  /     2          | #/     
          |                 \ # /                 | /  ## A x         | #      
          |/                 \#/                  |/ ##              \|#       
  <-------+------->   <-------+------->   <-------+##----->   <-------+------->
          |#                  |                   |\                  |        
          | ##                |                   | \                 |        
          |  \#  x = v + w    |                   |                   |        
          |   \#              |                   |                   |        
          |    \##            |                   |                   |        
          v     \ #           v                   v                   v        
                 \ #
                  \

[Every time we apply A to this vector, it changes direction.  We can understand
 it by writing it as a sum of components that don't change direction.]

Write x as linear combo of eigenvectors:

  x = alpha v + beta w

   k                k                k
  A x = alpha lambda  v + beta lambda  w
                    v                w


Ellipsoids
----------
[Now, let's look what happens to a quadratic function when we apply a symmetric
 matrix to the space, with these two eigenvectors and eigenvalues.]

          T
  f(x) = x x                  <== quadratic; isotropic; isosurfaces are spheres
  g(x) = f(Ax)                <== A symmetric

          T  2                                                    2
       = x  A  x              <== _quadratic_form_ of the matrix A
                                  anisotropic; isosurfaces are ellipsoids

[Show isocontours for f(x) = |x|^2 and g(x) = |Ax|^2, where A = [ 3/4   5/4 ]
                                                                [ 5/4   3/4 ]
 Draw the stretch direction (1, 1) & the shrink direction (1, -1).]

[Here's how to think of this:  we stretched the plane on the right along the
 direction with eigenvalue 2, and shrunk the plane along the direction with
 eigenvalue -1/2; then we drew the circular isocontours, like on the left; then
 we undid the stretching and let the plane spring back to its original shape.
 So the circle turned into an ellipse when the plane sprang back.]

[Looking at the quadratic form is one of the best ways to visually understand
 symmetric matrices and their eigenvectors and eigenvalues.]

g(x) = 1 is an ellipsoid with axes v_1, v_2, ..., v_n and
                              radii 1/lambda_1, 1/lambda_2, ... 1/lambda_n

because if v  has length 1/lambda , g(v ) = f(Av ) = f(lambda  v ) = 1
            i                    i     i        i            i  i

        ==>  v_i  lies on the ellipsoid

[The reason the radii are the reciprocals of the eigenvalues is that we're
 stretching the plane by the eigenvalues, then drawing the spheres, then
 letting the plane spring back to its original shape.  When the plane springs
 back, each axis of the spheres gets scaled by 1/eigenvalue.]

bigger eigenvalue   <==>   steeper slope   <==>   shorter ellipsoid radius
                         [ ^ really bigger curvature; slope varies along axis]

Alternate interpretation:  ellipsoids are spheres in _distance_metric_ A^2

Call M = A^2 a _metric_tensor_ because
the distance between samples x & z in stretched space is

                                    T
  d(x, z) = |Ax - Az| = sqrt{(x - z)  M (x - z)}

[This is the Euclidean distance in the stretched space, but let's think of it
 as an alternative metric for measuring distances in the original space.  It's
 a kind of distance from x to z that's different from the Euclidean distance.]

[I'm calling M a "tensor" because that's standard usage in Riemannian geometry,
 but don't worry about what "tensor" means.  For our purposes, it's a matrix.]

Ellipsoids are "spheres" in this metric:  {x : d(x, center) = isovalue}


A square matrix B is _positive_definite_     if w^T B w > 0  for all w != 0.
                                             <==>   all eigenvalues positive
                     _positive_semidefinite_ if w^T B w >= 0 for all w.
                                             <==>   all eigenvalues nonnegative
                     _indefinite_            if +ve eigenvalue & -ve eigenvalue
                     invertible              if no zero eigenvalue

[Show figures of ellipses for +ve definite, +ve semidefinite, indefinite
 matrices, and inverse of +ve definite matrix; separate "whiteboard".
 Positive eigenvalues correspond to axes where the curvature goes up; negative
 eigenvalues correspond to axes where the curvature goes down.]

Metric tensors must be symmetric +ve definite (SPD).

[Remember that M = A^2, so M's eigenvalues are the squares of the eigenvalues
 of A, so the eigenvalues must be nonnegative and M is positive semidefinite.
 But if M has a zero eigenvalue, its distance function is not a "metric".
 To have a metric, you must have a strictly positive definite M.  If you have
 eigenvalues of zero, the isosurfaces are cylinders instead of ellipsoids.]

Special case:  M & A are diagonal   <==>  eigenvectors are coordinate axes
                                    <==>  ellipsoids are _axis-aligned_

[Draw axis-aligned isocontours for a diagonal metric.]

Building a Quadratic w/Specified Eigenvectors/values
----------------------------------------------------
[I, personally, find the process of going from eigenvectors and eigenvalues to
 a matrix and some ellipsoids to be more intuitive than the reverse.  So let's
 do that.  Suppose you know which ellipsoid axes you want to use, and you know
 what ellipsoid radius or stretch factor you want to use along each axis.]

Choose n mutually orthogonal *unit* n-vectors v_1, ..., v_n
  [so they specify an orthonormal coordinate system]
Let V = [v_1   v_2   ...   v_n]                               <== n-by-n matrix
Observe:  V^T V = I           [off-diagonal 0's because vectors are orthogonal]
                              [diagonal 1's because unit vectors]
==>   V^T = V^-1   ==>   V V^T = I
V is _orthonormal_matrix_:  acts like rotation (or reflection)

Choose some inverse radii lambda_i:
             -                                      -
             | lambda_1      0       ...      0     |
Let Lambda = |    0       lambda_2            0     |          [diagonal matrix
             |   ...                                |           of eigenvalues]
             |    0          0       ...   lambda_n |
             -                                      -

                        T    n               T
Theorem:  A = V Lambda V  = sum lambda_i v  v   has chosen eigenvectors/values
                            i=1           i  i       [Clearly, A is symmetric]
               ^       ^                 \___/
          equivalent   |             outer product:  n-by-n matrix, rank 1
               v       |
Proof:  AV = V Lambda  |       <== definition of eigenvectors! (in matrix form)
                       |
This is a _matrix_factorization_ called the _eigendecomposition_.
Lambda is the _diagonalized_ version of A.
V^T rotates ellipsoid to be axis-aligned.

This is also a recipe for building quadratics with axes v_i, radii 1 / lambda_i

Observe:  M = A^2 = V Lambda V^T V Lambda V^T = V Lambda^2 V^T

Given SPD metric tensor M, we can find a symmetric _square_root_ A = M^{1/2}:
- compute eigenvectors/values of M
- take square roots of M's eigenvalues
- reassemble matrix A

[The first step--breaking a matrix down to its eigenvectors and eigenvalues--is
 much harder than the last step--building up a new matrix from its eigenvectors
 and eigenvalues.  But I think that the latter process helps take a lot of the
 mystery out of eigenvectors.]


ANISOTROPIC GAUSSIANS
=====================
  X ~ N(mu, Sigma)

                     1                      1         T      -1
  P(x) = -------------------------- exp ( - - (x - mu)  Sigma   (x - mu) )
         sqrt(2 pi)^d sqrt(|Sigma|)         2
                           ^ determinant of Sigma

Sigma is the SPD _covariance_matrix_.
Sigma^-1 is the SPD _precision_matrix_; serves as metric tensor.

[Show example of paraboloid q(x) and bivariate Gaussian n(q(x)), w/univariate
 Gaussian n(.) between them.]