EIGENVECTORS
============
[I don't know if you were properly taught about eigenvectors here at Berkeley,
but I sure don't like the way they're taught in most linear algebra books.
So I'll start with a review. You all know the definition of an eigenvector:]
Given matrix A, if Av = lambda v for some vector v != 0, scalar lambda, then
v is an _eigenvector_ of A and lambda is the associated _eigenvalue_ of A.
[But what does that mean? It means that v is a magical vector that, after
being multiplied by A, still points in the *same*direction*, or in exactly
the *opposite*direction*.]
Eigenvalue 2: 3 /
A v /
^ ^ ^ ^ /
    /
   / 2  /
   / A v  /
 v  / Av  /  /
/ / / /
<+> <+> <+> <+>
   
v v v v
Eigenvalue 1/2:
^ ^ ^ ^
   
 \   
 Aw \   3 
 \   A w 
 \  \
<+> <+> <+> <+>
\  \ 
 \   \ 2 
 \   A w 
 \   
 \   
v \ w v v v
\
\
[For most matrices, most vectors don't have this property. So the ones that do
are special, and we call them eigenvectors.]
[Clearly, when you scale an eigenvector, it's still an eigenvector. Only the
direction matters, not the length. Let's look at a few consequences.]
Theorem: if v is eigenvector of A w/eigenvalue lambda,
then v is eigenvector of A^k w/eigenvalue lambda^k [will use later]
2 2
Proof: A v = A (lambda v) = lambda v, etc.
Theorem: moreover, if A is invertible,
then v is eigenvector of A^1 w/eigenvalue 1 / lambda^k
1 1 1 1 [Look at the figures above,
Proof: A v =  A Av =  v but go from right to left.]
lambda lambda
[Stated simply: When you invert a matrix, the eigenvectors don't change, but
the eigenvalues get inverted. When you square a matrix, the eigenvectors
don't change, but the eigenvalues get squared.]
[Those theorems are pretty obvious. The next theorem is not obvious at all.
But it's going to be very useful for understanding the effect of a symmetric
matrix on a vector that is *not* an eigenvector.]
_Spectral_Theorem_: every symmetric nbyn matrix has n eigenvectors that are
T
mutually orthogonal, i.e., v v = 0 for all i != j
i j
[This takes about a page of math to prove.
One minor detail is that a matrix can have more than n eigenvector directions.
If two eigenvectors happen to have the same eigenvalue, then every linear
combination of those eigenvectors is also an eigenvector. Then you have
infinitely many eigenvector directions, but they all span the same plane.
So you just arbitrarily pick two vectors in that plane that are orthogonal to
each other. By contrast, the set of eigenvalues is always uniquely determined
by a matrix, including the multiplicity of eigenvalues.]
We can use them as a basis for R^n.
[Now we can ask, what happens to a vector that *isn't* an eigenvector when
you apply a symmetric matrix to it? Express that vector as a linear
combination of eigenvectors, and look at each eigenvector separately.] #
3 # /
A x# /
^ Ax # ^ ^ ^ #/
 #    #/
 \ #  /  #/
 \ #  / 2  #/
 \ # /  / ## A x  #
/ \#/ / ## \#
<+> <+> <+##> <+>
#  \ 
 ##   \ 
 \# x = v + w   
 \#   
 \##   
v \ # v v v
\ #
\
[Every time we apply A to this vector, it changes direction. We can understand
it by writing it as a sum of components that don't change direction.]
Write x as linear combo of eigenvectors:
x = alpha v + beta w
k k k
A x = alpha lambda v + beta lambda w
v w
Ellipsoids

[Now, let's look what happens to a quadratic function when we apply a symmetric
matrix to the space, with these two eigenvectors and eigenvalues.]
T
f(x) = x x <== quadratic; isotropic; isosurfaces are spheres
g(x) = f(Ax) <== A symmetric
T 2 2
= x A x <== _quadratic_form_ of the matrix A
anisotropic; isosurfaces are ellipsoids
[Show isocontours for f(x) = x^2 and g(x) = Ax^2, where A = [ 3/4 5/4 ]
[ 5/4 3/4 ]
Draw the stretch direction (1, 1) & the shrink direction (1, 1).]
[Here's how to think of this: we stretched the plane on the right along the
direction with eigenvalue 2, and shrunk the plane along the direction with
eigenvalue 1/2; then we drew the circular isocontours, like on the left; then
we undid the stretching and let the plane spring back to its original shape.
So the circle turned into an ellipse when the plane sprang back.]
[Looking at the quadratic form is one of the best ways to visually understand
symmetric matrices and their eigenvectors and eigenvalues.]
g(x) = 1 is an ellipsoid with axes v_1, v_2, ..., v_n and
radii 1/lambda_1, 1/lambda_2, ... 1/lambda_n
because if v has length 1/lambda , g(v ) = f(Av ) = f(lambda v ) = 1
i i i i i i
==> v_i lies on the ellipsoid
[The reason the radii are the reciprocals of the eigenvalues is that we're
stretching the plane by the eigenvalues, then drawing the spheres, then
letting the plane spring back to its original shape. When the plane springs
back, each axis of the spheres gets scaled by 1/eigenvalue.]
bigger eigenvalue <==> steeper slope <==> shorter ellipsoid radius
[ ^ really bigger curvature; slope varies along axis]
Alternate interpretation: ellipsoids are spheres in _distance_metric_ A^2
Call M = A^2 a _metric_tensor_ because
the distance between samples x & z in stretched space is
T
d(x, z) = Ax  Az = sqrt{(x  z) M (x  z)}
[This is the Euclidean distance in the stretched space, but let's think of it
as an alternative metric for measuring distances in the original space. It's
a kind of distance from x to z that's different from the Euclidean distance.]
[I'm calling M a "tensor" because that's standard usage in Riemannian geometry,
but don't worry about what "tensor" means. For our purposes, it's a matrix.]
Ellipsoids are "spheres" in this metric: {x : d(x, center) = isovalue}
A square matrix B is _positive_definite_ if w^T B w > 0 for all w != 0.
<==> all eigenvalues positive
_positive_semidefinite_ if w^T B w >= 0 for all w.
<==> all eigenvalues nonnegative
_indefinite_ if +ve eigenvalue & ve eigenvalue
invertible if no zero eigenvalue
[Show figures of ellipses for +ve definite, +ve semidefinite, indefinite
matrices, and inverse of +ve definite matrix; separate "whiteboard".
Positive eigenvalues correspond to axes where the curvature goes up; negative
eigenvalues correspond to axes where the curvature goes down.]
Metric tensors must be symmetric +ve definite (SPD).
[Remember that M = A^2, so M's eigenvalues are the squares of the eigenvalues
of A, so the eigenvalues must be nonnegative and M is positive semidefinite.
But if M has a zero eigenvalue, its distance function is not a "metric".
To have a metric, you must have a strictly positive definite M. If you have
eigenvalues of zero, the isosurfaces are cylinders instead of ellipsoids.]
Special case: M & A are diagonal <==> eigenvectors are coordinate axes
<==> ellipsoids are _axisaligned_
[Draw axisaligned isocontours for a diagonal metric.]
Building a Quadratic w/Specified Eigenvectors/values

[I, personally, find the process of going from eigenvectors and eigenvalues to
a matrix and some ellipsoids to be more intuitive than the reverse. So let's
do that. Suppose you know which ellipsoid axes you want to use, and you know
what ellipsoid radius or stretch factor you want to use along each axis.]
Choose n mutually orthogonal *unit* nvectors v_1, ..., v_n
[so they specify an orthonormal coordinate system]
Let V = [v_1 v_2 ... v_n] <== nbyn matrix
Observe: V^T V = I [offdiagonal 0's because vectors are orthogonal]
[diagonal 1's because unit vectors]
==> V^T = V^1 ==> V V^T = I
V is _orthonormal_matrix_: acts like rotation (or reflection)
Choose some inverse radii lambda_i:
 
 lambda_1 0 ... 0 
Let Lambda =  0 lambda_2 0  [diagonal matrix
 ...  of eigenvalues]
 0 0 ... lambda_n 
 
T n T
Theorem: A = V Lambda V = sum lambda_i v v has chosen eigenvectors/values
i=1 i i [Clearly, A is symmetric]
^ ^ \___/
equivalent  outer product: nbyn matrix, rank 1
v 
Proof: AV = V Lambda  <== definition of eigenvectors! (in matrix form)

This is a _matrix_factorization_ called the _eigendecomposition_.
Lambda is the _diagonalized_ version of A.
V^T rotates ellipsoid to be axisaligned.
This is also a recipe for building quadratics with axes v_i, radii 1 / lambda_i
Observe: M = A^2 = V Lambda V^T V Lambda V^T = V Lambda^2 V^T
Given SPD metric tensor M, we can find a symmetric _square_root_ A = M^{1/2}:
 compute eigenvectors/values of M
 take square roots of M's eigenvalues
 reassemble matrix A
[The first stepbreaking a matrix down to its eigenvectors and eigenvaluesis
much harder than the last stepbuilding up a new matrix from its eigenvectors
and eigenvalues. But I think that the latter process helps take a lot of the
mystery out of eigenvectors.]
ANISOTROPIC GAUSSIANS
=====================
X ~ N(mu, Sigma)
1 1 T 1
P(x) =  exp (   (x  mu) Sigma (x  mu) )
sqrt(2 pi)^d sqrt(Sigma) 2
^ determinant of Sigma
Sigma is the SPD _covariance_matrix_.
Sigma^1 is the SPD _precision_matrix_; serves as metric tensor.
[Show example of paraboloid q(x) and bivariate Gaussian n(q(x)), w/univariate
Gaussian n(.) between them.]