EIGENVECTORS ============ [I don't know if you were properly taught about eigenvectors here at Berkeley, but I sure don't like the way they're taught in most linear algebra books. So I'll start with a review. You all know the definition of an eigenvector:] Given matrix A, if Av = lambda v for some vector v != 0, scalar lambda, then v is an _eigenvector_ of A and lambda is the associated _eigenvalue_ of A. [But what does that mean? It means that v is a magical vector that, after being multiplied by A, still points in the *same*direction*, or in exactly the *opposite*direction*.] Eigenvalue 2: 3 / A v / ^ ^ ^ ^ / | | | | / | | | / 2 | / | | | / A v | / | v | / Av | / | / |/ |/ |/ |/ <-------+-------> <-------+-------> <-------+-------> <-------+-------> | | | | v v v v Eigenvalue -1/2: ^ ^ ^ ^ | | | | | \ | | | | Aw \ | | 3 | | \ | | A w | | \| | \| <-------+-------> <-------+-------> <-------+-------> <-------+-------> |\ | |\ | | \ | | \ 2 | | \ | | A w | | \ | | | | \ | | | v \ w v v v \ \ [For most matrices, most vectors don't have this property. So the ones that do are special, and we call them eigenvectors.] [Clearly, when you scale an eigenvector, it's still an eigenvector. Only the direction matters, not the length. Let's look at a few consequences.] Theorem: if v is eigenvector of A w/eigenvalue lambda, then v is eigenvector of A^k w/eigenvalue lambda^k [will use later] 2 2 Proof: A v = A (lambda v) = lambda v, etc. Theorem: moreover, if A is invertible, then v is eigenvector of A^-1 w/eigenvalue 1 / lambda^k -1 1 -1 1 [Look at the figures above, Proof: A v = ------ A Av = ------ v but go from right to left.] lambda lambda [Stated simply: When you invert a matrix, the eigenvectors don't change, but the eigenvalues get inverted. When you square a matrix, the eigenvectors don't change, but the eigenvalues get squared.] [Those theorems are pretty obvious. The next theorem is not obvious at all. But it's going to be very useful for understanding the effect of a symmetric matrix on a vector that is *not* an eigenvector.] _Spectral_Theorem_: every symmetric n-by-n matrix has n eigenvectors that are T mutually orthogonal, i.e., v v = 0 for all i != j i j [This takes about a page of math to prove. One minor detail is that a matrix can have more than n eigenvector directions. If two eigenvectors happen to have the same eigenvalue, then every linear combination of those eigenvectors is also an eigenvector. Then you have infinitely many eigenvector directions, but they all span the same plane. So you just arbitrarily pick two vectors in that plane that are orthogonal to each other. By contrast, the set of eigenvalues is always uniquely determined by a matrix, including the multiplicity of eigenvalues.] We can use them as a basis for R^n. [Now we can ask, what happens to a vector that *isn't* an eigenvector when you apply a symmetric matrix to it? Express that vector as a linear combination of eigenvectors, and look at each eigenvector separately.] # 3 # / A x# / ^ Ax # ^ ^ ^ #/ | # | | | #/ | \ #| | / | #/ | \ #| | / 2 | #/ | \ # / | / ## A x | # |/ \#/ |/ ## \|# <-------+-------> <-------+-------> <-------+##-----> <-------+-------> |# | |\ | | ## | | \ | | \# x = v + w | | | | \# | | | | \## | | | v \ # v v v \ # \ [Every time we apply A to this vector, it changes direction. We can understand it by writing it as a sum of components that don't change direction.] Write x as linear combo of eigenvectors: x = alpha v + beta w k k k A x = alpha lambda v + beta lambda w v w Ellipsoids ---------- [Now, let's look what happens to a quadratic function when we apply a symmetric matrix to the space, with these two eigenvectors and eigenvalues.] T f(x) = x x <== quadratic; isotropic; isosurfaces are spheres g(x) = f(Ax) <== A symmetric T 2 2 = x A x <== _quadratic_form_ of the matrix A anisotropic; isosurfaces are ellipsoids [Show isocontours for f(x) = |x|^2 and g(x) = |Ax|^2, where A = [ 3/4 5/4 ] [ 5/4 3/4 ] Draw the stretch direction (1, 1) & the shrink direction (1, -1).] [Here's how to think of this: we stretched the plane on the right along the direction with eigenvalue 2, and shrunk the plane along the direction with eigenvalue -1/2; then we drew the circular isocontours, like on the left; then we undid the stretching and let the plane spring back to its original shape. So the circle turned into an ellipse when the plane sprang back.] [Looking at the quadratic form is one of the best ways to visually understand symmetric matrices and their eigenvectors and eigenvalues.] g(x) = 1 is an ellipsoid with axes v_1, v_2, ..., v_n and radii 1/lambda_1, 1/lambda_2, ... 1/lambda_n because if v has length 1/lambda , g(v ) = f(Av ) = f(lambda v ) = 1 i i i i i i ==> v_i lies on the ellipsoid [The reason the radii are the reciprocals of the eigenvalues is that we're stretching the plane by the eigenvalues, then drawing the spheres, then letting the plane spring back to its original shape. When the plane springs back, each axis of the spheres gets scaled by 1/eigenvalue.] bigger eigenvalue <==> steeper slope <==> shorter ellipsoid radius [ ^ really bigger curvature; slope varies along axis] Alternate interpretation: ellipsoids are spheres in _distance_metric_ A^2 Call M = A^2 a _metric_tensor_ because the distance between samples x & z in stretched space is T d(x, z) = |Ax - Az| = sqrt{(x - z) M (x - z)} [This is the Euclidean distance in the stretched space, but let's think of it as an alternative metric for measuring distances in the original space. It's a kind of distance from x to z that's different from the Euclidean distance.] [I'm calling M a "tensor" because that's standard usage in Riemannian geometry, but don't worry about what "tensor" means. For our purposes, it's a matrix.] Ellipsoids are "spheres" in this metric: {x : d(x, center) = isovalue} A square matrix B is _positive_definite_ if w^T B w > 0 for all w != 0. <==> all eigenvalues positive _positive_semidefinite_ if w^T B w >= 0 for all w. <==> all eigenvalues nonnegative _indefinite_ if +ve eigenvalue & -ve eigenvalue invertible if no zero eigenvalue [Show figures of ellipses for +ve definite, +ve semidefinite, indefinite matrices, and inverse of +ve definite matrix; separate "whiteboard". Positive eigenvalues correspond to axes where the curvature goes up; negative eigenvalues correspond to axes where the curvature goes down.] Metric tensors must be symmetric +ve definite (SPD). [Remember that M = A^2, so M's eigenvalues are the squares of the eigenvalues of A, so the eigenvalues must be nonnegative and M is positive semidefinite. But if M has a zero eigenvalue, its distance function is not a "metric". To have a metric, you must have a strictly positive definite M. If you have eigenvalues of zero, the isosurfaces are cylinders instead of ellipsoids.] Special case: M & A are diagonal <==> eigenvectors are coordinate axes <==> ellipsoids are _axis-aligned_ [Draw axis-aligned isocontours for a diagonal metric.] Building a Quadratic w/Specified Eigenvectors/values ---------------------------------------------------- [I, personally, find the process of going from eigenvectors and eigenvalues to a matrix and some ellipsoids to be more intuitive than the reverse. So let's do that. Suppose you know which ellipsoid axes you want to use, and you know what ellipsoid radius or stretch factor you want to use along each axis.] Choose n mutually orthogonal *unit* n-vectors v_1, ..., v_n [so they specify an orthonormal coordinate system] Let V = [v_1 v_2 ... v_n] <== n-by-n matrix Observe: V^T V = I [off-diagonal 0's because vectors are orthogonal] [diagonal 1's because unit vectors] ==> V^T = V^-1 ==> V V^T = I V is _orthonormal_matrix_: acts like rotation (or reflection) Choose some inverse radii lambda_i: - - | lambda_1 0 ... 0 | Let Lambda = | 0 lambda_2 0 | [diagonal matrix | ... | of eigenvalues] | 0 0 ... lambda_n | - - T n T Theorem: A = V Lambda V = sum lambda_i v v has chosen eigenvectors/values i=1 i i [Clearly, A is symmetric] ^ ^ \___/ equivalent | outer product: n-by-n matrix, rank 1 v | Proof: AV = V Lambda | <== definition of eigenvectors! (in matrix form) | This is a _matrix_factorization_ called the _eigendecomposition_. Lambda is the _diagonalized_ version of A. V^T rotates ellipsoid to be axis-aligned. This is also a recipe for building quadratics with axes v_i, radii 1 / lambda_i Observe: M = A^2 = V Lambda V^T V Lambda V^T = V Lambda^2 V^T Given SPD metric tensor M, we can find a symmetric _square_root_ A = M^{1/2}: - compute eigenvectors/values of M - take square roots of M's eigenvalues - reassemble matrix A [The first step--breaking a matrix down to its eigenvectors and eigenvalues--is much harder than the last step--building up a new matrix from its eigenvectors and eigenvalues. But I think that the latter process helps take a lot of the mystery out of eigenvectors.] ANISOTROPIC GAUSSIANS ===================== X ~ N(mu, Sigma) 1 1 T -1 P(x) = -------------------------- exp ( - - (x - mu) Sigma (x - mu) ) sqrt(2 pi)^d sqrt(|Sigma|) 2 ^ determinant of Sigma Sigma is the SPD _covariance_matrix_. Sigma^-1 is the SPD _precision_matrix_; serves as metric tensor. [Show example of paraboloid q(x) and bivariate Gaussian n(q(x)), w/univariate Gaussian n(.) between them.]