CS267: Notes for Lecture 23(b), Apr 9, 1996

Spectral partitioning - Statement and Proof of Theorem 2

Theorem 2. Given a connected graph G = (N,E), partition its nodes into N- and N+ using the spectral bisection algorithm. Then N- is connected. If no component v2(n) of the second eigenvector v2 is zero, then N+ is also connected.

To prove this theorem, we need several other standard results from linear algebra, some of which we state without proof.

Definition. The spectral radius rho(A) of a matrix A is the largest absolute value of any eigenvalue:

      rho(A) = maxi | lambdai (A) |

Definition. A nonnegative matrix A is a matrix all of whose entries are nonnegative. This is written A >= 0. A positive matrix A is a matrix all of whose entries are positive, written A>0. We also refer to nonnegative and positive vectors, with similar notation.

Definition. The graph G(A) of an n-by-n matrix A is a graph with n nodes, and an edge e=(i,j) if and only if A(i,j) != 0.

Lemma 1.Let A by an n-by-n nonnegative matrix, and suppose G(A) is connected. Then summ=0,...,n-1 Am is a positive matrix.

Proof of Lemma 1. The (i,j) entry of Am is a sum of many terms of the form

    A(i,k1) * A(k1,k2) * A(k2,k3) *...* 
              * A(km-2,km-1) * A(km-1,j)
where the sum is over all nm-1 combinations 1 <= kq <= n, 1 <= q <= m-1. Each such term is nonnegative since A is nonnegative. Consider m=2. Then A(i,k)*A(k,j) will be positive if there is a path from i to j in G(A) of length 2, namely a path through k. Similarly, (Am)(i,j) will be positive if there is a path of length m connecting i and j. If G(A) is connected, then there is a path of length at most n-1 connected every pair of nodes. The statement of the lemma follows. QED

Definition. A symmetric matrix with all nonnegative eigenvalues is called positive semidefinite. If the eigenvalues are all positive, it is called positive definite.

Lemma 2. If A is n-by-n and symmetric with eigenvalues lambda1 <= ... <= lambdan, then

       lambda1 = minv!=0  v'*A*v / v'*v
       lambdan = maxv!=0  v'*A*v / v'*v

Proof of Lemma 2. It follows simply from the eigendecomposition A = Q*Lambda*Q', where Q is an orthogonal matrix whose columns are eigenvectors, and Lambda = diag(lambda1,...,lambdan), using the substitution

   v'*A*v / v'*v 
         = v'*Q*Lambda*Q'*v / v'*Q*Q'*v 
         = y'*Lambda*y / y'*y
           sumi=1,...,n lambda(i)*y(i)2 
         = ----------------------------
               sumi=1,...,n y(i)2
Details are left to the reader.

Cauchy Interlace Theorem (R. Horn and C. Johnson, "Matrix Analysis", 1988). Let A be an n-by-n symmetric matrix with eigenvalues lambda1 <= ... <= lambdan. Let B = A(1:n-1,1:n-1), the leading (n-1)-by-(n-1) submatrix of A. Let the eigenvalues of B be mu1 <= ... <= mun-1. Then for all i, lambdai <= mui <= lambdai+1. Applying this result recursively, we can show that if C = A(i:j, i:j) for any i and j, and the eigenvalues of C are chi1 <= ... <= chij-i+1, then A has at least k eigenvalues <= chik. In particular lambda1 <= chi1.

Corollary to the Cauchy Interlace Theorem. Let the symmetric matrix A be positive (semi)definite. Then any submatrix C=A(i:j,i:j) is also positive (semi)definite.

Lemma 3. If A is symmetric and positive (semi)definite, so is X'*A*X for any nonsingular matrix X.

Proof of Lemma 3. From Lemma 2, the smallest eigenvalue of X'*A*X is

       minv!=0 v'*X'*A*X*v / v'*v
     = minv!=0 ( v'*X'*A*X*v / v'*X'*X*v ) * 
       ( v'*X'*X*v / v'*v )
    >= minv!=0 ( v'*X'*A*X*v / v'*X'*X*v ) * 
       minv!=0 ( v'*X'*X*v / v'*v )
     = minXv!=0 ( v'*X'*A*X*v / v'*X'*X*v ) * 
       minv!=0 ( v'*X'*X*v / v'*v )
     = lambda1(A) * lambda1(X'*X)
Since v'*X'*X*v = (X*v)'*(X'*v) is a sum of squares, it is nonnegative. Thus lambda1(X'*X) >= 0. Since X is nonsingular, so is X'*X, so it can't have a zero eigenvalue. Thus lambda1(X'*X) > 0. The result follows. QED

Lemma 4. If A is symmetric matrix with rho(A) < 1, then I-A is invertible and

     (I-A)-1 = sumi=0,...,infinity Ai

Proof of Lemma 4. Since the eigenvalues of A are strictly between -1 and 1, the eigenvalues of I-A are strictly between 0 and 2, so I-A is positive definite and so nonsingular. Writing the eigendecomposition A = Q*Lambda*Q', we see that Ai = Q*Lambdai*Q', so the entries of Ai go to zero geometrically, like rho(A)i or faster. Thus sumi=0,...,infinity Ai converges. Since

     (I-A) * sumi=0,...,m Ai = I - Am+1
it is easy to see that S(m) = sumi=0,...,m Ai converges to (I-A)-1, since (I-A)*S(m) converges to I. QED

Partial proof of Theorem 2. (M. Fiedler, "A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory", Czech. Math. J. 25:619--637, 1975.) We consider the special (but generic) case where v2 is unique (modulo multiplication by a scalar) and v2 has only nonzero entries. We will use proof by contradiction: Assume that N+ is not connected, and in fact consists of k connected components. Suppose for illustration that k=2 (the general case is no harder). Then we can renumber the rows and columns of A so that

             n1   n2     n3
           [ A11   0    A13 ] n1              [ v1 ] n1
   A   =   [  0   A22   A23 ] n2  ,  v2   =   [ v2 ] n2
           [ A13' A12'  A33 ] n3              [ v3 ] n3
where v1 > 0, v2 > 0 and v3 < 0. The two zero blocks in A occur because there are no edges connecting the first n1 nodes (the first connected component of N+) and the following n2 nodes (the second connected component of N+). Then A*v2 = lambda2*v2 implies
       A11*v1 + A13*v3 = lambda2*v1               (1)

Note that A13 <= 0, and v3 < 0, so each term in the product A13*v3 is nonnegative and thus A13*v3 >= 0. In fact A13*v3 is nonzero, since otherwise A13 would have to be zero, and so the first n1 nodes alone would form a connected component of G, contradicting our assumption that G is connected.

By the Corollary to the Cauchy Interlace Theorem above, A11 is positive semidefinite since A is. Now let eps be any positive number. Then adding eps*v1 to both sides of (1) yields

   (eps*I + A11)*v1 + A13*v3 = (eps+lambda2)*v1    (2)
The eigenvalues of eps*I + A11 are all at least eps, so eps*I + A11 is positive definite. Write eps*I + A11 = D - N, where D is diagonal, and N >= 0 is zero on the diagonal (-N holds all the offdiagonal entries of eps*I + A11). Then
    eps*I + A11 
       = D - N
       = Dh * ( I - Dh-1*N*Dh-1 ) * Dh    
       = Dh * (I-M) * Dh
where 
    Dh = D(1/2) = diag(sqrt(D1,1),...,sqrt(Dn1,n1))
and
     M = Dh-1*N*Dh-1
By Lemma 3, I-M is positive definite since D-N is positive definite and Dh is nonsingular. Since the eigenvalues of I-M are 1 minus the eigenvalues of M, the eigenvalues of M must be less than 1. All the eigenvalues of M must also be greater than -1, because by Lemma 2,
        lambda1(M) = minv!=0 v'*M*v / v'*v
                  >= minv!=0 -|v|'*M*|v| / v'*v
                            since M >= 0
                   = -maxv!=0 |v|'*M*|v| / v'*v
                  >= -maxv!=0 v'*M*v / v'*v
                   = -lambdan1(M)
                   > -1
Thus | lambdaj(M) | <= rho(M) < 1 for all j. By Lemma 4,
     Y = (eps*I + A11)-1
       = Dh-1 * (I-M)-1 * Dh-1
       = Dh-1 * ( sumi=0,...,infinity Mi ) * Dh-1
is nonnegative, since M and Mi are nonnegative. By Lemma 1, Y is positive.

Multiplying equation (2) by Y yields

     v1 + Y*A13*v3 = Y*(eps+lambda2)*v1           
Multiplying by v1' yields
 
     v1'*v1 + v1'*Y*A13*v3 = (eps+lambda2) * v1'*Y*v1
so by Lemma 2
  (eps+lambda2) * lambdan1(Y) 
          = maxv!=0 (eps+lambda2) * v'*Y*v / v'*v
         >= (eps+lambda2)* v1'*Y*v1 / v1'*v1
          = (v1'*v1 + v1'*Y*A13*v3) / v1'*v1
          = 1 + v1'*Y*A13*v3 / v1'*v1
As stated above, A13*v3 >= 0 and is nonzero. Since Y>0, Y*A13*v3 > 0, and so v1'*Y*A13*v3 > 0. Thus
   (eps+lambda2) * lambdan1(Y) > 1
Since the eigenvalues of Y are positive and the reciprocals of the eigenvalues of eps*I + A11, we get
   (eps+lambda2) / lambda1(eps*I + A11) > 1
Since lambda1(eps*I + A11) = eps + lambda1(A11), we can rearrange to get
   lambda1(A11) < lambda2
The same logic apples to A22, so lambda1(A22) < lambda2. Thus, the leading n1+n2 -by- n1+n2 submatrix of A,
     [ A11   0  ]
     [  0   A22 ]
has two eigenvalues less than lambda2. By the Cauchy Interlace Theorem, this means A has two eigenvalues less than lambda2. But this contradicts the fact that lambda2 is the second smallest eigenvalue of A. This contradiction proves the theorem. QED