Spectral partitioning

Statement and Proof of Theorem 2

(CS 267, Mar 23 1995)

Theorem 2. Given a connected graph G = (N,E), partition its nodes into N- and N+ using the spectral bisection algorithm. Then N- is connected. If no v(2)(n) = 0, N+ is also connected.

To prove this theorem, we need several other standard results from linear algebra, some of which we state without proof.

Definition. The spectral radius rho(A) of a matrix A is the largest absolute value of any eigenvalue:

      rho(A) = max_i | lambda_i (A) |

Definition. A nonnegative matrix A is a matrix all of whose entries are nonnegative. This is written A >= 0. A positive matrix A is a matrix all of whose entries are positive. We also refer to nonnegative and positive vectors, with similar notation.

Definition. The graph G(A) of an n-by-n matrix A is a graph with n nodes, and an edge e=(i,j) if and only if A(i,j) != 0.

Lemma 1.Let A by an n-by-n nonnegative matrix, and suppose G(A) is connected. Then sum_{m=0}^{n-1} A^m is a positive matrix.

Proof of Lemma 1. The (i,j) entry of A^m is a sum of many terms of the form

    A(i,k(1)) * A(k(1),k(2)) * A(k(2),k(3)) *...* A(k(m-2),k(m-1)) * A(k(m-1),j)
where the sum is over all n^(m-1) combinations 1 <= k(q) <= n, 1 <= q <= m-1. Each such term is nonnegative since A is nonnegative. Consider m=2. Then A(i,k)*A(k,j) will be positive if there is a path from i to j in G(A) of length 2, namely a path through k. Similarly, (A^m)(i,j) will be positive if there is a path of length m connecting i and j. If G(A) is connected, then there is a path of length at most n-1 connected every pair of nodes. The statement of the lemma follows. QED

Definition. A symmetric matrix with all nonnegative eigenvalues is called positive semidefinite. If the eigenvalues are all positive, it is called positive definite.

Lemma 2. If A is n-by-n and symmetric with eigenvalues lambda(1) <= ... <= lambda(n), then

       lambda(1) = min_{v != 0}  v'*A*v / v'*v
       lambda(n) = max_{v != 0}  v'*A*v / v'*v

Proof of Lemma 2. It follows simply from the eigendecomposition A = Q*Lambda*Q', where Q is an orthogonal matrix whose columns are eigenvectors, and Lambda = diag(lambda(1),...,lambda(n)), using the substitution

        v'*A*v / v'*v = v'*Q*Lambda*Q'*v / v'*Q*Q'*v = y'*Lambda*y / y'*y
                      = sum_{i=1}^n lambda(i)*y(i)^2 / sum_{i=1}^n y(i)^2
Details are left to the reader.

Cauchy Interlace Theorem (R. Horn and C. Johnson, "Matrix Analysis", 1988). Let A be an n-by-n symmetric matrix with eigenvalues lambda(1) <= ... <= lambda(n). Let B = A(1:n-1,1:n-1), the leading (n-1)-by-(n-1) submatrix of A. Let the eigenvalues of B be mu(1) <= ... <= mu(n-1). Then for all i, lambda(i) <= mu(i) <= lambda(i+1). Applying this result recursively, we can show that if C = A(i:j, i:j) for any i and j, and the eigenvalues of C are xi(1) <= ... <= xi(j-i+1), then A has at least k eigenvalues <= xi(k). In particular lambda(1) <= xi(1).

Corollary to the Cauchy Interlace Theorem. Let the symmetric matrix A be positive (semi)definite. Then any submatrix C=A(i:j,i:j) is also positive (semi)definite.

Lemma 3. If A is symmetric and positive (semi)definite, so is X'*A*X for any nonsingular matrix X.

Proof of Lemma 3. From Lemma 2, the smallest eigenvalue of X'*A*X is

       min_{v != 0} v'*X'*A*X*v / v'*v
     = min_{v != 0} ( v'*X'*A*X*v / v'*X'*X*v ) * ( v'*X'*X*v / v'*v )
    >= min_{v != 0} ( v'*X'*A*X*v / v'*X'*X*v ) * min _{v != 0 } ( v'*X'*X*v / v'*v )
     = min_{Xv != 0} ( v'*X'*A*X*v / v'*X'*X*v ) * min _{v != 0 } ( v'*X'*X*v / v'*v )
     = lambda(1)(A) * lambda(1)(X'*X)
Since v'*X'*X*v = (X*v)'*(X'*v) is a sum of squares, it is nonnegative. Thus lambda(1)(X'*X) >= 0. Since X is nonsingular, so is X'*X, so it can't have a zero eigenvalue. Thus lambda(1)(X'*X) > 0. The result follows. QED

Lemma 4. If A is symmetric matrix with rho(A) < 1, then I-A is invertible and

     inv(I-A) = sum_{i=0}^{infinity} A^i

Proof of Lemma 4. Since the eigenvalues of A are strictly between -1 and 1, the eigenvalues of I-A are strictly between 0 and 2, so I-A is positive definite and so nonsingular. Writing the eigendecomposition A = Q*Lambda*Q', we see that A^i = Q*Lambda^i*Q', so the entries of A^i go to zero geometrically, like rho(A)^i or faster. Thus sum_{i=0}^{infinity} A^i converges. Since

     (I-A) * sum_{i=0}^m A*i = I - A^{m+1}
it is easy to see that S(m) = sum_{i=0}^m A*i converges to inv(I-A), since (I-A)*S(m) converges to I. QED

Partial proof of Theorem 2. (M. Fiedler, "A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory", Czech. Math. J. 25:619--637, 1975.) We consider the special (but generic) case where v(2) is unique (modulo multiplication by a scalar) and v(2) has only nonzero entries. We will use proof by contradiction: Assume that N+ is not connected, and in fact consists of k connected components. Suppose for illustration that k=2 (the general case is no harder). Then we can renumber the rows and columns of A so that

             n1   n2     n3
           [ A11   0    A13 ] n1                  [ v1 ] n1
   A   =   [  0   A22   A23 ] n2  ,    v(2)   =   [ v2 ] n2
           [ A13' A12'  A33 ] n3                  [ v3 ] n3
where v1 > 0, v2 > 0 and v3 < 0. The two zero blocks in A occur because there are no edges connecting the first n1 nodes (the first connected component of N+) and the following n2 nodes (the second connected component of N+). Then A*v(2) = lambda(2)*v(2) implies
       A11*v1 + A13*v3 = lambda(2)*v1               (1)

Note that A13 <= 0, and v3 < 0, so each term in the product A13*v3 is nonnegative and thus A13*v3 >= 0. In fact A13*v3 is nonzero, since otherwise A13 would have to be zero, and so the first n1 nodes alone would form a connected component of G, contradicting our assumption that G is connected.

By the Corollary above, A11 is positive semidefinite since A is. Now let eps be any positive number. Then adding eps*v1 to both sides of (1) yields

     (eps*I + A11)*v1 + A13*v3 = (eps+lambda(2))*v1           (2)
The eigenvalues of eps*I + A11 are all at least eps, so eps*I + A11 is positive definite. Write eps*I + A11 = D - N, where D is diagonal, and N >= 0 is zero on the diagonal (-N holds all the offdiagonal entries of eps*I + A11). Now
    eps*I + A11 = D - N
                = Dh * ( I - inv(Dh)*N*inv(Dh) ) * Dh    
                      where Dh = D^(1/2) = diag(sqrt(D(1,1)),...,sqrt(D(n1,n1)))
                = Dh * (I-M) * Dh
                      where M = inv(Dh)*N*inv(Dh)
By Lemma 3, I-M is positive definite since D-N is positive definite and Dh is nonsingular. Since the eigenvalues of I-M are 1 minus the eigenvalues of M, the eigenvalues of M must be less than 1. All the eigenvalues of M must also be greater than -1, because by Lemma 2,
      lambda(1)(M) = min_{v != 0} v'*M*v / v'*v
                  >= min_{v != 0} -|v|'*M*|v| / v'*v
                            since M >= 0
                   = -max_{v != 0} |v|'*M*|v| / v'*v
                  >= -max_{v != 0} v'*M*v / v'*v
                   = -lambda(n1)(M)
                   > -1
Thus | lambda(j)(M) | <= rho(M) < 1 for all j. By Lemma 4,
     Y = inv(eps*I + A11) = inv(Dh) * inv(I-M) * inv(Dh)
                          = inv(Dh) * ( sum_{i=0}^infinity M^i ) * inv(Dh)
is nonnegative, since M and M^i are nonnegative. By Lemma 1, Y is positive.

Multiplying equation (2) by Y yields

     v1 + Y*A13*v3 = Y*(eps+lambda(2))*v1           
Multiplying by v1' yields
 
     v1'*v1 + v1'*Y*A13*v3 = (eps+lambda(2) * v1'*Y*v1
so by Lemma 2
  (eps+lambda(2)) * lambda(n1)(Y) 
          = max_{v != 0} (eps+lambda(2)) * v'*Y*v / v'*v
         >= (eps+lambda(2))* v1'*Y*v1 / v1'*v1
          = (v1'*v1 + v1'*Y*A13*v3) / v1'*v1
          = 1 + v1'*Y*A13*v3 / v1'*v1
As stated above, A13*v3 >= 0 and is nonzero. Since Y>0, Y*A13*v3 > 0, and so v1'*Y*A13*v3 > 0. Thus
   (eps+lambda(2)) * lambda(n1)(Y) > 1
Since the eigenvalues of Y are positive and the reciprocals of the eigenvalues of eps*I + A11, we get
   (eps+lambda(2)) / lambda(1)(eps*I + A11) > 1
Since lambda(1)(eps*I + A11) = eps + lambda(1)(A11), we can rearrange to get
   lambda(1)(A11) < lambda(2)
The same logic apples to A22, so lambda(1)(A22) < lambda(2). Thus, the leading n1+n2 -by- n1+n2 submatrix of A,
     [ A11   0  ]
     [  0   A22 ]
has two eigenvalues less than lambda(2). By the Cauchy Interlace Theorem, this means A has two eigenvalues less than lambda(2). But this contradicts the fact that lambda(2) is the second smallest eigenvalue of A. This contradiction proves the theorem. QED