CS267: Notes for Lecture 23(b), Apr 9, 1996

Spectral partitioning - Statement and Proof of Theorem 2

Theorem 2. Given a connected graph G = (N,E), partition its nodes into N- and N+ using the spectral bisection algorithm. Then N- is connected. If no component v₂(n) of the second eigenvector v₂ is zero, then N+ is also connected.

To prove this theorem, we need several other standard results from linear algebra, some of which we state without proof.

Definition. The spectral radius rho(A) of a matrix A is the largest absolute value of any eigenvalue:

      rho(A) = max_i | lambda_i (A) |

Definition. A nonnegative matrix A is a matrix all of whose entries are nonnegative. This is written A >= 0. A positive matrix A is a matrix all of whose entries are positive, written A>0. We also refer to nonnegative and positive vectors, with similar notation.

Definition. The graph G(A) of an n-by-n matrix A is a graph with n nodes, and an edge e=(i,j) if and only if A(i,j) != 0.

Lemma 1.Let A by an n-by-n nonnegative matrix, and suppose G(A) is connected. Then sum_m=0,...,n-1 A^m is a positive matrix.

Proof of Lemma 1. The (i,j) entry of A^m is a sum of many terms of the form

    A(i,k₁) * A(k₁,k₂) * A(k₂,k₃) *...* 
              * A(k_m-2,k_m-1) * A(k_m-1,j)

where the sum is over all n^m-1 combinations 1 <= k_q <= n, 1 <= q <= m-1. Each such term is nonnegative since A is nonnegative. Consider m=2. Then A(i,k)*A(k,j) will be positive if there is a path from i to j in G(A) of length 2, namely a path through k. Similarly, (A^m)(i,j) will be positive if there is a path of length m connecting i and j. If G(A) is connected, then there is a path of length at most n-1 connected every pair of nodes. The statement of the lemma follows. QED

Definition. A symmetric matrix with all nonnegative eigenvalues is called positive semidefinite. If the eigenvalues are all positive, it is called positive definite.

Lemma 2. If A is n-by-n and symmetric with eigenvalues lambda₁ <= ... <= lambda_n, then

       lambda₁ = min_v!=0  v'*A*v / v'*v
       lambda_n = max_v!=0  v'*A*v / v'*v

Proof of Lemma 2. It follows simply from the eigendecomposition A = Q*Lambda*Q', where Q is an orthogonal matrix whose columns are eigenvectors, and Lambda = diag(lambda₁,...,lambda_n), using the substitution

   v'*A*v / v'*v 
         = v'*Q*Lambda*Q'*v / v'*Q*Q'*v 
         = y'*Lambda*y / y'*y
           sum_i=1,...,n lambda(i)*y(i)² 
         = ----------------------------
               sum_i=1,...,n y(i)²

Details are left to the reader.

Cauchy Interlace Theorem (R. Horn and C. Johnson, "Matrix Analysis", 1988). Let A be an n-by-n symmetric matrix with eigenvalues lambda₁ <= ... <= lambda_n. Let B = A(1:n-1,1:n-1), the leading (n-1)-by-(n-1) submatrix of A. Let the eigenvalues of B be mu₁ <= ... <= mu_n-1. Then for all i, lambda_i <= mu_i <= lambda_i+1. Applying this result recursively, we can show that if C = A(i:j, i:j) for any i and j, and the eigenvalues of C are chi₁ <= ... <= chi_j-i+1, then A has at least k eigenvalues <= chi_k. In particular lambda₁ <= chi₁.

Corollary to the Cauchy Interlace Theorem. Let the symmetric matrix A be positive (semi)definite. Then any submatrix C=A(i:j,i:j) is also positive (semi)definite.

Lemma 3. If A is symmetric and positive (semi)definite, so is X'*A*X for any nonsingular matrix X.

Proof of Lemma 3. From Lemma 2, the smallest eigenvalue of X'*A*X is

       min_v!=0 v'*X'*A*X*v / v'*v
     = min_v!=0 ( v'*X'*A*X*v / v'*X'*X*v ) * 
       ( v'*X'*X*v / v'*v )
    >= min_v!=0 ( v'*X'*A*X*v / v'*X'*X*v ) * 
       min_v!=0 ( v'*X'*X*v / v'*v )
     = min_Xv!=0 ( v'*X'*A*X*v / v'*X'*X*v ) * 
       min_v!=0 ( v'*X'*X*v / v'*v )
     = lambda₁(A) * lambda₁(X'*X)

Since v'*X'*X*v = (X*v)'*(X'*v) is a sum of squares, it is nonnegative. Thus lambda₁(X'*X) >= 0. Since X is nonsingular, so is X'*X, so it can't have a zero eigenvalue. Thus lambda₁(X'*X) > 0. The result follows. QED

Lemma 4. If A is symmetric matrix with rho(A) < 1, then I-A is invertible and

     (I-A)^-1 = sum_{i=0,...,infinity} Aⁱ

Proof of Lemma 4. Since the eigenvalues of A are strictly between -1 and 1, the eigenvalues of I-A are strictly between 0 and 2, so I-A is positive definite and so nonsingular. Writing the eigendecomposition A = Q*Lambda*Q', we see that Aⁱ = Q*Lambdaⁱ*Q', so the entries of Aⁱ go to zero geometrically, like rho(A)ⁱ or faster. Thus sum_{i=0,...,infinity} Aⁱ converges. Since

     (I-A) * sum_i=0,...,m Aⁱ = I - A^m+1

it is easy to see that S(m) = sum_i=0,...,m Aⁱ converges to (I-A)^-1, since (I-A)*S(m) converges to I. QED

Partial proof of Theorem 2. (M. Fiedler, "A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory", Czech. Math. J. 25:619--637, 1975.) We consider the special (but generic) case where v₂ is unique (modulo multiplication by a scalar) and v₂ has only nonzero entries. We will use proof by contradiction: Assume that N+ is not connected, and in fact consists of k connected components. Suppose for illustration that k=2 (the general case is no harder). Then we can renumber the rows and columns of A so that

             n1   n2     n3
           [ A11   0    A13 ] n1              [ v1 ] n1
   A   =   [  0   A22   A23 ] n2  ,  v₂   =   [ v2 ] n2
           [ A13' A12'  A33 ] n3              [ v3 ] n3

where v1 > 0, v2 > 0 and v3 < 0. The two zero blocks in A occur because there are no edges connecting the first n1 nodes (the first connected component of N+) and the following n2 nodes (the second connected component of N+). Then A*v₂ = lambda₂*v₂ implies

       A11*v1 + A13*v3 = lambda₂*v1               (1)

Note that A13 <= 0, and v3 < 0, so each term in the product A13*v3 is nonnegative and thus A13*v3 >= 0. In fact A13*v3 is nonzero, since otherwise A13 would have to be zero, and so the first n1 nodes alone would form a connected component of G, contradicting our assumption that G is connected.

By the Corollary to the Cauchy Interlace Theorem above, A11 is positive semidefinite since A is. Now let eps be any positive number. Then adding eps*v1 to both sides of (1) yields

   (eps*I + A11)*v1 + A13*v3 = (eps+lambda₂)*v1    (2)

The eigenvalues of eps*I + A11 are all at least eps, so eps*I + A11 is positive definite. Write eps*I + A11 = D - N, where D is diagonal, and N >= 0 is zero on the diagonal (-N holds all the offdiagonal entries of eps*I + A11). Then

    eps*I + A11 
       = D - N
       = Dh * ( I - Dh^-1*N*Dh^-1 ) * Dh    
       = Dh * (I-M) * Dh
where 
    Dh = D^(1/2) = diag(sqrt(D_1,1),...,sqrt(D_n1,n1))
and
     M = Dh^-1*N*Dh^-1

By Lemma 3, I-M is positive definite since D-N is positive definite and Dh is nonsingular. Since the eigenvalues of I-M are 1 minus the eigenvalues of M, the eigenvalues of M must be less than 1. All the eigenvalues of M must also be greater than -1, because by Lemma 2,

        lambda₁(M) = min_v!=0 v'*M*v / v'*v
                  >= min_v!=0 -|v|'*M*|v| / v'*v
                            since M >= 0
                   = -max_v!=0 |v|'*M*|v| / v'*v
                  >= -max_v!=0 v'*M*v / v'*v
                   = -lambda_n1(M)
                   > -1

Thus | lambda_j(M) | <= rho(M) < 1 for all j. By Lemma 4,

     Y = (eps*I + A11)^-1
       = Dh^-1 * (I-M)^-1 * Dh^-1
       = Dh^-1 * ( sum_{i=0,...,infinity} Mⁱ ) * Dh^-1

is nonnegative, since M and Mⁱ are nonnegative. By Lemma 1, Y is positive.

Multiplying equation (2) by Y yields

     v1 + Y*A13*v3 = Y*(eps+lambda₂)*v1

Multiplying by v1' yields

 
     v1'*v1 + v1'*Y*A13*v3 = (eps+lambda₂) * v1'*Y*v1

so by Lemma 2

  (eps+lambda₂) * lambda_n1(Y) 
          = max_v!=0 (eps+lambda₂) * v'*Y*v / v'*v
         >= (eps+lambda₂)* v1'*Y*v1 / v1'*v1
          = (v1'*v1 + v1'*Y*A13*v3) / v1'*v1
          = 1 + v1'*Y*A13*v3 / v1'*v1

As stated above, A13*v3 >= 0 and is nonzero. Since Y>0, Y*A13*v3 > 0, and so v1'*Y*A13*v3 > 0. Thus

   (eps+lambda₂) * lambda_n1(Y) > 1

Since the eigenvalues of Y are positive and the reciprocals of the eigenvalues of eps*I + A11, we get

   (eps+lambda₂) / lambda₁(eps*I + A11) > 1

Since lambda₁(eps*I + A11) = eps + lambda₁(A11), we can rearrange to get

   lambda₁(A11) < lambda₂

The same logic apples to A22, so lambda₁(A22) < lambda₂. Thus, the leading n1+n2 -by- n1+n2 submatrix of A,

     [ A11   0  ]
     [  0   A22 ]

has two eigenvalues less than lambda₂. By the Cauchy Interlace Theorem, this means A has two eigenvalues less than lambda₂. But this contradicts the fact that lambda₂ is the second smallest eigenvalue of A. This contradiction proves the theorem. QED