Math 110 - Fall 05 - Lectures notes # 20 - Oct 14 (Friday) We will cover the contents of Chapter 3 slightly differently, than the textbook, showing how to compute range(A), rank(A), nullspace(A), nullity(A), A^{-1} (if it exists) solve Ax=b (determine if there are no solutions, 1 solution, or infinitely many solutions). all by computing a single "matrix factorization" of A, called the "LU factorization" or "LU decomposition", which is a formula for A as product of much simpler matrices. The algorithm for computing this factorization is just Gaussian elimination, where we keep track of all the elementary matrix operations we do: reordering rows and/or columns adding a multiply of one row to another multiplying a row or column by a scalar as a product of simple matrices. When you use computer software to do matrix operations, it will return your answer as a product of simpler matrices much like the ones we show here. LU decomposition plays an important role in government high technology policy: the largest computers in the world (called "supercomputers") are regularly measured by how fast they can solve Ax=b using LU decomposition on very large matrices, with the ratings appearing on the web site www.top500.org. This web site keeps track of the 500 fastest computers anywhere in the world (at least those not at secret government sites). For example, as of June 2005, the top computer was an IBM Blue Gene/L, consisting of 65,536 parallel processors. The machine solved a linear system of n = 1,277,951 equations using the LU decomposition algorithm described below at a speed of 136.8 Teraflops, that is 136.8 trillion = 136.8 x 10^12 arithmetic operations per second. Since the number of arithmetic operations required by the algorithm will turn out to be about 2/3*n^3, the time required to solve this single linear system was (2/3)*(1,277,951)^3 / 136.8x10^12 = 10161 seconds ~ 2.8 hours Recall that multiplying 2 matrices takes about 2n^3 operations, 3 times more operations. A few years ago the fastest machine on the Top 500 list was the NEC "Earth Simulator", which ran at a speed of 35.9 Teraflops, as fast as the next four machines combined. This caught the attention of parts of the US government responsible for national security, because NEC is a Japanese company, and one national security goal of the US government is to have better technology, including faster computers, than anyone else in the world. This is because these computers are used for breaking secret codes, designing weapons, and other national security purposes. Indeed, the largest computers in the US have been built for these purposes for decades. The speed of the NEC machine led to a flurry of government studies and reports to analyze the situation, for example www.sc.doe.gov/ascr.FOSCfinalreport.pdf. More aspects of implementing linear algebra algorithms on parallel machines are discussed in the course CS267, see www.cs.berkeley.edu/~demmel/cs267_Spr05. One of the most widely used software packages for solving linear algebra problems is called LAPACK (its parallel version is ScaLAPACK); for example it is used within Matlab. This software was produced in a joint project between UC Berkeley, the University of Tennessee, and other groups. We have recently been funded to produce a new version. See www.netlib.org/lapack for the current software. Ex: Suppose L is square and lower triangular: L = [ 1 0 0 ] [ 1 2 0 ] [ 3 -1 3 ] Then we can solve Lx = b for x given any b by "substitution" Solve 1*x_1 = b_1 for x_1 Solve 1*x_1 + 2*x_2 = b_2 for x_2 Solve 3*x_1 - 1*x_2 + 3*x_3 = b_3 for x_3 As long as each L_ii is nonzero, so we can divide by it, we can solve any Lx=b for x and get a unique answer. In particular, the only solution of Lx = 0 is x=0, so that L is one-to-one. This implies that L is invertible. We can compute L^{-1} by solving for it one column at a time: Let X = L^{-1}, so L*X = I, and e_j = j-th standard basis vector, so L*X*e_j = I*e_j, or L*(X*e_j) = e_j or L*(column j of X) = e_j so we can solve for X = L^{-1} one column at a time. Ex: Suppose U is square, upper triangular, and each U_ii is nonzero. Then we can similarly solve any U*x=b for a unique x given b by substitution, solving first for x_n, then x_{n-1}, and so on. We can also compute U^{-1} as above. Ex: Suppose A = L*U, where L and U are n by n and nonsingular. Then A is invertible (why?), and A^{-1} = (L*U)^{-1} = U^{-1} * L^{-1} and solving A*x = b for x given b means computing x = A^{-1}b = (U^{-1} * L^{-1})*b = U^{-1} * (L^{-1}*b) ... by associativity = U^{-1} * y ... where we compute y = L^{-1}*b i.e. solve L*y=b by substitution where we finally compute x = U^{-1}*y i.e. solve U*x = y by substitution again. In other words, if we know A = L*U where L and U are invertible triangular matrices, we can solve A*x = b by 2 steps: (1) solve L*y = b by substitution (2) solve U*x = y by substitution If we want to compute A^{-1} we use the same idea as above, solving A*X = I for X one column of X at a time. Stop&Ask: Suppose A = U*L where U and L are n by n and invertible. How do we solve A*x=b? It may seem strange to expect to know A in the form A = L*U, but this is in fact exactly what Gaussian elimination applied to A computes. Ex: A = [ 3 -1 2 ] [ 6 0 10 ] [-3 -3 -17 ] To apply Gaussian elimination, we do the following: for i = 1 to n-1 (3-1 = 2 in this case) subtract multiples of row i from rows i+1 through n, in order to zero out A below the diagonal Subtract 2 * row 1 from row 2 Subtract -1 * row 1 from row 3, yielding A' = [ 3 -1 2 ; 0 2 6 ; 0 -4 -15 ] Subtract -2 * row 2 from row 2, yielding A'' = [ 3 -1 2 ; 0 2 6 ; 0 0 -3 ] Note that A'' = U is upper triangular. Let's take the multipliers we computed and put them into a lower triangular matrix L L = [ 1 0 0 ] [ 2 1 0 ] [-1 -2 1 ] Then we can confirm that L*U = A. We will see that this is true in general, as long as we never get a 0 on the diagonal. ASK & WAIT: What happens if A_11 = 0? What happens at any point if some diagonal entry = 0? If A is m-by-n, we will only get a zero on the diagonal when rank(A) < min(m,n), as we will see. First we will see what happens assuming this is not the case, and then show the most general case. We will use induction. To do so, we need some basic facts about "block matrices", which are a useful notation to keep track of what parts of the matrix we're working on: Def. Let A by m by n, where m = m1 + m2 and n = n1 + n2 Then we can write A as a "block matrix" A = [ A_11 A_12 ] [ A_21 A_22 ] where A_11 and A_21 have m1 columns, A_12 and A_22 have m2 columns A_11 and A_12 have n1 rows, and A_21 and A_22 have n2 rows. We sometimes write the dimensions around the matrix to indicate this: m1 m2 A = [ A_11 A_12 ] n1 [ A_21 A_22 ] n2 Here is perhaps the most useful fact about block matrices, that when we multiply them, we can just multiply and add the blocks, as though they were scalars: Lemma 1: Let A and B be block matrices, where A's column block dimenions m1 and m2 match B's row block dimensions: m1 m2 p1 p2 A = [ A_11 A_12 ] n1 B = [ B_11 B_12] m1 [ A_21 A_22 ] n2 [ B_21 B_22] m2 This means that matrix products like A_11*B_11 are defined, and in fact A*B can be written as the block matrix p1 p2 A*B = [ A_11*B_11 + A_12*B_21 A_11*B_12 + A_12*B_22 ] n1 [ A_21*B_11 + A_22*B_21 A_21*B_12 + A_22*B_22 ] n2 Note that this is the usual formula for multiplying 2 by 2 matrices, when all the block A_ij and B_ij are 1 by 1, i.e. scalars. Proof: (Picture) When (i,j) is in the top left block, we see (A*B)_ij = sum_{k=1 to m} A_ik*B_kj = sum_{k=1 to m1} A_ik*B_kj + sum_{k=m1+1 to m} A_ik*B_kj = sum_{k=1 to m1} A_11_ik*B_11_kj + sum_{k=1 to m2} A_12_ik*B_21_kj = (A_11*B_11)_ij + (A_12*B_21)_ij or the leading n1 by p1 block of A*B = A_11*B_11 + A_12*B_21 The other blocks of A*B are similar. Corollary 1: Let m1 m2 L = [ I^m1 0 ] m1 [ L_21 I^m2 ] m2 be a lower triangular matrix, where I^m1 is an m1 by m1 identity matrix, I^m2 is an m2 by m2 identity matrix, and 0 is an m1 by m2 zero matrix. Then m1 m2 L^{-1} = [ I^m1 0 ] m1 [-L_21 I^m2 ] m2 Proof: We just confirm L*L^{-1} = I^m = m by m identity matrix: The dimensions of the blocks match, letting us use Lemma 1 to compute [ I^m1 0 ] * [ I^m1 0 ] [ L_21 I^m2 ] [-L_21 I^m2 ] = [ I^m1 * I^m1 + 0 * (-L_21) I^m1 * 0 + 0 * I*m2 ] [ L_21*I^m1 + I^m2*(-L_21) L_21* 0 + I^m2 * I^m2 ] = [ I^m1 + 0 0 + 0 ] [ L_21 - L_21 0 + I^m2 ] = [ I^m1 0 ] [ 0 I^m2 ] = I^m Now we can show how to compute a factorization A = LU assuming we never get a zero on the diagonal. We present it for A rectangular. Thm 1 (Gaussian Elimination, assuming no 0s appear on the diagonal) Let A be m by n. Then there is a lower triangular matrix L and upper triangular matrix U such that A = L*U: (1) When A is square (m=n) then L and U are also n by n (2) When A has more rows than columns (m>n), then L is m by n and U is n by n (picture) (3) When A has more columns than rows (m n = 1, so A = [a_11] with a_11 neq 0, pick L = [ 1 ] [a_21] [ a_21/a_11 ] [... ] [ ... ] [a_m1] [ a_m1/a_11 ] and U = a_11 (3) When m = 1 < n, so A = [a_11 a_12 ... a_1n ] with a_11 neq 0, pick L = 1 and U = A Now we do induction. We assume the theorem is true for m-1 by n-1 matrices, and prove it is true for an m by n matrix A. By assumption a_11 neq 0. We write A as a block matrix (note that A_11 is just a_11): 1 n-1 A = [ A_11 A_12 ] 1 [ A_21 A_22 ] m-1 Subtracting a multiple of row 1 from row i to zero out a_i1 means the multiplier must be a_i1/a_11, since this changes the (i,1) entry from a_i1 to a_i1 - (a_i1/a_11)*a_11 = 0. We express this by premultiplying A by 1 n-1 L' = [ 1 0 0 0 ... 0 ] = [ 1 0 ] 1 [ -a_21/a_11 1 0 0 ... 0 ] [-A_21/A_11 I^{m-1} ] m-1 [ -a_31/a_11 0 1 0 ... 0 ] [ ... ] [ -a_m1/a_11 0 ... 0 1 ] where -A_21/A_11 is m-1 by 1, and 0 is 1 by m-1. Lemma 1 lets us multiply L' * A = [ 1 0 ] * [ A_11 A_12 ] [ -A_21/A_11 I^{n-1} ] [ A_21 A_22 ] = [ 1*A_11 + 0*A_21 1*A_12 + 0*A_22 ] [ -A_21/A_11*A_11 + I^{n-1}*A_21 -A_21/A_11*A_12 + I^{n-1}*A_22 ] = [ A_11 A_12 ] [ 0 A_22 - A_21*A_12/A_11 ] confirming we have zeroed out the first column beneath the diagonal as desired. Another way to see what it means to multiply L'*A is to look at it row by row: row j of L'*A = e_j^t *(L'*A) ... e_j = j-th standard basis vector = (e_j^t*L')*A ... by associativity = (row j of L')*A When j=1, row j of L' = e_1^t so row 1 of L'*A = row 1 of A When j=2, row j of L' = [-a_21/a_11, 1, 0, ..., 0] so row 2 of L'*A = (-a_21/a_11)*(row 1 of A) + (1)*(row 2 of A) Similary, for other j, row j of L'*A = (-a_j1/a_11)*(row 1 of A) + (row j of A) The quantity S = A_22 - A_21 * A_12 / A_11 is called a "Schur complement". Note that L'*A is closer to upper triangular, because it is zero below the first diagonal. Now S is m-1 by n-1 so we can apply induction to write S = L_S * U_S (where we again assume no zeros appear on the diagonal): 1 n-1 L' * A = [ A_11 A_12 ] = [ A_11 A_12 ] 1 [ 0 S ] [ 0 L_S*U_S ] m-1 = [ 1 0 ] * [ A_11 A_12 ] [ 0 L_S ] [ 0 U_S ] Note that the second matrix above is upper triangular; we will call it U. Then A = L'^{-1} * [ I 0 ] * U [ 0 L_S ] = [ 1 0 ]^{-1} * [ 1 0 ] * U [ X I^{n-1} ] [ 0 L_S ] ... where X = -A_21/A_11 for short = [ 1 0 ] * [ 1 0 ] * U [-X I^{n-1} ] [ 0 L_S ] ... by Corollary 1 = [ 1*1 + 0*0 1*0 + 0*L_S ] * U [-X*1 + I^{n-1}*0 -X*0 + I^{n-1}*L_S ] = [ 1 0 ] * U [ -X L_S ] = L*U where L = [ 1 0 ] is lower triangular with ones on the diagonal as desired. [-X L_S ] and with the multipliers from Gaussian elimination below the diagonal. This completes the proof of Thm 1. Now we generalize this result to the case where zeros may appear on the diagonal. For example, if A_11 had been 0 above, we would have tried to divide by zero, and it would not have worked. But as long as there is a nonzero somewhere in the matrix, we can always reorder the rows and columns to put that nonzero in the (1,1) entry. If, after the first step, the (2,2) entry is zero, we again permute rows and columns to put a nonzero there if one exists. We will also use matrix-multiplication, by "permutation matrices", to keep track of this bookkeeping. Ex: If A = [ 0 1 ], then we can't write [ 1 2 ] A = LU = [ l_11 0 ] * [ u_11 u_12 ] = [ l_11*u_11 l_11*u_12 ] [ l_21 l_22 ] [ 0 u_22 ] [ l_21*u_11 l_21*u_12 + l_22*u_22 ] unless l_11 = 0 or u_11 = 0 to match a_11 = 0. But this would mean either a_12 = 0 (if l_11=0) or a_21 = 0 (if u_11=0) a contradiction. But if we permute A's rows to get [ 1 2 ; 0 1 ], or A's columns to get [ 1 0 ; 2 1 ]; then we can do LU factorization. Def: A permutation matrix P is an n by n matrix with exactly one 1 in each row, one 1 in each column, and the other entries equal to 0. An equivalent definition is that a permutation matix P is gotten by taking the identity matrix, and permuting its rows (or its columns). Ex: P1 = [ 1 0 ], P2 = [ 0 1 ], P3 = [ 0 0 1 ] [ 0 1 ] [ 1 0 ] [ 1 0 0 ] [ 0 1 0 ] Stop&Ask: How do we construct P2 and P3 by permuting rows or columns of I? Lemma 2: Let P be a permutation matrix. Then (1) If y = P*x, then y has the same entries as x but in a permuted order (2) If Y = P*X, then Y has the same rows as X but in a permuted order (3) If Y = X*P, then Y has the same columns as X but in a permuted order (4) If P1 and P2 are permutation matrices, so is P1*P2. (5) If P is a permutation matrix, so is P^t = P^{-1} Ex: P1*[x1] = [x1], P2*[x1] = [x2], P3*[x1] = [x3] [x2] [x2] [x2] [x1] [x2] [x1] [x3] [x2] Proof: (1) Recall that y=Px -> y_i = e_i^t*y = e_i^t*(P*x) = (e_i^t*P)*y = (row i of P)*y. Since row i of P is a row of the identity matrix, say e_j^t, then y_i = e_j^t * x = x_j. Since each row of P is a different row of the identity, each y_i is a different component of x. (2) Apply (1) to each column of Y and X, since Y = P*X means that for all j, column j of Y = Y*e_j = (P*X)*e_j = P*(X*e_j) = P*(column j of X) (3), (4) and (5): homework! Finally, we can do the general case of Gaussian elimination: Thm 2 (Gaussian Elimination, or LU decomposition in general case) Let A be m by n and nonzero. Then we can write A = P_L * L * U * P_R where, for some r with 1 <= r <= min(m,n) P_L is an m by m permutation matrix (needed to permute A's rows) L is an m by r lower triangular matrix, (with ones on the diagonal) U is an r by n upper triangular matrix (with nonzero diagonal) P_R is an n by n permutation matrix (needed to permute A's columns) We can write A = P_L*L*U*P_R equivalently as P_L^t*A*P_R^t = L*U, or that A's rows and columns can be reordered (by P_L^t and P_R^t, respectively), so that the reordered matrix has an LU decomposition. Proof: We use induction as before. This time the base case is more complicated: Base cases: When min(m,n)=1, we factor A as follows: (1) When A = [ a_11 ] is 1 by 1 (and so a_11 neq 0), pick r=1 and write A = 1 * 1 *a_11* 1 = P_L * L * U * P_R (2) When m > n = 1, so A = [a_11] with a_11 neq 0, pick r = 1 and write [a_21] [ ...] [a_m1] A = I^m * [ 1 ] * a_11 * 1 [a_21/a_11] [ ... ] [a_m1/a_11] = P_L * L * U * P_R When m > n = 1, with a_11 = 0 but some other a_ii neq 0, then pick r = 1 and P_L to swap entries 1 and i ASK & WAIT: What does P_L look like? and write A = P_L * [ 1 ] * a_ii * 1 [ a_21/a_ii ] [ a_31/a_ii ] [ ... ] [ a_11/a_ii ] <- i-th location [ ... ] [ a_m1/a_ii ] = P_L * L * U * P_R (3) When m = 1 < n, so A = [a_11 a_12 ... a_1m ] with a_11 neq 0, then pick r = 1 and write A = 1 * 1 * A * I^n = P_L * L * U * P_R When m = 1 < n, so A = [a_11 a_21 ... a_m1 ] with a_11 = 0 but some other a_ii neq 0 then pick r = 1 and P_R (as described above) to swap entries 1 and i so A = 1 * 1 * [a_ii a_22 ... a_11 ... a_m1] * P_R ^^^^ location i = P_L* L * U * P_R Now for the induction. Let A be m by n and assume Theorem 2 is true for (m-1) by (n-1) matrices. If A_11 neq 0, let P'_L = I^m and P'_R = I^n. Otherwise, if some other A_ij is nonzero, pick P'_L (to swap rows 1 and i) and P'_R (to swap columns 1 and j) so that P'_L*A*P'_R has a nonzero entry (A_ij) in the (1,1) position. Then as above write 1 n-1 P'_L*A*P'_R = [ A_11 A_12 ] 1 [ A_21 A_22 ] n-1 = [ 1 0 ] * [ A_11 A_12 ] [ A_21/A_11 I^{n-1} ] [ 0 A_22 - A_21/A_11*A_12 ] = [ 1 0 ] * [ A_11 A_12 ] [ X I^{n-1} ] [ 0 S ] where X = A_21/A_11 and S = A_22 - A_21/A_11*A_12 for short. There are 2 cases: (1) If S = 0, then we can confirm that we can write P'_L*A*P'_R = [ 1 0 ] * [ A_11 A_12 ] [ X I^{n-1} ] [ 0 S ] = [ 1 0 ] * [ A_11 A_12 ] [ X I^{n-1} ] [ 0 0 ] = [ A_11 A_12 ] [ X*A_11 X*A_12 ] = [ 1 ] * [ A_11 A_12 ] = L * U [ X ] Thus A = (P'_L)^{-1} * L * U * (P'_R)^{-1} = P_L * L * U * P_R where P_L = P'_L^t and P_R = P'_R^t are permutations by Lemma 2 part (5), is the desired LU factorization. (2) If S is nonzero, we can apply the induction hypothesis to S to get S = P_LS * L_S * U_S * P_RS, where L_S (U_S) has r' columns (rows) Plugging this in we get A = (P'_L)^t * [ 1 0 ] * [ A_11 A_12 ] * (P'_R)^t [ X I ] [ 0 P_LS*L_S*U_S*P_RS ] ... substituting for S = (P'_L)^t * [ 1 0 ] * [ 1 0 ] * [ A_11 A_12 ] * (P'_R)^t [ X I ] [ 0 P_LS*L_S ] [ 0 U_S*P_RS ] ... factoring the 3rd matrix. ... Note that P_LS*L_S is m-1 by r' and U_S*P_RS is r' by n-1 = (P'_L)^t * [ 1 0 ] * [ A_11 A_12 ] * (P'_R)^t [ X P_LS*L_S ] [ 0 U_S*P_RS ] ... multiplying factors 2 and 3 = (P'_L)^t * [ 1 0 ] * [ 1 0 ] * [ A_11 A_12 ] * (P'_R)^t [ 0 P_LS ] [ P_LS^t*X L_S ] [ 0 U_S*P_RS ] ... factoring the 2nd matrix ... Note that L_S is m-1 by r' and P_LS^t*X is m-1 by 1 = P_L * [ 1 0 ] * [ A_11 A_12 ] * (P'_R)^t [ P_LS^t*X L_S ] [ 0 U_S*P_RS ] ... multiplying factors 1 and 2, both permutation matrices, ... to get the final permutation matrix P_L, by Lemma 2 part (4) = P_L * [ 1 0 ] * [ A_11 A_12*P_RS^t ] * [ 1 0 ] * (P'_R)^t [ P_LS^t*X L_S ] [ 0 U_S ] [ 0 P_RS ] ... factoring the 3rd matrix ... Note that U_S is r' by n-1 and A_12*P_RS^t is 1 by n-1 = P_L * [ 1 0 ] * [ A_11 A_12*P_RS^t ] * P_R [ P_LS^t*X L_S ] [ 0 U_S ] ... multiplying factors 4 and 5, both permutation matrices, ... to get the final permutation matrix P_R, by Lemma 2 part (4) = P_L * L * U * P_R ... the final factorization, by noticing that the 2nd matrix ... is lower triangular with 1s on the diagonal as desired, ... and the 3rd matrix is upper triangular with nonzeros ... on the diagonal, as desired. ... Note that L has r = r'+1 columns and U has r rows. Corollary 2: Let A = P_L * L * U * P_R be the above factorization of the m by n nonzero matrix A, where L (U) has r columns (rows). Then (1) range(A) = span(P_L*L) (2) rank(A) = r (3) nullity(A) = n-r (4) nullspace(A) may be computed as follows. Write U = [U1 , U2] where U1 is r by r (and so upper triangular) and U2 is r by n-r. Then nullspace(A) = span( P_R^t * [ -U1^{-1}*U2 ] ) [ I^{n-r} ] (5) If A is invertible if and only if m = n = r, in which case A^{-1} = P_R^t * U^{-1} * L^{-1} * P_L^t (6) The complete solution set of Ax = b may be computed from the entries of the LU factorization of A (see below). r r n-r Proof: Write L = [ L1 ] r and U = [U1 , U2 ] r [ L2 ] m-r so that L1 is square and lower triangular and U1 is square and upper triangular. Since the diagonal entries of L1 and U1 are nonzero, L1 and U1 are invertible, and have rank = r. Thus L has nullity(L) = 0 and so rank(L) = r, since L*x = 0 => [L1*x] = 0 => L1*x=0 => x=0 => L is one-to-one [L2*x] (1): range(A) = {A*x, all x in F^n} = {P_L*L*U*P_R*x, all x in F^n} = {P_L*L*U*x, all x in F^n} ... since {x: x in F^n} = {P_R*x: x in F^n} = {P_L*L*U*[ x1 ], all x1 in F^r and x2 in F^{n-r}} [ x2 ] = {P_L*L*(U1*x1 + U2*x2), all x1 in F^r and x2 in F^{n-r}} ... by multiplying out = {P_L*L*(x1 + U2*x2), all x1 in F^r and x2 in F^{n-r}} ... since U1 is invertible, {x1: x1 in F^r} = {U1*x1: x1 in F^r} = {P_L*L*x1, all x1 in F^r} ... since F^r = {x1: x1 in F^r} ... = {x1 + U2*x2: x1 in F^r and x2 in F^{n-r}} = range(P_L*L) (2) rank(A) = dim(range(A)) = dim(range(P_L*L)) = dim(P_L*range(L)) = dim(range(L)) ... since P_L is invertible = rank(L) = r (3) By the dimension theorem, nullity(A) = #columns(A) - rank(A) = n-r (4) Ax=0 implies that P_L*L*U*P_R*x = 0, and that U*P_R*x = 0, since nullity(L) = 0. Let P_R*x = [ x1 ] r [ x2 ] n-r so that Ax = 0 => 0 = [U1 U2]*[x1] = U1*x1+U2*x2. [x2] Since U1 is invertible, this means x1 = -U1^{-1}*U2*x2. In other words, a vector x is in nullspace(A) iff x = P_R^t * [x1] = P_R^t * [-U1^{-1}*U2] * x2, for any x2 in F^{n-r} [x2] [ I^{n-r} ] or x in span( P_R^t * [-U1^{-1}*U2] ) [ I^{n-r} ] (5) A is invertible iff it is square (m=n) and has rank(A) = r = n. In this case A^{-1} = (P_L * L * U * P_R)^{-1} = P_R^{-1} * U^{-1} * L^{-1} * P_L^{-1} = P_R^t * U^{-1} * L^{-1} * P_L^t (6) We can solve Ax=b if b in range(A) = range(P_L*L), i.e. if P_L^t*b in range(L) = range([L1]) or P_L^t*b = [L1]*x' = [L1*x'] [L2]) [L2] [L2*x'] for some x'. Write P_L^t*b = [b1] r so b1 = L1*x' and b2 = L2*x' [b2] n-r Thus x' = L1^{-1}*b1 and b2 = L2*L1^{-1}*b1. In other words, we can solve Ax=b if and only if b2 = L2*L1^{-1}*b1 where P_L^t * b = [b1] [b2] If this condition holds, we can find all the solutions as follows. Suppose there are two solutions, x and y. Then A*(x-y) = A*x -A*y = b - b = 0 so x-y must be in nullspace(A). In other words, if we find one solution x, all the others can be found by taking x plus any member of nullspace(A) from part (4). To find one solution x, we solve x' = U * P_R * x for x or x' = [U1,U2] * P_R * x = [U1,U2] * [x1] = U1*x1 + U2*x2 [x2] One solution is gotten by setting x2 = 0, and solving x'=U1*x1 for x1 = U1^{-1}*x' = U1^{-1}*L1^{-1}*b1. Thus the complete solution set of Ax = b, if a solution exists at all, is {x = P_R^t * [ U1^{-1}*L1^{-1}*b1 ] + z, for all z in nullspace(A) } [ 0 ] Finally, we consider the number of arithmetic operations required to compute the LU factorization. We note that multiplying by permutation matrices does not require any arithmetic operations, only reordering rows (or columns). We assume for simplicity that A is n by n with rank n (if the rank is lower the number of operations is lower too). Let C(n) be the "cost", i.e. number of operations. Then by the proof of Theorem 2 C(n) = number of operations to compute X = A_21/a11 + number of operations to compute S = A_22 - X*A_12 + number of operations to compute the LU factorization of S = n-1 divisions + (n-1)^2 multiplications to form X*A_12 + (n-1)^2 substractions to subtract A_22 - X*A_12 + C(n-1) This gives us the recurrence C(n) = 2*(n-1)^2 + (n-1) + C(n-1) along with the initial value C(1) = 0, or C(n) = sum_{j=1 to n-1} (2*j^2 + j) = 2/3*n^3 + "lower order terms" where "lower order terms" means a quadratic function of n. When n is large, C(n) is well approximated by 2/3*n^3. It is interesting to note that solving a system of linear equations costs 2/3*n^3 + lower order terms, a third as much as multiplying two matrices of the same size, 2*n^3.