Here's a clarification on applying linear cryptanalysis
to find a good approximation for a linear function.  Sorry
about strewing confusion in class today on this topic.

Let L be a nxn boolean matrix.  Then we can view it as a
linear function that takes n-bit strings to n-bit strings,
and every linear function can be viewed this way.

Recall that the prob. of an approximation gamma -> gamma'
for function F is
  Pr[gamma' . F(x) = gamma . x],
with x a random n-bit string and the probability taken
over the choice of x.  Here s.t represents the dot-product
of the n-bit strings s and t.  Let's view both s and t as
column vectors.  Also, write s^T for the transpose of s,
so that s^T is a n-bit row vector.  We can note that the
dot-product s.t is just s^T t, i.e., we take the transpose
of s (which is a row vector) and multiply it by t.  (The
latter multiplication is ok, since we're multiplying a
1xn vector by a nx1 vector, and we get a 1x1 vector, i.e.,
a single bit, as expected.)

A key property of the transpose is that (A B)^T = B^T A^T.
Another property is that (A^T)^T = A.

I'm finally ready to state the claim about how to approximate
a linear function L.  We can take gamma' to be arbitrary,
and then we choose gamma = L^T gamma'.  Notice that L^T is
the transpose of L, and thus is another nxn boolean matrix;
also, viewing gamma' as a n-bit column vector, we can multiply
L^T by gamma' to get a n-bit column vector, as required, so
the dimensions check out as desired.

With this choice, I claim Pr[gamma' . L(x) = gamma . x] = 1.
This can be verified with some linear algebra:
  gamma' . L(x)
    = gamma'^T L(x)
    = gamma'^T L x
    = gamma'^T (L^T)^T x   (2nd property of the transpose)
    = (L^T gamma')^T x     (1st property of the transpose)
    = gamma^T x            (by definition of gamma)
    = gamma . x,
from which the claim follows.  Therefore, every linear operation
in a cipher has a bias 1 approximation.