CS 70 - Lecture 30 - Apr 6, 2011 - 10 Evans

Goals for today:  Law of Large Numbers 
                  Independent Random Variables
                  (Please read Note 17, beginning of 18)

Question: Suppose we want to take a poll on whether taxes should be
increased in California. How many people (n) do we need to ask to be
95% sure that our estimate of the fraction p of "yes" votes is within 5%
of the actual value?

Let's formalize this as follows: Suppose we ask n randomly selected people, 
and the number of "yes" votes in this group is S_n. Then our estimate of
p will be A_n = S_n/n. Intuitively, when n is large A_n should be close to p.
To see how close, write S_n = X_1 + X_2 + ... + X_n
where X_i = { 1 if the i-th person says "yes"
            { 0 if the i-th person says "no"
Since person i is chosen randomly, X_i is a random variable where
      X_i = { 1 with probability p
            { 0 with probability 1-p
Thus E(X_i) = 1*p + 1*(1-p) = p, E(S_n) = sum_i E(X_i) = n*p,
and E(A_n) = (1/n)*E(S_n) = p as desired.
To see how close A_n is to p, we need to know its variance, and use
Chebyshev's inequality.

First, we compute V(S_n) as we have done before:
  V(S_n) = E(S_n^2) - (E(S_n))^2
         = E((sum_i X_i)^2) - (n*p)^2
         = E(sum_i (X_i)^2 + sum_{i neq j} X_i*X_j) - (n*p)^2
         = sum_i E((X_i)^2) + sum_{i neq j} E(X_i*X_j) - (n*p)^2
         = sum_i (p)        + sum_{i neq j} p^2        - (n*p)^2
         = n*p + (n^2-n)*p^2 - (n*p)^2
         = n*p*(1-p)

Next we use

Thm If S is a sample space, P a probability function, 
    f is a random variable, and a and b are constants, then
    E(a*f+b) = a*E(f) + b
    V(a*f+b) = a^2*V(f) 
    E(a*f+b) = sum_x (a*f(x)+b)*P(x) 
             = a*sum_x f(x)*P(x) + b* sum_x P(x)
             = a*E(f) + b
    V(a*f+b) = E([(a*f+b) - E(a*f+b)]^2)
             = E([(a*f+b) - (a*E(f)+b)]^2)
             = E([a*f - a*E(f)]^2)
             = E(a^2*[f - E(f)]^2)
             = a^2*E([f - E(f)]^2)
             = a^2*V(f)

Thus V(A_n) = V(S_n/n) = (1/n)^2*V(S_n) = p*(1-p)/n.
In particular, the variance V(A_n) gets smaller as the
sample size n grows, i.e. A_n get closer to its mean,
as we expect. To say how close, we will use Chebyshev's
   Prob( | A_n - E(A_n) | >= r ) <= V(A_n)/r^2
   Prob( | A_n - p | >= r ) <= p*(1-p)/(n*r^2)
Recall that our goal was to pick n large enough so that
we were 95% sure of A_n being within 5% of p; this means
    r = .05 
    p*(1-p)/(n*r^2) <= 1 - 95% = .05
    n >= p*(1-p)/.05^3 = 8000*p*(1-p)
How do we do this if we don't know p? It is easy
to see that p*(1-p) is maximized by p=1/2, so
it is enough to choose
    n >= 8000*.25 = 2000  people to poll.

This is a special case of a more general result, called
the Law of Large Numbers, which gives conditions under
which the average of more general random variables
approach an expectation.  To state it, we need to 
generalize the idea of independent events to independent
random variables. 

Recall that events A and B (subsets of a sample space S)
are independent if P(A and B) = P(A)*P(B). 
A typical example is flipping a fair coin twice, 
so   S = {HH,HT,TH,TT},
with A = "first coin is H"  = {HH,HT},
and  B = "second coin is H" = {HH,TH}
with P(A and B) = P(HH) = .25 = .5*.5 = P(A)*P(B).

Now consider two random variables f_1 and f_2 where
f_1 = {1 if first coin H
      {0 otherwise
f_2 = {1 if second coin H
      {0 otherwise
Clearly A = {x: f1(x)=1}; we abbreviate this by {f1=1}.
Similarly B = {f2=1}.
So we can restate the independence property as
   P(A and B) = P({f1=f2=1})
              = P({f1=1}) * P({f2=1})

More generally, we have
Def: Random variables f1 and f2 are independent if
for every a and b,
   P({f1=a and f2=b}) = P({f1=a}) * P({f2=b})

Recall that we say A and B are independent sets when 
knowing whether or not x is in A
tells you nothing about whether x is in B:
  P(B|A) = P(B and A)/P(A) = P(B)*P(A)/P(A) = P(B)
Similarly, f1 and f2 are independent random variables
when knowing the value of f1 tells you nothing about
the value of f2:
  P({f2=b}|{f1=a}) = P({f2=b and f1=a})/P({f1=a})
                   = P({f2=b})*P({f1=a})/P({f1=a})
                   = P({f2=b})

The next two theorems use the property of f1 and f2 being
independent to simplify computing expectations and variances
of combinations of f1 and f2:

Thm: If f1 and f2 are independent, then E(f1*f2) = E(f1)*E(f2)
Proof: E(f1*f2) = sum_x f1(x)*f2(x)*P(x)
                = sum_{a,b} sum_{x: f1(x)=a and f2(x)=b} f1(x)*f2(x)*P(x)
                = sum_{a,b} sum_{x: f1(x)=a and f2(x)=b} a*b*P(x)
                = sum_{a,b} a*b* sum_{x: f1(x)=a and f2(x)=b} P(x)
                = sum_{a,b} a*b*P({f1=a and f2=b})
                = sum_{a,b} a*b*P({f1=a})*P({f2=b})
                = sum_{a} sum_{b} a*b*P({f1=a})*P({f2=b})
                = sum_{a} a*P({f1=a}) * sum_{b} b*P({f2=b})
                = E(f1) * E(f2)

Ex: Roll 2 dice, f1 = value on first die, f2 = value on second die
E(f1*f2) = E(f1)*E(f2) = (7/2)^2 = 49/4

Ex: To contrast, let f2 = f1; these have the same distribution
as in the last example, but are clearly not independent.
Then E(f1*f2) = E(f1^2) =  91/6 neq 49/4

Thm: if f1 and f2 are independent, then V(f1+f2) = V(f1) + V(f2)
Proof: V(f1+f2) = E((f1+f2)^2) - (E(f1+f2))^2
                = E(f1^2 + 2*f1*f2 + f2^2) - (E(f1)+E(f2))^2
                = E(f1^2) + 2*E(f1*f2) + E(f2^2) - 
                   [ (E(f1))^2 + 2*E(f1)*E(f2) + (E(f2))^2 ]
                = E(f1^2) - (E(f1))^2
                   + E(f2^2) - (E(f2))^2
                     + 2*( E(f1*f2) - E(f1)*E(f2) )
                = V(f1) + V(f2) + 0

Just as we generalized the idea of 2 independent sets to
n mutually independent sets (so we could deal with flipping a coin
n times), we need to generalize 2 independent random variables to
n mutually independent random variables.

Def: Random variables f1, f2, ... ,fn are mutually independent
if for every subset J of {1,2,...,n}, and every set of constants
a_i for i in J,
   P({for all i in J, fi(x)=a_i}) = prod_{i in J} P({fi(x)=a_i)

Ex: Flip a coin n times, let 
    fi(x) = {1 if i-th flip is H
            {0 otherwise

Thm: If f1,...,fn are mutually independent then
     E(f1*f2*...*fn) = E(f1)*E(f2)*...*E(fn)
Proof: same idea as above for f1*f2

Thm if f1,...,fn are mutually independent then
    V(f1+f2+...+fn) = V(f1) + V(f2) + ... + V(fn)
Proof: same idea as above for f1+f2

Def: Random variables f1, f2, ..., fn are independent and
identically distributed (i.i.d.) if 
  (1) they are mutually independent
  (2) they have the same distribution, that is P(fi(x)=j) only 
      depends on j, not i

Ex: Same one as before, flipping a coin n times

Corollary: if S_n = f1 + f2 + ... + fn is a sum
of i.i.d random variables, then E(S_n) = n*E(f1)
and V(S_n) = n*V(f1).
Proof: By independence we just sum the expectations
and variances, which are all the same as for f1.

Thm (Law of Large Numbers) Let f1,f2,f3,... be i.i.d.
random variables with common expectation mu and finite variance.
Let A_n = (f1 + ... + fn)/n  be their average. Then for any r>0,
   lim_{n -> infinity} P(|A_n - mu| > r) = 0
In other words, the average A_n gets closer and closer to
a single value mu, with high probability.

Proof: The last Corollary tells us that 
   E(A_n) = E(S_n)/n = E(f1) = mu
   V(A_n) = (1/n^2)*V(S_n) = V(f1)/n
So by Chebyshev's Inequality, for any r>0
   P( |A_n - mu| >= r ) <= V(f1)/(n*r^2)
Taking the limit as n -> infinity yields zero as desired.