CS 70 - Lecture 30 - Apr 6, 2011 - 10 Evans Goals for today: Law of Large Numbers Independent Random Variables (Please read Note 17, beginning of 18) Question: Suppose we want to take a poll on whether taxes should be increased in California. How many people (n) do we need to ask to be 95% sure that our estimate of the fraction p of "yes" votes is within 5% of the actual value? Let's formalize this as follows: Suppose we ask n randomly selected people, and the number of "yes" votes in this group is S_n. Then our estimate of p will be A_n = S_n/n. Intuitively, when n is large A_n should be close to p. To see how close, write S_n = X_1 + X_2 + ... + X_n where X_i = { 1 if the i-th person says "yes" { 0 if the i-th person says "no" Since person i is chosen randomly, X_i is a random variable where X_i = { 1 with probability p { 0 with probability 1-p Thus E(X_i) = 1*p + 1*(1-p) = p, E(S_n) = sum_i E(X_i) = n*p, and E(A_n) = (1/n)*E(S_n) = p as desired. To see how close A_n is to p, we need to know its variance, and use Chebyshev's inequality. First, we compute V(S_n) as we have done before: V(S_n) = E(S_n^2) - (E(S_n))^2 = E((sum_i X_i)^2) - (n*p)^2 = E(sum_i (X_i)^2 + sum_{i neq j} X_i*X_j) - (n*p)^2 = sum_i E((X_i)^2) + sum_{i neq j} E(X_i*X_j) - (n*p)^2 = sum_i (p) + sum_{i neq j} p^2 - (n*p)^2 = n*p + (n^2-n)*p^2 - (n*p)^2 = n*p*(1-p) Next we use Thm If S is a sample space, P a probability function, f is a random variable, and a and b are constants, then E(a*f+b) = a*E(f) + b V(a*f+b) = a^2*V(f) Proof: E(a*f+b) = sum_x (a*f(x)+b)*P(x) = a*sum_x f(x)*P(x) + b* sum_x P(x) = a*E(f) + b V(a*f+b) = E([(a*f+b) - E(a*f+b)]^2) = E([(a*f+b) - (a*E(f)+b)]^2) = E([a*f - a*E(f)]^2) = E(a^2*[f - E(f)]^2) = a^2*E([f - E(f)]^2) = a^2*V(f) Thus V(A_n) = V(S_n/n) = (1/n)^2*V(S_n) = p*(1-p)/n. In particular, the variance V(A_n) gets smaller as the sample size n grows, i.e. A_n get closer to its mean, as we expect. To say how close, we will use Chebyshev's inequality: Prob( | A_n - E(A_n) | >= r ) <= V(A_n)/r^2 or Prob( | A_n - p | >= r ) <= p*(1-p)/(n*r^2) Recall that our goal was to pick n large enough so that we were 95% sure of A_n being within 5% of p; this means r = .05 and p*(1-p)/(n*r^2) <= 1 - 95% = .05 or n >= p*(1-p)/.05^3 = 8000*p*(1-p) How do we do this if we don't know p? It is easy to see that p*(1-p) is maximized by p=1/2, so it is enough to choose n >= 8000*.25 = 2000 people to poll. This is a special case of a more general result, called the Law of Large Numbers, which gives conditions under which the average of more general random variables approach an expectation. To state it, we need to generalize the idea of independent events to independent random variables. Recall that events A and B (subsets of a sample space S) are independent if P(A and B) = P(A)*P(B). A typical example is flipping a fair coin twice, so S = {HH,HT,TH,TT}, with A = "first coin is H" = {HH,HT}, and B = "second coin is H" = {HH,TH} with P(A and B) = P(HH) = .25 = .5*.5 = P(A)*P(B). Now consider two random variables f_1 and f_2 where f_1 = {1 if first coin H {0 otherwise f_2 = {1 if second coin H {0 otherwise Clearly A = {x: f1(x)=1}; we abbreviate this by {f1=1}. Similarly B = {f2=1}. So we can restate the independence property as P(A and B) = P({f1=f2=1}) = P({f1=1}) * P({f2=1}) More generally, we have Def: Random variables f1 and f2 are independent if for every a and b, P({f1=a and f2=b}) = P({f1=a}) * P({f2=b}) Recall that we say A and B are independent sets when knowing whether or not x is in A tells you nothing about whether x is in B: P(B|A) = P(B and A)/P(A) = P(B)*P(A)/P(A) = P(B) Similarly, f1 and f2 are independent random variables when knowing the value of f1 tells you nothing about the value of f2: P({f2=b}|{f1=a}) = P({f2=b and f1=a})/P({f1=a}) = P({f2=b})*P({f1=a})/P({f1=a}) = P({f2=b}) The next two theorems use the property of f1 and f2 being independent to simplify computing expectations and variances of combinations of f1 and f2: Thm: If f1 and f2 are independent, then E(f1*f2) = E(f1)*E(f2) Proof: E(f1*f2) = sum_x f1(x)*f2(x)*P(x) = sum_{a,b} sum_{x: f1(x)=a and f2(x)=b} f1(x)*f2(x)*P(x) = sum_{a,b} sum_{x: f1(x)=a and f2(x)=b} a*b*P(x) = sum_{a,b} a*b* sum_{x: f1(x)=a and f2(x)=b} P(x) = sum_{a,b} a*b*P({f1=a and f2=b}) = sum_{a,b} a*b*P({f1=a})*P({f2=b}) = sum_{a} sum_{b} a*b*P({f1=a})*P({f2=b}) = sum_{a} a*P({f1=a}) * sum_{b} b*P({f2=b}) = E(f1) * E(f2) Ex: Roll 2 dice, f1 = value on first die, f2 = value on second die E(f1*f2) = E(f1)*E(f2) = (7/2)^2 = 49/4 Ex: To contrast, let f2 = f1; these have the same distribution as in the last example, but are clearly not independent. Then E(f1*f2) = E(f1^2) = 91/6 neq 49/4 Thm: if f1 and f2 are independent, then V(f1+f2) = V(f1) + V(f2) Proof: V(f1+f2) = E((f1+f2)^2) - (E(f1+f2))^2 = E(f1^2 + 2*f1*f2 + f2^2) - (E(f1)+E(f2))^2 = E(f1^2) + 2*E(f1*f2) + E(f2^2) - [ (E(f1))^2 + 2*E(f1)*E(f2) + (E(f2))^2 ] = E(f1^2) - (E(f1))^2 + E(f2^2) - (E(f2))^2 + 2*( E(f1*f2) - E(f1)*E(f2) ) = V(f1) + V(f2) + 0 Just as we generalized the idea of 2 independent sets to n mutually independent sets (so we could deal with flipping a coin n times), we need to generalize 2 independent random variables to n mutually independent random variables. Def: Random variables f1, f2, ... ,fn are mutually independent if for every subset J of {1,2,...,n}, and every set of constants a_i for i in J, P({for all i in J, fi(x)=a_i}) = prod_{i in J} P({fi(x)=a_i) Ex: Flip a coin n times, let fi(x) = {1 if i-th flip is H {0 otherwise Thm: If f1,...,fn are mutually independent then E(f1*f2*...*fn) = E(f1)*E(f2)*...*E(fn) Proof: same idea as above for f1*f2 Thm if f1,...,fn are mutually independent then V(f1+f2+...+fn) = V(f1) + V(f2) + ... + V(fn) Proof: same idea as above for f1+f2 Def: Random variables f1, f2, ..., fn are independent and identically distributed (i.i.d.) if (1) they are mutually independent (2) they have the same distribution, that is P(fi(x)=j) only depends on j, not i Ex: Same one as before, flipping a coin n times Corollary: if S_n = f1 + f2 + ... + fn is a sum of i.i.d random variables, then E(S_n) = n*E(f1) and V(S_n) = n*V(f1). Proof: By independence we just sum the expectations and variances, which are all the same as for f1. Thm (Law of Large Numbers) Let f1,f2,f3,... be i.i.d. random variables with common expectation mu and finite variance. Let A_n = (f1 + ... + fn)/n be their average. Then for any r>0, lim_{n -> infinity} P(|A_n - mu| > r) = 0 In other words, the average A_n gets closer and closer to a single value mu, with high probability. Proof: The last Corollary tells us that E(A_n) = E(S_n)/n = E(f1) = mu and V(A_n) = (1/n^2)*V(S_n) = V(f1)/n So by Chebyshev's Inequality, for any r>0 P( |A_n - mu| >= r ) <= V(f1)/(n*r^2) Taking the limit as n -> infinity yields zero as desired. <\pre>