CS 70 - Lecture 23 - Mar 14, 2011 - 10 Evans Goals for today (read Note 14): random variables expection (average, mean) of random variables DEF: Let S be the sample space of a given experiment, with probability function P. A _random variable_ is a function f:S -> Reals. EX 1: Flip a biased coin once, S1 = {H,T}, P1(H) = p, f1(x) = {1 if x=H, -1 if x=T} f1 = amount you win (if f1>0) (or lose, if f1<0) if you bet $1 on H. EX 2: Flip a biased coin n times, S2 = {all sequences of H, T of length n} ASK&WAIT: What is P2(x), if x has i Heads? f2(x) = #H - #T = #H - (n-#H) = 2*#H - n f2 = amount you win (or lose) if you bet $1 on H on each flip EX 3: Let S3 = result of rolling a die once, P3(any face) = 1/6 Let f3(die) = value on top of die (an integer from 1 to 6) EX 4: Let S4 = result of rolling a pair of red and blue dice 24 times = { ((1,1),(1,1),...,(1,1)),...,((6,6),(6,6),...,(6,6))} <----- 24 times ------> <----- 24 times ------> ASK&WAIT: What is P4(x) for any x in S4? Let f4(x) = { +1 if a pair of sixes appears in x } { -1 otherwise } We can interpret f4 as the amount of money we win (or lose) by betting on getting a pair of sixes EX 5: S5 = {US population}, P5(person x in S5) = 1/|S5|, Let f5(person x in S) = { +1 if x has a particular disease } { 0 if x does not } EX 6: Suppose you have a pile of n graded homework assignments to hand back to a class. But you shuffle them randomly, so all permutations are equally likely, and hand them back in that order, so each student gets a random homework. How many students get their own homework back? Let the students be named {1,2,...n}. S6 = {all permutations of 1 to n}, P6(any permutation) = 1/n! Let a particular permutation of (1,...,n) be sigma = (sigma(1),sigma(2),...,sigma(n)); then student 1 gets homework sigma(1), student 2 gets homework sigma(2), and student i get homework sigma(i). So student i gets her own homework back if and only if i = sigma(i). g_i(sigma) = 1 if i = sigma(i), and 0 otherwise ASK&WAIT: What random variable tells us how many students get their own homwork back? EX: Suppose you flip a fair coin, and win $1 if it comes up H, lose $1 if it comes up T ASK&WAIT: What is the "average" amount you expect to win after N flips? DEF: Given S, P and random variable f, the _Expected Value_ (also called Mean or Average) of f is E(f) = sum_{all x in S} P(x)*f(x) This is the "average" value of f ones gets if one repeats the experiment a great number of times. EX 1: With S1, P1, f1 as before, (flip coin once, bet $1 on H) E(f1) = (+1)*(p) + (-1)*(1-p) = 2*p-1 = 0 if coin fair (p=1/2) Imagine betting $1 on getting H. Then E(f1) is the amount you expect to win (if E(f1)>0) or lose (E(f1)<0) on the bet. If E(f1)=0, you break even EX 2: With S2, P2 and f2 as before, (flip coin N times, bet $1 on H) If we flip a coin N times, we expect E(f2) to be the amount we win betting $1 on flip to get H; and intuitively this should be N*E(f1) = N*(2*p-1) Formally, we get E(f2) = sum_{sequences x of n Hs and Ts} f2(x)*P2(x) = sum_{sequences x of n Hs and Ts} (#H-#T in x)*P2(x) looks complicated, but later we will see that our intuition was right, and there is an easier way to do it that matches our intuitive approach EX 3: With S3, P3, f3 as before, (roll die once) E(f3) = (1/6)*1 + (1/6)*2 + ... + (1/6)*6 = 21/6 = 7/2 EX 5: With S5, P5, f5 as before, (choose random person, are they sick?) E(f5) = sum_{persons x} f5(x)*P5(x) = sum_{sick persons x} f5(x)*P5(x) + sum_{healthy persons x} f5(x)*P5(x) = sum_{sick persons x} 1*(1/|S5|) + sum_{healthy persons x} 0*(1/|S5|) = P(random person is sick) EX 6: S6, P6, g as before (return homeworks at random, who get their own back) then E(g) = average number of students who get back their own homework. Looks like a sum over n! terms, again we need a better approach... EX 4: With S4, P4, f4, (roll red/blue dice 24 times, bet on pair of sixes) seems like you need to sum over all 6^48 sequences, We need a simpler way: DEF: P(f=r) = sum_{all x in S such that f(x)=r} P(x) EX 1: With S1, P1 and f1 as before (flip coin once, bet $1 on H) P1(f1=1) = P1(H) = p, P1(f1=-1) = P1(T) = 1-p EX 2: With S2, P2 and f2 as before (flip coin N times, bet $1 on H) ASK&WAIT: What is P2(f2=i)? EX 3: With S3, P3 and f3 as before (roll die once) P3(f3=k) = 1/6 for k=1,2,...,6 and P3(f3=k)=0 otherwise EX 4: With S4, P4 and f4 as before (roll red/blue dice 24 times, bet on pair of sixes) P4(f4=1) = sum_{all x in which a pair of sixes appears} P4(x) = P4(a pair of sixes appears) ASK&&WAIT: What is P4(f4=-1)? EX 5: With S5, P5 and f5 as above, (choose random person, are they sick?) ASK&WAIT: what is P5(f5=1)? P5(f5=0)? Thm: E(f) = sum_{numbers r in range of f} r*P(f=r) Proof: Write down proof for S finite, but same for S countably infinite Let {r1,r2,...,rk} be numbers in range of f, and write S = S1 U S2 U ... U Sk where Si = {x in S such that f(x)=ri} and so P(Si) = P(f=ri) Note that all Si are pairwise disjoint, so we can write E(f) = sum_{x in S} f(x)*P(x) = sum_{x in S1} f(x)*P(x) + sum_{x in S2} f(x)*P(x) + ... + sum_{x in Sk} f(x)*P(x) = sum_{x in S1} r1*P(x) + sum_{x in S2} r2*P(x) + ... + sum_{x in Sk} rk*P(x) Look at one term: sum_{x in Si} ri*P(x) = ri * sum_{x in Si} P(x) = ri * P(Si) = ri * P(f=ri) so E(f) = r1*P(f=r1) + r2*P(f=r2) + ... + rk*P(f=r3) = sum_{number r in range of f} r*P(f=r) as desired. EX 3: With S3, P3 and f3 as above, (roll die once) E(f3) = sum_{k=1 to 6} k*P(f=k) = sum_{k=1 to 6} k*(1/6) = 7/2 as before EX 4: With S4, P4, f4 as above (roll red/blue dice 24 times, bet $1 on pair of sixes.) E(f4) is the average amount one wins (if E(f4)>0) or loses (if E(f4)<0) every time one plays. E(f4) = sum_{numbers r in range of f} r*P(f4=r) = +1*P4(getting pair of sixes) + (-1)*P4(not getting pair of sixes) = P4(getting pair of sixes) - P4(not getting pair of sixes) ASK&WAIT: What is P4(not getting pair of sixes)? P4(getting pair of sixes) = 1 - P4(not getting pair of sixes) ~ 1-.5086 = .4914 and E(f4) = .4914 - .5086 = -.0172, so you lose in the long run Note: In 1654 the gambler Gombaud asked Fermat and Pascal whether this was a good bet, inadvertently starting the field of probability theory Note: If we do 25 rolls instead of 24, P4(not getting a pair of sixes) drops to (35/36)^25 ~ .4945 P4(getting pair of sixes) grows to .5055, so it is a good bet. EX 5: Let S5, P5, f5 be as above. (pick random person, are they sick?) E(f5) = (+1)*P(f5=1) + O*P(f5=0) = P(f5=1) = P(person sick) This is a special case of the following lemma: Lemma: Let S be a sample space, E subset S any event, and f(x) = {1 if x in E } {0 if x not in E } Then E(f) = P(E) ASK&WAIT: proof? EX 2: S2, P2, f2 as above (flip coin N time, bet $1 on H) E(f2) = expected win betting $1 on a coin N times = sum_{i=-N to N} i*P2(getting i=#H-#T) = sum_{i=-N to N, i+N even} i*C(N,(N+i)/2)*p^((N+i)/2)^(1-p)^((N-i)/2) still isn't simple, so need a new idea: Thm: Let S and P be a sample space and probability function, and let f and g be two random variables. Then E(f+g) = E(f) + E(g) Proof: Let h=f+g be a new random variable. Then E(h) = sum_{x in S} h(x)*P(x) = sum_{x in S} (f(x)+g(x))*P(x) = sum_{x in S} f(x)*P(x) + sum_{x in S} g(x)*P(x) = E(f) + E(g) Corollary: Let S and P be as above, and h = f1 + f2 + ... + fn Then E(h) = E(f1) + E(f2) + ... + E(fn) EX 2: Let S2, P2, f2 be as before (flip coin N times, bet $1 on H) Then we can write f2 = g1 + g2 + ... + gN where gi(x) = { +1 if i-th flip = H } { -1 if i-th flip = T } and E(f2) = E(g1) + E(g2) + ... + E(gN) For any i, E(gi) = (+1)*P(H) + (-1)*P(T) = p - (1-p) = 2*p-1 so E(f2) = N*(2*p-1) which matches our original intuition about making N independent bets in a row (whew!) EX 6: Let S6, P6 be as before (return homework at random, how many get their own back?) Let sigma be a permutation, g_i(sigma) = 1 if sigma(i)=i, 0 otherwise = 1 if i-th student gets her own homework back g(sigma) = sum_{i=1 to n} g_i(sigma) = number of students getting their own homework back E(g) = E(sum_{i=1 to n} g_i) = sum_{i=1 to n} E(g_i) So all we need is the probability that the i-th student gets the right homework: E(g_i) = P(student i gets right homework) = (# permutations where student i gets right homework)/n! = (# permutations of other (n-1) homeworks)/n! = (n-1)! / n! = 1/n Thus E(g) = sum_{i=1 to n} 1/n = 1 So the answer is 1 student, independent of the class size n. EX 4: Let S4, P4, and f4 be as before (roll red/blue dice 24 times, bet $1 on pair of sixes) Suppose you also make the side bet that you win $2 if at least 8 fives come up, and lose $2.5 if fewer than 8 fives come up. Is this joint bet worth making? Answer: Let g(x) = { +2 if at least 8 fives come up in x } { -2.5 if at most 7 fives come up in x } P(g=+2) = P(at least 8 fives) = sum_{i=8 to 48} C(48,i) * (1/6)^i * (5/6)^(48-i) ~ .55992 P(g=-2.5) = P(at most 7 fives) = 1 - P(at least 8 fives) = 1 - .55992 = .44008 E(g) ~ +2*.55992 - 2.5*.44008 ~ .0196 Then the value of the joint bet f4+g is E(f4+g) = E(f4)+E(g) ~ -.0172+.0196 = .0024 and being positive, is worth making (barely) EX: Suppose you shoot at a target, and miss it with probability p each time you try. What is the expected number of times you have to try before getting a hit? S = { H, MH, MMH, MMMH, .... } P( MM...MH ) = p^#M * (1-p) f( MM...MH ) = #shots = #M + 1 We want E(f) = sum_{m=0}^infinity (m+1)*p^m*(1-p) Recall sum_{m=0}^infinity p^m = 1/(1-p) so d/dp ( sum_{m=0}^infinity p^m ) = d/dp ( 1/(1-p) ) or sum_{m=0}^infinity m*p*{m-1} = 1/(1-p)^2 or sum_{m=0}^infinity m*p^m*(1-p) = p/(1-p) so sum_{m=0}^infinity (m+1)*p^m*(1-p) = p/(1-p) + (1-p)/(1-p) = 1/(1-p) so E(f) = 1/P(hit) So if P(M)=.99, you need to take 1/(1-.99) = 100 shots on average to hit