CS 70 - Lecture 23 - Mar 14, 2011 - 10 Evans

Goals for today (read Note 14):    
    random variables
    expection (average, mean) of random variables

 DEF: Let S be the sample space of a given experiment, with probability 
      function P. A _random variable_ is a function f:S -> Reals.

 EX 1: Flip a biased coin once, S1 = {H,T}, P1(H) = p, 
       f1(x) = {1 if x=H, -1 if x=T}
       f1 = amount you win (if f1>0) (or lose, if f1<0) if you bet $1 on H.

 EX 2: Flip a biased coin n times, S2 = {all sequences of H, T of length n}
ASK&WAIT: What is P2(x), if x has i Heads?
       f2(x) = #H - #T = #H - (n-#H) = 2*#H - n
       f2 = amount you win (or lose) if you bet $1 on H on each flip

 EX 3: Let S3 = result of rolling a die once, P3(any face) = 1/6
     Let f3(die) = value on top of die (an integer from 1 to 6)

 EX 4: Let S4 = result of rolling a pair of red and blue dice 24 times
           = { ((1,1),(1,1),...,(1,1)),...,((6,6),(6,6),...,(6,6))}
               <----- 24 times ------>     <----- 24 times ------>
ASK&WAIT:   What is P4(x) for any x in S4?
     Let f4(x) = { +1 if a pair of sixes appears in x }
                 { -1 otherwise                       }
     We can interpret f4 as the amount of money we win (or lose) by 
     betting on getting a pair of sixes

 EX 5: S5 = {US population}, P5(person x in S5) = 1/|S5|, 
     Let f5(person x in S) = { +1 if x has a particular disease }
                             {  0 if x does not                 }

 EX 6: Suppose you have a pile of n graded homework assignments to hand
     back to a class. But you shuffle them randomly, so all permutations
     are equally likely, and hand them back in that order, so each student
     gets a random homework. How many students get their own homework back?

     Let the students be named {1,2,...n}.
     S6 = {all permutations of 1 to n}, P6(any permutation) = 1/n!
     Let a particular permutation of (1,...,n) be 
     sigma = (sigma(1),sigma(2),...,sigma(n));
     then student 1 gets homework sigma(1), student 2 gets homework sigma(2),
     and student i get homework sigma(i). So student i gets her own homework
     back if and only if i = sigma(i).

     g_i(sigma) = 1 if i = sigma(i), and 0 otherwise
ASK&WAIT: What random variable tells us how many students get their 
     own homwork back?

EX: Suppose you flip a fair coin, and win $1 if it comes up H, 
    lose $1 if it comes up T
ASK&WAIT: What is the "average" amount you expect to win after N flips?

 DEF: Given S, P and random variable f, the _Expected Value_
      (also called Mean or Average) of f is 
          E(f) = sum_{all x in S} P(x)*f(x)

 This is the "average" value of f ones gets if one repeats the experiment
    a great number of times. 
 EX 1: With S1, P1, f1 as before, (flip coin once, bet $1 on H)
          E(f1) = (+1)*(p) + (-1)*(1-p) = 2*p-1
                = 0 if coin fair (p=1/2)
     Imagine betting $1 on getting H. Then E(f1) is the amount you expect to
     win (if E(f1)>0) or lose (E(f1)<0) on the bet. If E(f1)=0, you break even

 EX 2: With S2, P2 and f2 as before, (flip coin N times, bet $1 on H)
     If we flip a coin N times, we expect E(f2) to be the amount we win 
     betting $1 on flip to get H; and intuitively this should be 
     N*E(f1) = N*(2*p-1)
     Formally, we get
      E(f2) = sum_{sequences x of n Hs and Ts} f2(x)*P2(x)
            = sum_{sequences x of n Hs and Ts} (#H-#T in x)*P2(x)
      looks complicated, but later we will see that our intuition was right,
      and there is an easier way to do it that matches our intuitive approach

 EX 3: With S3, P3, f3 as before, (roll die once)
          E(f3) = (1/6)*1 + (1/6)*2 + ... + (1/6)*6 = 21/6 = 7/2

 EX 5: With S5, P5, f5 as before, (choose random person, are they sick?)
     E(f5) = sum_{persons x} f5(x)*P5(x)
           = sum_{sick persons x} f5(x)*P5(x) + sum_{healthy persons x} f5(x)*P5(x)
           = sum_{sick persons x} 1*(1/|S5|) + sum_{healthy persons x} 0*(1/|S5|)
           = P(random person is sick)

 EX 6: S6, P6, g as before (return homeworks at random, who get their own back)
     then E(g) = average number of students who get back their
     own homework. Looks like a sum over n! terms, again we need a better approach...

 EX 4: With S4, P4, f4, (roll red/blue dice 24 times, bet on pair of sixes)
     seems like you need to sum over all 6^48 sequences,
     We need a simpler way:

 DEF: P(f=r) = sum_{all x in S such that f(x)=r} P(x)

 EX 1:  With S1, P1 and f1 as before (flip coin once, bet $1 on H)
      P1(f1=1) = P1(H) = p, P1(f1=-1) = P1(T) = 1-p
 EX 2:  With S2, P2 and f2 as before (flip coin N times, bet $1 on H)
ASK&WAIT: What is P2(f2=i)?

 EX 3:  With S3, P3 and f3 as before (roll die once)
      P3(f3=k) = 1/6 for k=1,2,...,6 and P3(f3=k)=0 otherwise

 EX 4:  With S4, P4 and f4 as before (roll red/blue dice 24 times, bet on pair of sixes)
      P4(f4=1) = sum_{all x in which a pair of sixes appears} P4(x)
               = P4(a pair of sixes appears)
ASK&&WAIT: What is P4(f4=-1)?

 EX 5:  With S5, P5 and f5 as above, (choose random person, are they sick?)
ASK&WAIT: what is P5(f5=1)? P5(f5=0)?

 Thm: E(f) = sum_{numbers r in range of f} r*P(f=r)
    Proof: Write down proof for S finite, but same for S countably infinite
           Let {r1,r2,...,rk} be numbers in range of f, and write
           S = S1 U S2 U ... U Sk where
           Si = {x in S such that f(x)=ri}
           and so P(Si) = P(f=ri)
           Note that all Si are pairwise disjoint, so we can write
           E(f) = sum_{x in S} f(x)*P(x) 
                = sum_{x in S1} f(x)*P(x) + sum_{x in S2} f(x)*P(x) 
                  + ... + sum_{x in Sk} f(x)*P(x)
                = sum_{x in S1} r1*P(x) + sum_{x in S2} r2*P(x) 
                  + ... + sum_{x in Sk} rk*P(x)
           Look at one term: 
           sum_{x in Si} ri*P(x) = ri * sum_{x in Si} P(x)
                                 = ri * P(Si)
                                 = ri * P(f=ri)  
           so E(f) = r1*P(f=r1) + r2*P(f=r2) + ... + rk*P(f=r3)
                   = sum_{number r in range of f} r*P(f=r)
           as desired.

 EX 3: With S3, P3 and f3 as above, (roll die once)
     E(f3) = sum_{k=1 to 6} k*P(f=k) = sum_{k=1 to 6} k*(1/6) = 7/2 as before

 EX 4: With S4, P4, f4 as above (roll red/blue dice 24 times, bet $1 on pair of sixes.)
     E(f4) is the average amount one wins (if E(f4)>0) or loses (if E(f4)<0)
     every time one plays.  
     E(f4) = sum_{numbers r in range of f} r*P(f4=r)
           = +1*P4(getting pair of sixes) + (-1)*P4(not getting pair of sixes)
           = P4(getting pair of sixes) - P4(not getting pair of sixes) 
ASK&WAIT:     What is P4(not getting pair of sixes)?
     P4(getting pair of sixes) = 1 - P4(not getting pair of sixes)
                               ~ 1-.5086 = .4914
     and E(f4) = .4914 - .5086 = -.0172, so you lose in the long run

     Note: In 1654 the gambler Gombaud asked Fermat and Pascal whether
           this was a good bet, inadvertently starting the field of
           probability theory
     Note: If we do 25 rolls instead of 24, 
           P4(not getting a pair of sixes) drops to (35/36)^25 ~ .4945
           P4(getting pair of sixes) grows to .5055, so it is a good bet.

 EX 5: Let S5, P5, f5 be as above. (pick random person, are they sick?)
     E(f5) = (+1)*P(f5=1) + O*P(f5=0)
           = P(f5=1) = P(person sick)
     This is a special case of the following lemma:

 Lemma: Let S be a sample space, E subset S any event, and
        f(x) = {1 if x in E     }
               {0 if x not in E }
        Then E(f) = P(E)
ASK&WAIT: proof?

EX 2: S2, P2, f2 as above (flip coin N time, bet $1 on H)
    E(f2) = expected win betting $1 on a coin N times
          = sum_{i=-N to N} i*P2(getting i=#H-#T)
          = sum_{i=-N to N, i+N even} i*C(N,(N+i)/2)*p^((N+i)/2)^(1-p)^((N-i)/2)
    still isn't simple, so need a new idea:

 Thm: Let S and P be a sample space and probability function, and
      let f and g be two random variables. Then
            E(f+g) = E(f) + E(g)
      Proof: Let h=f+g be a new random variable.
            Then E(h) = sum_{x in S} h(x)*P(x)
                      = sum_{x in S} (f(x)+g(x))*P(x)
                      = sum_{x in S} f(x)*P(x) + sum_{x in S} g(x)*P(x)
                      = E(f)                   + E(g)

  Corollary: Let S and P be as above, and h = f1 + f2 + ... + fn
      Then E(h) = E(f1) + E(f2) + ... + E(fn)

  EX 2: Let S2, P2, f2 be as before (flip coin N times, bet $1 on H)
      Then we can write
      f2 = g1 + g2 + ... + gN where
      gi(x) = { +1 if i-th flip = H }
              { -1 if i-th flip = T }
      and E(f2) = E(g1) + E(g2) + ... + E(gN)
      For any i, E(gi) = (+1)*P(H) + (-1)*P(T) = p - (1-p) = 2*p-1
      so E(f2) = N*(2*p-1) 
      which matches our original intuition about making N independent
      bets in a row (whew!)

  EX 6: Let S6, P6 be as before (return homework at random, how many get their own back?)
      Let sigma be a permutation, 
      g_i(sigma) = 1 if sigma(i)=i, 0 otherwise
                 = 1 if i-th student gets her own homework back
      g(sigma) = sum_{i=1 to n} g_i(sigma)
               = number of students getting their own homework back
      E(g) = E(sum_{i=1 to n} g_i)
           = sum_{i=1 to n} E(g_i)
      So all we need is the probability that the i-th student gets the right homework:
         E(g_i) = P(student i gets right homework)
                = (# permutations where student i gets right homework)/n!
                = (# permutations of other (n-1) homeworks)/n!
                = (n-1)! / n! = 1/n
      Thus E(g) = sum_{i=1 to n} 1/n = 1
      So the answer is 1 student, independent of the class size n.

  EX 4: Let S4, P4, and f4 be as before (roll red/blue dice 24 times, bet $1 on pair of sixes)
      Suppose you also make the side bet
      that you win $2 if at least 8 fives come up, and lose $2.5 if
      fewer than 8 fives come up. Is this joint bet worth making?
      Answer: Let g(x) = { +2 if at least 8 fives come up in x  }
                         { -2.5 if at most 7 fives come up in x }
      P(g=+2) = P(at least 8 fives) 
              = sum_{i=8 to 48} C(48,i) * (1/6)^i * (5/6)^(48-i)
              ~ .55992
      P(g=-2.5) = P(at most 7 fives)  
                = 1 - P(at least 8 fives)
                = 1 - .55992 = .44008
      E(g) ~ +2*.55992 - 2.5*.44008 ~ .0196
      Then the value of the joint bet f4+g is 
             E(f4+g) = E(f4)+E(g) ~ -.0172+.0196 = .0024
      and being positive, is worth making (barely)

 EX: Suppose you shoot at a target, and miss it with probability p each
     time you try. What is the expected number of times you have to try
     before getting a hit?
     S = { H, MH, MMH, MMMH, .... }
     P( MM...MH ) = p^#M * (1-p)
     f( MM...MH ) = #shots = #M + 1
     We want E(f) = sum_{m=0}^infinity (m+1)*p^m*(1-p)
          Recall    sum_{m=0}^infinity p^m = 1/(1-p)
          so d/dp ( sum_{m=0}^infinity p^m ) = d/dp ( 1/(1-p) )
          or    sum_{m=0}^infinity m*p*{m-1} = 1/(1-p)^2
          or    sum_{m=0}^infinity m*p^m*(1-p) = p/(1-p)
             so sum_{m=0}^infinity (m+1)*p^m*(1-p) = 
                       p/(1-p) + (1-p)/(1-p) = 1/(1-p)
          so E(f) = 1/P(hit)
     So if P(M)=.99, you need to take 1/(1-.99) = 100 shots on average to hit