CS 70 - Lecture 26 - Mar 28, 2011 - 10 Evans

Goals for today (read Note 15):    
    quick review of random variables and expectation
    important distributions

 DEF: Let S be the sample space of a given experiment, with probability 
      function P. A _random variable_ is a function f:S -> Reals.

 EX 1: Flip a biased coin once, S1 = {H,T}, P1(H) = p, 
       f1(x) = {1 if x=H, -1 if x=T}
       f1 = amount you win (if f1 = 1) (or lose, if f1 = -1) if you bet $1 on H.

 EX 2: Flip a biased coin n times, S2 = {all sequences of H, T of length n}
       P2(x) = p^#H * (1-p)^(n-#H)   where #H = #Heads in x, n-#H = #Tails
       f2(x) = #H - #T = #H - (n-#H) = 2*#H - n
       f2 = amount you win (or lose) if you bet $1 on H on each flip

 DEF: Given S, P and random variable f, the _Expected Value_
      (also called Mean or Average) of f is 
          E(f) = sum_{all x in S} P(x)*f(x)

 EX 1: E(f1) = 1*p + (-1)*(1-p) = 2*p-1
 EX 2: E(f2) = sum over all 2^n sequences x of n Heads and Tails, of
                 (2*#H-n) * p^#H*(1-p)^(n-#H)
       prefer something simpler to sum

 DEF: P(f=r) = sum_{all x in S such that f(x)=r} P(x)

 DEF: We call the set of pairs of values 
        {(r,P(f=r)) for all r in the range of f} 
      the distribution of f.

 Thm 1: E(f) = sum_{numbers r in range of f} r*P(f=r)

 EX 1: P(H) = P(f1=1) = p, P(T) = P(f1=-1) = 1-p
       so E(f1) = 1*p + (-1)*(1-p) = 2*p-1, same as before

 EX 2: P(f2 = r) = P(#H - #T = r) = P(2*#H - n = r)
       = P(#H = (n+r)/2)
       = { p^((n+r)/2) * (1-p)^((n-r)/2)  if (n+r) is even
         { 0 otherwise,   eg when n=1 and r=0
       so E(f2) = sum_{r = -n to n by steps of 2}
                  r * p^((n+r)/2) * (1-p)^((n-r))/2)
       This only has n+1 terms to sum instead of 2^n as before,
       but we still want something simpler.

 Thm 2: If g_1(x),...,g_n(x) are random variables, then
        g(x) = sum_{i=1 to n} g_i(x) is another random variable
        with expectation
        E(g) = sum_{i=1 to n} E(g_i)

 EX 2: Let g_i(x) = {+1 if the i-th toss is Heads
                    {-1 if the i-th toss is Tails
                  = how much you win or lose on the i-th toss
       so f2(x) = sum_{i=1 to n} g_i(x)
                = total you win or lose on all n tosses
       so E(f2) = sum_{i=1 to n} E(g_i)
       Note that g_i is the same as f_1, how much you win or
       lose on 1 toss, so E(g_i) = E(f_1) = 2*p-1, and
       E(f2) = n*(2*p-1)

End of review!

Recall that the distribution of a random variable is the set
   {(r,P(f=r)) where r is in the range of f} 

There are certain important distributions that come up repeatedly,
that we need to recognize. The first we have seen already:

DEF: if P(f=r) = C(n,r) * p^r * (1-p)^(n-r) for r=0,...,n,
     then we say f has a binomial distribution, and abbreviate this as
     f ~ Bin(n,p)

Geometric Distribution:

 EX: Suppose you shoot at a target, and hit with probability p each
     time you try. What is the expected number of times you have to try
     before getting a hit?
     S = { H, MH, MMH, MMMH, .... }
     P( MM...MH ) = (1-p)^#M * p
     f( MM...MH ) = #shots = #M + 1
     so P(f=r) = P(r = #M+1) = (1-p)^(r-1) * p

DEF: We say that f has the geometric distribution with parameter p
     if f:S-> {1,2,3,...} and
     P(f=r) = (1-p)^(r-1) * p
     We abbreviate this by f ~ Geom(p)

Check that 1 = sum_{r=1 to infinity} P(f=r)
             = sum_{r=1 to infinity} (1-p)^(r-1) * p
             = p * sum_{r=1 to infinity} (1-p)^(r-1) 
             = p * 1/(1-(1-p)) = p / p = 1 - ok!

We want E(f) = sum_{r=1}^infinity r*(1-p)^(r-1)*p
     Start from geometric sum:    sum_{r=1}^infinity (1-p)^(r-1) = 1/p
     so d/dp ( sum_{r=1}^infinity (1-p)^(r-1) ) = d/dp ( 1/p )
     or    sum_{r=1}^infinity -(r-1)*(1-p)*(r-2) = -1/p^2
     or    sum_{r=1}^infinity (r-1)*(1-p)^(r-2)*p = 1/p
     so E(f) = 1/p = 1/P(hit)
So if P(hit)=.01, you need to take 1/.01 = 100 shots on average to hit

Here is another way to compute E(f), for random variables whose
values can be positive integers N+ = {1,2,3...}

Thm: Let f:S->N+ be a random variable. Then
     E(f) = sum_{i=1 to infinity} P(f >= i)
Proof: Write P(f=i) as p_i, so
       E(f) = sum_{i=1 to infinity} i * p_i
            = p_1 + (p_2 + p_2) + (p_3 + p_3 + p_3) + 
              (p_4 added 4 times) + ... (p_i added i times) ...
            = (p_1 + p_2 + p_3 + p_4 + ...)
             +(      p_2 + p_3 + p_4 + ...)
             +(            p_3 + p_4 + ...)
             +(                  p_4 + ...) + ...
            = P(f >= 1) + P(f >= 2) + P(f >= 3) + P(f >= 4) + ...

Ex: When P(f=r) = (1-p)^(r-1)*p, then
         P(f>=i) = sum_{r = i to infinity) (1-p)^(r-1)*p
                 = (1-p)^(i-1) * p * sum_{r=0 to infinity} (1-p)^r
                 = (1-p)^(i-1) * p * (1/(1-(1-p)))
                 = (1-p)^(i-1)
    so E(f) = sum_{i=1 to infinity} (1-p)^(i-1)
            = 1/(1-(1-p)) = 1/p as before 

Summarizing we have
Thm: If f has a geometric distribution with parameter p, that is
     P(f=r) = (1-p)^(r-1)*p, then
     P(f >= i) = (1-p)^(i-1)
     E(f) = 1/p

Beside the number of throws until we hit a target, there are
various of other examples:
   number of runs before a system fails
   number of retransmissions of a packet before it is successfully sent, etc.

Ex: Coupon Collector, revisited
   We buy cereal boxes, each of which contains one of n random baseball cards.
   What is the expected number of boxes I have to buy to have at least one
   of each card?

   To define the sample space carefully, let the different baseball cards be 
   numbered from 1,...,n, so the sample space is
   S = {all sequences w of numbers from 1 to n such that
          each number appears at least once
          the last number in the sequence appears exactly once}
   The random variable whose expectation we want is 
    f(w) = length of w
   which is rather complicated to compute directly.
   So instead we write f(w) as a sum of other random variables,
   each of which has a geometric distribution, and just add their expectations.

   Let f_i(w) = number of additional boxes we need to buy to get the
     i-th distinct card in w, after getting the (i-1)-st distinct card.
   Ex: suppose n=3, and w = 111221123. Then
       f_1(w) = 1 (the first box always contains a card we haven't seen yet)
       f_2(w) = 3 (since we need 3 more boxes to get the first card besides 1)
       f_3(w) = 5 (since we need 5 more boxes to get the first card besides 1 and 2)
   Then clearly f(w) = length of w = f_1(w) + f_2(w) + ... + f_n(w)
   and so E(f) = sum_{i=1 to n} E(f_i)
  
   We claim that each f_i has a geometric distribution:
     f_1 is trivially geometric with p = probability that the first card
         is different than previous ones = 1
     f_2 is geometric with p = probability that the next card
         is different from the 1 already gotten = (n-1)/n
     f_3 is geometric with p = probability that the next card
         is different from the 2 already gotten = (n-2)/n
 ... f_i is geometric with p = probability that the next card
         is different from the i-1 already gotten = (n-(i-1))/n

   Thus E(f_i) = 1/((n-(i-1))/n) = n/(n-i+1)
   and E(f) = sum_{i=1 to n} n/(n-i+1)
            = n * sum_{j=1 to n} 1/j

   We can approximate this sum by an integral:
    integer_{1 to n+1} dx/x <= sum_{j=1 to n} 1/j <= 1 + integral_{1 to n} dx/x 
   or
        ln(n) <= ln(n+1) <= sum_{j=1 to n} 1/j <= 1 + ln(n)  
   In fact, it is known that 
       sum_{j=1 to n} 1/j ~ ln(n) + gamma
   where gamma = .5772... is called Euler's constant.

   Finally we get E(f) ~ n*(ln(n) + gamma).

   Recall that last time we looked at this problem, we showed
   that if you bought > = n*ln(2*n) boxes, then the probability of
   having at least one of each card is >= 1/2, so we got a similar result.

Poisson Distribution:

Suppose we throw n balls into n/lambda bins, where lambda is a constant.
We want to know the probability that i balls land in a particular bin,
when n is very large. The exact distribution is binomial, with
the probability of landing in bin 1 (say) equal to 1/(n/lambda) = lambda/n, so

   P(i balls land in bin 1) = C(n,i) *  (lambda/n)^i * (1- lambda/n)^(n-i)

The problem is that all we know about n is that it is very large;
what do we do? It is easiest to start with
   P(0 balls land in bin 1) = C(n,0) * (1 - lambda/n)^n
         = (1 - lambda/n)^((n/lambda)*lambda)
As n grows toward infinity, x = lambda/n shrinks to zero, and we know
   lim_{x -> 0} (1 - x)^(1/x) = 1/e = 1/2.71828... = .367...
Therefore
   lim_{n -> infinity} P(0 balls land in bin 1) = (1/e)^lambda = exp(-lambda)
More generally, we want simple expressions for
   p_i = lim_{n -> infinity} P(i balls land in bin i)
       = lim_{n -> infinity} C(n,i) * (lambda/n)^i * (1 - lambda/n)^(n-i)
We know p_0 = exp(-lambda), and the easiest way to figure out the rest
is to look at the ratios
   p_i / p_{i-1} = lim_{n -> infinity} 
     [ C(n,i)   * (lambda/n)^i     * (1 - lambda/n)^(n-i)     ] /
     [ C(n,i-1) * (lambda/n)^(i-1) * (1 - lambda/n)^(n-(i-1)) 
   = lim_{n -> infinity} 
     [ C(n,i) / C(n,i-1) ] * (lambda/n) * (1 - lambda/n)^(-1) ] 
   = lim_{n -> infinity} 
     [ (n!/(i!(n-i)!) / (n!/(i-1)!(n-i+1)! ] * (lambda/n) * (1 - lambda/n)^(-1) ]
   = lim_{n -> infinity} 
     [ (n-i+1)/i ] * (lambda/n) * (1 - lambda/n)^(-1) ]
   = (lambda/i) * lim_{n -> infinity} [ (n-i+1)/n ] * (1 - lambda/n)^(-1) ]
   = (lambda/i)
or p_i = (lambda/i)*p_{i-1}
This yields a simple recurrence:
  p_0 = exp(-lambda)
  p_1 = exp(-lambda)*lambda
  p_2 = exp(-lambda)*lambda^2/2
  p_3 = exp(-lambda)*lambda^3/(3*2)
  p_4 = exp(-lambda)*lambda^4/(4*3*2)
  ...
  p_i = exp(-lambda)*lambda^i/i!

Let's check that
  1 = sum_{i=0 to infinity} p_i
    = sum_{i=0 to infinity} exp(-lambda) * lambda^i/i!
    = exp(-lambda) * sum_{i=0 to infinity} lambda^i/i!
    = exp(-lambda) * exp(lambda) = 1 as desired.

DEF: A random variable f for which
       P(f=i) = exp(-lambda)*lambda^i/i!
     for i=0,1,2,... is said to have a Poisson distribution
     with parameter lambda, or more briefly f ~ Poiss(lambda)

If f ~ Poiss(lambda), we compute its expectation as follows:

E(f) = sum_{i=0 to infinity} i * exp(-lambda)* lambda^i/i!
     = exp(-lambda) * sum_{i=1 to infinity} * lambda^i/(i-1)!
     = lambda * exp(-lambda) * sum_{i=1 to infinity} * lambda^(i-1)/(i-1)!
     = lambda * exp(-lambda) * exp(lambda)
     = lambda

If you plot P(f=i) versus i, it increases until i ~ lambda,
and then decreases.

The Poisson distribution is useful for modeling "rare events",
as the following example shows.

Ex: Suppose a web server gets an average of 100K requests/day,
You want to make sure to have enough computers to handle most
bursts in activity. One computer takes 1 second to handle one request.
How many computers do you need to handle bursts in requests?

The "rare event" here is not a request arriving at the web server,
but one person choosing to make a request, which we assume is
done independently by each person (maybe not true if the web server
is handling news events and everyone wants to find out about
some big event at the same time). 

In this case we have some large but unknown number of people n,
and some tiny but unknown probability p that each person will make
a request in any 1 second period of time, so the probability
that i people make a request in a 1 second period is binomial:
   P(i people out of n make a request) = C(n,i) * p^i * (1-p)^(n-i)
But we don't know n or p, so what do we do? Since there are
100K requests/day, there are an average of 
   100K/(24*3600) = 1.2 = lambda 
requests per second. In other words, we can directly measure
the expectation E(f) of the binomial random variable 
  f = number of people making a request in a 1 second period
That is lambda = E(f) = n*p. So we can measure the product n*p even
though we don't know n or p individually. Thus
  P(i people out of n make a request) 
      = C(n,i) * (lambda/n)^i * (1-lambda/n)^(n-i)
This is the expression we used to introduce the Poisson distribution;
for n large (and the number of people is certainly large),
this gets close to
  P(i people out of n make a request) ~ exp(-lambda) * lambda^i / i!
For lambda = 1.2, we get the following table:
   i  P(i)  Sum
   0  .301  .301
   1  .361  .662
   2  .217  .879
   3  .087  .966
   4  .026  .992
   5  .006  .999
So if you have 5 servers, there is a 99.9% chance
that you will be able to deal with simultaneous
1-second requests without a conflict.