CS 70 - Lecture 26 - Mar 28, 2011 - 10 Evans Goals for today (read Note 15): quick review of random variables and expectation important distributions DEF: Let S be the sample space of a given experiment, with probability function P. A _random variable_ is a function f:S -> Reals. EX 1: Flip a biased coin once, S1 = {H,T}, P1(H) = p, f1(x) = {1 if x=H, -1 if x=T} f1 = amount you win (if f1 = 1) (or lose, if f1 = -1) if you bet $1 on H. EX 2: Flip a biased coin n times, S2 = {all sequences of H, T of length n} P2(x) = p^#H * (1-p)^(n-#H) where #H = #Heads in x, n-#H = #Tails f2(x) = #H - #T = #H - (n-#H) = 2*#H - n f2 = amount you win (or lose) if you bet $1 on H on each flip DEF: Given S, P and random variable f, the _Expected Value_ (also called Mean or Average) of f is E(f) = sum_{all x in S} P(x)*f(x) EX 1: E(f1) = 1*p + (-1)*(1-p) = 2*p-1 EX 2: E(f2) = sum over all 2^n sequences x of n Heads and Tails, of (2*#H-n) * p^#H*(1-p)^(n-#H) prefer something simpler to sum DEF: P(f=r) = sum_{all x in S such that f(x)=r} P(x) DEF: We call the set of pairs of values {(r,P(f=r)) for all r in the range of f} the distribution of f. Thm 1: E(f) = sum_{numbers r in range of f} r*P(f=r) EX 1: P(H) = P(f1=1) = p, P(T) = P(f1=-1) = 1-p so E(f1) = 1*p + (-1)*(1-p) = 2*p-1, same as before EX 2: P(f2 = r) = P(#H - #T = r) = P(2*#H - n = r) = P(#H = (n+r)/2) = { p^((n+r)/2) * (1-p)^((n-r)/2) if (n+r) is even { 0 otherwise, eg when n=1 and r=0 so E(f2) = sum_{r = -n to n by steps of 2} r * p^((n+r)/2) * (1-p)^((n-r))/2) This only has n+1 terms to sum instead of 2^n as before, but we still want something simpler. Thm 2: If g_1(x),...,g_n(x) are random variables, then g(x) = sum_{i=1 to n} g_i(x) is another random variable with expectation E(g) = sum_{i=1 to n} E(g_i) EX 2: Let g_i(x) = {+1 if the i-th toss is Heads {-1 if the i-th toss is Tails = how much you win or lose on the i-th toss so f2(x) = sum_{i=1 to n} g_i(x) = total you win or lose on all n tosses so E(f2) = sum_{i=1 to n} E(g_i) Note that g_i is the same as f_1, how much you win or lose on 1 toss, so E(g_i) = E(f_1) = 2*p-1, and E(f2) = n*(2*p-1) End of review! Recall that the distribution of a random variable is the set {(r,P(f=r)) where r is in the range of f} There are certain important distributions that come up repeatedly, that we need to recognize. The first we have seen already: DEF: if P(f=r) = C(n,r) * p^r * (1-p)^(n-r) for r=0,...,n, then we say f has a binomial distribution, and abbreviate this as f ~ Bin(n,p) Geometric Distribution: EX: Suppose you shoot at a target, and hit with probability p each time you try. What is the expected number of times you have to try before getting a hit? S = { H, MH, MMH, MMMH, .... } P( MM...MH ) = (1-p)^#M * p f( MM...MH ) = #shots = #M + 1 so P(f=r) = P(r = #M+1) = (1-p)^(r-1) * p DEF: We say that f has the geometric distribution with parameter p if f:S-> {1,2,3,...} and P(f=r) = (1-p)^(r-1) * p We abbreviate this by f ~ Geom(p) Check that 1 = sum_{r=1 to infinity} P(f=r) = sum_{r=1 to infinity} (1-p)^(r-1) * p = p * sum_{r=1 to infinity} (1-p)^(r-1) = p * 1/(1-(1-p)) = p / p = 1 - ok! We want E(f) = sum_{r=1}^infinity r*(1-p)^(r-1)*p Start from geometric sum: sum_{r=1}^infinity (1-p)^(r-1) = 1/p so d/dp ( sum_{r=1}^infinity (1-p)^(r-1) ) = d/dp ( 1/p ) or sum_{r=1}^infinity -(r-1)*(1-p)*(r-2) = -1/p^2 or sum_{r=1}^infinity (r-1)*(1-p)^(r-2)*p = 1/p so E(f) = 1/p = 1/P(hit) So if P(hit)=.01, you need to take 1/.01 = 100 shots on average to hit Here is another way to compute E(f), for random variables whose values can be positive integers N+ = {1,2,3...} Thm: Let f:S->N+ be a random variable. Then E(f) = sum_{i=1 to infinity} P(f >= i) Proof: Write P(f=i) as p_i, so E(f) = sum_{i=1 to infinity} i * p_i = p_1 + (p_2 + p_2) + (p_3 + p_3 + p_3) + (p_4 added 4 times) + ... (p_i added i times) ... = (p_1 + p_2 + p_3 + p_4 + ...) +( p_2 + p_3 + p_4 + ...) +( p_3 + p_4 + ...) +( p_4 + ...) + ... = P(f >= 1) + P(f >= 2) + P(f >= 3) + P(f >= 4) + ... Ex: When P(f=r) = (1-p)^(r-1)*p, then P(f>=i) = sum_{r = i to infinity) (1-p)^(r-1)*p = (1-p)^(i-1) * p * sum_{r=0 to infinity} (1-p)^r = (1-p)^(i-1) * p * (1/(1-(1-p))) = (1-p)^(i-1) so E(f) = sum_{i=1 to infinity} (1-p)^(i-1) = 1/(1-(1-p)) = 1/p as before Summarizing we have Thm: If f has a geometric distribution with parameter p, that is P(f=r) = (1-p)^(r-1)*p, then P(f >= i) = (1-p)^(i-1) E(f) = 1/p Beside the number of throws until we hit a target, there are various of other examples: number of runs before a system fails number of retransmissions of a packet before it is successfully sent, etc. Ex: Coupon Collector, revisited We buy cereal boxes, each of which contains one of n random baseball cards. What is the expected number of boxes I have to buy to have at least one of each card? To define the sample space carefully, let the different baseball cards be numbered from 1,...,n, so the sample space is S = {all sequences w of numbers from 1 to n such that each number appears at least once the last number in the sequence appears exactly once} The random variable whose expectation we want is f(w) = length of w which is rather complicated to compute directly. So instead we write f(w) as a sum of other random variables, each of which has a geometric distribution, and just add their expectations. Let f_i(w) = number of additional boxes we need to buy to get the i-th distinct card in w, after getting the (i-1)-st distinct card. Ex: suppose n=3, and w = 111221123. Then f_1(w) = 1 (the first box always contains a card we haven't seen yet) f_2(w) = 3 (since we need 3 more boxes to get the first card besides 1) f_3(w) = 5 (since we need 5 more boxes to get the first card besides 1 and 2) Then clearly f(w) = length of w = f_1(w) + f_2(w) + ... + f_n(w) and so E(f) = sum_{i=1 to n} E(f_i) We claim that each f_i has a geometric distribution: f_1 is trivially geometric with p = probability that the first card is different than previous ones = 1 f_2 is geometric with p = probability that the next card is different from the 1 already gotten = (n-1)/n f_3 is geometric with p = probability that the next card is different from the 2 already gotten = (n-2)/n ... f_i is geometric with p = probability that the next card is different from the i-1 already gotten = (n-(i-1))/n Thus E(f_i) = 1/((n-(i-1))/n) = n/(n-i+1) and E(f) = sum_{i=1 to n} n/(n-i+1) = n * sum_{j=1 to n} 1/j We can approximate this sum by an integral: integer_{1 to n+1} dx/x <= sum_{j=1 to n} 1/j <= 1 + integral_{1 to n} dx/x or ln(n) <= ln(n+1) <= sum_{j=1 to n} 1/j <= 1 + ln(n) In fact, it is known that sum_{j=1 to n} 1/j ~ ln(n) + gamma where gamma = .5772... is called Euler's constant. Finally we get E(f) ~ n*(ln(n) + gamma). Recall that last time we looked at this problem, we showed that if you bought > = n*ln(2*n) boxes, then the probability of having at least one of each card is >= 1/2, so we got a similar result. Poisson Distribution: Suppose we throw n balls into n/lambda bins, where lambda is a constant. We want to know the probability that i balls land in a particular bin, when n is very large. The exact distribution is binomial, with the probability of landing in bin 1 (say) equal to 1/(n/lambda) = lambda/n, so P(i balls land in bin 1) = C(n,i) * (lambda/n)^i * (1- lambda/n)^(n-i) The problem is that all we know about n is that it is very large; what do we do? It is easiest to start with P(0 balls land in bin 1) = C(n,0) * (1 - lambda/n)^n = (1 - lambda/n)^((n/lambda)*lambda) As n grows toward infinity, x = lambda/n shrinks to zero, and we know lim_{x -> 0} (1 - x)^(1/x) = 1/e = 1/2.71828... = .367... Therefore lim_{n -> infinity} P(0 balls land in bin 1) = (1/e)^lambda = exp(-lambda) More generally, we want simple expressions for p_i = lim_{n -> infinity} P(i balls land in bin i) = lim_{n -> infinity} C(n,i) * (lambda/n)^i * (1 - lambda/n)^(n-i) We know p_0 = exp(-lambda), and the easiest way to figure out the rest is to look at the ratios p_i / p_{i-1} = lim_{n -> infinity} [ C(n,i) * (lambda/n)^i * (1 - lambda/n)^(n-i) ] / [ C(n,i-1) * (lambda/n)^(i-1) * (1 - lambda/n)^(n-(i-1)) = lim_{n -> infinity} [ C(n,i) / C(n,i-1) ] * (lambda/n) * (1 - lambda/n)^(-1) ] = lim_{n -> infinity} [ (n!/(i!(n-i)!) / (n!/(i-1)!(n-i+1)! ] * (lambda/n) * (1 - lambda/n)^(-1) ] = lim_{n -> infinity} [ (n-i+1)/i ] * (lambda/n) * (1 - lambda/n)^(-1) ] = (lambda/i) * lim_{n -> infinity} [ (n-i+1)/n ] * (1 - lambda/n)^(-1) ] = (lambda/i) or p_i = (lambda/i)*p_{i-1} This yields a simple recurrence: p_0 = exp(-lambda) p_1 = exp(-lambda)*lambda p_2 = exp(-lambda)*lambda^2/2 p_3 = exp(-lambda)*lambda^3/(3*2) p_4 = exp(-lambda)*lambda^4/(4*3*2) ... p_i = exp(-lambda)*lambda^i/i! Let's check that 1 = sum_{i=0 to infinity} p_i = sum_{i=0 to infinity} exp(-lambda) * lambda^i/i! = exp(-lambda) * sum_{i=0 to infinity} lambda^i/i! = exp(-lambda) * exp(lambda) = 1 as desired. DEF: A random variable f for which P(f=i) = exp(-lambda)*lambda^i/i! for i=0,1,2,... is said to have a Poisson distribution with parameter lambda, or more briefly f ~ Poiss(lambda) If f ~ Poiss(lambda), we compute its expectation as follows: E(f) = sum_{i=0 to infinity} i * exp(-lambda)* lambda^i/i! = exp(-lambda) * sum_{i=1 to infinity} * lambda^i/(i-1)! = lambda * exp(-lambda) * sum_{i=1 to infinity} * lambda^(i-1)/(i-1)! = lambda * exp(-lambda) * exp(lambda) = lambda If you plot P(f=i) versus i, it increases until i ~ lambda, and then decreases. The Poisson distribution is useful for modeling "rare events", as the following example shows. Ex: Suppose a web server gets an average of 100K requests/day, You want to make sure to have enough computers to handle most bursts in activity. One computer takes 1 second to handle one request. How many computers do you need to handle bursts in requests? The "rare event" here is not a request arriving at the web server, but one person choosing to make a request, which we assume is done independently by each person (maybe not true if the web server is handling news events and everyone wants to find out about some big event at the same time). In this case we have some large but unknown number of people n, and some tiny but unknown probability p that each person will make a request in any 1 second period of time, so the probability that i people make a request in a 1 second period is binomial: P(i people out of n make a request) = C(n,i) * p^i * (1-p)^(n-i) But we don't know n or p, so what do we do? Since there are 100K requests/day, there are an average of 100K/(24*3600) = 1.2 = lambda requests per second. In other words, we can directly measure the expectation E(f) of the binomial random variable f = number of people making a request in a 1 second period That is lambda = E(f) = n*p. So we can measure the product n*p even though we don't know n or p individually. Thus P(i people out of n make a request) = C(n,i) * (lambda/n)^i * (1-lambda/n)^(n-i) This is the expression we used to introduce the Poisson distribution; for n large (and the number of people is certainly large), this gets close to P(i people out of n make a request) ~ exp(-lambda) * lambda^i / i! For lambda = 1.2, we get the following table: i P(i) Sum 0 .301 .301 1 .361 .662 2 .217 .879 3 .087 .966 4 .026 .992 5 .006 .999 So if you have 5 servers, there is a 99.9% chance that you will be able to deal with simultaneous 1-second requests without a conflict.