CS 70 - Lecture 35 - Apr 18, 2011 - 10 Evans Goals for today: Continuous probability (Please read Note 19) So far all our examples of sample spaces S have had a discrete set of objects assigned probabilities: Flip a coin (2 members of S: H and T) Roll a die (6 members of S: 1 through 6) Send a signal of n bits (2^n members of S) Ask a random person their salary (at most a few billion members of S, i.e. the number of different people you could ask) Keep flipping a coin until you get a Head (this is an example of a geometric distribution: there are infinitely many members of S, but it is still "discrete") Now it is time to consider examples where S is not discrete: Ex 1: Spin a "wheel of fortune", i.e. a sharp arrow spinning on an axis, so its tip traces out a circle of circumference L. When it stops spinning, the tip is equally likely to be pointing at any point along the circle. In other words the probability space S is a circle of circumference L. What is the probability that the tip points at a point in the right half of the circle when it stops (i.e. between 12 and 6 o'clock, if the circle were a clock)? At a point in the upper right or lower left (i.e. either between 12 and 3 o'clock or between 6 and 9 o'clock)? At the point exactly at 12 o'clock? Ex 2: You throw a dart at a square target of sides equal to 1, so that it is equally likely to land anywhere in the square. In other words the probability space S is a square. What is the probability that the dart land in a circle of diameter 1 centered in the square? Ex 3: (Buffon's Needle Problem): you drop a needle of length L so it lands anywhere at random on a board covered with parallel lines a distance L apart. What is the probability that the needle intersects one of these lines? Ex 4: You have a geiger counter measuring radiation from a leaky nuclear plant. Every time a gamma ray hits it, it emits a click. Let T be the time between consecutive clicks. What is the probability that T is less than 10 seconds? 1 second? A millisecond? Let's consider Example 1: Could we treat the (infinitely many) points on the circle the same way as we have done so far, assigning each a positive probability? Since each point is equally likely, (a uniform probability distribution) we'd have to assign them all the same number. But then how would we compute the probability of landing between 12 and 6 o'clock? There are infinitely many points in this interval (or any interval of positive length). So if they all have positive probability, adding them all up would yield infinity. And if they had zero probability, adding them up would yield 0. We need a different approach. Let us think of the circumference of the circle as the interval of numbers from 0 to L, that is [0,L]. To be careful, the points 0 and L are the same, since it is a circle, but that won't change any answers we compute. Then it is natural to say that the chance that the arrow stops at a point in an interval [a,b] lying inside [0,L] is P([a,b]) = (b-a)/L This has the natural properties that P([0,L]) = L/L = 1 ... probability that you land somewhere is 1 If [a1,b1] is shorter than [a,b], so b1-a1 < b-a, then P([a1,b1]) < P([a,b]), i.e. it is less likely to land in the shorter segment P([0,L/2]) = P([L/2,L]) = 1/2 ... probability that you land in left half ... = probability that you land in right half = 1/2 One important property of probability theory that we have used many times so far is that if A and B are disjoint events, then P(A union B) = P(A) + P(B) In our example, suppose A = [a1,a2] and B=[b1,b2] are disjoint subintervals of [0,L]. Then it is natural to also insist that P(A U B} = (a2-a1)/L + (b2-b1)/L = length(A U B)/L More generally, given any finite union of intervals C, its length is well defined, and we say P(C) = length(C)/L We would use the same definition if C were a union of infinitely many intervals; how to do this carefully is a subject called "measure theory", and beyond the scope of this course. Suffice it to say that we need to exclude some weird sets whose "length" is hard to define. Ex 1a: If you spin the wheel of fortune, what is the probability that when the arrow stops, it is between 8 and 9 o'clock, or between 1 and 1:30? Answer: There are 12*60=720 minutes in the whole circle, and the length of the 2 intervals is 90 minutes, so the probability is 90/720 = 1/8. Ex 2: We pick a "random point" in the unit square, i.e. each point is equally likely (a uniform distribution) and ask for the probability that it falls in the unit circle. So how do we define a "random point" (x,y) in the unit square? We can get such a point by spinning the wheel of fortune, with L=1, to get a random x lying in the interval [0,1], and then spin it again to get y. Thus x and y are independent for the same reason that flipping a coin twice gives you independent results. What is the probability that (x,y) falls in the left half of the square? This is the same as P(x in [0, 1/2])) = 1/2 = area of left half of (unit) square What is probability that (x,y) falls in the upper half of the square? This is the same as P(y in [1/2,1])) = 1/2 = area of top half of (unit) square What is probability that (x,y) falls in the upper left quarter of the square? This is the same as P(x in [0,1/2] and y in [1/2,1])) = P(x in [0,1/2]) * P(y in [1/2,1])) ... by independence = 1/2 * 1/2 = 1/4 = area of upper left quarter of the square More generally, we see that for any rectangle A, P(A) = area(A), and by the same rule as above P(A U B) = P(A) + P(B) if A and B are disjoint we see that P(A) = area(A) for any union of rectangles. Since (many!) regions can be approximated as closely as you like by a union of disjoint rectangles, it is natural to define P(A) = area(A)/area(unit square) = area(A) Again, care is needed to define this carefully (outside the scope of CS70). Finally, we can answer the question: What is the chance that the randomly thrown dart lands in the unit circle? P(circle) = area(circle) = pi/4 ~ .785. Ex 3: What is the chance that the randomly dropped needle of length L intersects one of the parallel lines drawn a distance L apart? Now the sample space S is defined by two parameters: y = distance of the middle of the needle to the closest parallel line, so 0 <= y <= L/2 theta = angle of needle with respect to a line perpendicular to all the parallel lines, so -pi/2 <= theta <= pi/2 If we throw the needle randomly, both y and theta will be uniformly distributed in their intervals, so we are picking a random point (y,theta) in the rectangle [0,L/2] x [-pi/2, pi/2] For some points (y,theta) the needle will intersect the closest line, and for others it will not. The probability is the area of the set of points where it intersects, divided by the total area (L/2)*pi. By elementary trigonometry the needle intersects the closest line exactly when y <= (L/2)*cos(theta), so the desired area is integral_{-pi/2 to pi/2) (L/2)*cos(theta) d theta = (L/2)*sin(theta) |_{-pi/2}^{pi/2} = (L/2)*sin(pi/2) - (L/2)*sin(-pi/2) = L The total area of the rectangle is (L/2)*pi, so the desired probability of intersection is L/((L/2)*pi) = 2/pi ~ .637 This gives us a (very slow) algorithm to estimate pi: Drop n needles on the board, one at a time, where n is very large Count the number m that intersect a line Estimate 2*n/m ~ pi Continuous random variables The next idea we need to generalize from previous chapters is that of a random variable. These arose naturally in the previous examples: Ex 1: Spinning wheel of fortune: X = final position of arrow on circle Ex 2: Throwing dart at a square: (X,Y) = coordinates of of randomly thrown dart Ex 3: Buffon Needed problem: (Y, Theta) = (distance of center of needle from line, angle of needle) All these are example of uniform random variables, where each point is equally likely. This is not always the case: Ex 1: if the axis on which the arrow turns is "sticky" on one side, it is more likely to stop there Ex 2: If you aim at the center of the square, you are more likely to land close to the center than far away. Ex 4: The time between geiger counter click is likely to be shorter the more radioactive the environment (indeed, this is example what you need to measure!). To deal with these cases, we need to be able to express the fact some some parts of the sample space (eg the interval [0,L] in example 1) may be more likely than others. To do this for the interval [0,L], we need to have a way to assign a probability to any (reasonable) subset of [0,L]. We start with the simplest subset, an interval [a,b]: Def: A probability density function f(x) for a random variable X satisifes P(a < X <= b) = integral_a^b f(x) dx for all a <= b. Ex: Suppose X is uniformly distributed in [0,L] as in Example 1 above. Then we let f(x) = 1/L if a < x <= b, 0 otherwise So if [a,b] is inside [0,L], P(a < X <= b) = integral_a^b 1/L dx = (b-a)/L as expected. And if [a,b] is not inside [0,L], we get the right answer too: P(X < -1) = integral_{-inf}^{-1} 0 dx = 0 P(L/2 < X <= 2*L) = integral_{L/2}^{2*L} f(x) dx = = integral_{L/2}^{L} 1/L dx + integral_{L}^{2*L} 0 dx = 1/2 What properties must a probability density function satisfy? Clearly (1) f(x)>=0. Because if it were negative in some small interval [a,b], however small, we'd have integral_a^b f(x) dx < 0, and we can't have negative probabilities. (2) integral_{-inf}^{inf} f(x) dx = 1. In other words, the random variable must have some value, with probability one. This is the analogue of the requirement that sum_{x in S} P(x) = 1 for a discrete probability function. The point of probability density functions is to permit non-uniform distributions: Ex: f(x) = (2/L^2)*x for 0 <= x <= L, and 0 otherwise. To see that f(x) is a legimate probability distribution, we need to check (1) f(x) is nonnegative (2) integral_{-inf}^{inf} f(x) dx = integral_0^L (2/L^2)*x dx = (1/L^2)*x^2 |_0^L = 1 as desired Now P(X < L/2) = 1/4 is not the same as P(X >= L/2) = 3/4 Note that if L is small, say L = .1, then f(L) = 2/L > 1. So unlike a probability itself, f(x) does not have to be less than or equal to 1, just its integral. Sometimes it is easier to work with the integral of f(x) directly instead of f(x) itself: Def: The Cumulative Distribution Function (CDF) F(x) of random variable X is F(a) = P(X <= a) Knowing the CDF of X is enough to compute the probabilty of any interval we like: Since we can express the interval [-inf,b] as a disjoint union of intervals [-inf,b] = [-inf,a] U (a,b] where a < b, we get P(X <= b) = P(X <= a) + P(a < X <= b) or P(a < X <= b) = P(X <= b) - P(X <= a) = F(b) - F(a) and so we can get the probability of any union of disjoint intervals (by summing). Thus F(a) = integral_{-inf}^a f(x) dx or, by the Fundamental Theorem of Calculus (remember!) dF(x)/dx = f(x) Just as in calculus, we will whichever function is more convenient, the derivative f(x) or the integral F(x). Ex: Suppose we shoot a dart at a circular target of radius L, and the dart can land anywhere with equal probability (uniform distribution). Let X be the distance of the dart from the center of the circle. What is the CDF F(a) = P(X <= a)? What is the probability density function f(x)? Since the dart lands everywhere with equal likelihood, the probability it lands in a particular region is proportinal to the area of the region. Thus F(a) = P(X <= a) = P(dart lands inside circle of radius a) = area(circle of radius a) / area(whole target) = area(circle of radius a) / area(circle of radius L) = pi*a^2 / (pi*L^2) = (a/L)^2 for 0 <= a <= L. Thus f(x) = d/dx (x/L)^2 = 2*x/L^2. The most natural questions to ask about a random variable X are what is its expectation E(X) and variance V(X): Def: If X is a random variable with density function f(x), its expectation is defined as E(X) = integral_{-inf}^inf x*f(x) dx This is entirely analogous to the definition for discrete probability E(X) = sum_r r*P(X=r) and has lots of similar properties: E(X+Y) = E(X) + E(Y) E(X+c) = E(X) + c E(a*X) = a*E(X) Ex: Spin the wheel of fortune, let X = stopping position. Then E(X) = integral_0^L x*(1/L) dx = x^2/(2*L) |_0^L = L/2, as expected. Ex: Throw the dart at the circular target, let X = distance from center. Then E(X) = integral_0^L x*(2*x/L^2) dx = (2/3)*x^3/L^2 |_0^L = (2/3)*L Def: If X is a random variable with density function f(x), its variance is V(X) = E(X^2) - (E(X))^2 = integral_{-inf}^{inf} x^2*f(x) - (E(X))^2 Ex: Spin the wheel of fortune, let X = stopping position. Then V(X) = integral_0^L x^2*(1/L) dx - (L/2)^2 = (1/3 - 1/4)*L^2 = (1/12)*L^2 To see the similarity to picking a random integer Y from 1 to L, we recall E(Y) = sum_{i=1 to L} i*(1/L) = (L+1)/2 V(Y) = sum_{i=1 to L} i^2*(1/L) - ((L+1)/2)^2 = (L^2 - 1)/12