CS 70 - Lecture 35 - Apr 18, 2011 - 10 Evans

Goals for today:  Continuous probability
                  (Please read Note 19)

So far all our examples of sample spaces S have had a discrete set
of objects assigned probabilities:
   Flip a coin (2 members of S: H and T)
   Roll a die  (6 members of S: 1 through 6)
   Send a signal of n bits (2^n members of S)
   Ask a random person their salary (at most a few billion members of S, 
      i.e. the number of different people you could ask)
   Keep flipping a coin until you get a Head
      (this is an example of a geometric distribution:
       there are infinitely many members of S, but it is still "discrete")
   
Now it is time to consider examples where S is not discrete:

Ex 1: Spin a "wheel of fortune", i.e. a sharp arrow spinning on an
axis, so its tip traces out a circle of circumference L. When it
stops spinning, the tip is equally likely to be pointing at any
point along the circle. In other words the probability space
S is a circle of circumference L.
What is the probability that the tip points at a point in the
right half of the circle when it stops (i.e. between 12 and 6 o'clock,
if the circle were a clock)?
At a point in the upper right or lower left (i.e. either
between 12 and 3 o'clock or between 6 and 9 o'clock)?
At the point exactly at 12 o'clock?

Ex 2: You throw a dart at a square target of sides equal to 1,
so that it is equally likely to land anywhere in the square.
In other words the probability space S is a square.
What is the probability that the dart land in a circle
of diameter 1 centered in the square?

Ex 3: (Buffon's Needle Problem): you drop a needle of length L
so it lands anywhere at random on a board covered with
parallel lines a distance L apart. What is the probability
that the needle intersects one of these lines?

Ex 4: You have a geiger counter measuring radiation from
a leaky nuclear plant. Every time a gamma ray hits it, it emits
a click. Let T be the time between consecutive clicks.
What is the probability that T is less than 10 seconds? 1 second? 
A millisecond?

Let's consider Example 1: Could we treat the (infinitely many)
points on the circle the same way as we have done so far, assigning
each a positive probability? Since each point is equally likely,
(a uniform probability distribution)
we'd have to assign them all the same number. But then how would
we compute the probability of landing between 12 and 6 o'clock?
There are infinitely many points in this interval (or any
interval of positive length). So if they all have positive
probability, adding them all up would yield infinity.
And if they had zero probability, adding them up would yield 0.
We need a different approach.

Let us think of the circumference of the circle as the interval
of numbers from 0 to L, that is [0,L]. To be careful, the points
0 and L are the same, since it is a circle, but that won't change
any answers we compute. Then it is natural to say that the
chance that the arrow stops at a point in an interval [a,b]
lying inside [0,L] is
   P([a,b]) =  (b-a)/L
This has the natural properties that 
   P([0,L]) = L/L = 1 ... probability that you land somewhere is 1
   If [a1,b1] is shorter than [a,b], so b1-a1 < b-a, then
      P([a1,b1]) < P([a,b]), i.e. it is less likely to land in the 
      shorter segment
   P([0,L/2]) = P([L/2,L]) = 1/2  
      ... probability that you land in left half 
      ...  = probability that you land in right half = 1/2
     
One important property of probability theory that we have used
many times so far is that if A and B are disjoint events, then
   P(A union B) = P(A) + P(B)
In our example, suppose A = [a1,a2] and B=[b1,b2] are disjoint
subintervals of [0,L]. Then it is natural to also insist that
   P(A U B} = (a2-a1)/L + (b2-b1)/L = length(A U B)/L
More generally, given any finite union of intervals C, its length
is well defined, and we say
   P(C) = length(C)/L
We would use the same definition if C were a union of infinitely
many intervals; how to do this carefully is a subject called
"measure theory", and beyond the scope of this course. Suffice it
to say that we need to exclude some weird sets whose "length"
is hard to define.

Ex 1a: If you spin the wheel of fortune, what is the probability that
when the arrow stops, it is between 8 and 9 o'clock, or between
1 and 1:30?
Answer: There are 12*60=720 minutes in the whole circle,
and the length of the 2 intervals is 90 minutes, 
so the probability is 90/720 = 1/8.

Ex 2: We pick a "random point" in the unit square, i.e. each
point is equally likely (a uniform distribution) and ask for the
probability that it falls in the unit circle. So how do we define
a "random point" (x,y) in the unit square? We can get such a point
by spinning the wheel of fortune, with L=1, to get a random x
lying in the interval [0,1], and then spin it again to get y.
Thus x and y are independent for the same reason that flipping
a coin twice gives you independent results.

What is the probability that (x,y) falls in the left half of the square? 
This is the same as 
  P(x in [0, 1/2])) = 1/2 = area of left half of (unit) square

What is probability that (x,y) falls in the upper half of the square?
This is the same as 
  P(y in [1/2,1])) = 1/2 = area of top half of (unit) square

What is probability that (x,y) falls in the upper left quarter of the square?
This is the same as 
  P(x in [0,1/2] and y in [1/2,1])) 
     = P(x in [0,1/2]) * P(y in [1/2,1])) ... by independence
     = 1/2 * 1/2 = 1/4 = area of upper left quarter of the square

More generally, we see that for any rectangle A, P(A) = area(A),
and by the same rule as above
   P(A U B) = P(A) + P(B) if A and B are disjoint
we see that P(A) = area(A) for any union of rectangles.
Since (many!) regions can be approximated as closely as you like
by a union of disjoint rectangles, it is natural to define
   P(A) = area(A)/area(unit square) = area(A)
Again, care is needed to define this carefully (outside the scope of CS70).

Finally, we can answer the question: What is the chance that the randomly
thrown dart lands in the unit circle? P(circle) = area(circle) = pi/4 ~ .785.

Ex 3: What is the chance that the randomly dropped needle of length L
intersects one of the parallel lines drawn a distance L apart?
Now the sample space S is defined by two parameters: 
  y = distance of the middle of the needle to the closest parallel line,
      so 0 <= y <= L/2
  theta = angle of needle with respect to a line perpendicular to
          all the parallel lines, so -pi/2 <= theta <= pi/2
If we throw the needle randomly, both y and theta will be uniformly
distributed in their intervals, so we are picking a random point (y,theta)
in the rectangle [0,L/2] x [-pi/2, pi/2]
For some points (y,theta) the needle will intersect the closest line,
and for others it will not. The probability is the area of the
set of points where it intersects, divided by the total area (L/2)*pi.
By elementary trigonometry the needle intersects the closest line
exactly when y <= (L/2)*cos(theta), so the desired area is
  integral_{-pi/2 to pi/2) (L/2)*cos(theta) d theta
    = (L/2)*sin(theta) |_{-pi/2}^{pi/2}
    = (L/2)*sin(pi/2) - (L/2)*sin(-pi/2)
    = L
The total area of the rectangle is (L/2)*pi, so the desired probability
of intersection is L/((L/2)*pi) = 2/pi ~ .637

This gives us a (very slow) algorithm to estimate pi:
   Drop n needles on the board, one at a time, where n is very large
   Count the number m that intersect a line
   Estimate 2*n/m ~ pi

Continuous random variables 

The next idea we need to generalize from previous chapters is
that of a random variable. These arose naturally in the previous examples:

Ex 1: Spinning wheel of fortune: X = final position of arrow on circle

Ex 2: Throwing dart at a square: (X,Y) = coordinates of of randomly thrown dart

Ex 3: Buffon Needed problem: (Y, Theta) = (distance of center of needle from 
      line, angle of needle)

All these are example of uniform random variables, where each point is equally
likely. This is not always the case:

Ex 1: if the axis on which the arrow turns is "sticky" on one side, it
is more likely to stop there

Ex 2: If you aim at the center of the square, you are more likely to land
close to the center than far away.

Ex 4: The time between geiger counter click is likely to be shorter the
more radioactive the environment (indeed, this is example what you
need to measure!).

To deal with these cases, we need to be able to express the
fact some some parts of the sample space (eg the interval [0,L]
in example 1) may be more likely than others. To do this
for the interval [0,L], we need to have a way to assign a probability
to any (reasonable) subset of [0,L]. We start with the simplest subset,
an interval [a,b]:

Def: A probability density function f(x) for a random variable X
satisifes  P(a < X <= b) = integral_a^b f(x) dx for all a <= b.

Ex: Suppose X is uniformly distributed in [0,L] as in Example 1 above.
Then we let f(x) = 1/L if  a < x <= b,  0 otherwise
So if [a,b] is inside [0,L], 
    P(a < X <= b) = integral_a^b 1/L dx = (b-a)/L
as expected. And if [a,b] is not inside [0,L], we get the right answer too:
    P(X < -1) = integral_{-inf}^{-1} 0 dx = 0
    P(L/2 < X <= 2*L) = integral_{L/2}^{2*L} f(x) dx = 
                      = integral_{L/2}^{L} 1/L dx +
                        integral_{L}^{2*L} 0 dx 
                      = 1/2

What properties must a probability density function satisfy? Clearly
(1) f(x)>=0. Because if it were negative in some small interval [a,b],
however small, we'd have integral_a^b f(x) dx < 0, and we can't have
negative probabilities.
(2) integral_{-inf}^{inf} f(x) dx = 1. In other words, the random
variable must have some value, with probability one. This is the
analogue of the requirement that sum_{x in S} P(x) = 1 for a discrete
probability function.

The point of probability density functions is to permit non-uniform
distributions:

Ex: f(x) = (2/L^2)*x for 0 <= x <= L, and 0 otherwise. To see that
f(x) is a legimate probability distribution, we need to check
(1) f(x) is nonnegative
(2) integral_{-inf}^{inf} f(x) dx = integral_0^L (2/L^2)*x dx
        = (1/L^2)*x^2 |_0^L = 1 as desired
Now P(X < L/2) = 1/4 is not the same as P(X >= L/2) = 3/4

Note that if L is small, say L = .1, then f(L) = 2/L > 1.
So unlike a probability itself, f(x) does not have to be less than
or equal to 1, just its integral.

Sometimes it is easier to work with the integral of f(x) directly
instead of f(x) itself:

Def: The Cumulative Distribution Function (CDF) F(x) of random variable X
is F(a) = P(X <= a)

Knowing the CDF of X is enough to compute the probabilty of any interval we like:
Since we can express the interval [-inf,b] as a disjoint union of intervals
   [-inf,b] = [-inf,a] U (a,b]
where a < b, we get 
   P(X <= b) = P(X <= a) + P(a < X <= b)
or
   P(a < X <= b) = P(X <= b) - P(X <= a) = F(b) - F(a)
and so we can get the probability of any union of disjoint intervals (by summing).

Thus  
   F(a) = integral_{-inf}^a f(x) dx
or, by the Fundamental Theorem of Calculus (remember!)
   dF(x)/dx = f(x)

Just as in calculus, we will whichever function is more
convenient, the derivative f(x) or the integral F(x).

Ex: Suppose we shoot a dart at a circular target of radius L, and the
dart can land anywhere with equal probability (uniform
distribution). Let X be the distance of the dart from the
center of the circle. What is the CDF F(a) = P(X <= a)?
What is the probability density function f(x)?

Since the dart lands everywhere with equal likelihood, the
probability it lands in a particular region is proportinal
to the area of the region. Thus
   F(a) = P(X <= a) 
        = P(dart lands inside circle of radius a)
        = area(circle of radius a) / area(whole target)
        = area(circle of radius a) / area(circle of radius L)
        = pi*a^2 / (pi*L^2) = (a/L)^2
for 0 <= a <= L.
Thus f(x) = d/dx (x/L)^2 = 2*x/L^2.

The most natural questions to ask about a random variable X are
what is its expectation E(X) and variance V(X):

Def: If X is a random variable with density function f(x), its
expectation is defined as 
    E(X) = integral_{-inf}^inf x*f(x) dx

This is entirely analogous to the definition for discrete probability
    E(X) = sum_r  r*P(X=r)
and has lots of similar properties:
    E(X+Y) = E(X) + E(Y)
    E(X+c) = E(X) + c
    E(a*X) = a*E(X)

Ex: Spin the wheel of fortune, let X = stopping position. Then
     E(X) = integral_0^L x*(1/L) dx = x^2/(2*L) |_0^L = L/2, 
as expected.

Ex: Throw the dart at the circular target, let X = distance from center. Then
     E(X) = integral_0^L x*(2*x/L^2) dx = (2/3)*x^3/L^2 |_0^L = (2/3)*L

Def: If X is a random variable with density function f(x), its
variance is V(X) = E(X^2) - (E(X))^2 
                 = integral_{-inf}^{inf} x^2*f(x) - (E(X))^2

Ex: Spin the wheel of fortune, let X = stopping position. Then
     V(X) = integral_0^L x^2*(1/L) dx - (L/2)^2 = (1/3 - 1/4)*L^2 = (1/12)*L^2

To see the similarity to picking a random integer Y from 1 to L, we recall
     E(Y) = sum_{i=1 to L} i*(1/L) = (L+1)/2
     V(Y) = sum_{i=1 to L} i^2*(1/L) - ((L+1)/2)^2 = (L^2 - 1)/12