Math 55 - Fall 2007 - Lecture notes # 26 - October 29 (Monday)

   Begin reading Chapter 6 of Rosen
   Read sections 1-5 of Lenstra's notes (see class web page for pointer)
   Additional reading: CS70 notes, 
                       inst.eecs.berkeley.edu/~cs70/sp07
                       starting with second half of Lecture 16

Goals for today:        Introduce discrete probability theory

Long term goal: we would like to make sense of statements like:

      1) "The chance of getting a "flush" (all cards the same suit)
          in 5-card poker is about 2 in 1000."
      2) "If you flip a fair coin 50 times, and each time it comes out
          heads, then the chance you get a head the 51st time is still 50%."
      3) "If quicksort picks a random "pivot item" at each step, then it will
          sort n numbers, in O(n log n) time with high probability".
      4) "With this algorithm for balancing the workload among computers
          the probability (or chance) that a user has to wait more than
          1 minute is 2%."
      5) "There is a 60% chance of the Big One (large earthquake) hitting
          Northern California in the next 30 years."
      6)  In an election, if B's reported vote total is 2,912,790,
          G's reported vote total is 2,912,253, i.e. a difference of
          537 votes, and the chance of a vote having been mistakenly 
          counted wrong is 1 out of 1000, so that about 2900 votes may
          have been counted wrong, what is the probability that G actually 
          won the election?
          (recall an election in Florida in 2000).

To understand these examples and others, we must specify a 
  1) an experiment, whose outcome is "random"
  2) the set of possible outcomes (the "sample space")
  3) probability of each possible outcome

EX 1): The experiment is shuffling a deck of cards and dealing  5 of them
       The set of outcomes is all possible 5 card subsets of 52 cards
ASK&WAIT: How many ways are there?
       The probability of each outcome is equal (assuming we shuffle well)
ASK&WAIT: What is the probability of any particular shuffle?

EX 2): The experiment is flipping a fair coin 51 times
       The set of outcomes is all possible sequences is of 51 H's and T's
ASK&WAIT: How many outcomes are there?
       The probability of each outcome is equal (assuming the coin is fair)
ASK&WAIT: What is the probability of any particular sequence?

EX 3 through 5 are tricker, especially 5) (ask your local geologist!)
We'll return to EX 6 later.
      
DEF: A _sample space_ is a finite (or countable) set S
together with a function (called probability function, or just probability)
	P: S --> [0,1]
such that
	sum_{x in S} P(x) = 1.

S is the set of all possible outcomes of the experiment,
with P(x) equal to the probability that the outcome is x.  

The most notable case is when S is finite and P(x) has the same value
for all x in S, i.e. when all events are equally likely.  In this case,
we say S has a "uniform probability distribution."  
What is P(x) equal to in this case?  We have
	1 = sum_{x in S} P(x) = P(x) |S|
for all x in S, so, in the uniform distribution case, we have
	P(x) = 1/|S|
for all x in S.  

EX 1 and 2 above: |S| = C(52,5) or |S| = 2^51

DEF: An _event_ E is a subset of the sample space S, and the _probability_
P(E) of an event E is given by
	P(E) = sum_{x in E} P(x).
Note: the empty set has probability 0.  The whole sample space S has
probability 1.

In the case of a uniform distribution, we have
	P(E) = sum_{x in E} P(x)
	     = sum_{x in E} 1/|S|
	     = |E|/|S|.

EX: One toss of a fair coin.
    S = {H,T} P(H) = 1/2, P(T) = 1/2.

EX: 3 tosses of a fair coin
    S = {HHH, HHT, ..., TTT}, P(any particular outcome) = 1/8
    E = {2 heads and a tail}

ASK&WAIT: What is P(E)?

EX: 3 tosses of a biased coin. This means P(H) not equal to P(T), i.e. not 1/2
    Suppose P(H) = 1/3

ASK&WAIT: What is P(T)?

ASK&WAIT: What is P(HHH)? P(HTH)?

ASK&WAIT: What is P(E), E as above?

EX: A roll of a die.
    S = {1,2,3,4,5,6}.
    P(x) = 1/6 for all x in S.
    Let E = "the roll of the die is odd" = {1,3,5} 
ASK&WAIT: What is P(E)?

EX: A roll of two dice, one red and one blue.
    S = {1,2,3,4,5,6}x{1,2,3,4,5,6}, 
    i.e. all pairs S = {(i,j), 1 <= i <= 6, 1 <= j <= 6 }
    P(x) = 1/36 for all x in S, since |S| = 36

ASK&WAIT: What is P(E), E = "the first die is a 6"?

ASK&WAIT: What is P(E), E = "at least one die is a 6"?

ASK&WAIT: What is P(E), E = "the dice sum to 7"

ASK&WAIT: What is P(E), E = "the dice sum to 10"

EX: A roll of two indistinguishable dice (eg both blue)
    indistinguishable means that, say (1,6) and (6,1) no longer different
    S = {(1,1),(1,2),...,(1,6),(2,2),(2,3),...,(2,6),(3,3),...,(3,6),(4,4),...,(6,6)}
    S = {(i,j), 1 <= i <= j <= 6}
ASK&WAIT: What is P(i,i)? What is P(i,j) for i not = j?
ASK&WAIT: What is P(E), E = "dice sum to 10"

EX: A single poker hand, gotten by shuffling a deck of 52 card and taking 5.
    S has C(52,5) = 2,598,960 elements, which I will not list here.
    P(x) = 1/2,598,960 for all x in S.

ASK&WAIT: What is P(E), E = "royal flush" = "A,K,Q,J,10" of the same suit?

ASK&WAIT: What is P(E), E = "the hand has four of a kind" 

ASK&WAIT: What is P(E), E = "the hand contains a full house" 

ASK&WAIT: What is P(E), E = "the hand contain a flush"

EX: Balls and bins. Suppose you take 20 distinguishable balls (tasks)
and throw them into 10 distinguishable bins (computers) so that
each ball has an equal chance of landing in each bin. (This is
a common way of distributing work among multiple computers, eg
web requests coming into a company)

ASK&WAIT: What is S? |S|? P(any particular outcome)?

What is P(E), E = {each bin has at most 4 balls} 
                = {each computer has at most 4 requests}?
We will eventually answer this...

ASK&WAIT: What is P({each bin has exactly 2 balls})? 
          i.e. that the balls are perfectly evenly distributed?

EX: Strategy for a TV game show, where you have to pick one of 3 doors, 
and you win whatever is behind the door.
First a prize is placed behind one of three doors, each with equal 
probability.  You are then allowed first to choose one door.  
Then, one of the other two doors is revealed (behind which, of course, 
no prize appears).  
Finally, you are allowed the option of switching to another door.  
You will win whatever is behind the door you select.
Should you switch to the third door, stay where you are,
or does it not matter?  

Answer: you should switch, always!  Why?
Let's figure out the sample space describing the situation
up to the moment you have to choose whether to switch:
   S = {(i,j,k) where 
          i=1,2 or 3  indicates the door where the prize is,
          j=1,2 or 3  indicates the door you originally choose, and
          k=1,2 or 3  indicates the door opened on the show}
    So i and j and take any values from {1,2,3} with equal probability.
    But k is restricted: if i not = j, then k must be chosen not
    to equal either i or j, so its value is determined. But if i=j,
    then k can equal either of the other 2 values with equal probability.

So here is the sample space with probabilities shown below
each outcome in parentheses. 
For example i=2,j=1 means k=3, and has probability (1/3)*(1/3)=1/9.
For example i=2,j=2 means k=1 or 3, and i=2,j=2,k=3 has 
   probability (1/3)*(1/3)*(1/2)=1/18

               i=1              i=2             i=3
          ---------------  ---------------  --------------
j=1       k=2   or k=3          k=3             k=2
          (1/18)   (1/18)       (1/9)           (1/9)
j=2            k=3         k=1   or k=3         k=1
              (1/9)        (1/18)   (1/18)      (1/9)
j=3            k=2              k=1         k=1  or k=2
              (1/9)            (1/9)        (1/18)  (1/18)

Now suppose that your strategy is not to switch doors; 
what is the probably of the event E = {you win!}?
ASK&WAIT? Can you indicate which parts of the sample space is in E? What is P(E)?
Now suppose that your strategy is to switch doors; 
what is the probably of the event E = {you win!}?
ASK&WAIT? Can you indicate which parts of the sample space is in E? What is P(E)?
ASK&WAIT: What is the best strategy, switch or not?

Now we go on to techniques that make it easier to compute
the probabilities of certain events.

THEOREM: Let E be an event in a sample space S.  The probability of
the event S-E, the complement of E in S, is given by 1-P(E).

PROOF: 1 = sum_{x in S} P(x)
	 = sum_{x in E} P(x) + sum_{x in S-E} P(x)
	 = P(E) + P(S-E).
So P(S-E) = 1 - P(E).

THEOREM: Let E and F be events in a sample space S.  Then
	P(E union F) = P(E) + P(F) - P(E intersect F).

PROOF: Similar to the proof of inclusion-exclusion, which may
be stated as
	sum_{x in E union F} 1 = sum_{x in E} 1 + sum_{x in F} 1
					- sum_{x in E intersect F} 1.
Just replace 1 in the sums above by P(x).

EX: What is the probability that a randomly chosen integer between 1 and 100
    is divisible by 5 or 7?

THEOREM: Let E1, E2,... En be pairwise disjoint event in a sample space S.
         Then P(E1 U E2 U ... U En ) = P(E1) + P(E2) + ... + P(En)
ASK&WAIT: What is the proof?

Goals of next section:  Continue discrete probability theory
                        Conditional probability
                        Independence
                        Bernoulli trials

Now we start discussing conditional probability.
Here is an example that we would like to understand:

  A pharmaceutical company is marketing a new test for a certain
  medical condition. According to clinical trials, the test has
  the following properties:
    1. When applied to an affected person, the test comes up 
       positive in 90% of cases, and negative in 10% ("False negatives")
    2. When applied to a healthy person, the test comes up
       negative in 80% of cases and positive in 20% ("False positives")
  Suppose that 5% of the US population has the condition. 
  In other words, a random person has a 5% chance of being affected.
  When a random person is tested and comes up positive, what is the
  probability that the person actually has the condition?

This is an example of conditional probability: what is the 
probability of event A (person is affected) given that we know
event B occurs (the person tests positive). We write this P(A|B),
the probability of A given B.

Def: P(A|B) = P(A inter B)/P(B)

Justification: Let S be the original sample space, and P() the
original probability function on S.  Since we know B occurs, 
we have a new sample space, namely B subset S. What is the
new probability function? If x in B, then P(x|B) must satisfy
  1 = sum_{x in B} P(x|B), so
the obvious choice is P(x|B) = P(x)/P(B).
So if A subset B is any event in the new sample space B,
then P(A|B) = sum_{x in A} P(x|B) = sum_{x in A} P(x)/P(B)
            = P(A)/P(B)
What if A is not a subset of B? If x in A but x not in B,
then clearly P(x|B) = 0; if B occurs then x cannot occur.
Thus we finally get P(A|B) = P(A inter B)/P(B).

Let N = US population.
Returning to medical testing, the population consists of 4 groups:
  1) TP (true positives)  |TP|=90% of  5% of N = ( 9/200)*N, P(TP)=9/200
  2) FP (false positives) |FP|=20% of 95% of N = (19/100)*N, P(FP)=19/100
  3) TN (true negatives)  |TN|=80% of 95% of N = (76/100)*N, P(TN)=76/100
  4) FN (false negatives) |FN|=10% of  5% of N = ( 1/200)*N, P(FN)=1/200
Now let A = {person is affected} = TP U FN
        B = {person tests positive} = TP U FP
        A inter B = TP
  and finally P(A|B) = P(TP)/P(TP U FP) 
                     = (9/200)/(9/200 + 19/100) = 9/47 ~ .19
So if a random person tests positive, there is only a 19% chance
that they really have it.

ASK&WAIT: What is P(B|A) = P(person tests positive | person is affected)?

ASK&WAIT: What is P(test correct when given to random person)?

ASK&WAIT: Let a "phony test" simply declare everyone healthy
          what is P(phony test correct when given to a random person)?

Ex: Suppose we toss 3 balls into 3 bins
ASK&WAIT: What is P(first bin empty)?
ASK&WAIT: What is P(second bin empty | first bin empty)?

Ex: Roll two fair dice, what is P(rolling a 6 | sum of dice is 10)?

Ex: Roll two fair coins, what is P(second is head | first is head)

Def: Two events A and B are independent if P(A inter B) = P(A)*P(B)

EX:  flip two coins, A = {HH, TH}, B = {HH, HT}, A inter B = {HH}
     P(A) = 1/2 = P(B), P(A inter B) = 1/4

Prop: If A and B are independent, then P(A|B) = P(A) and P(B|A) = P(B)
Proof: P(A|B) = P(A inter B)/P(B) = P(A)*P(B)/P(B) = P(A)
       P(B|A) = P(A inter B)/P(A) = P(A)*P(B)/P(A) = P(B)

ASK&WAIT: Throw 3 balls into 3 bins, are 
          A = {first bin empty} and B = {second bin empty} independent?
ASK&WAIT: Throw 2 dice, are
          A = {rolling a 6} and B ={sum=10} independent?
ASK&WAIT: Throw 2 dice, are
          A = {sum even}, B = {first die even} independent?

Def: Events A1, A2, ... , An are mutually independent if
     for every i and every subset J of {1,2,...,n} - {i} then
     P(Ai | inter_{j in J} Aj) = Pr(Ai)
     i.e. Ai does not depend on any combination of the other events

Thm: P(B inter A) = P(B)*P(A|B)
Proof: follows from definition of P(A|B)

Thm: P(A1 inter A2 inter ... inter An) = 
     P(A1) * P(A2|A1) * P(A3|A1 inter A2) * P(A4| A1 inter A2 inter A3)
           * ... * P(An | A1 inter A2 inter ... inter An-1 )
Proof: induction on n: 
       Base case: n=1: P(A1)=P(A1)
       Induction step: Assume 
          P(A1 inter ... inter An-1)
                = P(A1) * ... * P(An-1 | A1 inter ... inter An-2)
       Then P(A1 inter ... inter An) 
          = P(A1 inter ... inter An-1) * P(An | A1 inter ... inter An-1)
          = P(A1) * ... * P(An-1 | A1 inter ... inter An-2) *
            P(An | A1 inter ... inter An-1)     (by induction, as desired)

Corollary: Suppose A1, A2, ... , An are mutually independent. Then
           P(A1 inter A2 inter ... inter An) = P(A1)*P(A2)*...*P(An)
  Proof: in above proof, each
           P(Ai | A1 inter ... inter Ai-1) = P(Ai) by mutual independence

EX: Toss a fair coin 3 times. Let A={HHH}, A1={Hxx}, A2={xHx}, A3={xxH}
    A = A1 inter A2 inter A3
    P(A) = P(A1) * P(A2|A1) * P(A3|A1 inter A2)
         = P(A1) * P(A2)    * P(A3)
         = 1/2   *  1/2     * 1/2
         = 1/8 as expected
EX: Toss a biased coin 3 times, with P(H) = p
ASK&WAIT: what is P(A)?

Def: a Bernoulli trial is a (sequence) of (independent, identical)
     experiments, each of which has two outcomes

EX: Suppose we flip a fair coin 100 times. What is P(50 Heads)?
    sample space S = {all sequences of 100 H's and T's}, 
    each with P(x)=1/2^100 because 
        P(HTH...) = P(1st = H)*P(2nd = T)*P(3rd = H)* ... = 1/2^100
    (or because it's a uniform distribution over 2^100 possibilities)
    E = {all sequences with 50 heads, 50 tails}
ASK&WAIT: What is |E|? P(E)?
ASK&WAIT: Let E(i) = {i Heads out of n flips} What is |E(i)|? P(E(i))?
    Note that E(i) and E(j) are disjoint, and 
    S = E(0) U E(1) U ... E(n), so P(S) = P(E(0)) + ... + P(E(n)) = 1
    Check this: sum_{i=0 to n} P(E(i)) = sum_{i=0 to n} C(n,i)/2^n
                   = 2^(-n) * sum_{i=0 to n} C(n,i)
                   = 2^(-n) * (1+1)^n  ... by the Binomial Theorem
                   = 1 as desired 

EX; Now flip a biased coin, with P(H) = p and P(T) = 1-p, 100 times
    The sample space is the same as above. 
    But not all P(x) are the same
ASK&WAIT: What is P(50 Hs followed by 50 Ts)?
ASK&WAIT: What is P(50 Hs and 50 Ts, in some fixed order)?
ASK&WAIT: What is P(50 Hs and 50 Ts, in any order)?
    Now flip a biased coin n times
ASK&WAIT: What is P(i Hs and n-i Ts, in any order)?
ASK&WAIT: What is sum_{i=0 to n} P(i Hs and n-i Ts, in any order)?

Theorem: If you flip a biased coin n times, with P(H) = p,
         the probability of getting i Heads is C(n,i)*p^i*(1-p)^(n-i)

What does P(getting i heads out of n flips) look like as a function of i?
Let's look for n=100, p = .5 , and for n=100, p=.7
Comments on the plots (shown in pdf version of notes)
   when p=.5, the probability is largest at i=50 (equal numbers of heads
     and tails), and quickly gets smaller for larger or smaller i.
     It gets so small that it is easier to look at a logarithmic scale
     (second plot), where the probability of getting 30 Hs and 70 Ts 
     (or 70 Hs and 30 Ts), is about 10^(-5),
     and the probability of getting 10 Hs (or 10Ts) is down to 10^(-17).
   when p = .7, then most noticeable feature of the 3rd plot is that it
     look very much like the first plot, except slid over to have its
     peak at 70 Hs instead of 50 Hs. This makes sense because with P(H) = .7,
     one expects close to 70 Hs out of 100. We will return later to explain
     the remarkable resemblance of these two plots when we discuss the 
     Central Limit Theorem.

EX: Probability of {a flush in poker} (5 cards of same suit)
    P(flush) = 4*P(A), A = {flush in hearts}
    A = A1 inter A2 inter ... inter A5
    Ai = {ith card is a heart}
    By Theorem: P(A) = P(A1) * P(A2|A1) * P(A3|A1 inter A2) * ...
ASK&WAIT: What is P(flush)?

EX: You go to a casino, which advertises the following game:
    You pick a number from 1 to 6. Then they role 3 die, and
    you win if your number comes up at least once.
ASK&WAIT: The casino claims that your chance of winning is 50%,
          since it is 1/6 for each die, each die is independent,
          so the probability is 3*(1/6)=1/2. Is this argument reasonable?
    Let's figure out the real probability of winning at this game.
    Let Ai = {your number comes up on die i}, and A = A1 U A2 U A3.
    We want P(A). The casino said P(A) = P(A1) + P(A2) + P(A3) = 3*(1/6)=1/2
    But this is only true if the Ai are disjoint, which they are not
    (your number can come up twice). So we need inclusion/exclusion:

    Recall: P(A1 U A2) = P(A1) U P(A2) - P(A1 inter A2)
ASK&WAIT: what is P(A1 U A2 U A3)?
ASK&WAIT: What is P(Ai inter Aj)?
ASK&WAIT: What is P(A1 inter A2 inter A3)?
ASK&WAIT: What is P(A) = P(winning)? Should you play an even bet?