CS174 Sp99, Lecture 9 summary, John Canny

CS174 Spring 99 Lecture 9 Summary

Chernoff Bounds

Chernoff bounds are another kind of tail bound. Like Markoff and Chebyshev, they bound the total amount of probability of some random variable Y that is in the "tail", i.e. far from the mean.

Recall that Markov bounds apply to any non-negative random variable Y and have the form

Markov bounds don’t depend on any knowledge of the distribution of Y. Chebyshev bounds use knowledge of the standard deviation to give a tighter bound. The Chebyshev bound for a random variable X with standard deviation s is:

But we already saw that some random variables (e.g. the number of balls in a bin) fall off exponentially with distance from the mean. So Markov and Chebyshev are very poor bounds for those kinds of random variables. The Chebyshev bound applies to a class of random variables and does give exponential fall-off of probability with distance from the mean. The critical condition that’s needed for a Chernoff bound is that the random variable be a sum of independent indicator random variables. Since that’s true for balls in bins, Chernoff bounds apply.

Bernoulli Trials and the Binomial Distribution

The first kind of random variable that Chernoff bounds work for is a random variable that is a sum of indicator variables with the same distribution. That is, if X_i is a random variable with Pr[X_i = 1] = p, Pr[X_i = 0] = (1-p), and the Xi are all independent, then X_i is called a Bernoulli trial. So tossing a coin is a Bernoulli trial. So is the event that a randomly tossed ball falls into one of n bins (p=1/n). If

Then X has a Binomial distribution. We derived this already for coins and balls into bins. It is

The name binomial distribution comes because of the binomial coefficients that appear in the expression for the probability.

Poisson Trials

There is a slightly more general distribution that we can derive Chernoff bounds for. If instead of a fixed probability we allow every X_i to have a different probability, Pr[X_i = 1] = p_i, and Pr[X_i = 0] = (1-p_i), then these event are called Poisson trial. A Poisson trial by itself is really just a Bernoulli trial. But when you have a lot of them together with different probabilities, they are called Poisson trials. But it is very important that the X_i must still be independent.

Chernoff bound (upper tail)

That leads us to one statement of the Chernoff bound. Let X₁, X₂,…,X_n be independent Poisson trials with Pr[X_i = 1] = p_i. Then if X is the sum of the X_i and if m is E[X], for any d ,

The right-hand side of this inequality looks very strange. At first its not clear that the probability is even less than one. But it is because e^d < (1+d )^{(1+d )} for any positive d . By proving that, we will get the bound into a more comfortable (but not quite as strong) form. First, write the RHS as

Lets concentrate on the exponent inside the parenthesis, that is:

d - (1 + d )ln(1 + d )

After Taylor expanding ln(1+d ), we get

d - (1 + d )(d - d ²/2 + d ³/3 - d ⁴/4 + …)

which becomes

d - (d + d ²/2 - d ³/6 + d ⁴/3.4 - …) = -(d ²/2 - d ³/6 + d ⁴/3.4 - …) » -d ²/2 for small d

To give a concrete bound, if d <2e-1, this series is less than -d ²/4, so we can write

For d in that range. Normally, delta should be small, because Chernoff is about getting good bounds near the mean. If d is outside that range, we can use another simplification:

So we still get exponential decrease in probability away from the mean.

Chernoff Bound (lower tail)

A slightly different version of the Chernoff bound applies to values less than the mean. Let X₁, X₂,…,X_n be independent Poisson trials with Pr[X_i = 1] = p_i. Then if X is the sum of the X_i and if m is E[X], for any d in (0,1]

Once again the bound is exponential in the square of d .

Proof Sketch

Note: this proof only applies to the lower tail result. The upper tail proof is different.

First write the inequality as an inequality in exponents, multiplied by t>0.

And notice that exp(-tX) is a product of independent random variables exp(-tX_i). So its expected value is the product of the expected values E[exp(-tX_i)]. Now we can apply the Markov inequality to the RHS above:

Now E[exp(-tX_i)] is given by

Because 1-x < exp(-x), therefore

(because the sum of the p_i is m ) And the overall bound becomes

This is close to what we want, but it has an arbitrary factor of t in it. By choosing a good value of t, we can make the bound as tight as possible. It turns out that t = ln(1/(1-d )) works well. After that substitution, we get

To get the final bound, we notice that for d in (0,1]: (1-d )^{(1-d )}>exp(-d +d ²/2)

Making that substitution gives:

Which is the Chernoff bound for the lower tail.