Markov’s inequality bounds the tail probability of a nonnegative random variable x
based only on its expectation. For a >0,
Prob(x > a)≤ E(x)
a .
As a grows, the bound drops off as 1/a. Given the second moment of x, Chebyshev’s inequality, which does not assumexis a nonnegative random variable, gives a tail bound falling off as 1/a2
Prob(|x−E(x)| ≥a)≤
E x−E(x)2
a2 .
Higher moments yield bounds by applying either of these two theorems. For example, ifris a nonnegative even integer, thenxr is a nonnegative random variable even ifxtakes on negative values. Applying Markov’s inequality toxr,
Prob(|x| ≥a) = Prob(xr ≥ar)≤ E(x r)
ar ,
a bound that falls off as 1/ar. The larger the r, the greater the rate of fall, but a bound
onE(xr) is needed to apply this technique.
For a random variable x that is the sum of a large number of independent random variables,x1, x2, . . . , xn, one can derive bounds onE(xr) for high evenr. There are many
situations where the sum of a large number of independent random variables arises. For example,xi may be the amount of a good that theith consumer buys, the length of theith
message sent over a network, or the indicator random variable of whether the ith record
in a large database has a certain property. Each xi is modeled by a simple probability
distribution. Gaussian, exponential (probability density at any t >0 is e−t), or binomial
distributions are typically used, in fact, respectively in the three examples here. If thexi
have 0-1 distributions, there are a number of theorems called Chernoff bounds, bounding the tails of x =x1 +x2 +· · ·+xn, typically proved by the so-called moment-generating
function method (see Section 11.4.11 of the appendix). But exponential and Gaussian ran- dom variables are not bounded and these methods do not apply. However, good bounds on the moments of these two distributions are known. Indeed, for any integer s >0, the
sth moment for the unit variance Gaussian and the exponential are both at most s!.
Given bounds on the moments of individual xi the following theorem proves moment
bounds on their sum. We use this theorem to derive tail bounds not only for sums of 0-1 random variables, but also Gaussians, exponentials, Poisson, etc.
The gold standard for tail bounds is the central limit theorem for independent, iden- tically distributed random variablesx1, x2,· · · , xn with zero mean and Var(xi) =σ2 that
states as n → ∞ the distribution of x = (x1 +x2 +· · ·+xn)/ √
n tends to the Gaus- sian density with zero mean and variance σ2. Loosely, this says that in the limit, the tails of x = (x1 +x2 +· · ·+xn)/
√
n are bounded by that of a Gaussian with variance
σ2. But this theorem is only in the limit, whereas, we prove a bound that applies for alln. In the following theorem, x is the sum of n independent, not necessarily identically distributed, random variables x1, x2, . . . , xn, each of zero mean and variance at most σ2.
By the central limit theorem, in the limit the probability density of x goes to that of the Gaussian with variance at most nσ2. In a limit sense, this implies an upper bound of ce−a2/(2nσ2) for the tail probability Prob(|x| > a) for some constant c. The following theorem assumes bounds on higher moments, but asserts a quantitative upper bound of 3e−a2/(8nσ2)
on the tail probability, not just in the limit, but for every n. We will apply this theorem to get tail bounds on sums of Gaussian, binomial, and power law distributed random variables.
Theorem 2.10 Letx=x1+x2+· · ·+xn, where x1, x2, . . . , xn are mutually independent
random variables with zero mean and variance at most σ2. If for 3 ≤ s ≤ (a2/4nσ2),
|E(xs i)| ≤σ2s!, then for 0≤a≤ √ 2nσ2), Prob(|x1+x2 +· · ·xn| ≥a)≤3e−a 2/(8nσ2) .
Proof: We first prove an upper bound onE(xr) for any even positive integerr and then use Markov’s inequality as discussed earlier. Expand (x1 +x2+· · ·+xn)r.
(x1+x2 +· · ·+xn)r = X r r1, r2, . . . , rn xr1 1 x r2 2 · · ·x rn n =X r! r1!r2!· · ·rn! xr1 1 x r2 2 · · ·x rn n
where the ri range over all nonnegative integers summing to r. By independence
E(xr) =X r! r1!r2!· · ·rn! E(xr1 1 )E(x r2 2 )· · ·E(x rn n ).
If in a term, any ri = 1, the term is zero since E(xi) = 0. Assume henceforth that
(r1, r2, . . . , rn) runs over sets of nonzerori summing tor where each nonzerori is at least
two. There are at mostr/2 nonzero ri in each set. Since |E(xiri)| ≤σ2ri!,
E(xr)≤r! X (r1,r2,...,rn)
σ2( number of nonzeroriin set).
Collect terms of the summation with t nonzero ri for t = 1,2, . . . , r/2. There are nt
subsets of {1,2, . . . , n} of cardinality t. Once a subset is fixed as the set of t values of i
allocate the remainingr−2t to thet ri arbitrarily. The number of such allocations is just r−2t+t−1 t−1 = r−t−t−11. So, E(xr)≤r! r/2 X t=1 f(t), where f(t) = n t r−t−1 t−1 σ2t.
Thusf(t)≤h(t),whereh(t) = (nσt!2)t2r−t−1. In the hypotheses of the theorema≤√2nσ2 and s ≤ a2
4nσ2. Thus r is at most nσ2/2. For t ≤ r/2, increasing t by one, increases h(t)
by at least nσ2/(2t), which is at least two. This gives
E(xr) =r! r/2 X t=1 f(t)≤r!h(r/2)(1 + 1 2 + 1 4+· · ·)≤ r! (r/2)!2 r/2(nσ2)r/2. Applying Markov inequality,
Prob(|x|> a) = Prob(|x|r > ar)≤ r!(nσ2)r/22r/2
(r/2)!ar =g(r).
For evenr,g(r)/g(r−2) = 4(r−a1)2nσ2 and sog(r) decreases as long asr−1≤a
2/(4nσ2). Takingrto be the largest even integer less than or equal toa2/(4nσ2), the tail probability is at moste−r/2,which is at most e·e−a2/(8nσ2)
≤3·e−a2/(8nσ2)
,proving the theorem.