TÉCNICA E INSTRUMENTOS DE RECOLECCIÓN DE DATOS

Markov’s inequality bounds the tail probability of a nonnegative random variable x

based only on its expectation. For a >0,

Prob(x > a)≤ E(x)

a .

As a grows, the bound drops off as 1/a. Given the second moment of x, Chebyshev’s inequality, which does not assumexis a nonnegative random variable, gives a tail bound falling off as 1/a2

Prob(|x−E(x)| ≥a)≤

E x−E(x)2

a2 .

Higher moments yield bounds by applying either of these two theorems. For example, ifris a nonnegative even integer, thenxr is a nonnegative random variable even ifxtakes on negative values. Applying Markov’s inequality toxr_,

Prob(|x| ≥a) = Prob(xr ≥ar)≤ E(x r₎

ar ,

a bound that falls off as 1/ar_{. The larger the} _r_{, the greater the rate of fall, but a bound}

onE(xr_{) is needed to apply this technique.}

For a random variable x that is the sum of a large number of independent random variables,x1, x2, . . . , xn, one can derive bounds onE(xr) for high evenr. There are many

situations where the sum of a large number of independent random variables arises. For example,xi may be the amount of a good that theith consumer buys, the length of theith

message sent over a network, or the indicator random variable of whether the ith _record

in a large database has a certain property. Each xi is modeled by a simple probability

distribution. Gaussian, exponential (probability density at any t >0 is e−t_{), or binomial}

distributions are typically used, in fact, respectively in the three examples here. If thexi

have 0-1 distributions, there are a number of theorems called Chernoff bounds, bounding the tails of x =x1 +x2 +· · ·+xn, typically proved by the so-called moment-generating

function method (see Section 11.4.11 of the appendix). But exponential and Gaussian random variables are not bounded and these methods do not apply. However, good bounds on the moments of these two distributions are known. Indeed, for any integer s >0, the

sth _{moment for the unit variance Gaussian and the exponential are both at most} _s_!.

Given bounds on the moments of individual xi the following theorem proves moment

bounds on their sum. We use this theorem to derive tail bounds not only for sums of 0-1 random variables, but also Gaussians, exponentials, Poisson, etc.

The gold standard for tail bounds is the central limit theorem for independent, identically distributed random variablesx1, x2,· · · , xn with zero mean and Var(xi) =σ2 that

states as n → ∞ the distribution of x = (x1 +x2 +· · ·+xn)/ √

n tends to the Gaus- sian density with zero mean and variance σ2. Loosely, this says that in the limit, the tails of x = (x1 +x2 +· · ·+xn)/

√

n are bounded by that of a Gaussian with variance

σ2_{. But this theorem is only in the limit, whereas, we prove a bound that applies for all}_n_. In the following theorem, x is the sum of n independent, not necessarily identically distributed, random variables x1, x2, . . . , xn, each of zero mean and variance at most σ2.

By the central limit theorem, in the limit the probability density of x goes to that of the Gaussian with variance at most nσ2_{. In a limit sense, this implies an upper bound} of ce−a2/(2nσ2) _{for the tail probability Prob(}_|_x_| _{> a}_{) for some constant} _c. _{The following} theorem assumes bounds on higher moments, but asserts a quantitative upper bound of 3e−a2_/₍₈_nσ2₎

on the tail probability, not just in the limit, but for every n. We will apply this theorem to get tail bounds on sums of Gaussian, binomial, and power law distributed random variables.

Theorem 2.10 Letx=x1+x2+· · ·+xn, where x1, x2, . . . , xn are mutually independent

random variables with zero mean and variance at most σ2_{. If for} ₃ _≤ _s _≤ ₍_a2_/₄_nσ2₎_,

|E(xs i)| ≤σ2s!, then for 0≤a≤ √ 2nσ2_), Prob(|x1+x2 +· · ·xn| ≥a)≤3e−a 2_/₍₈_nσ2₎ .

Proof: We first prove an upper bound onE(xr) for any even positive integerr and then use Markov’s inequality as discussed earlier. Expand (x1 +x2+· · ·+xn)r.

(x1+x2 +· · ·+xn)r = X r r1, r2, . . . , rn xr1 1 x r2 2 · · ·x rn n =X r! r1!r2!· · ·rn! xr1 1 x r2 2 · · ·x rn n

where the ri range over all nonnegative integers summing to r. By independence

E(xr) =X r! r1!r2!· · ·rn! E(xr1 1 )E(x r2 2 )· · ·E(x rn n ).

If in a term, any ri = 1, the term is zero since E(xi) = 0. Assume henceforth that

(r1, r2, . . . , rn) runs over sets of nonzerori summing tor where each nonzerori is at least

two. There are at mostr/2 nonzero ri in each set. Since |E(xiri)| ≤σ2ri!,

E(xr)≤r! X (r1,r2,...,rn)

σ2( number of nonzeroriin set)_.

Collect terms of the summation with t nonzero ri for t = 1,2, . . . , r/2. There are n_t

subsets of {1,2, . . . , n} of cardinality t. Once a subset is fixed as the set of t values of i

allocate the remainingr−2t to thet ri arbitrarily. The number of such allocations is just r−2t+t−1 t−1 = r−_t₋t−₁1. So, E(xr)≤r! r/2 X t=1 f(t), where f(t) = n t r−t−1 t−1 σ2t.

Thusf(t)≤h(t),whereh(t) = (nσ_t_!2)t2r−t−1_{. In the hypotheses of the theorem}_a_≤√₂_nσ2 and s ≤ a2

4nσ2. Thus r is at most nσ2/2. For t ≤ r/2, increasing t by one, increases h(t)

by at least nσ2/(2t), which is at least two. This gives

E(xr) =r! r/2 X t=1 f(t)≤r!h(r/2)(1 + 1 2 + 1 4+· · ·)≤ r! (r/2)!2 r/2₍_nσ2₎r/2_. Applying Markov inequality,

Prob(|x|> a) = Prob(|x|r _{> a}r₎_≤ r!(nσ2)r/22r/2

(r/2)!ar =g(r).

For evenr,g(r)/g(r−2) = 4(r−_a1)2nσ2 and sog(r) decreases as long asr−1≤a

2_/₍₄_nσ2_). Takingrto be the largest even integer less than or equal toa2_/₍₄_nσ2_{), the tail probability} is at moste−r/2_,_{which is at most} _e_·_e−a2_/₍₈_nσ2₎

≤3·e−a2_/₍₈_nσ2₎

,proving the theorem.

In document Actividad antibacteriana del extracto del Pelargonium hortorum (geranio) frente a Streptococcus mutans Tacna 2016 (página 48-57)