carencias de micronutrientes
4. Seguridad alimentaria
4.2. Seguridad alimentaria – entregas de dinero en efectivo y de cupones
1.2 Two complementary results . . . 9
1.3 Illustrative cases . . . 11
2 Legendre-Fenchel dual of the Log-Laplace . . . 13
3 Loss and noise . . . 13
4 General exponential families, properties . . . 17
In this chapter, we focus on arguingly one of the most powerful and oldest tool in statistics, namely the log-Laplace transform, or cumulant generative function of a random variable. We provide below a short list of powerful properties of this crucial quantity.
1
C
ONTROL OF PROBABILITY TAILSLet us first start with a simple property of non-negative random variables.
Lemma 1.1 (Non-negative random variables) LetX be a R+-valued random variable with E(X) < ∞. Then
(Markov inequality) E(X) > εP(X > ε) for eachε ∈ R+, (Fubini formula) E(X) =
Z
R+
P(X > x)dµ(x) whereµ is the Lebesgue measure.
1.1
A first consequence
We can apply this result immediately to real-valued random variables by remarking that for any random variable distributed according to ν (which we note X ∼ ν) and λ ∈ R, the random variable exp(λX) is non-negative. Thus if we now define the domain of ν by Dν = {λ : E[exp(λX)] < ∞}, we deduce by application of Markov’s inequality that for all t > 0,
∀λ ∈ R+
? ∩ Dν P(X > t) = P(exp(λX) > exp(λt))
6 exp(−λt)E[exp(λX)] . (1.1)
∀λ ∈ R−? ∩ Dν P(X 6 t) = P(exp(λX) > exp(λt))
6 exp(−λt)E[exp(λX)] . (1.2)
One first immediate result is the following:
Part I Chapter 1. log E exp(λX)
Lemma 1.2 (Chernoff’s rule) LetX ∼ ν be a real-valued random variable. Then log E exp(X) 6 0 , implies ∀δ ∈ (0, 1], PX > ln(1/δ)6 δ .
The proof is immediate by considering t = ln(1/δ) and λ = 1 in (1.1). More generally, one can obtain a control of the tail of a random variable X from a control of the log-Laplace. The following result shows that conversely, a control of the tails of X induces a control of the log-Laplace:
Lemma 1.3 (Tails and log-Laplace) LetX be a R-valued random variable. ∃λ ∈ R+
: log E exp(λX) 6 ϕ(λ) =⇒ ∀t ∈ R, P(X > t) 6 exp(ϕ(λ) − λt) . ∀t ∈ R, P(X > t) 6 α(t) =⇒ ∀λ ∈ R+, log E exp(λX) 6 log
Z
R
α(u/λ)eudµ(u) .
Proof :
Indeed, for any λ > 0
P(X > t) = P(exp(λX) > exp(λt)) 6 E[exp(λX)] exp(−λt) , where we applied Markov inequality to the R+-valued random variable Z = exp(λX).
For the reverse inequality, we apply Fubini formula for Z, and conclude with a change of variable: log E exp(λX) = log
Z R+ P(exp(λX) > x)dµ(x) = log Z R+ P(X > log(x)/λ)dµ(x)
Why logarithm? In these derivations, the exp transform may seem arbitrary, and one could indeed use more general transforms. Lemma 1.3 is stated using log and exp function, but there is nothing too specific about using the function log here. Indeed, let (f , f ) be any pair of functions such that f : R → R+is increasing with f (R) = R+and f ◦ f = f ◦ f is the identity mapping. Then
∃λ ∈ R+
: f (Ef (λX)) 6 ϕ(λ) =⇒ ∀t ∈ R, P(X > t) 6 f (ϕ(λ)) f (λt) .
However, using the pair (log, exp) makes appear the quantity λt − ϕ(λ), which when optimized on λ corre- sponds to the Legendre-Fenchel dual of ϕ, another powerful mathematical tool.
Another natural explanation is that logarithms are most appropriate to deal with product measures: Let us say we have two measures p1, p2, and we form the product measure p = p1⊗ p2. Since we are generally happier
Chapter 1 1. Control of probability tails
with summing things, let us look for a function such that h(p1⊗ p2) = h(p1) + h(p2) (additivity). It turns out that there are not too many choices. Indeed, let us first note that every (non zero) continuous morphism from (R?+, ×) to (R, +) must be a logarithm function. The one that maps the neutral e× = 1 to the neutral e+ = 0, is the classical logarithm log. Now we want to build a function acting on probability measures, not just R+ A natural way to do so is by combining measures p(S) ∈ R of Borel sets S. Hence given two Borel sets S1 and S2, and measures p1, p2, we may want a function such that f (p1(S1)p2(S2)) = f (p1(S1)) + f (p2(S2)). While a logarithm function works, the remaining dependency on S1, S2 is not desirable. This can be done by replacing evaluation at a Borel set by integration over the space. For illustration, let us consider p1 and p2 are probability measures on a discrete set X . Discrete integration (summing) on X reveals, using the fact that p1(X ) = p2(X ) = 1, that
X
i,j∈X2
p1(i)p2(j) log(p1(i)p2(j)) = X
i,j
p1(i)p2(j) log(p1(i)) + X
i,j
p1(i)p2(j) log(p2(j))
= X
i
p1(i) log(p1(i)) + X
j
p2(j) log(p2(j)).
Hence p → P
ip(i) log(p(i)) is a good candidate for h. Changing the sign then gives the entropy function H(p) = −P
ipilog(pi) (any multiplicative factor works as well). It turns out that we do not require much more to uniquely determine H. Indeed additivity for any p1, p2 plus assuming that when X is discrete, h(p) = P
x∈Xg(p(x)) for some measurable g null at 0 are enough to ensure unicity of H up to a constant factor, see
Daróczy(1971),Csiszár(2008).
1.2
Two complementary results
Now one can consider two complementary points of view: The first one is to fix the value of t in (1.1) and (1.2) and minimize the probability level (the term on the right-hand side of the inequality). The second one is to fix the value of the probability level, and optimize the value of t. This leads to the following lemmas.
Lemma 1.4 (Cramer-Chernoff) Let X ∼ ν be a real-valued random variable. Let us introduce the log-Laplace transform and its Legendre transform:
∀λ ∈ R, ϕν(λ) = log E[exp(λX)], ∀t ∈ R, ϕ?ν(t) = sup λ∈R λt − ϕν(λ) , and letDν = {λ ∈ R : ϕν(λ) < ∞}.
IfDν ∩ R+? 6= ∅, then E[X] < ∞ and for all t > E[X]
log P(X > t) 6 −ϕ?ν(t) . Likewise, ifDν ∩ R−
? 6= ∅, E[X] > −∞ and for all t 6 E[X], log P(X 6 t) 6 −ϕ?ν(t) .
Part I Chapter 1. log E exp(λX)
Remark 1.1 The log-Laplace transform ϕν is also known as the cumulant generative function.
Proof of Lemma1.4:
First, note that {λ ∈ R : E[exp(λX)] < ∞} coincides with {λ ∈ R : ϕν(λ) < ∞}. Using equations (1.1) and (1.2), it holds:
P(X > t) 6 inf λ∈R+?∩Dν
exp(−λt + log E[exp(λX)]) P(X 6 t) 6 inf
λ∈R−?∩Dν
exp(−λt + log E[exp(λX)])
The Legendre transform ϕ?ν of the log-Laplace function ϕν unifies these two cases. Indeed, a striking property of ϕ?
ν is that if λ ∈ Dν for some λ > 0, then E[X] < ∞. This can be seen by Jensen’s inequality applied to the function ln: Indeed it holds λE[X] = E[ln exp(λX)] 6 ϕν(λ). Further, for all t > E[X], it holds ϕ?ν(t) = sup λ∈R+∩D ν λt − ϕν(λ) .
Note that this also applies if E[X] = −∞. Likewise, if λ ∈ Dν for some λ < 0 then E[X] > −∞ and for all t6 E[X], it holds
ϕ?ν(t) = sup λ∈R−∩D ν λt − ϕν(λ) . Alternatively, the second point of view is to fix the confidence level δ ∈ (0, 1], and then to solve the equation exp(−λt)E[exp(λX)] = δ in t = t(δ). We then optimize over t. This leads to:
Lemma 1.5 (Alternative Cramer-Chernoff) LetX ∼ ν be a real-valued random variable and let Dν = {λ ∈ R : log E exp(λX) < ∞}. It holds,
P X > inf λ∈Dν∩R+? n1 λlog E[exp(λX)] + log(1/δ) λ o 6 δ P X 6 sup λ∈(−Dν)∩R+? n − 1 λlog E[exp(−λX)] − log(1/δ) λ o 6 δ . Proof of Lemma1.5:
Solving exp(−λt)E[exp(λX)] = δ for δ ∈ (0, 1] and λ 6= 0, we obtain he following equivalence −λt + log E[exp(λX)] = log(δ)
λt = − log(δ) + log E[exp(λX)] t = 1
λlog(1/δ) + 1
λlog E[exp(λX)] .
Chapter 1 1. Control of probability tails
Thus, we deduce from (1.1) and (1.2) that
∀λ > 0 P X > 1 λlog(1/δ) + 1 λlog E[exp(λX)] 6 δ ∀λ > 0 P X 6 −1 λlog(1/δ) − 1 λlog E[exp(−λX)] 6 δ .
The rescaled Laplace transform λ → λ1log E[exp(λX)] is sometimes called the entropic risk measure. Note that Lemma1.4and1.5involve slightly different quantities, depending on whether we focus on the probability level δ or the threshold on X.