2. Capítulo II Marco Teórico
2.4. Marco conceptual
tion with mean 0 and finite variance if and only if its ch.f. has
logϕ(t) =
Z
(eitx−1−itx)x−2ν(dx)
Here the integrand is−t2/2 at 0, ν is called thecanonical measureand var(Z) =
ν(R).
To explain the formula, note that ifZλ has a Poisson distribution with meanλ Eexp(itx(Zλ−λ)) = exp(λ(eitx−1−itx))
so the measure forZ =x(Zλ−λ) hasν({x}) =λx2.
3.9
Limit Theorems in R
dLetX = (X1, . . . , Xd) be a random vector. We define itsdistribution functionby
F(x) =P(X ≤x). Here x∈Rd, andX ≤xmeansXi ≤xi fori= 1, . . . , d. As in
one dimension,F has three obvious properties: (i) It is nondecreasing, i.e., ifx≤y thenF(x)≤F(y).
(ii) limx→∞F(x) = 1, limxi→−∞F(x) = 0.
(iii)F is right continuous, i.e., limy↓xF(y) =F(x).
Herex→ ∞means each coordinatexi goes to∞,xi → −∞means we letxi→ −∞
keeping the other coordinates fixed, andy ↓xmeans each coordinateyi↓xi. As discussed in Section 1.1, an additional condition is needed to guarantee thatF
is the distribution function of a probability measure, let
A= (a1, b1]× · · · ×(ad, bd] V ={a1, b1} × · · · × {ad, bd}
V = the vertices of the rectangleA. Ifv∈V, let sgn (v) = (−1)# ofa’s inv The inclusion-exclusion formula implies
P(X ∈A) =X
v∈V
sgn (v)F(v) So if we use ∆AF to denote the right-hand side, we need (iv) ∆AF ≥0 for all rectanglesA.
The last condition guarantees that the measure assigned to each rectangle is≥0. At this point we have defined the measure on the semialgebra Sd defined in Example
1.1.3. Theorem 1.1.6 now implies that there is a unique probability measure with distributionF.
Exercise 3.9.1. IfF is the distribution of (X1, . . . , Xd) thenFi(x) =P(Xi≤x) are itsmarginal distributions. How can they be obtained fromF?
Exercise 3.9.2. LetF1, . . . , Fd be distributions onR. Show that for anyα∈[−1,1] F(x1, . . . , xd) = ( 1 +α d Y i=1 (1−Fi(xi)) ) d Y j=1 Fj(xj)
is a d.f. with the given marginals. The caseα= 0 corresponds to independent r.v.’s. Exercise 3.9.3. A distributionF is said to have adensityf if
F(x1, ..., xk) = Z x1 −∞ . . . Z xk −∞ f(y)dyk. . . dy1
Show that iff is continuous,∂kF/∂x
1. . . ∂xk=f.
IfFnandF are distribution functions onRd, we say thatFn converges weakly
toF, and writeFn⇒F, ifFn(x)→F(x) at all continuity points ofF. Our first task is to show that there are enough continuity points for this to be a sensible definition. For a concrete example, consider
F(x, y) = 1 ifx≥0,y≥1 y ifx≥0, 0≤y <1 0 otherwise
F is the distribution function of (0, Y) whereY is uniform on (0,1). Notice that this distribution has no atoms, butF is discontinuous at (0, y) wheny >0.
Keeping the last example in mind, observe that if xn < x, i.e., xn,i < xi for all coordinatesi, andxn ↑xasn→ ∞then
F(x)−F(xn) =P(X ≤x)−P(X ≤xn)↓P(X ≤x)−P(X < x) Ind= 2, the last expression is the probabilityX lies in
{(a, x2) :a≤x1} ∪ {(x1, b) :b≤x2}
LetHci ={x: xi =c} be the hyperplane where the ith coordinate isc. For eachi,
theHi
c are disjoint soDi ={c:P(X ∈Hci)>0} is at most countable. It is easy to
see that ifxhasxi∈/Di for allithenF is continuous atx. This gives us more than
enough points to reconstructF.
As in Section 3.2, it will be useful to have several equivalent definitions of weak convergence. In Chapter 8, we will need to know that this is valid for an arbitrary metric space (S, ρ), so we will prove the result in that generality and insert another equivalence that will be useful there. f is said to be Lipschitz continuousif there is a constantCso that|f(x)−f(y)| ≤Cρ(x, y).
Theorem 3.9.1. The following statements are equivalent toXn⇒X∞.
(i)Ef(Xn)→Ef(X∞)for all bounded continuous f.
(ii) Ef(Xn)→Ef(X∞)for all bounded Lipschitz continuousf.
(iii) For all closed sets K,lim supn→∞P(Xn ∈K)≤P(X∞∈K).
(iv) For all open sets G,lim infn→∞P(Xn ∈G)≥P(X∞∈G).
(v) For all setsA withP(X∞∈∂A) = 0,limn→∞P(Xn∈A) =P(X∞∈A).
(vi) LetDf =the set of discontinuities off. For all bounded functionsf withP(X∞∈
3.9. LIMIT THEOREMS IN RD 149 Proof. We will begin by showing that (i)–(vi) are equivalent.
(i) implies (ii): Trivial.
(ii) implies (iii): Letρ(x, K) = inf{ρ(x, y) :y∈K}, ϕj(r) = (1−jr)+, and fj(x) = ϕj(ρ(x, K)). fj is Lipschitz continuous, has values in [0,1], and↓1K(x) asj↑ ∞. So
lim sup
n→∞
P(Xn∈K)≤ lim
n→∞Efj(Xn) =Efj(X∞)↓P(X∞∈K) asj↑ ∞
(iii) is equivalent to (iv): As in the proof of Theorem 3.2.5, this follows easily from two facts: A is open if and only ifAc is closed; P(A) +P(Ac) = 1.
(iii) and (iv) imply (v): LetK= ¯A,G=Ao, and reason as in the proof of Theorem 3.2.5.
(v) implies (vi): Suppose|f(x)| ≤Kand pickα0< α1< . . . < α`so thatP(f(X∞) =
αi) = 0 for 0≤i≤`,α0<−K < K < α`, andαi−αi−1< . This is always possible
since{α:P(f(X∞) =α)>0} is a countable set. LetAi={x:αi−1< f(x)≤αi}. ∂Ai ⊂ {x:f(x)∈ {αi−1, αi}} ∪Df , soP(X∞∈∂Ai) = 0 , and it follows from (v)
that ` X i=1 αiP(Xn ∈Ai)→ ` X i=1 αiP(X∞∈Ai)
The definition of theαi implies 0≤
`
X
i=1
αiP(Xn ∈Ai)−Ef(Xn)≤ for 1≤n≤ ∞
Sinceis arbitrary, it follows thatEf(Xn)→Ef(X∞).
(vi) implies (i): Trivial.
It remains to show that the six conditions are equivalent to weak convergence (⇒).
(v) implies (⇒) : If F is continuous at x, then A = (−∞, x1]×. . .×(−∞, xd] has
µ(∂A) = 0, soFn(x) =P(Xn∈A)→P(X∞∈A) =F(x).
(⇒)implies (iv): LetDi ={c:P(X
∞∈Hci)>0} whereHci ={x:xi =c}. We say
a rectangleA= (a1, b1]×. . .×(ad, bd] is good ifai,bi∈/Di for alli. (⇒) implies that
for all good rectanglesP(Xn ∈A)→P(X∞∈A). This is also true forB that are a
finite disjoint union of good rectangles. Now any open setGis an increasing limit of
Bk’s that are a finite disjoint union of good rectangles, so lim inf
n→∞ P(Xn∈G)≥lim infn→∞ P(Xn ∈Bk) =P(X∞∈Bk)↑P(X∞∈G)
ask→ ∞. The proof is complete.
Remark. In Section 3.2, we proved that (i)–(v) are consequences of weak convergence by constructing r.v’s with the given distributions so thatXn →X∞ a.s. This can be
done in Rd (or any complete separable metric space), but the construction is rather
messy. See Billingsley (1979), p. 337–340 for a proof inRd.
Exercise 3.9.4. LetXn be random vectors. Show that ifXn ⇒X then the coordi- natesXn,i ⇒Xi.
A sequence of probability measures µn is said to betightif for any >0, there is anM so that lim infn→∞µn([−M, M]d)≥1−.
Theorem 3.9.2. Ifµn is tight, then there is a weakly convergent subsequence.
Proof. LetFn be the associated distribution functions, and letq1, q2, . . .be an enu-
meration of Qd = the points in Rd with rational coordinates. By a diagonal argu-
ment like the one in the proof of Theorem 3.2.6, we can pick a subsequence so that
Fn(k)(q)→G(q) for allq∈Qd. Let
F(x) = inf{G(q) :q∈Qd, q > x}
where q > xmeansqi> xi for all i. It is easy to see thatF is right continuous. To check that it is a distribution function, we observe that ifAis a rectangle with vertices in Qd then ∆AFn ≥0 for alln, so ∆AG≥0, and taking limits we see that the last conclusion holds for F for all rectangles A. Tightness implies that F has properties (i) and (ii) of a distributionF. We leave it to the reader to check thatFn⇒F. The
proof of Theorem 3.2.6 works if you read inequalities such asr1< r2< x < sas the
corresponding relations between vectors.
Thecharacteristic functionof (X1, . . . , Xd) isϕ(t) =Eexp(it·X) wheret·X =
t1X1+· · ·+tdXd is the usual dot product of two vectors.
Theorem 3.9.3. Inversion formula. If A= [a1, b1]×. . .×[ad, bd]with µ(∂A) = 0 then µ(A) = lim T→∞(2π) −d Z [−T ,T]d d Y j=1 ψj(tj)ϕ(t)dt
whereψj(s) = (exp(−isaj)−exp(−isbj))/is.
Proof. Fubini’s theorem implies
Z [−T ,T]d Z d Y j=1 ψj(tj) exp(itjxj)µ(dx)dt = Z d Y j=1 Z T −T ψj(tj) exp(itjxj)dtjµ(dx) It follows from the proof of Theorem 3.3.4 that
Z T
−T
ψj(tj) exp(itjxj)dtj→π 1(aj,bj)(x) + 1[aj,bj](x)
so the desired conclusion follows from the bounded convergence theorem.
Exercise 3.9.5. Letϕbe the ch.f. of a distributionF onR. What is the distribution onRd that corresponds to the ch.f. ψ(t
1, . . . , td) =ϕ(t1+· · ·+td)?
Exercise 3.9.6. Show that random variablesX1, . . . , Xkare independent if and only
if ϕX1,...Xk(t) = k Y j=1 ϕXj(tj)
3.9. LIMIT THEOREMS IN RD 151 Theorem 3.9.4. Convergence theorem. Let Xn, 1≤n≤ ∞be random vectors with ch.f. ϕn. A necessary and sufficient condition for Xn ⇒ X∞ is that ϕn(t) → ϕ∞(t).
Proof. exp(it·x) is bounded and continuous, so if Xn ⇒X∞ thenϕn(t)→ϕ∞(t).
To prove the other direction it suffices, as in the proof of Theorem 3.3.6, to prove that the sequence is tight. To do this, we observe that if we fixθ∈Rd, then for alls∈R, ϕn(sθ)→ϕ∞(sθ), so it follows from Theorem 3.3.6, that the distributions ofθ·Xn
are tight. Applying the last observation to the d unit vectors e1, . . . , ed shows that
the distributions ofXn are tight and completes the proof.
Remark. As before, ifϕn(t)→ϕ∞(t) withϕ∞(t) continuous at 0, thenϕ∞(t) is the
ch.f. of someX∞ andXn ⇒X∞.
Theorem 3.9.4 has an important corollary.
Theorem 3.9.5. Cram´er-Wold device. A sufficient condition for Xn ⇒ X∞ is
that θ·Xn⇒θ·X∞ for allθ∈Rd.
Proof. The indicated condition implies Eexp(iθ·Xn)→Eexp(iθ·X∞) for all θ ∈
Rd.
Theorem 3.9.5 leads immediately to
Theorem 3.9.6. The central limit theorem in Rd. LetX1, X2, . . .be i.i.d. ran-
dom vectors with EXn=µ, and finite covariances
Γij =E((Xn,i−µi)(Xn,j−µj))
If Sn=X1+· · ·+Xn then (Sn−nµ)/n1/2⇒χ, whereχ has a multivariate normal distribution with mean 0 and covariance Γ, i.e.,
Eexp(iθ·χ) = exp
− X i X j θiθjΓij/2
Proof. By consideringXn0 =Xn−µ, we can suppose without loss of generality that µ= 0. Letθ∈Rd. θ·X
n is a random variable with mean 0 and variance
EX i θiXn,i 2 =X i X j E(θiθjXn,iXn,j) = X i X j θiθjΓij
so it follows from the one-dimensional central limit theorem and Theorem 3.9.5 that
Sn/n1/2⇒χwhere
Eexp(iθ·χ) = exp
− X i X j θiθjΓij/2
which proves the desired result.
To illustrate the use of Theorem 3.9.6, we consider two examples. In eache1, . . . , ed
Example 3.9.1. Simple random walk on Zd. LetX1, X2, . . .be i.i.d. with
P(Xn= +ei) =P(Xn =−ei) = 1/2d fori= 1, . . . , d
EXni = 0 and ifi 6=j then EXniXnj = 0 since both components cannot be nonzero
simultaneously. So the covariance matrix is Γij= (1/2d)I.
Example 3.9.2. LetX1, X2, . . .be i.i.d. withP(Xn=ei) = 1/6 fori= 1,2, . . . ,6. In words, we are rolling a die and keeping track of the numbers that come up. EXn,i= 1/6 and EXn,iXn,j = 0 for i 6= j, so Γij = (1/6)(5/6) when i =j and =−(1/6)2
wheni6=j. In this case, the limiting distribution is concentrated on{x:P
ixi= 0}.
Our treatment of the central limit theorem would not be complete without some discussion of the multivariate normal distribution. We begin by observing that Γij =
Γji and ifEXi= 0 andEXiXj= Γi,j
X i X j θiθjΓij=E X i θiXi !2 ≥0
so Γ is symmetric and nonnegative definite. A well-known result implies that there is an orthogonal matrix U (i.e., one with UtU = I, the identity matrix) so that Γ =UtV U, where V ≥0 is a diagonal matrix. Let W be the nonnegative diagonal matrix withW2=V. If we let A=W U, then Γ =AtA. LetY be ad-dimensional vector whose components are independent and have normal distributions with mean 0 and variance 1. If we view vectors as 1×dmatrices and let χ=Y A, then χ has the desired normal distribution. To check this, observe that
θ·Y A=X
i θiX
j YjAji
has a normal distribution with mean 0 and variance
X j X i Ajiθi !2 =X j X i θiAtij ! X k Ajkθk ! =θAtAθt=θΓθt
soE(exp(iθ·χ)) = exp(−(θΓθt)/2).
If the covariance matrix has rank d, we say that the normal distribution isnon- degenerate. In this case, its density function is given by
(2π)−d/2(det Γ)−1/2exp − X i,j yiΓ−ij1yj/2
The joint distribution in degenerate cases can be computed by using a linear transfor- mation to reduce to the nondegenerate case. For instance, in Example 3.9.2 we can look at the distribution of (X1, . . . , X5).
Exercise 3.9.7. Suppose (X1, . . . , Xd) has a multivariate normal distribution with
mean vector θ and covariance Γ. Show X1, . . . , Xd are independent if and only if
Γij = 0 for i 6= j. In words, uncorrelated random variables with a joint normal
distribution are independent.
Exercise 3.9.8. Show that (X1, . . . , Xd) has a multivariate normal distribution with
mean vectorθand covariance Γ if and only if every linear combinationc1X1+· · ·+cdXd
Chapter 4
Random Walks
Let X1, X2, . . . be i.i.d. taking values in Rd and let Sn = X1+. . .+Xn. Sn is a
random walk. In the last chapter, we were primarily concerned with the distribution of Sn. In this one, we will look at properties of the sequence S1(ω), S2(ω), . . . For example, does the last sequence return to (or near) 0 infinitely often? The first section introduces stopping times, a concept that will be very important in this and the next two chapters. After the first section is completed, the remaining three can be read in any order or skipped without much loss. The second section is not starred since it contains some basic facts about random walks.
4.1
Stopping Times
Most of the results in this section are valid for i.i.d.X’s taking values in some nice measurable space (S,S) and will be proved in that generality. For several reasons, it is convenient to use the special probability space from the proof of Kolmogorov’s extension theorem:
Ω ={(ω1, ω2, . . .) :ωi∈S} F =S × S ×. . .
P =µ×µ×. . . µis the distribution of Xi Xn(ω) =ωn
So, throughout this section, we will suppose (without loss of generality) that our random variables are constructed on this special space.
Before taking up our main topic, we will prove a 0-1 law that, in the i.i.d. case, generalizes Kolmogorov’s. To state the new 0-1 law we need two definitions. Afinite permutationofN={1,2, . . .}is a mapπ fromNontoN so thatπ(i)6=ifor only finitely manyi. Ifπis a finite permutation ofNandω∈SN we define (πω)
i =ωπ(i).
In words, the coordinates of ω are rearranged according to π. Since Xi(ω) = ωi
this is the same as rearranging the random variables. An eventA ispermutableif
π−1A≡ {ω:πω ∈A} is equal toA for any finite permutationπ, or in other words,
if its occurrence is not affected by rearranging finitely many of the random variables. The collection of permutable events is aσ-field. It is called theexchangeableσ-field and denoted byE.
To see the reason for interest in permutable events, supposeS=Rand letSn(ω) =
X1(ω) +· · ·+Xn(ω). Two examples of permutable events are 153
(i){ω:Sn(ω)∈B i.o.}
(ii){ω: lim supn→∞Sn(ω)/cn≥1}
In each case, the event is permutable because Sn(ω) =Sn(πω) for large n. The list of examples can be enlarged considerably by observing:
(iii) All events in the tailσ-fieldT are permutable.
To see this, observe that if A ∈σ(Xn+1, Xn+2, . . .) then the occurrence of A is un-
affected by a permutation ofX1, . . . , Xn. (i) shows that the converse of (iii) is false.
The next result shows that for an i.i.d. sequence there is no difference betweenEand
T. They are both trivial.
Theorem 4.1.1. Hewitt-Savage 0-1 law. If X1, X2, . . .are i.i.d. and A∈ E then P(A)∈ {0,1}.
Proof. Let A ∈ E. As in the proof of Kolmogorov’s 0-1 law, we will show A is independent of itself, i.e., P(A) = P(A∩A) = P(A)P(A) so P(A) ∈ {0,1}. Let
An∈σ(X1, . . . , Xn) so that
(a) P(An∆A)→0
HereA∆B= (A−B)∪(B−A) is the symmetric difference. The existence of theAn’s is proved in part ii of Lemma A.2.1. An can be written as {ω : (ω1, . . . , ωn)∈ Bn}
withBn ∈ Sn. Let π(j) = j+n if 1≤j≤n j−n ifn+ 1≤j ≤2n j ifj≥2n+ 1
Observing thatπ2 is the identity (so we don’t have to worry about whether to write πorπ−1) and the coordinates are i.i.d. (so the permuted coordinates are) gives
(b) P(ω:ω∈An∆A) =P(ω:πω∈An∆A) Now{ω:πω∈A}={ω:ω∈A}, sinceAis permutable, and
{ω:πω∈An}={ω: (ωn+1, . . . , ω2n)∈Bn}
If we useA0n to denote the last event then we have
(c) {ω:πω∈An∆A}={ω:ω∈A0n∆A}
Combining (b) and (c) gives
(d) P(An∆A) =P(A0n∆A)
It is easy to see that
|P(B)−P(C)| ≤ |P(B∆C|
so (d) implies P(An), P(A0n)→P(A). Now A−C⊂(A−B)∪(B−C) and with a similar inequality forC−A implies A∆C ⊂(A∆B)∪(B∆C). The last inequality, (d), and (a) imply
4.1. STOPPING TIMES 155 The last result implies
0≤P(An)−P(An∩A0n)
≤P(An∪A0n)−P(An∩A0n) =P(An∆A0n)→0 soP(An∩A0n)→P(A).ButAn andA0n are independent, so
P(An∩A0n) =P(An)P(A0n)→P(A)2 This showsP(A) =P(A)2, and proves Theorem 4.1.1.
A typical application of Theorem 4.1.1 is
Theorem 4.1.2. For a random walk on R, there are only four possibilities, one of which has probability one.
(i) Sn= 0 for alln.
(ii) Sn → ∞.
(iii) Sn→ −∞.
(iv)−∞= lim infSn<lim supSn =∞.
Proof. Theorem 4.1.1 implies lim supSn is a constantc∈[−∞,∞]. LetSn0 =Sn+1−
X1. Since Sn0 has the same distribution as Sn, it follows that c = c−X1. If c is finite, subtractingc from both sides we concludeX1≡0 and (i) occurs. Turning the last statement around, we see that ifX16≡0 thenc=−∞or∞. The same analysis applies to the liminf. Discarding the impossible combination lim supSn =−∞and lim infSn= +∞, we have proved the result.
Exercise 4.1.1. Symmetric random walk. Let X1, X2, . . .∈ R be i.i.d. with a distribution that is symmetric about 0 and nondegenerate (i.e., P(Xi = 0) < 1). Show that we are in case (iv) of Theorem 4.1.2.
Exercise 4.1.2. LetX1, X2, . . .be i.i.d. withEXi= 0 andEXi2=σ2∈(0,∞). Use the central limit theorem to conclude that we are in case (iv) of Theorem 4.1.2. Later in Exercise 4.1.11 you will show thatEXi= 0 andP(Xi= 0)<1 is sufficient.
The special case in whichP(Xi= 1) =P(Xi=−1) = 1/2 is calledsimple random
walk. Since a simple random walk cannot skip over any integers, it follows from either exercise above that with probability one it visits every integer infinitely many times.
LetFn =σ(X1, . . . , Xn) = the information known at time n. A random variable