2.4 Ejecución de la evaluación ambiental
2.4.2 Problemática Ambiental existente
We have shown in the previous section that under regularity conditions of Theorems
10,11, and12, the AirMCMC algorithm produces a chain with various convergence properties. However, without any additional assumptions the chain might fail to converge in distribution, as we demonstrate in Example1 below. On the other, we show in Theorem13 that imposing an additional diminishing adaptation condition
(C1), guarantees ergodicity (i.e., convergence in distribution) of the AirMCMC algorithm. We argue that this is a minor condition that either holds in practice or can be easily enforced. In Theorem 14 we introduce an AirMCMC Algorithm
18, where the sequence of increasing lags {nk} is randomised, which ensures the
diminishing adaptation condition.
Recall that the diminishing adaptation condition(C1)is a restriction on the adaptation size of the algorithm:
sup
x∈X
kPγn(x,·)−Pγn+1(x,·)kT V
P
−→0 as n→ ∞. (3.15)
The following theorem demonstrates that the regularity conditions of the pre- vious section together with the diminishing adaptation condition imply convergence in distribution of the AirMCMC algorithms.
Theorem 13. Suppose that the diminishing adaptation condition (3.15) and As- sumption 1 hold. Let also one of the following conditions hold
(a) Assumption 2 and for the corresponding drift function V, sup
x∈C
V(x)<∞ and
EV(X0)<∞;
(b) Assumption 4 and for the corresponding collection of drift functionsVγ,
sup
x∈C
Vγ(x)<∞ andEVγ(X0)<∞ for all γ ∈Γ;
(c) Assumption 3 and for the corresponding drift function V, EV(X0) <∞ and
every level setCd:={x|V(x)≤d} is a uniform small set, i.e., satisfies (3.8).
Then any AirMCMC Algorithm 4 (or, in case (b), any modified AirMCMC Algorithm 17, where the set B in the algorithm settings is such that for the corre- sponding drift functionsVγ, sup
x∈B
Vγ(x)<∞,γ ∈Γ) produces an ergodic chain{Xn}, i.e.,
kL(Xn)−πkT V →0,
where L(Xn) is the distribution law of Xn.
As argued inRoberts and Rosenthal[2007], the diminishing adaptation con- dition is not an issue in practice. The condition holds for many typical adaptive MCMC algorithms (e.g., as for the standard Adaptive Metropolis or Adaptive Gibbs Samplers). For the adaptive algorithms where the condition does not hold (e.g., as for KAMH Sejdinovic et al. [2014]) or it is hard to verify the condition, we could, nevertheless, easily modify the algorithms in order to enforce (3.15). For example, at the adaptation timesNi, we could flip a coin with success probabilitypi to decide whether to adapt the Markov kernel. If limpi
i→∞
= 0, then (3.15) holds. Notice that the sequencepi can decay arbitrarily slowly.
Alternatively, for the AirMCMC algorithms, we could allow the sequence of increasing lags{nk}to be random. More precisely, let sequence{n?k}be determinis-
tic that satisfies (3.14) for someβ >0. We could consider an AirMCMC Algorithm
4, where in Step 2 we set nk = n?k+ Uniform[0,bkκc] for some κ ∈ (0, β). Since,
{nk} satisfies (3.14), we could still prove the statements of Theorems10,11,12 for
this randomised version of the AirMCMC. Moreover, the resulting Algorithm 18
would be ergodic and satisfy the statements of Theorems 10, 11 or 12 under the corresponding regularity conditions. We summarise our observations in Theorem14
below.
Theorem 14. Consider settings of Theorem 10 (alternatively, of Theorem 11 or
Algorithm 18: Randomised AirMCMC Sampler
Set some initial values forX0 ∈ X;γ0 ∈Γ;γ :=γ0. Let{n?k}be an
increasing sequence of positive integers. Fix some δ∈(0,1). Set
k:= 1;n:= 0,n1:=n?1.
Beginning of the loop 1. Fori= 1, .., nk
1.1. sampleXn+i ∼Pγ(Xn+i−1,·);
1.2. given{X0, .., Xn+i, γ0, .., γn+i−1}updateγn+i according to some
adaptation rule.
2. Setn:=n+nk,k:=k+ 1, nk=n?k+ Uniform
h
0,bkcδi,γ :=γn.
Go toBeginning of the loop
Algorithm18(in case of the settings of Theorem11, we allow adaptations in Step2
to happen only if the chain hits the corresponding set B). Then the adaptive chain
{Xn} produced by the algorithm satisfies statements of Theorem 10 (alternatively,
of Theorem 11or 12, respectively).
Moreover, for any sequence of lags {n∗k}, the AirMCMC Algorithm 18 sat- isfies the diminishing adaptation condition (3.15). Under regularity conditions of Theorem 13 (in case of the settings (b) of the theorem, we allow adaptations to happen only if the chain hits the corresponding set B), the adaptive chain {Xn}
produced by the algorithm converges in distribution.
We conclude this section with a counterexample that demonstrates that an AirMCMC Algorithm 4 might fail to be ergodic (i.e., the corresponding adaptive chain does not converge in distribution) without the diminishing adaptation condi- tion.
Example 1. This example is a modified version of Example 4 of Roberts & Rosenthal Roberts and Rosenthal[2007]. Our goal is to construct an AirMCMC algorithm that satisfies conditions of Theorem10 but fails to be ergodic. Let X ={1,2,3,4}. For someε >0, define a target asπ({1}) :=ε,π({2}) :=ε3,π({3}) =π({4}) := 1−ε2−ε3. Forγ ∈Γ :={1,2}, letPγcorrespond to a Metropolis-Hastings kernel with proposals
Q1(x,·)∼Uniform{x−1, x+ 1}, Q2(x,·)∼Uniform{x−2, x−1, x+ 1, x+ 2}.
Qγ(Xn,·), with probability min
(
1,π(Yn+1)
π(Xn)
)
set Xn+1 := Yn+1, otherwise, reject
the proposal, i.e.,Xn+1 := Xn. If the proposal is outsideX, then we always reject
it. Consider the following adaptive Algorithm19.
Algorithm 19:AMCMC with the SLLN but failing convergence in dis- tribution
Start withX0 =X1=X2 = 1, γ0 =γ1 =γ2 = 1,k= 1.
Beginning of the loop 1. SampleX2k2+1∼Pγ
2k2(X2k2,·);
2. Ifγ2k2 = 1, then update as follows. IfX2k2+1 6=X2k2, i.e., the proposal is accepted,γ2(k+1)2 := 2. Otherwise, γ2(k+1)2 := 1;
Ifγ2k2 = 2, then setγ2(k+1)2 = 1 if X2k2+1= 1, otherwise, set γ2(k+1)2 = 2.
3. Forn∈N, 2k2+ 2≤n≤2(k+1)2 run a Markov chain with the kernel
Pγ
2(k+1)2; 4. k:=k+ 1.
Go toBeginning of the loop
Proposition 8. KernelsPγ,γ ∈Γ satisfy the simultaneous minorisation Assump-
tion 1 and the simultaneous drift Assumption 2. Therefore, by virtue of Theorem
10, the SLLN holds for Algorithm 19. However, the adaptive chain produced by the algorithm does not converge in distribution for sufficiently smallε >0 in the setting of the underlying target distribution.
Proof of Proposition8. Algorithm19is designed in such a way, that Steps1and
2 “drift” the adaptive {Xn} chain away from the correct stationary distribution. Since the chain approaches the stationary distribution arbitrarily closely after Step
3, we conclude that at times 2k2 + 2,
L X2k2+2|X0, γ0 −π T V > δ
for some δ > 0. Therefore, {Xn} does not converge in distribution. We provide a
detailed proof of the proposition in Appendix A.