• No se han encontrado resultados

3. CASO DE ESTUDIO: IQUIQUE

3.2 RESULTADOS DE CADA ESCENARIO

Atchad´e et al. [2011] and Roberts and Rosenthal [2014] give practical guidance to setup the temperature schedule in an optimal way. From the perspective of the ST/PT algorithm, both Atchad´eet al.[2011] and Roberts and Rosenthal [2014], seek a Markov chain that can mix in the inverse temperature component “optimally”.

Heuristically, this means that their approach seeks a setup that instigates a chain that can move from the hot state to the cold state (and vice versa) as quickly as possible. Such an approach doesn’t consider how well the within temperature chains are mixing or even worse whether the temperature marginal component is only being tuned to work in a subset of the modes. The latter issue is the focus of study in Chapters 4 and 5.

The performance of the full chain for an ST/PT approach has been studied in detail in Zheng [2003], Madras and Zheng [2003], Woodard et al. [2009b] and Woodard et al. [2009a]. Essentially, these studies partition the state space into regions (typically containing a mode) and study the resulting ST Markov chain by breaking its mixing efficiency into three intuitive core components:

1. The mixing of the chain between regions at the hot state; 2. The mixing of the chain within a region;

3. The mixing of the chain through the temperature levels via the swap move. Woodardet al.[2009a] and Woodardet al.[2009b] have the most informative results regarding the scalability of the ST and PT approaches; motivating the core strategies in this thesis. Section 1.2.2 gave the core definitions and motivation for analysing the spectral gap of a Markov chain and it is this quantity that is studied in detail in Woodardet al. [2009a] and Woodardet al. [2009b].

The spectral gap gives (a bound on) the rate of convergence of the Markov chain to invariance and so analysing its behaviour, as the dimensionality of the state space increases, indicates how robust the algorithm is to the curse of dimensionality. Inevitably, the rate of convergence will decrease as the dimensionality increases hence the spectral gap will decrease. Characterising this decrease is the focus of Woodard

et al.[2009a] and Woodard et al.[2009b].

Definition 1.6.1 (Rapid and Torpid Mixing). As in Woodard et al. [2009a] and Woodard et al.[2009b]:

• A Markov chain is said to be Rapidly Mixing if the spectral gap, defined in (1.7), decays at most polynomially quickly with respect to the state space dimensionality.

• A Markov chain is said to beTorpidly Mixing if the spectral gap, defined in (1.7), decays at least exponentially quickly with respect to the state space dimensionality.

Of the two types of mixing characterised in Definition 1.6.1, the preferential type of mixing is Rapid mixing which scales far less badly as the dimensionality grows with the dimensionality of the problem.

The result that is rather condemning for the scalability of the PT and ST algorithms is given in Woodardet al. [2009b][Corollary 3.2] and with full details to be found in their paper, it states that if the following three properties hold then the ensuing ST/PT algorithm will be Torpidly mixing. If there exists a region A and inverse temperature valuesβ∗ < β∗∗ such that:

1. The supremum of the conductance of A (a measure of the chain’s ability to escape the local region, A) over inverse temperatures above a threshold value, β∗, is exponentially decreasing with dimension. More formally, the conductance of a setA∈ B with respect to a target distribution measureµ is given by

R

AP(x, A

c)π(dx)

π(A)π(Ac)

where P is the transition kernel of the Markov chain and Ac denotes the compliment ofA.

2. The supremum of the persistence ofA(a measure of the decrease in probability weight ofAat a hot temperature relative to its weight at the cold target state) over inverse temperatures, in the range [β∗, β∗∗), is exponentially decreasing. 3. The supremum of the overlap (a measure of the weight indifference of A be- tween a pair of temperature levels) for all pairingsβ ∈[0, β∗) andβ0 ∈(β∗∗,1] is exponentially decreasing with dimension.

Woodardet al.[2009b] illustrates that the important canonical setting (that this thesis focuses on) of the Gaussian mixture target with non-identical covariance structures is Torpidly mixing. An essential failing in this case is the persistence property, this will be a key focus of the work in Chapter 4.

Additionally, Section 2 attempts to overcome some of the issues that are problematic for the overlap property restricting the ambition of the temperature spacings meaning a less dense schedule is needed in certain settings.

It is worth noting that, Woodardet al.[2009a], provides an interesting result that gives conditions guaranteeing Rapid mixing for the ST and PT approaches. The quantities bounding the spectral gap from below this time are similar to those sufficient for the Torpid mixing in Woodardet al. [2009b]. Details are in the paper but heuristically, conditions guaranteeing Rapid mixing are: the mixing quality in

each (unimodal) region; the mixing speed of the chain at the hottest levels between regions; a variation of the persistence property, described above, decaying only poly- nomially with dimension; and a variation of the overlap property, described above, decaying only polynomially with dimension.

For the canonical Gaussian mixture target setting, Woodard et al. [2009a] illustrate that Rapid mixing can be achieved for a symmetric mode setup. This is the form of the target distributions primarily used for the empirical examples in Chapter 2 and so it is worth noting that even though the PT algorithm used in those simulations is theoretically geometrically ergodic, in practice, the performance is poor for a finite run of the algorithm.

Documento similar