1. Marco Teórico
1.4. La Responsabilidad Social Empresarial en el Ecuador
There is no denying the fact that the practical implementation of the HAT algorithm is significantly more expensive per iteration (ignoring inferential quality) over the PT algorithm.
This is due to the expense incurred evaluating the target distribution for the HAT targets. This involves the numerical calculation (and inversion of) of a hessian atevery evaluation of the target distribution. This is anO(d3) operation whether or not eigenvalues are calculated to assess whether the hessian is suggestive of a “proper” positive definite covariance matrix.
For comparison, the computational expense of evaluating the target in the five dimensional example in Section 4.4.1 is highlighted with a HAT target evaluation taking typically around 240 times longer than for the toy (very cheap) powered version. Much of the expense can be put down to the chosen tuning of the numerical “hessian” function in R, from the package Gilbertet al.[2006], which has been tuned for high accuracy over speed. The hessian calculation computes 25 entries for a 5×5 matrix and for each entry the accuracy tuning parameter was allowed 10 iterations
Figure 4.15: Three trace plots of the cold state chain targeting the distribution in equation (4.44) using the PT, HAT and Ideal algorithms respectively. However, unlike the versions of the runs in Figure 4.13, the temperature schedules were on a more ambitious, sub-optimal, spacing. All cold level swap move acceptance rates were approximately 0.16; even for the PT runs.
until convergence; immediately explaining the expense factor of approximately 250. A key question is how the performance scales with dimension. The cost of forming and inverting a hessian in d-dimensions is O(d3), which is undeniably expensive.
Particularly in the colder states, with significantly different covariance struc- tures between modes, performing location dependent moves would be essential to en- sure fast intra-modal mixing. Position dependent RWM moves, Livingstone [2015], which use the hessian at the current location to estimate the local covariance struc- ture would be one approach to ensure fast intra-modal mixing. It would make the iterative cost O(d3) for the standard PT algorithm. At each step of the HAT al- gorithm the hessian can be calculated and stored for use in a position dependent RWM framework making the within temperature moves more efficient at no signif-
icant extra cost.
4.5.1 Limiting Diffusion for the Ideal Algorithm
A key insight into the scalability of HAT comes from collaborative work in con- junction with this thesis by Professor Gareth Roberts (University of Warwick) and Professor Jeffrey Rosenthal (University of Toronto). This work is in the process of being written up for publication at the time of submission of this thesis.
The work analyses a simulated tempering algorithm targeting a bimodal Gaussian target in d-dimensions, with inverse temperature level targets suggested by the ideal algorithm of Section 4.2, i.e.
πβ(x)∝ 2 X k=1 wkφ µk,Σk β (x) (4.47)
where φ(µ,Σ)(.) is the density function of a d-dimensional Gaussian with mean µ and variance matrix Σ. The weights of the modes are even with w1 = w2 = 0.5,
µ1 = (−1, . . . ,−1),µ2 = (1, . . . ,1), Σ1 =Id and Σ2=σ2×Id.
Previous analysis in Roberts and Rosenthal [2014], focussed on the ST ap- proach with power-based tempered targets and made the unrealistic assumption that exact, immediate mixing was happening within each temperature level.
The new work analyses the performance of a simulated tempering algorithm where the hot states are given as in equation (4.47) following the idealised target concept of Section 4.2. It makes two, far more realistic assumptions for the mixing of the chain:
1. Immediate mixing within asingle mode. Hence, conditional on being in one of the mixture components, the Markov chain immediately mixes to invariance within that component.
2. Immediate hot state mixing between modes only at the hot state temperature level.
The first assumption is very realistic, while the second assumption is less so; in particular this will likely be violated for the HAT algorithm (see Section 4.6).
Further to this, the temperature spacings are geometric withO(d−1/2) spac- ings which are suitable and indeed optimal considering the associated optimal scal- ing results that will follow in Chapter 5. Additionally, the hottest temperature is assumed to be O(d−1) to induce stable probabilities of swapping between regions.
Thed+ 1 dimensional chain at timet is denoted as (βt, Xt) whereXt is the
location in the state space, X, and βt is the inverse temperature level. The aim
is to find a limiting diffusion for the signed “temperature” component of the chain defined as
Yt= sgn(Xt)
log (βt/βmin)
log(1/βmin)
∈[−1,1]
where sgn(Xt) is 1 if the chain is assigned to the mode centred on{1}d or -1 if the
chain is assigned to the mode centred on {−1}d and β
min is the minimum of the
inverse temperature levels (i.e. hottest state).
With suitable scaling of the process, and using the two assumptions above, it is concluded thatYtconverges to a limiting process characterised as a skew Brownian
motion. The scaling that is required to obtain this non-trivial limiting process gives insight into the convergence rate of this particular algorithm as dimensionality grows. It turns out that time must be “sped up” by a factor ofO dlog(d)2
to obtain a non- trivial limiting process. This suggests that the convergence time of the algorithm is polynomial in dimension.
This is a positive result for the HAT algorithm since assuming similar be- haviour, the addedO(d3) complexity that Hessian information requires suggests that HAT converges inO(d4log(d)2), which is still polynomial in dimension. Comparing this to the standard ST approach for this example, which is torpidly mixing and so convergence is decaying exponentially badly in dimension, see Woodard et al.
[2009b]. This result is therefore very positive and supportive of the HAT approach. Alas, there are still open issues with the HAT approach that will likely cause issues with the mixing at the hot temperature. Details are given in the following section.