• No se han encontrado resultados

TOTAL= 7 TRABAJADORES

1.8. Construcción de la matriz FODA

7.4.1

The Theory of Convergence

The diagnostics of the convergence for the Bayesian iterative simulation methods are very different from the diagnostics for the EM algorithm. As discussed in Chapter 6, the EM algo- rithm converges when the parameter estimates no longer change across successive iterations. But, as discussed in this Chapter, the fundamental mechanism of Bayesian iterative simula- tion methods is to draw parameter estimates randomly from their posterior distribution. This means that draws from each iteration are most likely to be different from any other draws from other iterations. Hence, apparently, the Bayesian iterative simulation has a different kind of convergence to that of EM algorithm.

However, the Bayesian iterative simulation has to stop at some point of the iteration. Gen- erally speaking, the Bayesian iterative simulation converges when the distributions of the parameter estimates become stable and stationary. Why do we say a stable and stationary dis- tribution of the parameter estimates means convergence? This is because that once a draw of parameter estimate from the target distribution has been obtained in the process of the simu- lation, all the subsequent draws will be from that distribution. This means, for the subsequent draw θt and θt+k, where k is any integer, the joint distribution of the parameter estimates

all n and k, given t1, ...,tn. Hence, the posterior distribution becomes stable and we use this criterion as an indicator of the convergence of the Bayesian iterative simulation methods.

As an aside, the convergence diagnostics for the Bayesian iterative simulation methods which are applied to the imputation of missing data is no different from any other Bayesian iterative simulation methods which do not involve the missing data problem. Despite whether there is any missing data, the convergence diagnostics all focus on measuring the changes of parameter estimates of the posterior distribution. Theoretically, we can check the convergence of the distribution of individual missing data points themselves, because the Bayesian method treats the individual missing data and parameters as random quantities. However, in practice, a dataset normally has a large number of missing data. Hence, it will be very hard and complex to check the convergence for each missing data point.

7.4.2

Pre-convergence: the burn-in period

The “burn-in” refers to the part of a Bayesian iterative simulation chain where the current state of the chain is dependent on its starting point (Sahlin 2011). In other words, it refers to the part of the chain before its convergence. Researchers normally throw away the burn-in iterations. Then, after the burn-in period, they do their calculation based on the iterates from the convergence part.

7.4.3

Some of the popular Methods of Convergence Diagnostics

Time Series Plots: Intuitively, the simplest way to look for the convergence of the Bayesian iterative simulation is to plot all the simulated parameter estimates on a time series like scale. The vertical axis is the values of all the parameter estimates, and the horizontal axis is the iteration number which is similar to the time in a time series plot. Then, we just simply look at the time series plot to see whether and beyond which point the series starts becoming stationary.

A burning question is:“How do we know how many iterations we need in order to see the series turning stationary?”. The answer is “we do not know”. Hence, we can only increase the number of iterations until we find the series turning stable and stationary. According to (Enders 2010, p. 206), if the series becomes stable and stationary at iteration t, we normally double or triple the number t for an extra margin of safety.

Figure 7.4 displays two time series plots of simulated SURF’s Income means. If the series does not have any obvious trend, and stays stable and stationary after some iteration, we say there is a possible convergence. It shows that the distribution of the income means simulated by the MH algorithm (the top plot) does not converge even for 10000 iterations. On the other hand, the distribution of the income means generated by the Gibbs sampler (the bottom plot) converges very quickly.

0 2000 4000 6000 8000 10000

500

600

The MH algorithm − Income means

iteration

Mean

0 200 400 600 800 1000

400

550

The Gibbs sampler − Income means

iteration

Mean

Figure 7.4: Time series plots for the simulated SURF’s Income mean. The top plot shows the results of the MH algorithm; the bottom plot shows the results of the Gibbs sampler

Gelman and Rubin’s method: The Time Series Plots are simple and “easy to use” con- vergence diagnostic methods, but they are more or less subjective6, and more importantly, they are only suitable for a single sequence of the Bayesian iterative simulation in practice, because it would be very tedious to plot multiple sequences for each parameter for which we want to find its convergence. But, why do we want to run multiple sequences? This is because the converged value of a single sequence might correspond to a local maximum in- stead of a global maximum, if the posterior distribution is not unimodal7(Hoeschele 1989). Hence, The Time Series Plots and the Autocorrelation Function plots are only recommended for well-understood models and straightforward data sets Little & Rubin (2002, pg. 206).

For the not so well-understood models and complex data sets, or for all the known and unknown distributions in general, Gelman & Rubin (1992) propose a general approach to monitoring convergence of the Bayesian iterative simulation methods by simulating D > 1 sequences with starting values dispersed throughout the parameter space. This means that the starting values for parameter estimates are far away from the centre of their respective posterior distribution. Then, the convergence obtained, if variations between and within the D simulated sequences are roughly equal. Obviously, the convergence which is monitored this way has reduced risk of corresponding to a local max-mode. This is for two reasons.

6The decision of convergence depends on the shape of the plots.

7A unimodal probability distribution is a probability distribution which has a single mode. A mode is the

The first reason is that D dispersed starting points increase the chance of reaching different local max-modes, if the posterior distribution is not unimodal. The second reason is that the between sequences variation would be not equal to the within sequences variation, if each sequence only converged to its local maximum. In addition, a single sequence might have a starting value which is very close or far away from the centre of the posterior distribution by chance. This means that the convergence speed is either too fast or too slow. Hence, multiple sequences provide us with a more conservative guess of convergence speed than single sequence (Enders 2010, pg. 209).

The actual method is rather straightforward. Suppose we have D sequences, and each sequence has T iterations, where d = 1, ..., D, and t = 1, ..., T . Then, the between sequence variance is: B= T D− 1 D

d=1 ( ¯θ.d− ¯θ..)2, where ¯ θ.d = 1 T T

t=1 ˆ θt.d ¯ θ..= 1 D D

d=1 ¯ θd

and the within sequence variance is:

¯ V = 1 D D

d=1 s2d, where s2d= 1 T− 1 T

t=1 ( ˆθt,d− ¯θ.d)2.

Then, we estimate the overall variance ˆVtotal, by a weighted average of the within and between variances: ˆ Vtotal= T− 1 T V¯ + 1 TB (7.9)

If the chains do not converge, the first term on the right hand side of the equation underes- timates the variance, since the individual chains have not had time to range all over the sta- tionary distribution, and the second term overestimates the variance, since the starting points were chosen to be dispersed. As a result, the within variance ( ¯V) should be smaller than the between variance (B) (Gelman et al. 1995, pg. 332). However, as T → ∞ in Equation (7.9), we can see the first term T−1T V¯ → ¯V, and the second term T1B→ 0, then ˆVtotal ≈ ¯V, which means the expectation of within variance ( ¯V) approaches the total variance (Vtotal). Therefore,

Gelman & Rubin (1992) establish a single explicit monitoring statistic R, that compares ˆVtotal and ¯V: R= s ˆ Vtotal ¯ V

which declines to 1 as T → ∞. So, if R is close to 1, we have convergence. Otherwise, the simulation runs should be continued, or it suggests that the simulation algorithm is not efficient.

7.5

Applying Gibbs sampler to multiple regression with miss-

Documento similar