• No se han encontrado resultados

ALGUNOS ASPECTOS LIMÍTROFES AL DELITO DE OPINIÓN.

III. Los delitos contra el honor.

CHAPTER 1. INTRODUCTION AND BACKGROUND MATERIAL 13

Convergence: The chain is unlikely to have been initialised from its stationary distribution (since if this were straightforward there would be no need for MCMC) and so a certain number of iterations are required for elements of the chain to be samples from the target distributionπ(·).

Mixing: Once stationarity has been achieved the chain produces dependent iden- tically distributed samples from π(·). A certain number of iterations are required to explore the target well enough to produce Monte Carlo estimates of the desired accuracy. This number of iterations is in general more than would be necessary if the elements of the chain were independent.

Note: in practice most chains never achieve perfect stationarity. In this thesis a chain is referred to as having ’reached stationarity’ or ’converged’ when the distri- bution from which an element is sampled is as close to the stationary distribution as to make no practical difference to any Monte-Carlo estimates.

For an efficient algorithm both the number of iterations required for convergence and the number of iterations needed to expore the target should be relatively small. Figure 1.2 shows so called “traceplots” of the first 1000 iterations for each of three chains exploring a standard one-dimensional Gaussian target distribution π(x) = φ(x) and initialised at x = 20. The first of these converges slowly and then mixes poorly; the second converges quickly but mixes poorly and the third converges relatively quickly and mixes well.

In Chapter 2 of this thesis we will be concerned with practical determination of a point at which a chain has converged. The method we employ is simple heuristic examination of the trace plots for the different components of the chain. Note that

0 200 400 600 800 1000 0 5 10 15 20 0 200 400 600 800 1000 0 5 10 15 20 0 200 400 600 800 1000 0 5 10 15 20 (b) λ = 24 (c) λ= 2.4 (a)λ= 0.24 index in chain index in chain index in chain x x x

Figure 1.2: Trace plots for exploration of a standard Gaussian initialised from x = 20 and using the random walk Metropolis algorithm with Gaussian proposal. Proposal scale parameters for the three plots were respectively (a) 0.24, (b) 24, and (c) 2.4.

CHAPTER 1. INTRODUCTION AND BACKGROUND MATERIAL 15

since the state space is multi-dimensional it is not sufficient to simply examine a single component. Alternative techniques are discussed in Chapter 7 of Gilks et al. (1996).

The degree to which a chain has converged can be measured by the total variational distance. For two distributionsν1andν2on state spaceEwith sigma algebraσ(E),

this is defined as

||ν1 −ν2||:= 2 sup

A∈σ(E)|

ν1(A)−ν2(A)|

A measure of the degree of convergence of a chain initialised at x and run for n iterations is therefore||Pn(x,·)π(·)||, wherePi(x,·) is the distribution of a chain

afteri iterations from initial point at x.

Theoretical criteria for ensuring convergence of MCMC Markov chains are exam- ined in detail in Chapters 3 and 4 of Gilks et al. (1996) and references therein, and will not be discussed here. We do however wish to highlight the concept of geomet- ric ergodicity. A Markov chain is geometrically ergodic with stationary distribution π(·) if

||Pn(x,·)π(·)|| ≤M(x)rn (1.4)

for some positive r < 1 and M(·). Geometric convergence of the Gibbs sampler and of the RWM is discussed in Chapter 3 of Gilks et al. (1996). As well as relating to the speed of convergence, geometric ergodicity also guarantees a central limit theorem for Monte Carlo estimates such as (1.1) for functions f(·) such that

Z

for some smallǫ >0. In this case

N1/2fˆN −Eπ[f(X)]

⇒ N(0, σ2f) (1.6)

where denotes convergence in distribution and σf2 := Varπ[f(X)]<∞

The central limit theorem (1.6) guarantees not only convergence of the Monte Carlo estimate (1.1) but also supplies its standard error, which decreases asN−1/2.

Note: Total variation distance is a natural measure for defining convergence since (e.g. Meyn and Tweedie, 1993) geometric convergence as defined in (1.4) actually guarantees the given level of convergence forf(X) for all integrable functionsf(·). For more general functions, other distance measures may be used to define conver- gence, for example the f-norm and the V-norm, which is defined in terms of the f-norm (e.g. Meyn and Tweedie, 1993):

||ν1 −ν2||f := sup g:|g|≤f Z dν1(x) g(x)− Z dν2(x) g(x) |||P1(x,·)−P2(x,·)|||V := sup x ||P1(x,·)−P2(x,·)||V V(x)

for some f 0, some 1 V < , and a chain initialised at x. In this thesis, however our interest in the convergence of a chain is motivated by the desire for a central limit theorem such as (1.6); this theorem is used implicitly in Chapter 2. The likelihood of an MMPP with maximum and minimum Poisson intensitiesλmax

andλmin and with n events observed over a time window of lengthtobs, is bounded

above by λn

maxe−λmintobs. In Chapter 2 only parameters and their logarithms are

CHAPTER 1. INTRODUCTION AND BACKGROUND MATERIAL 17

We therefore make no further discussion of norms.

A more accurate estimate than (1.1) is likely to be obtained by discarding the portion of the chainX0, . . . ,Xm up until the point at which it was deemed to have

reached stationarity; iterations 1, . . . m are commonly termed “burn in”. Using only the remaining elementsXm+1, . . . ,Xm+n (withm+n =N) our Monte Carlo

estimator becomes ˆ fn:= 1 n m+n X m+1 f(Xi) (1.7)

Convergence and burn in are not discussed any further here and for the rest of this section the chain is assumed to have started at stationarity and continued for n further iterations. For a stationary chain, X0 is sampled from π(·), and so for all

k >0 and i0

Cov [f(Xk), f(Xk+i)] = Cov [f(X0), f(Xi)]

This is theautocorrelation at lagi. Therefore at stationarity σ2f = lim n→∞nVar h ˆ fn i = Var [f(X0)] + 2 ∞ X 1 Cov [f(X0), f(Xi)]

If elements of the stationary chain were independent then σ2

f would simply be

Var [f(X0)] and so a measure of the inefficiency of the Monte-Carlo estimate ˆfn

relative to the perfect i.i.d. sample is σ2 f Var [f(X0)] = 1 + 2 ∞ X 1 Corr [f(X0), f(Xi)] (1.8)

This is theintegrated autocorrelation time(ACT) and represents the effective num- ber of dependent samples that is equivalent to a single independent sample. Al- ternativelyn∗ =n/ACT may be regarded as the effective equivalent sample size if

the elements of the chain had been independent.

To estimate the ACT in practice one might examine the chain from the point at which it is deemed to have converged and estimate the lag-i autocorrelation Corr [f(X0), f(Xi)] by ˆ γi = 1 ni n−i X j=1 f(Xj)−fˆn f(Xj+i)−fˆn (1.9) Naively, substituting these into (1.8) gives an estimate of the ACT. But as noted for example in Geyer (1992) this estimate is not even consistent. For sensibly large nmost of the estimated terms (1.9) consist the mean of products of two effectively independent realisations of f(X)fˆn, and have finite varianceO(1/(n−i)). This

is evident in Figure 1.3c which shows the estimated autocorrelation function from the last 700 iterations of the simulated chain in Figure 1.2(c). The sum of these terms consists of random noise with variance at least O(1).

The simple solution employed in Chapter 2 is to visually inspect the estimated autocorrelations and then truncate the sum (1.8) at a lag l after which the auto- correlations appear to be mostly noise. This gives the estimator

ACTest := 1 + 2 l X i=1 ˆ γi (1.10)

Geyer (1992) gives references for regularity conditions under which this estimator is consistent. He also discusses extensions of this window estimator and compares these with alternatives.

CHAPTER 1. INTRODUCTION AND BACKGROUND MATERIAL 19 0 10 20 30 40 50 60 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 50 60 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 50 60 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 (b) λ= 24 (c) λ= 2.4 (a)λ = 0.24 lag lag lag A C F A C F A C F

Figure 1.3: Estimated autocorrelation functions up to lag-60 for iterations 301 to 1000 of the trace plots shown in Figure 1.2. Graphs correspond to proposal scale parameters of respectively (a) 0.24, (b) 24, and (c) 2.4.

Outline

Documento similar