• No se han encontrado resultados

9 SECUENCIACION DE NUEVA GENERACION O DE ALTO RENDIMIENTO

The aim of the Metropolis-Hastings algorithm is to sample the parameter space of a function in such a way that the values sampled conform to some property. Consider a spaceB ⊂Rn, from which samples are taken, and a function termed

the transition kernelp:B×B(B)→[0,1]which is the probability that a point

x ∈ B is mapped to the set A ∈ B(Rn) where B(X) is the Borel field of the

field A (roughly equivalent to the power set).

A constraint is introduced by a distributionπ∗(.) :Rn →[0,1]with density

π(.), describing regions in which a value is most likely to be found. The

relationship between π and p can be expressed as the following

π∗(dy) =

Z

Rn

p(x, y)π(x)dx (6.4) This is equivalent to saying thatπ is the invariant distribution of p(., .).

6.3.1

A useful result

Tierney [110] proved the following: If p is reversible, that is π(x)p(x, y) = π(y)p(y, x) (6.5) then ∀A∈B(Rn) Z B p(x, A)π(x)dx= Z A π(y)dy (6.6)

which is equivalent to stating that π is the invariant distribution of p(., .).

This result is key to the Metropolis algorithm, as explained in section 6.4. The problem is now one of finding a reversible function.

6.3.2

The algorithm

[76, 44, 110, 15, 112] If a putative transition kernel q(x, y) is proposed it is

unlikely to be symmetric. Instead

A functionα(x, y) can be manufactured that will correct for the difference, so

that the function is reversible. The Metropolis-Hastings probability function is defined as PM H :=q(x, y)α(x, y) (6.8) for y6=x and α(x, x) = 1− Z A PM H(x, y)dy. (6.9)

Since, by definition of α the system is reversible

π(x)q(x, y)α(x, y) =π(y)q(y, x)α(y, x). (6.10) By equation (6.7), α(y, x) should be made as large as possible. Since this is a

probability means that this cannot exceed 1, so

π(x)q(x, y)α(x, y) =π(y)q(y, x). (6.11) By rearranging α(x, y) = π(y)q(y, x) π(x)q(x, y). (6.12) Hence we define α(x, y) = min π(y)q(y, x) π(x)q(x, y),1 (6.13) A Markov chain can be built using this function. Any initial value for xi is

proposed. For any point in chain, xi, the next valuexi+1 in the chain a value is proposed (at random) and accepted with a probabilityα. If it is not accepted xi+1 = xi. By the properties of α these values are being sampled from a

symmetric distribution, hence by Tierney’s result (equation (6.6)), from the invariant distribution of p(x, y) as required.

6.3.3

Irreducibility and aperiodicity

Two important properties for convergence of Monte Carlo methods are irre- ducibility and periodicity. Here they are defined and their importance out- lined. The conditions or irreducibility and periodicity are as follows. Consider

a measure χ and transition kernel P :E×B(E)→[0,1].

Define P to be χ-irreducible if: for each x ∈ E and A ∈ B(E) with the

property thatχ(A)>0, there existsn∈Nsuch thatPn(x, A)>0. This states

that, regardless of the starting position, iterations of the transition kernel will eventually reach each point of E.

P is defined to be periodic if there existsd∈N≥2 and a sequence(Ej)dj=0−1 of non-empty disjoint subsets of E such that for each i = 0, ..., d−1 and x∈E, P(x, Ej) = 1 where j = i+ 1 mod d. In other words, there is some sequence

of subsets of the space that the transition kernel will cycle round in sequence. P is aperiodic if it is not periodic.

Tierney 1994 [110] has shown that under the conditions of irreducibility and aperiodicity the following is true. For a transition kernelP with invariant distribution π∗ define P(n)(x, A) := Z Rn P(n−1)(x, dy)P(y, A), (6.14) then lim n→∞P n−1 (x, dy)∼π∗(dy). (6.15) This means that iteration of a correct transition kernel will tend towards the invariant distribution. This property underpins Markov chain Monte Carlo since it allows a Markov chain to be constructed from successive iterations of the transition kernel. Using the function α from the previous section a putative transition kernel can be gradually corrected so that eventually the iterations will sample from the invariant distribution. Note that there is no general principle to determine the rate of convergence.

6.3.4

Application of the algorithm

We have applied the Metropolis-Hastings algorithm to the maximum likelihood problem. The transition kernel (q(x, y)in the notation above) is the likelihood

functionL(θ) given in equation (6.3) and the prior distribution entered as the

distribution density π.

The algorithm can be summarised as follows:

1. For a parameter value xi, propose a new value y by sampling from a

2. Calculateα(x, y) using equation (6.13), so α(x, y) = min π(y)L(y) π(x)L(x),1 . (6.16)

In practice the logarithm of this function is often used.

3. Generate u by sampling from the uniform distribution on the interval

[0,1]

4. Ifu < α(x, y)acceptyasxi+1, otherwisexi+1 =xi for the next iteration

Typically this is done changing each value of the vector xin turn, in which case one iteration is complete when every value of xhas been altered. After a predetermined number of steps, the Markov chain (x1, x2, x3, ...)is returned.

6.3.5

Convergence

In many simple cases accompanying theorems guarantee convergence within a finite number of samples to a certain error [14] . However, in general this is not the case. Cowles and Carlin [19] note that neither is it possible to guarantee convergence by running diagnostics on the outcomes.

For the work presented here, note that the form of the likelihood function assumes a normal distribution of parameters. When the Markov chain ap- pears to the eye to be sampling from normal distributions (ascertained using histogram plots) it is assumed to have converged.