d e s eGunda p ublicación DIARIO
IV.- AUTORIZACION A LOS ADMINISTRADORES DE CON FORMIDAD AL ARTICULO DOSCIENTOS SETENTA Y
where T denotes the sample size. In addition to the Normal distribution, there are many widely used distributions that belong to the exponential family, for example, the Poisson, Gamma, Beta, Dirichlet, binomial and Multinomial distributions (Gelman et al.,2014).
1.1.3 Markov Chain Monte Carlo method
The main aim of Bayesian inference is to approximate the posterior distribution, Pr(θ |y), in Equation (1.6), as a function of θ . However, in high–dimensional models, where θ is a multi–dimensional vector, we may often face the problem of obtaining the marginal posterior distribution for a single given parameter such as θi (where 1 < θi < k). In principle, the marginal posterior density of θiis the integral of the joint posterior density of all elements of θ except θi. In practice, evaluating such integrals is analytically difficult. It is possible to evaluate these integrals numerically using Markov Chain Monte Carlo (MCMC) methods, in which a Markov chain is used to sample from the posterior distribution. The main idea behind the MCMC method is that it provides an approximation to the posterior distribution by generating sequentially sampled values, where the posterior distribution depends on its previous sampled value for each unknown parameter (Gelman et al., 2014). The MCMC approach is based on two key aspects, namely, the Markov chain and Monte Carlo integration. So, to understand more about the MCMC methods, it is useful to have a look at these two concepts.
1.1.3.1 Markov chains
The MCMC method works by creating a Markov chain that represents the posterior distribution of interest. A Markov chain can be defined as a particular type of discrete time Markov process,X(t);t ≥ 0 , with state space S = s
j; j = 1, 2, ..., K , K 6 ∞. A sequence X0, X1, X2... of random variables is a Markov chain if the conditional distribution of Xt+1given X0, ..., Xt depends only on Xt (Geyer,2011). We can write this as
for all sj, si, sit−1, ... ∈ S and t = 0, 1, 2, .... In other words, the future and past states are independent, given the current state. This property is called the Markov property. The probability
pi j= Pr(Xt+1= sj|Xt= si); si, sj, ∈ S, (1.15)
is called the transition probability from state si to state sj. If there are K possible states, then
P = pi j; i, j = 1, 2, ..., K, (1.16)
represents the transition matrix of dimension K × K that provides the various probabilities of all possible moves among these states for every i, ∑Kj=1pi j= 1. Let πj(t) = Pr(Xt= sj) denote the probability that the chain is in state sj at time t, and π(t) = {π1(t), π2(t), ..., πK(t)} denote the K-length vector of these state probabilities at time t. The probability that the chain is in state sj at time (step) t + 1 is given by
πj(t + 1) = Pr(Xt+1= sj), =
∑
i Pr(Xt+1= sj|Xt = si)Pr(Xt = si), =∑
i pi jπi(t), (1.17)where Equation (1.17) describes the evolution of the chain using a number of successive iterations. Using matrix notation, Equation (1.17) can be written as
π(t+1)= π(t)P. (1.18)
It follows that
π(t)= π0P(t). (1.19)
The n-step transition probability, p(n)i j , is the probability that the chain is in the state sj given that n steps earlier it was in the state si, i.e.,
where p(n)i j is just the i, jth element of P(n). A Markov chain is said to be irreducible if there exists a positive integer ni j such that p
ni j
i j > 0, for all i, j = 1, 2, ..., K. That is, one can move from any state to any other state in S possible states in a finite number of steps.
A state in a Markov chain is classified as absorbing, transient, or recurrent to characterize how often the state is visited or the time between visits. Let fi j(n)denote the probability that the chain first visits the state sjat step n, when it started in the state siat step 0, i.e.,
fi j(n)= Pr(X16= sj, X26= sj, ..., Xn−16= sj, Xn6= sj|X0= si), (1.21)
with fii(0)= 1 and fi j(0)= 0 for j 6= i. Further, define the sum of probabilities of the first visiting times being n = 1, 2, ..., fi j= ∞
∑
n=1 fi j(n), (1.22)which is the probability that the chain visits state sj in finite time if it starts in state si. In particular, fiiis the probability of returning to the starting state siin finite time. A state sjis said to be: transient if fj j< 1, recurrent if fj j= 1, and absorbing if pj j= 1. If state sj is recurrent, then it is said to be positive recurrent if the mean time between revisits is finite, i.e.,
∞
∑
n=1
n f(n)j j < ∞. (1.23)
Otherwise, it is said to be null recurrent. If one state in an irreducible Markov chain is positive recurrent, then all the states are positive recurrent. The period of a state sjis defined as
dj= gcd n
n≥ 1|p(n)j j > 0 o
, (1.24)
where dj denotes the greatest common divisor (gcd) of all integers n ≥ 1. It can be shown that for an irreducible Markov chain, dj= d, ∀ j. If d > 1, the chain is said to be periodic with period d. If d = 1, then the chain is said to be aperiodic, which means that the chain is not forced into some cycle of fixed length between certain states. It can be seen that if P has no eigenvalues equal to 1 the chain is aperiodic. The limiting probability limn→∞p(n)j j may or may not converge. For a transient or null recurrent state sj, limn→∞p(n)j j = 0, i.e., the probability of the chain being in state sjeventually goes to zero. If state sjis positive recurrent and periodic, then limn→∞p(n)j j will not converge. If sj is positive recurrent and aperiodic, then limn→∞p(n)j j will converge to a steady state probability πj> 0. A positive recurrent and aperiodic Markov chain approaches a
stationary distribution π, where the vector of probabilities of being in any particular given state is independent of the initial condition π(0). The stationary distribution satisfies
π = πP. (1.25)
A sufficient condition for a unique stationary distribution is that the detailed balance or time reversibility condition, namely,
πipi j= πjpji; ∀i, j, (1.26)
is satisfied. Indeed, reversibility of a Markov chain for a desired distribution π implies that the Markov chain has π as its stationary distribution. Given a Markov chain X1, X2, ..., the transition probability distribution is said to be reversible regarding an initial distribution if the distribution of pairs (Xt, Xt+1) is exchangeable. Reversible Markov chain plays a main role in MCMC methods (Fan and Sisson, 2011). Given a reversible Markov chain Xm with the stationary distribution π, it follows that
1 M M
∑
m=1 h(Xm) −→ Z h(x)π(x)dx, as M −→ ∞, (1.27)which links Markov chains to Monte Carlo methods, where m = 1, 2, ..., M, is a desired period of iterations. Now, it is easy to calculate the posterior quantities using Equation (1.27) because the stationary distribution π is equal to the posterior density Pr(θ |y).
1.1.3.2 Monte Carlo integration
In practice, inference usually requires the integration of posterior distribution over the parameter space. However, for complicated models, such integration is difficult or impossible to be achieved analytically. For this reason, Monte Carlo integration is often utilized to approximate such integrals (Rizzo,2008).
For example, assume one is interested in computing the expectation of some function h of a random variable Y that has probability density function fY(y):
E [h(Y )] = Z
h(y)d fY= Z
Hence, an approximation to this integral can be computed by estimating the sample average of random variables y1, y2, ..., yT drawn from the distribution of Y as follows
d h(y) = 1 T T
∑
t=1 h(yt). (1.29)It follows that the estimate dh(y) is a strongly consistent estimate of E [h(Y )], such that dh(y) −→ E [h(Y )] as the sample size T −→ ∞. Consequently, it can be said that dh(y) converges to E [h(Y )] with probability 1 as T −→ ∞. Also, according to the Central Limit Theorem,
d
h(y) − E [h(Y )] σ /
√
T −→ φ (0, 1) as T −→ ∞, (1.30)
where σ /√T is the standard error of estimate dh(y) and σ2= Var(h(Y )) is variance of sample. The main idea of applying the Monte Carlo approach is obtaining the solution of integrals which involve the posterior distribution Pr(θ |y) mentioned earlier in Equation (1.6).