As with other Markov Switching models, we add the history of regimes to parameter set, and repeat the steps of sampling the parameters given the regimes, and then regimes given the parameters. In general, however, we also have the unobservable states from the state-space model, which need to be sampled as well. So, the general process is to (in some order):
1. Sample the regimes given the states, switching and other parameters.
2. Sample the switching parameters given the regimes (nothing else should matter).
3. Sample the states given the regimes, switching and other parameters.
4. Sample the other parameters given states and regimes.
Number 2 is what we’ve seen in each section on Markov Switching. Numbers 3 and 4 are typical of Gibbs sampling for state-space models. What’s new here is number 1. Whether the efficient FFBS algorithm can apply will depend a great deal on how the regimes and the states interact. We’ll need Single-Move Sampling (page 83) for the Lam model, while we can use FFBS for the time-varying regression.
12.3.1 Lam Model by MCMC
The state-space setup used for estimation with the Kim filter can’t be used in a standard way for MCMC. Ordinarily, the measurement equation in first differenced form:
∆yt = δ(St) + ∆xt
Markov Switching State-Space Models 155 would be rearranged to
∆yt− δ(St) = [1, −1]Xt
for the purposes of sampling the states (X) given the regimes and parameters (number 3 on the list). And, in fact, that can be done to generate a series of xt. The problem is that this equation is the only place where St appears, and it has no error term. Given the sampled state series, and the δ, there’s only one possible set of St—the ones just used to create the x. Because X and S are linked by an identity, we can’t sample the S treating the X as given.
For this example, we’ll go back to the original specification of the model with y as the observable rather than its difference. The differenced form loses the connection to the level of y so it can’t really give an accurate estimate of the trend series itself. With q as the number of lags in the AR, the size of the state vector is q + 1 with the extra one being the trend variable τ which will be in the last position. The setup for the fixed parts of the state matrices is:
compute ndlm=q+1
*
dec rect a(ndlm,ndlm)
ewise a(i,j)=%if(i==ndlm,(i==j),(i==j+1))
*
dec rect c(ndlm,1)
compute c=%unitv(q,1)˜˜1.0
*
dec rect f(ndlm,1)
compute f=%unitv(ndlm,1)
We need a prior for the transition matrix, which will again be Dirichlet weakly favoring staying in each state:
dec vect[vect] gprior(nstates)
dec vector tcounts(nstates) pdraw(nstates) compute gprior(1)=||8.0,2.0||
compute gprior(2)=||2.0,8.0||
The initial values for the lag coefficients and the variance will come from an
OLSregression of a preliminary estimate of the cycle on its lags.
linreg(define=areqn) x 1952:2 1984:4
# x{1 to q}
compute phi=%beta,sigsq=%seesq
The prior for the lag coefficients will be very loose, with a 0 mean and .5 stan-dard deviation (precision is 4.0) on each:
dec vect bprior(%nreg)
dec symm hprior(%nreg,%nreg) compute bprior=%zeros(%nreg,1)
compute hprior=%diag(%fill(%nreg,1,4.0))
Markov Switching State-Space Models 156 We’ll use an uninformative prior for the variance, since that will always be estimated using the full sample:
compute s2prior=1.0 compute nuprior=0.0
Finally, we’ll use an uninformative prior for the δ.
The regimes will be sampled using Single-Move Sampling, using the template from page 84. We use the DLM instruction to compute the log likelihood. We first have to paste the current values of φ into the A matrix. In this model form, the switching comes in the Z component of the model, where the final component is the value of DELTA for the regime—since this changes with time, we need to put it in as the formula delta(MSRegime(t))*%unitv(ndlm,ndlm)
compute %psubmat(a,1,1,tr(phi))
dlm(a=a,f=f,c=c,sw=sigsq,y=y,presample=ergodic,$
z=delta(MSRegime(t))*%unitv(ndlm,ndlm)) gstart gend compute logplast=%logl
compute pstar=%mcergodic(p) do time=xstart,gend
compute oldregime=MSRegime(time) do i=1,nstates
if oldregime==i
compute logptest=logplast else {
compute MSRegime(time)=i
dlm(a=a,f=f,c=c,sw=sigsq,y=y,presample=ergodic,$
z=delta(MSRegime(t))*%unitv(ndlm,ndlm)) gstart gend compute logptest=%logl
}
compute pleft =%if(time==xstart,pstar(i),p(i,MSRegime(time-1))) compute pright=%if(time==gend ,1.0,p(MSRegime(time+1),i)) compute fps(i)=pleft*pright*exp(logptest-logplast)
compute logp(i)=logptest end do i
compute MSRegime(time)=%ranbranch(fps) compute logplast=logp(MSRegime(time)) end do time
Given the regimes, the transition is drawn using the standard techniques.
As we described above, the cycle is almost completely determined once we know the regimes and the δ (there’s just a certain randomness coming from the pre-sample values). However, we’ll pre-sample as you would typically with a state-space model, using DLM with the option TYPE=CSIMULATE (conditional simula-tion). The estimated cycle is the first element of the state vector. Note that this also produces a simulated value for the trend series in the final component of the state vector. Note that this starts at XSTART, which is q elements before the official start of estimation. This allows generation of the pre-sample values.
Markov Switching State-Space Models 157
compute %psubmat(a,1,1,tr(phi))
dlm(a=a,f=f,c=c,sw=sigsq,y=y,presample=ergodic,type=csim,$
z=delta(MSRegime(t))*%unitv(ndlm,ndlm)) xstart gend xstates set x xstart gend = xstates(t)(1)
Given the generated x series, the φ (and variance) can be drawn using standard Bayesian procedures for a least squares regression. We’ll reject non-stationary estimates, doing a redraw if we have an unstable root.
cmom(equation=areqn) gstart gend :redraw
compute phi =%ranmvpostcmom(%cmom,1.0/sigsq,hprior,bprior) compute %eqnsetcoeffs(areqn,phi)
compute cxroots=%polycxroots(%eqnlagpoly(areqn,x)) if %cabs(cxroots(%rows(cxroots)))<=1.00 {
disp "PHI draw rejected"
goto redraw }
compute sumsqr=%rsscmom(%cmom,phi)
compute sigsq =(sumsqr+s2prior*nuprior)/%ranchisqr(%nobs+nuprior) All that remains is to draw the δ. There are two possible approaches to this.
First, we can unwind τt as in (12.5). Other than τ0 (for which we just produced a simulated value), this is a linear function of the δ. The equation
yt = τ0+ δ1c1t+ δ2c2t+ xt
is in the form of a linear regression with serially correlated errors with a known form for the covariance matrix—the φ and σ2 are assumed known. We can filter the data and sample as if it were a least squares regression. There is one potential problem with this for this particular model and data set: the second regime (low drift) is likely to be quite sparse so δ2won’t be very well determined and we might need a prior that is more informative than we would like in order to keep the sampler working properly.
Instead, we’re choosing to use (random walk) Metropolis within Gibbs. Our proposal density will be the current value plus a Normal increment. After a bit of experimenting, we came up with (independent) Normal increments with standard deviation .10:
compute fdelta=||.10,0.0|0.0,.10||
The Metropolis code is:
Markov Switching State-Space Models 158
compute %psubmat(a,1,1,tr(phi))
dlm(a=a,f=f,c=c,sw=sigsq,y=y,presample=ergodic,$
z=delta(MSRegime(t))*%unitv(ndlm,ndlm)) gstart gend compute logplast=%logl
*
compute [vector] deltatest=delta+%ranmvnormal(fdelta) dlm(a=a,f=f,c=c,sw=sigsq,y=y,presample=ergodic,$
z=deltatest(MSRegime(t))*%unitv(ndlm,ndlm)) gstart gend compute logptest=%logl
compute alpha=exp(logptest-logplast) if alpha>1.0.or.%uniform(0.0,1.0)<alpha
compute delta=deltatest,accept=accept+1
The results are surprisingly different from those from the Kim filter. The fil-tered estimates of the probabilities of the high-mean regime from the Kim filter are in Figure 12.1:
1952 1955 1958 1961 1964 1967 1970 1973 1976 1979 1982 0.0
0.2 0.4 0.6 0.8 1.0
Figure 12.1: Filtered Probabilities from Kim filter estimates The estimates from theMCMC procedure are in Figure 12.2:2
The Kim filter has picked up a mode which basically just identifies the outliers—quarters with sharply negative growth. MCMC comes up with a recession-expansion breakdown similar to what comes out of the Hamilton model. Because the data set is small enough, it’s possible to do exact maximum likelihood (rather than the Kim approximation) and that finds that the two modes have very similar likelihoods, barely favoring the recession-expansion mode. However, because the outlier mode is so narrow, it’s very hard to move to it using Gibbs sampling.
2Estimates of the regimes coming out ofMCMCare smoothed, since they’re produced using the full data set, but the smoothed estimates from the Kim filter are almost identical to the filtered ones.
Markov Switching State-Space Models 159
1951 1954 1957 1960 1963 1966 1969 1972 1975 1978 1981 1984 0.0
0.2 0.4 0.6 0.8 1.0
Figure 12.2: Smoothed Probabilities from MCMC Estimates
The Kim filter approximation also identifies the two modes, but much more clearly favors the outlier mode—since the data set is largely classified as one regime, the approximation will be more accurate than with the mode where there are many data points in each regime.
12.3.2 Time-varying parameters by MCMC
This is Example 12.4. The sampler for the regimes is simpler here than in the previous case. We can add the measurement errors and shocks to the regres-sion coefficients to the parameter set and simulate them using DLM (given the previous settings for the regimes). Taking the measurement errors as given, the regimes can be sampled using a simple FFBS algorithm exactly as in Ex-ample 8.3. While there is some correlation between the regimes and the mea-surement errors (almost no Gibbs sampler will avoid some correlation among blocks), this is nothing like the identity we had in the previous example, and isn’t as tight a relationship as we would have if the regime-switching controlled the mean (rather than variance) of a process.3
The main problems come from it being a time-varying parameters regression.
The difficulty coming out of the diffuse initial conditions won’t be the problem here because the PRESAMPLE=DIFFUSE option on DLM can apply since there’s only one branch that needs to be evaluated. The possibility of the variance in a drift being (optimally) zero remains. The variances will be drawn as inverse chi-squareds, which will never be true zero. However, if you use conditional simulation on a state-space model with a component variance being effectively zero, the element of the disturbances that that variance controls will be forced to be (near) zero as well, causing the next estimate of the variance to again be
3The modal value of a Normal is still zero whether the variance is high or low.
Markov Switching State-Space Models 160 near zero. Thus, the Gibbs sampler will have an absorbing state at zero for each of the variances of the coefficient drifts, unless we use a non-zero informative prior which ensures that if the variance goes to zero that it has a chance of being sampled non-zero in a future sweep.
As we did with the Kim filter estimates for this model, we’ll start with a linear regression to get initial values. We’ll start with the variances on the drifts as values too large to be reasonable.
compute sigmae(1)=.25*%seesq,sigmae(2)=2.5*%seesq compute sigmav=%stderrs.ˆ2
The switching variance for the equation will be handled as before with a hier-archical prior, using a non-informative prior for the common variance scale:
compute nucommon=0.0
compute s2common=.5*%seesq dec vect nuprior(nstates) dec vect s2prior(nstates)
compute nuprior=%fill(nstates,1,4.0) compute s2prior=%fill(nstates,1,1.0) compute scommon =%seesq
The prior on the coefficient drift variances is (very) weakly informative, cen-tered on a small multiple of the least squares variances.
dec vect nusw(ndlm) dec vect s2sw(ndlm)
*
ewise nusw(i)=1.0
ewise s2sw(i)=(.01*%stderrs(i))ˆ2
The Gibbs sampling loop starts by simulating the state-space model given the settings for the variances and the current values of the regimes. This will give us simulated coefficient drifts in WHAT, which is a SERIES of VECTORS, and equation errors in VHAT, similarly a SERIES of VECTORS (size one in this case, since there’s only one observable). This also includes simulated values for the state vector, which will here be the (time-varying) regression coefficients.
dlm(y=m1gr,c=%eqnxvector(mdeq,t),sv=sigmae(MSRegime(t)),$
sw=%diag(sigmav),presample=diffuse,type=csimulate,$
what=what,vhat=vhat) gstart gend xstates vstates
We then treat the WHAT and VHAT as given in drawing the variances. This does the coefficient drift variances:
do i=1,ndlm
sstats gstart+ncond gend what(t)(i)ˆ2>>sumsqr compute sigmav(i)=(sumsqr+nusw(i)*s2sw(i))/$
%ranchisqr(%nobs+nusw(i)) end do i
Markov Switching State-Space Models 161
and this does the (switching) equation variances using a hierarchical prior:
sstats gstart+ncond gend vhat(t)(1)ˆ2/sigmae(MSRegime(t))>>sumsqr compute scommon=(scommon*sumsqr+nucommon*s2common)/$
%ranchisqr(%nobs+nucommon) do k=1,nstates
sstats(smpl=MSRegime(t)==k) gstart+ncond gend vhat(t)(1)ˆ2>>sumsqr compute sigmae(k)=(sumsqr+nuprior(k)*scommon)/$
%ranchisqr(%nobs+nuprior(k)) end do k
The regimes are sampled usingFFBS: compute pstar=%mcergodic(p) do time=gstart,gend
compute pt_t1(time)=%mcstate(p,pstar)
compute pstar=%msupdate(RegimeF(time),pt_t1(time),fpt) compute pt_t(time)=pstar
end do time
@%mssample p pt_t pt_t1 MSRegime
which requires the function RegimeF which returns the vector of likelihoods given the simulated VHAT:
function RegimeF time type integer time type vector RegimeF local integer i
*
dim RegimeF(nstates)
ewise RegimeF(i)=exp(%logdensity(sigmae(i),vhat(time)(1))) end
Note that (at least with this data set), this requires a very large number of draws to get the numerical standard errors on the coefficient variances down to a reasonable level (compared to their means). The switching variance and the transition probabilities are quite a bit more stable.
Markov Switching State-Space Models 162