• No se han encontrado resultados

Modelos No Lineales Krolzig

N/A
N/A
Protected

Academic year: 2018

Share "Modelos No Lineales Krolzig"

Copied!
50
0
0

Texto completo

(1)

Regime–Switching Models

HANS-MARTIN KROLZIG

Department of Economics and Nuffield College, University of Oxford.

[email protected] Hilary Term 2002

The course offers an introduction to regime-switching models, covering their theoretical prop-erties and the statistical tools for empirical research (including maximum likelihood estima-tion, model evaluaestima-tion, model selection and forecasting). With the Markov-switching vector autoregressive model, it presents a systematic and operational approach to the econometric modelling of time series subject to shifts in regime. The theory will be linked to empirical studies of the business cycle, using MSVAR for OX.

Course structure

(1) Introduction

(2) Types of regime-switching models (Assumptions, properties and estimation)

Structural change and switching regression models

Threshold models

Smooth transition autoregressive models Markov-switching vector autoregressions

(3) Assessing business cycles with regime-switching models (Markov-switching VECM of the UK labour market)

(2)

Basic literature

Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time series and the business cycle, Econometrica, 57, 357–384.

Hamilton, J.D. (1994). Time Series Analysis. Princeton: Princeton University Press. Chapter 22.

Hansen, B. (1999), Testing for Linearity, Journal of Economic Surveys, 13, 551–576. Krolzig, H.-M., Marcellino, M. and G. E. Mizon, A Markov–Switching Vector

Equilib-rium Correction Model of the UK Labour Market, Empirical Economics, forthcoming. Potter, S. (1999), Nonlinear time series modelling: An introduction, Journal of

Eco-nomic Surveys, 13, 505–528.

Ter¨asvirta, T. (1994). Specification, estimation, and evaluation of smooth transition autoregressive models, Journal of the American Statistical Association, 89, 208–218.

Monographies

Franses, H.P. and D. van Dijk (2000). Nonlinear Time Series Models in Empirical

Fin-ance, Cambridge: Cambridge University Press.

Granger, C.W.J. and T. Ter¨asvirta (1993). Modelling Nonlinear Economic

Relation-ships, Oxford, Oxford University Press.

Kim, C.J. and C.R. Nelson (1999). State-Space Models with Regime Switching, Cam-bridge, MA: MIT Press.

Krolzig, H.-M. (1997). ‘Markov-Switching Vector Autoregressions. Modelling, Statist-ical Inference and Application to Business Cycle Analysis’, Lecture Notes in Economics

(3)

1 Introduction

1.1 Linear time series models

Since Sims (1980) critique of traditional macroeconometric modeling, vector autoregressive (VAR) models are widely used in macroeconometrics. Their popularity is due to the flexib-ility of the VAR framework and the ease of producing macroeconomic models with useful descriptive characteristics, within statistical tests of economically meaningful hypothesis can be executed. Over the last two decades VARs have been applied to numerous macroeconomic data sets providing an adequate fit of the data and fruitful insight on the interrelations between economic data.

In the vector autoregressive model, theK-dimensional time series vectoryt= (y1t, . . . , yKt)0

is generated by a vector autoregressive process of orderp

yt=ν+A1yt1+· · ·Apytp+εt (1)

wheret= 1, . . . , T, theν is a vector of intercepts andAi are coefficient matrices. The error process εt = (ε1t, . . . , εKt)0 is an unobservable, usually Gaussian, zero-mean white noise process,

εtWN(0,Σ).

that is, E[εt] = 0, E[εtε0t] = Σ, and E[εtε0s] = 0for s 6= t, where the variance-covariance matrixΣis time-invariant, positive-definite and non-singular.

The errors are such that the innovations can be interpreted as the one-step prediction errors of the system

εt=ytE[yt|Yt1],

while the expectation ofytconditional on the information setYt1 = (yt1, yt2, . . . , y1p)

is given by the vector autoregression:

E[yt|Yt1] =ν+

p

X

j=1

Ajytj.

Although, in the past macroeconomic fluctuations and growth have been largely investigated using linear time series models, it is now increasingly recognized that the implications of the linear models

linearity (invariance of dynamic multipliers with regard to the history of the system, size and sign of the shocks)

time-invariance of parameters Gaussianity

(4)

1.2 Regime-switching models

While the importance of regime shifts seems to be generally accepted, there is no established theory suggesting a unique approach for specifying econometric models that embed changes in regime. Increasingly, regime shifts are not considered as singular deterministic events, but the unobservable regime is assumed to be governed by an exogenous stochastic process. Thus regime shifts of the past are expected to occur in the future in a similar fashion.

When a time series is subject to regime shifts, the parameters of the statistical model will be varying. The basic idea of regime-switching models is that the process is time-invariant conditional on a regime variablestindicating the regime prevailing at timet. Regime-switching models characterize a non-linear data generating process as piecewise linear by re-stricting the process to be linear in each regime, where the regime might be unobservable, and only a discrete number of regimes are feasible. The models within this class differ in their assumptions concerning the stochastic process generating the regime.

The primary objective of regime-switching models is to provide a systematic econometric ap-proach for the statistical analysis of multiple time series when the mechanism which generated the data is subject to regime shifts:

(i) extracting the information in the data about regime shifts in the past, (ii) estimating the parameters of the model consistently and efficiently, (iii) detecting recent regime shifts,

(iv) correcting the vector autoregressive model at times when the regime alters, (v.) incorporating the probability of future regime shifts into forecasts.

Regime-switching models studied represent a very general class which encompasses some alternative non-linear and time-varying models. In general, the model generate conditional heteroscedasticity and non-normality; prediction intervals are asymmetric and reflect the pre-vailing uncertainty about the regime.

(5)

1.2.1 Regime shifts

Characteristics

finite number — infinite number

deterministic — stochastic

single event — reoccurring within sample — reoccurring out of sample

observable — observable if DGP is known — unobservable even if DGP is known

(strongly) exogenous — endogenous

permanent — persistent — transitory

predictable — unpredictable

common — interrelated — independent

Granger causal — Granger noncausal

Implications

nonlinearity

time-varying parameters

(6)

1.2.2 The Conditional Process

The statistical model ofytdefined conditional upon the regimest∈ {1, . . . , M}. :

p(yt|Yt1, Xt, st) =     

f(yt|Yt1, Xt, θ1) ifst= 1

.. .

f(yt|Yt1, Xt, θM) ifst=M.

wherep(yt|Yt1, Xt, st)is the probability density function of the vector of endogenous vari-ables yt = (y1t, . . . , yKt)0 conditional upon the history of the process, Yt1 = {yti}∞i=1,

some (strongly) exogenous variablesXt = {xti}∞i=0 and the regime variablest.. θm is the parameter vector present in regimem.

It is usually assumed that the statistical model is linear in each regime, say st = m. In the following we focus on autoregressive processes

yt=νm+αm1yt1+. . .+αmpytp+εt, εtIID(0, σ2m),

and their multivariate generalization: the vector autoregressive (VAR) process

yt=νm+Am1yt1+. . .+Ampytp+εt, εtIID(0,Σm).

1.2.3 The Regime Generating Process

If the stochastic process of yt is defined conditionally upon the (unobservable) regime st, a complete description of the data generating mechanism requires the specification of the stochastic process which generates the regime:

Pr(st|Yt1, St1, Xt;ρ)

(7)

2 Types of regime-switching models

2.1 Structural change and switching regression models

2.1.1 Structural break models

Structural break at timet=τ :

yt= (

ν1+Ppi=1α1iyti+εt fort < τ

ν2+Ppi=1α2iyti+εt fort≥τ (2)

whereεtIID(0, σ2).By using the indicator functionI(t;τ) :

I(t;τ) = (

1 fort > τ 0 fort≤τ.

the DGP can be rewritten as

yt= ν1+

p

X

i=1

α1iyti !

(1I(t;τ)) + ν2+

p

X

i=1

α2iyti !

I(t;γ) +εt.

Two different assumptions regarding the information structure

τ is known: break is deterministic τ is unknown: break is stochastic

2.1.2 Switching regression model

Closely related to the structural change model is the switching regression model, where the regime shifts are driven by an observable regime variablest:

yt= ν1+

p

X

i=1

α1iyti !

(1I(st= 1)) + ν2+

p

X

i=1

α2iyti !

(8)

2.1.3 Maximum likelihood estimation under normality

Structural break at timet=τ :

yt= (

ν1+Ppi=1α1iyti+εt fort < τ ν2+Ppi=1α2iyti+εt fort≥τ

whereεtNID(0, σ2).

Two different assumptions regarding the information structure

τ is known: break is deterministic

– Estimation: Split sample andOLSfor each regime;

– Test ofβ1 =β2 has standard asymptotics; whereβm = (νm, α1, . . . , αp). – The same technique can be used for switching regression models.

τ is unknown: break is stochastic

– Grid search forτ [0.15,0.85]T :

τ∗ = arg min

τ RSS(τ)

= arg min

τ τσˆ

2

1(τ) + (1−τσ22(τ)

– Test ofβ1 =β2 has non-standard asymptotics asτ becomes nuisance variable. – See, inter alia, Andrews (1993), and Andrews and Ploberger (1994) and Banerjee,

(9)

2.2 Threshold models

2.2.1 The TAR model

In the threshold autoregressive model, the regime shifts are triggered by an observable, exo-genous transition variablextcrossing the thresholdc:

yt= ν1+

p

X

i=1

α1iyti !

(1I(xt;c)) + ν2+

p

X

i=1

α2iyti !

I(xt;c) +εt (4)

whereεtIID(0, σ2).The indicator functionI(xt;c)is of the type

I(x;c) =

(

1 ifg(xt)> c 0 ifg(xt)≤c.

Forxt=ta model with a structural break at timet=coccurs

2.2.2 The SETAR model

If the transition variable is a lagged endogenous variable ytd with delay d > 0, the self-exciting threshold autoregressive model results:

yt= ν1+

p

X

i=1

α1iyti !

(1I(ytd;c)) + ν2+

p

X

i=1

α2iyti !

I(ytd;c) +εt (5)

whereεtIID(0, σ2). . cis again the threshold. Note that the model can be written as:

yt=ν(st) +

p

X

i=1

αi(st)yti+εt

where for a given but unknown threshold c, the ‘probability’ of the unobservable regime, say

st= 2is given by

Pr (st= 1|St1, Yt1) =I(ytd;c) = (

1 if g(ytd)> c 0 if g(ytd)≤c.

Thus in the self-exciting threshold autoregressive (SETAR) model, the regime-generating pro-cess is not assumed to be exogenous but directly linked to the lagged endogenous variable

(10)

SETAR Models of US GNP of Tiao and Tsay (1994) and Potter (1993)

Quarterly growth rate of U.S. GNP,∆yt:

∆yt=µ(st) +X5

i=1

αi(st)∆yti+ut, utIID(0, σ2(st))

2-regime SETAR withd= 2.

Empirical models:

Thresholdr≈0:st= (

1 if∆yt2 > r 2 if∆yt2 ≤r

Moving swiftly out of recessions: α(2L) <<0

2.2.3 Maximum likelihood estimation under normality

(i) For given delayd,and thresholdc:

Sample split according toI(ytd;c). OLSregression for each regime separately:

ˆ

βm = (X0mXm)1X0mym

ˆem =

IXm(X0mXm)1X0m

ym

ˆ

σm2 = Tm1ˆe0mˆem

whereXmandymcollect the observations from regimem,i.e. those observations at timetwithst=m. Tmis the number of observations in regimem.

Alternative indicator functions can be used in a single regression, constraining the residual error variance to be constant across regimes (see, for example, Potter, 1993, p.113.).

(ii) Grid search overdand c: select the pair (c, d) that minimizes the overall residual sum of squares (RSS)

(c, d)= arg min

(c,d)RSS(c, d) = arg min(c,d) M

X

m=1

Tmσˆ2m

Usually the search overc(givend)is restricted such that

minTm 0.15T.

(iii) Whenpis unknown, fit is usually traded against parsimony. A search is made over all values ofp pmax, and the preferred order is often taken to be that which minimizes

AIC.

p∗ = arg min

p

(

AIC(p) = XM

m=1

Tmln ˆσm2 + 2 (p+ 1) )

.

(11)

1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 -2

0 2

4 Actual and fitted values from an AR(3), 1948:1 - 1990:4

actual fitted

1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 -2

0 2

4 Actual and fitted values from an AR(2), 1959:4 - 1996:2

actual fitted

Figure 1 Linear AR model of US GNP growth.

1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 -2

0 2

4 Actual and fitted values from a SETAR(2;2,2), 1947:4 - 1990:4

actual fitted

1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 -2

0 2

4 Actual and fitted values from SETAR(2;2,2), 1959:4 - 1996:2

actual fitted

(12)

2.3 Smooth transition autoregressive models

2.3.1 The STAR model

In the smooth transition autoregressive model popularized by Granger and Ter¨asvirta (1993), the weight attached to the regimes depends on the realization of exogenous or lagged endo-genous variableszt:

Pr(st= 2|St1, Yt1, Xt) =G(zt;γ, c),

where the transition functionG(zt;γ, c)is a continuous function determining the weight of regime 2, and usually bounded between 0 and 1.

The STAR model is closely associated with the work of Ter¨asvirta (1994), (1998)

yt= ν1+

p

X

i=1

α1iyti !

(1−G(zt;γ, c)) + ν2+

p

X

i=1

α2iyti !

G(zt;γ, c) +εt (6)

whereεtIID(0, σ2).

The transition variable zt can be a lagged endogenous variable (zt = ytd for d > 0),

an exogenous variable (zt = xt), or a function of some lagged endogenous and exogenous variables: zt = g(ytd, xt).Forzt = ta model with smoothly changing parameters results (see Lin and Ter¨asvirta, 1994). cis the threshold,γ is the smoothness parameter.

The STAR model (6) exhibits two regimes

associated with the extreme values of the transition function: G(zt;γ, c) = 1 and

G(zt;γ, c) = 0;

transition from one regime to the other is gradual;

the regime occurring at timetis observable (for givenzt;γ, c) and can be determined byG(zt;γ, c).

For multiple-regime STAR models: see Dijk (1999).

Choices for the transition functionG(zt;γ, c) :

logistic cumulative density function (LSTAR): different behavior for positive versus negative values ofztrelatively toc

G(zt;γ, c) = 1

1 + exp{−γ(zt−c)}.

Forγ → ∞:LSTARSETAR:

G(zt;γ, c) =I(zt> c);

Forγ 0 :LSTARlinear AR

(13)

exponential function (ESTAR): different behavior for small versus large deviations ofzt

from the thresholdc:

G(zt;γ, c) = 1−exp−γ(zt−c)2 .

Forγ → ∞andγ 0 :ESTARlinear AR:

G(zt;γ, c) = 0.

quadratic logistic function:

G(zt;γ, c) = 1

1 + exp{−γ(zt−c1)(zt−c2)}.

Forγ → ∞:quadratic LSTAR3-regime SETAR:

G(zt;γ, c) = 1−I(c1 < zt< c2);

Forγ 0 :quadratic LSTARlinear AR

G(zt;γ, c) = 0.5.

Properties of STAR models

Little is known about the conditions under which STAR models are stationary; Stationarity has to be evaluated by numerical procedures;

Even under stationarity: Rich variability of the implied dynamics – unique equilibrium

– multiple equilibria – limit cycles

– strange attractors (chaos)

STAR models of US Industrial Production

Ter¨asvirta and Anderson (1992): 2-regime LSTAR model of the annual growth rate of US Industrial Production (quarterly data from 1961-1986):

4yt= ν1+

9

X

i=1

α1i4yti !

(1−G(·)) + ν2+

9

X

i=1

α2i4yti !

G(·) +εt

with the transition function

G(yt3;γ, c) = 1

1 + exp{−45(∆yt30.0061)/σy}.

Properties of business cycle

expansion: ∆yt3 >0.61%

largest root ofα1(L): modulus= 0.76and period= 61quarters contraction:∆yt3 <0.61%

largest root ofα2(L): modulus= 1.1and period= 8.9quarters

(14)

Multivariate Smooth Transition Models

yt= ν1+

p

X

i=1

A1iyti !

(1−G(zt;γ, c)) + ν2+

p

X

i=1

A2iyti !

G(zt;γ, c) +εt

whereyt= (y1t,· · ·, yKt)0, εtIID(0,Σ), Amiis a(K×K)matrix,νmis(K×1). Tsay (1998) describes a specification procedure for multivariate threshold models.

Suppose now that yt isI(1),but a linear combination et = β0ytis stationary with mean µ.

Then a smooth transition equilibrium correction model is of interest:

Asymmetric VECMs

∆yt=α1(1−G(et1;γ, µ)) (et1−µ) +α2G(et1;γ, µ) (et1−µ) +εt.

LSTAR: positive versus negative deviations from equilibrium

G(et1;γ, µ) = 1

1 + exp{−γ(et1−µ)}.

SETAR results forγ → ∞

G(et1;γ, µ) =I(et1 > µ)

ESTAR: small versus large deviations from equilibrium

G(et1;γ, µ) = 1−exp−γ(et1−µ)2 .

Interesting case: random walk behavior in regime 1 (β1 = 0) and mean adjustment in regime 2 (β0α2 <0)

(15)

2.3.2 Maximum likelihood estimation

STAR model

yt = x0tβ1(1−G(zt;γ, c)) +x0tβ2G(zt;γ, c) +εt, εtIID(0, σ2)

Non-linear least squares (NLS) estimation ofθ= (β10, β20;γ, c)0 :

ˆ

θ= arg min

θ RSS= arg minθ T

X

t=1

ε2t(θ)

whereεt(θ) =yt[x0tβ1(1−G(zt;γ, c)) +x0tβ2G(zt;γ, c)].

Under the assumption of normality,εtNID(0, σ2) :NLS=ML.

Estimation via numerical optimization procedure (see e.g. Hendry, 1995, Appendix A5).

– local maxima! – convergence?

Starting values:

– Conditional uponγ andc:OLSestimation ofβ= (β100, β20)0

ˆ

β(γ, c) =

T

X

t=1

xt(γ, c)xt(γ, c)0−1xt(γ, c)yt

wherext(γ, c) = (x0t(1−G(zt;γ, c)), x0tG(zt;γ, c))0;

– Grid search overγandc: minRSS(γ, c).

Concentrating the likelihood (RSS) function:

– Conditional uponγ andc:OLSestimation ofβ= (β100, β20)0; – NLS ofγandc: minRSS(γ, c).

Problem: precise estimation ofγ

– reason: for large values ofγ,the shape of the logistic function changes only little – accurate estimate ofγrequires many observations in the immediate neighbourhood

of the thresholdc.

(16)

2.3.3 Model selection

An empirical specification procedure

Ter¨asvirta (1994) based on the Granger and Ter¨asvirta (1993) recommendation of a specific-to-general procedure for non-linear models.

(1) Specify appropriate linear AR(p) model;

(2) Test the null hypothesis of linearity against the STAR alternative; (3) If linearity is rejected, selectztand specifyG(zt;γ, c);

(4) Estimate the STAR model;

(5) Evaluate the STAR model using diagnostic tests; (6) If misspecification is detected, modify the model; (7) Use the model for descriptive or forecasting purposes.

Testing for STAR nonlinearity

Problem: Under the null of linearity, some ‘nuisance’ parameters are not identified

null hypothesis nuisance parameters

1, α11, . . . α1p) = (ν2, α21, . . . α2p) γ; c

γ = 0 (ν1, α11, . . . α1p)1, α11, . . . α1p); c

conventional statistical theory can not be applied (see Davies, 1977, Davies, 1987 and Hansen, 1996b)

non-standard distributions

critical values have to be determined by means of simulation methods.

Solution proposed by Luukkonen, Saikkonen and Ter¨asvirta (1988):

Replace the transition functionG(zt;γ, c)by a suitable Taylor approximation. In the reparametrized model, the identification problem is no longer present. Linearity can be tested by means of a Lagrange multiplier (LM) statistic, which has a standard asymptoticχ2distribution under the null.

→Test against LSTAR: Luukkonen et al. (1988). Test against LSTAR: Granger and Ter¨asvirta (1993).

(17)

Diagnostic checking in STAR models

Eitrheim and Ter¨asvirta (1996) discuss formal diagnostic tests for STAR models

Jarque-Bera test for normality of the residuals LM type test for serial autocorrelation

LM test for remaining nonlinearity (two-regime STAR against the alternative of an ad-ditive STAR model)

(18)

Hans–Martin Krolzig Hilary Term 2002

Regime–Switching Models

2.4 Markov-switching vector autoregressions

2.4.1 The MS-VAR model

In Markov-switching vector autoregressive (MS-VAR) models it is assumed that the regimest

is generated by a hidden discrete-state homogeneous and ergodic Markov chain:

Pr(st|St1, Yt1, Xt) = Pr(st|st1;ρ)

defined by the transition probabilities

pij = Pr(st+1=j|st=i).

The conditional process is a VAR(p) with

shift in the mean (MSM-VAR): once-and-for-all jump in the time series

yt−µ(st) =A1(st) (yt1−µ(st1)) +. . .+Ap(st) (ytp−µ(stp)) +ut,

shift in the intercept (MSI-VAR): smooth adjustment of the time series

yt=ν(st) +A1(st)yt1+. . .+Ap(st)ytp+ut,

A major advantage of the MS-VAR is its flexibility, see Krolzig (1997).

Special MS-VAR Models

MSM MSI Specification

µvarying µinvariant νvarying νinvariant

Aj Σinvariant MSM–VAR linear MVAR MSI–VAR linear VAR

invariantΣvarying MSMH–VAR MSH–MVAR MSIH–VAR MSH–VAR

(19)

1955 1960 1965 1970 1975 1980 1985 0

2.5

MSM(2)-AR(4), 1952 (2) - 1984 (4)

1955 1960 1965 1970 1975 1980 1985 .5

1 Probabilities of Regime 1

1955 1960 1965 1970 1975 1980 1985 .5

1 Probabilities of Regime 2

Figure 3 Hamilton’s MSM(2)-AR(4) model.

Markov-switching autoregressive models of US GNP

Hamilton (1989): 2-regime MS-AR model for the quarterly growth rate of U.S. GNP:

∆yt−µ(st) =X4

k=1

αk(∆ytk−µ(stk)) +ut, ut|stNID(0, σ2)

Two regimes “state of the business cycle”

µ(st) = (

µ1 >0 ifst= 1(‘expansion’)

µ2 <0 ifst= 2(‘contraction’)

generated by an ergodic Markov chain

(20)

50 60 70 80 90 .5

1 MSM(2)-AR(4) Model, 1959:2 - 1996:2 50 60 70 80 90 .5

1 MSM(2)-AR(4) Model, 1959:2 - 1996:2 50 60 70 80 90 .5

1 MSM(2)-AR(4) Model, 1947:2 - 1990:4

50 60 70 80 90 .5

1 MSM(2)-AR(4) Model, 1947:2 - 1984:4

1950 1960 1970 1980 1990 .5

1 MSM(2)-AR(2) Model, 1959:2 - 1996:2 1950 1960 1970 1980 1990 .5

1 MSM(2)-AR(2) Model, 1947:1 - 1990:4

1950 1960 1970 1980 1990 .5

1 MSM(2)-AR(2) Model, 1959:2 - 1990:4 1950 1960 1970 1980 1990 .5

1 MSM(2)-AR(2) Model, 1947:2 - 1984:4

Figure 4 MSM(2)-AR models of US GNP growth.

50 60 70 80 90 .5

1‘High’ Growth Regime, H

1948:2-1990:4

50 60 70 80 90 .5

1

1948:2-1984:4

50 60 70 80 90 .5

1‘Recession’ Regime, L

50 60 70 80 90 .5

1

50 60 70 80 90 .5

1

1960:2-1990:4

50 60 70 80 90 .5

1

50 60 70 80 90 .5

1

1960:2-1996:2

50 60 70 80 90 .5

1

(21)

2.4.2 State-Space Representation

The framework for the statistical analysis of MS-VAR models is the state-space form. The advantage of viewing MS-VAR models in this way is that general concepts as the likelihood principle and a recursive filter algorithm can be introduced. The state-space model consists of the set of measurement and transition equations.

Measurement or observation equation (conditional process): The measurement equation describes the relation between the unobserved state vector ξt and the observed time series vectoryt. Here, the predetermined variablesYt1and the vector of Gaussian disturbances ut

enter the model.

Example: MSI(M)-VAR(1) model

yt = t+A1yt1+ut

whereM=

h

ν1 · · · νM i

andξt=   

I(st= 1)

.. .

I(st=M)

 

 withI(st=m) =

(

1 ifst=m

0 otherwise

State or transition equation (regime generating process): The state vector ξt follows a Markov chain subject to a discrete adding-up restriction. The Markov chain governing the state vectorξtcan be represented as a first-order vector autoregression (cf. Hamilton, 1994b):

ξt+1 =Fξt+vt+1, vt+1 ≡ξt+1E[ξt+1|{ξtj}∞j=0]

(22)

MSM-VAR processes as linearly transformed VAR processes

MSM(M)–VAR(p) Process,p≥0

A(L) (yt−µ(st)) =ut ⇐⇒     

yt = µ(st) +zt µt = Mξt,

A(L)zt = ut, uti.i.d. WN(0,Σu).

State Space Representation

yt−µy = t+Jzt ζt = t1+vt

zt = Azt−1+ut

⇐⇒           

yt−µy = h

M J

i " ζ

t zt # " ζt zt # = " F 0 0 A # " ζt1

zt−1

# + " vt ut #

ζt =   

ξ1,t

.. .

ξM1,t       ¯ ξ1 .. . ¯ ξM1

   F =   

p1,1−pM,1 . . . pM1,1−pM,1

..

. ...

p1,M1−pM,M1 . . . pM1,M1−pM,M1   ,

M = h µ1−µM . . . µM1−µM i

,

zt =

      zt zt1

.. .

ztp+1     

, A=      

A1 . . . Ap1 Ap

IK 0 0

. .. ... ...

0 . . . IK 0     

, ut=       ut 0 .. . 0      ,

J = e01IK.

A VARMA-Representation Theorem

MSM(M)VAR(p)

yt = µy+t+zt

zt = A(L)1ut, A(L) =IK−A1L−. . .−ApLp ζt = F(L)1vt, F(L) =IM1− FL

Moving-average representation:

yt=µy+MF(L)1vt+A(L)1ut

Final-equations-form VARMA(M+Kp−1,M+Kp−2):

(23)

2.4.3 Related models

Mixture of normals

The mixture of normals model is characterized by serially independently distributed regimes:

Pr(st|St1, Yt1) = Pr(st;ρ).

This is a special case of the MS-AR model, which results when the transition probabilities are independent of the history of the regime.

The conditional probability distribution ofytis independent ofSt1,

p(yt|Yt1, St1) =p(yt|Yt1),

and the regimes are Granger non-causal foryt. Even so, this model can be considered as a restricted MS-VAR model where the transition matrix has rank one. Moreover, if only level of the time series is regime-dependent, the model is observationally equivalent to time-invariant linear processes with non-normal errors.

Time-varying transition probabilities (endogenous switching)

All the previously mentioned models are special cases of an endogenous selection model: The transition probabilities pij are not time-invariant parameters, but functions of the observed time series vectorytdor some exogenous variablesxt:

Pr(st= 1|St1, Yt1, Xt) =F(zt, st1;γ, c) = (

1−F12(zt;γ, c) if st1 = 1 F21(zt;γ, c) if st1 = 2.

For example, in the case of an exponential function the time-varying transition probabilities are given by:

pijt=Fij(zt;γ, c) = 1−exp−γij(zt−cij)2 fori6=j

andpiit = 1PMj=1pijt.

In contrast to an MS-AR model, the regime switching rule also depends on the history of the observed variables. Since the observed variables contain additional information on the conditional probability distribution of the states, the regime generating process is no longer Markovian:

Pr(st|St1, Yt1)a.e.6= Pr(st|st1).

(24)

−5 −4 −3 −2 −1 0 1 2 3 4 5 0.2

0.4

Regime−dependent densities

p(y

t|st=1,Yt−1)

p(y

t|st=2,Yt−1)

−5 −4 −3 −2 −1 0 1 2 3 4 5

0.1 0.2

0.3 Density of yt given Yt−1

p(yt|Yt1) for Pr(st=1|Yt1)=.3 p(yt|Yt1) for Pr(st=1|Yt1)=.5

−5 −4 −3 −2 −1 0 1 2 3 4 5

0.0 0.5

1.0 Regime inference after observation of yt

Pr(st=1|Yt) for Pr(st=1|Yt1)=.3 Pr(st=1|Yt) for Pr(st=1|Yt1)=.5

Figure 6 Regime inference.

2.4.4 Regime inference

The discrete support of the state in the MS-AR model allows to derive the complete conditional distribution of the unobservable state variable

instead of deriving the first two moments, as in the Kalman filter (cf. Kalman, 1960, Kalman and Bucy, 1961, and Kalman, 1963) for Gaussian linear state-space models, the grid-approximation suggested by Kitagawa (1987) for non-linear, non-normal

state-space models.

Literature

The filtering and smoothing algorithms for time series models with Markov-switching regimes are closely related to Hamilton (1988, 1989, 1994a) building upon ideas of Cosslett and Lee (1985).

The basic filtering and smoothing recursions had been introduced by Baum, Petrie, Soules and Weiss (1970) for the reconstruction of hidden Markov chains.

Lindgren (1978) applied their algorithms to regression models with Markovian regime switches.

(25)

Filtering

The filter introduced by Hamilton (1989) can be described as an iterative algorithm for calcu-lating the optimal inference ofξt+1 on the basis of the information set intconsisting of the observed values ofyt, namelyYt= (yt0, yt01, . . . , y01p)0. It might also be viewed as a discrete version of the Kalman filter for the state-space model

yt = XtBξt+ut, ξt+1 = Fξt+vt+1.

For given parameters, the discrete-state algorithm under consideration summarizes the condi-tional probability distribution of the state vectorξtby

ˆ

ξt|t=E[ξt|Yt] =   

Pr(ξt=ι1|Yt)

.. .

Pr(ξt=ιN|Yt)   .

Since each component ofξˆt|tis a binary variable,ξˆt|tpossesses not only the interpretation as the conditional mean, which is the optimal inference of ξt givenYt, but it also presents the probability distribution ofξtconditional onYt.

The filtering algorithm computes ξˆt|t by deriving the joint probability density of ξt and yt

conditioned on observationsYt.

By invoking the law of Bayes, the posterior probabilitiesPr(ξt|yt, Yt1)are given by

Pr(ξt|Yt)Pr(ξt|yt, Yt1) = p(yt|ξt, Yt−1)Pr(ξt|Yt−1)

p(yt|Yt1) ,

with the prior probability

Pr(ξt|Yt1) =X

ξt−1

Pr(ξtt1)Pr(ξt1|Yt1)

and the density

p(yt|Yt1) =X

ξt

p(yt, ξt|Yt1) =X

ξt

Pr(ξt|Yt1)p(yt|ξt, Yt1).

Note that the summation involves all possible values ofξtandξt1. Letηtbe the vector of the densities ofytconditional onξtandYt1

ηt=   

p(yt1, Yt1)

.. . p(ytN, Yt1)

  =   

p(ytt=ι1, Yt1)

.. .

p(ytt=ιN, Yt1)   ,

(26)

Then, the contemporaneous inferenceξˆt|tis given in matrix notation by

ˆ

ξt|t = ηt ˆ ξt|t1

10

Nˆt|t−1)

, (7)

where denotes the element-wise matrix multiplication and 1N = (1, . . . ,1)0 is a vector consisting of ones. The filter weights for each regime the conditional density of the observation

yt, given the vectorθmof AR parameters of regimem, with the predicted probability of being in regime m at time t given the information set Yt1. Thus, the instruction (7) describes the filtered regime probabilities ξt|t as an update of the estimate ξt|t1 ofξt given the new informationyt.

The transition equation implies that the vectorξˆt+1|tof predicted probabilities is a linear func-tion of the filtered probabilitiesξˆt|t:

ˆ

ξt+1|t= Fξˆt|t. (8)

The sequenceˆt|t1}Tt=1can therefore be generated by iterating on (7) and (8), which can be summarized as:

ˆ

ξt+1|t = Ft ˆ ξt|t1)

10

ˆt|t−1)

. (9)

In the prevailing Bayesian context, ξˆt|t1 is the prior distribution ofξt. The posterior distri-butionξˆt|tis calculated by linking the new informationytwith the prior via Bayes’ law. The posterior distributionξˆt|tbecomes the prior distribution for the next stateξt+1and so on.

Smoothing

The filter recursions deliver estimates forξt, t = 1, . . . , T based on information up to time point t. This is a limited information technique, as we have observations up tot=T. In the following, full-sample information is used to make an inference about the unobserved regimes by incorporating the previously neglected sample informationYt+1.T = (y0t+1, . . . , y0T)0 into the inference aboutξt. Thus, the smoothing algorithm gives the best estimate of the unobserv-able state at any point within the sample.

The smoothing algorithm proposed by Kim (1994) may be interpreted as a backward filter that starts at the end pointt=T of the previously applied filter.

The full–sample smoothed inferences ξˆt|T can be found by iterating backward fromt=T 1,· · ·,1by starting from the last output of the filterξˆT|T and by using the identity

Pr(ξt|YT) = X

ξt+1

Pr(ξt, ξt+1|YT)

= X

ξt+1

(27)

For pure AR models with Markovian parameter shifts, the probability laws for yt and ξt+1

depend only on the current stateξtand not on the former history of states. Thus, we have

Pr(ξt|ξt+1, YT) Pr(ξt|ξt+1, Yt, Yt+1.T)

= p(Yt+1.T|ξt, ξt+1, Yt)Pr(ξt|ξt+1, Yt)

p(Yt+1.Tt+1, Yt) = Pr(ξtt+1, Yt).

It is therefore possible to calculate the smoothed probabilitiesξˆt|T by getting the last term from the previous iteration of the smoothing algorithm ξˆt+1|T, while it can be shown that the first term can be derived from the filtered probabilities ξˆt|t,

Pr(ξtt+1, Yt) = Pr(ξt+1|ξt, Yt)Pr(ξt|Yt) Pr(ξt+1|Yt)

= Pr(ξt+1|ξt)Pr(ξt|Yt)

Pr(ξt+1|Yt) . (11)

If there is no deviation between the full information estimate,ξˆt+1|T, and the inference based on the partial information, ξˆt+1|t, then there is no incentive to update ξˆt|T = ˆξt|t and the filtering solutionξˆt|tcannot be further improved.

In matrix notation, (10) and (11) can be condensed to

ˆ ξt|T =

F0ξ

t+1|T ξˆt+1|t)

ξˆt|t, (12)

(28)

2.4.5 Maximum Likelihood estimation

The Likelihood Function

In econometrics the so-called Markov model of switching regressions considered by Goldfeld and Quandt (1973)

yt=x0tβm+umt, umtNID(0, σ2m)form= 1,2

has been one of the first attempts to analyze regressions with Markovian regime shifts. Gold-feld and Quandt (1973) claimed to derive maximum likelihood estimates by maximizing their “likelihood” function, which would be in terms of our model

Q(θ, ρ, ξ0) =YT

t=1

ηt(θ)t|0(ρ, ξ0),

whereηtis again an(M×1)vector collecting the conditional densitiesp(yt|Yt1, θm), m= 1, . . . , M, andξt|0= F0are the unconditional regime probabilities.

Unfortunately, the functionQ(θ, ρ, ξ0)isnotthe likelihood function as pointed out by Cosslett and Lee (1985).

Derivation of the likelihood function as a by–product of the filter:

L(λ|Y) := p(YT|Y0;λ)

= YT

t=1

p(Yt|Yt1, λ)

= T Y t=1 X ξt

p(yt|ξt, Yt1, θ) Pr(ξt|Yt1, λ)

= YT

t=1

η0tξˆt|t1

= YT

t=1

η0tFξˆt1|t1.

The conditional densitiesp(ytt1 =ιi, Yt1)are mixtures of normals. Thus, the likelihood function is non-normal:

L(λ|Y) =

T Y t=1 N X i=1 N X j=1

pij Pr(ξt1 =ιi|Yt1, λ)p(ytt=ιj, Yt1, θ)

= YT

t=1 N X i=1 N X j=1

pij ξˆi.t1|t1

(2π)−K/2|Σ

j|−1/2exp

12u0jtΣj1ujt

,

where ujt = ytE[ytt = ιj, Yt1]and N = Mp+1 in MSM specifications orN = M

(29)

Normal Equations of theMLEstimator

The maximum likelihood (ML) estimates can be derived by maximization of likelihood func-tionL(λ|Y)subject to the adding-up restrictions:

P1M = 1

10Mξ0 = 1

and the non-negativity restrictions

ρ≥0, σ≥0, ξ00.

If the non-negativity can be ensured, theMLestimateλ˜ is given by the first-order conditions (FOCs) of the constrained log-likelihood function

lnL∗(λ) := lnL(λ|YT)−κ01(P1M 1M)−κ2(10Mξ01). (13)

Then the FOCs are given by the set of simultaneous equations

lnL(λ|Y)

∂θ0 = 0

lnL(λ|Y) ∂ρ0 −κ

0

1(10M IM) = 0

lnL(λ|Y) ∂ξ00 −κ21

0

M = 0,

where it is assumed that the interior solution of these conditions exits and is well-behaved, such that the non-negativity restrictions are not binding.

The derivation of the log-likelihood function concerning the parameter vector θleads to the score function

lnL(λ|Y)

∂θ0 =

1

L

Z

p(Y|ξ, θ)

∂θ0 Pr(ξ0, ρ)dξ = 1

L

Z

lnp(Y|ξ, θ)

∂θ0 p(Y|ξ, θ)Pr(ξ|ξ0, ρ)dξ =

Z

lnp(Y|ξ, λ)

∂θ0 Pr(ξ|Y, λ)dξ

= XT

t=1

X

ξt

lnp(ytt, Yt1, λ)

∂θ0 Pr(ξt|YT, λ)

Maximization of the constrained likelihood function with respect to the parameter vectorρof the hidden Markov chain leads to

lnL(λ|Y)

∂ρ0 =

1

L

Z

p(Y|ξ, θ)∂Pr(ξ|ξ0, ρ) ∂ρ0

= 1

L

Z

ln Pr(ξ0, ρ)

∂ρ0 p(Y|ξ, θ)Pr(ξ|ξ0, ρ)dξ =

Z

ln Pr(ξ0, ρ)

(30)

Thus, the MLestimator of the vector of transition probabilities ρ is equal to the transition probabilities in the sample calculated with the smoothed regime probabilities:

˜ pij =

PT

t=1PPr(st=j, st−1 =i|YT;λ) T

t=1Pr(st−1=i|YT;λ)

.

The EM Algorithm

As shown in Hamilton (1990), the Expectation-Maximization (EM) algorithm introduced by Dempster, Laird and Rubin (1977) can be used in conjunction with the filter to obtain the maximum likelihood estimates of the model’s parameters.

The EM algorithm is an iterative ML estimation technique designed for a general class of models where the observed time series depends on some unobservable stochastic variables. For the hidden Markov-chain model an early precursor to the EM algorithm was provided by Baum et al. (1970) building upon ideas in Baum and Eagon (1967). The consistency and asymptotic normality of the proposed MLestimator were studied in Baum and Petrie (1966) and Petrie (1969). Their work has been extended by Lindgren (1978) to the case of regression models with Markov-switching regimes.

Each iteration of the EM algorithm consists of two steps:

In the expectation step (E), the unobserved states ξt are estimated by their smoothed probabilitiesξˆt|T. The conditional probabilitiesPr(ξ|Y, λ(j−1))are calculated with the filter and smoother by using the estimated parameter vectorλ(j−1)of the last maximiz-ation step instead of the unknown true parameter vectorλ.

In the maximization step (M), an estimate ofλis derived as a solutionλ˜of the FOCs of

MLestimation, where the conditional regime probabilitiesPr(ξt|Y, λ) are replaced by the smoothed probabilitiesξˆt|T(λ(j−1))of the last expectation step. Thus, the dominant source of non-linearities in the FOCs is eliminated. If the score, i.e. the gradient of

lnL(λ|YT), would have been linear inξ, this procedure were equivalent to replacing the unobserved latent variablesξin the FOCs with their expectationξˆt|T.

Equipped with the new parameter vectorλthe filtered and smoothed probabilities are updated and so on. Thus, each EM iteration involves a pass through the filter and smoother, followed by an update of the first order conditions and the parameter estimates and is guaranteed to increase the value of the likelihood function.

(31)

Determination of the number of regimes in MS-VAR models

Testing for the number of regimes in an MS-VAR model is a difficult enterprise:

Conventional testing approaches are not applicable due to the presence of unidentified nuis-ance parameters under the null of linearity.

null hypothesis nuisance parameters

µ1=µ2 p12, p21 p12= 0(s0 = 1) µ2

The presence of the nuisance parameters gives the likelihood surface sufficient freedom so that one cannot reject the possibility that the apparently significant parameters could simply be due to sampling variation. The scores associated with parameters of interest under the alternative may be identically zero under the null.

Davies (1977, 1987) derived an upper bound for the significance level of the likelihood ratio test statistic under nuisance parameters.

Formal tests of the Markov-switching model against linear alternative employing standardized likelihood ratio test designed to deliver (asymptotically) valid inference have been proposed by Hansen (1992, 1996a), Garcia (1998), but they are computationally demanding.

The results of Ang and Bekaert (1998) indicate that critical values of theχ2(r+n)distribution can be used approximately whereris the number of restricted parameters andnis the number of nuisance parameters.

Alternatives

Information criteria:

AIC = 2 logL/T + 2n/T,

SC = 2 logL/T +nlog(T)/T,

HQ = 2 logL/T + 2nlog(log(T))/T,

whereLis the maximized likelihood,nis the number of parameters andT is the sample size: see Akaike (1985), Schwarz (1978), and Hannan and Quinn (1979).

(32)

Hans–Martin Krolzig Hilary Term 2002

Regime–Switching Models

3 Prediction and structural analysis with regime-switching models

Forecasting and structural analysis with regime-switching models is considerably more in-volved than with linear ones. Various techniques have been proposed to overcome these prob-lems (see, inter alia, Granger and Ter¨asvirta, 1993). Though the main probprob-lems are common to all non-linear models, we will focus on the MS-VAR approach in the following.

3.1 Predictions of linear and nonlinear stochastic processes

For the mean square prediction error (MSPE) criterion,

min

ˆ

y E

h

(yt+h−y)ˆ 2ti,

the optimal predictor ofyt+his given by the conditional expectation for the given information setΩt:

ˆ

yt+h|t=E[yt+h|t],

whereΩtis the available information set, i.e. the past of the stochastic process up to time t,

t=Yt. The prediction error associated with the optimal predictoryˆt+h|tis given by

ˆ

(33)

3.1.1 Linear AR(1) model

yt=αyt1+εt, εtIID(0, σ2).

One-step prediction

ˆ

yt+1|t=E[αyt+εt+1|t] =αyt.

Multi-step prediction

ˆ

yt+h|t=E[αyt+h1+εt+h|t] =αyˆt+h1|t=αhyt=Fh(yt, α).

3.1.2 Nonlinear AR(1) model

yt=F(yt1;θ) +εt, εtIID(0, σ2)

whereF(yt1;θ)is some nonlinear function. One-step prediction

ˆ

yt+1|t=E[F(yt;θ) +εt+1|t] =F(yt;θ).

Multi-step prediction, sayh= 2 :

ˆ

yt+2|t = E[F(yt+1;θ) +εt+2|t] = E[F(yt+1;θ)|t]

6

(34)

3.1.3 Methods of calculating multi-step forecasts in nonlinear models

(1) ‘Naive’ approach

ˆ

yt(+2n)|t=F yˆt+1|t;θ

biased.

(2) ‘Exact’ approach (closed form forecast)

ˆ

yt(+2e)|t =

Z +

−∞ F(F(yt;θ) +εt+1;θ) ft+1)dεt+1 =

Z +

−∞ F(yt+1;θ) g(yt+1|t)dyt+1 =

Z +

−∞ E[yt+2|yt+1]g(yt+1|t)dyt+1

whereft+1)is the pdf ofεt+1 andg(yt+1|t) =p(yt+1−F(yt;θ))is the pdf ofyt+1

conditional onΩt.

approximation by numerical integration; time-consuming forh >2

normal forecast error method: assumes normality ofg(yt+h1|t). (3) ‘Monte-Carlo’ method

ˆ

yt(+2mc|)t= 1 N

N

X

i=1

F(F(yt;θ) +εi;θ)

whereN is large andεiis drawn from the presumed distribution ofεt.

approximation ofg(yt+h1|t)by simulation (4) ‘Bootstrap’ method

ˆ

y(t+2bs)|t= 1 T

T

X

i=1

F(F(yt;θ) + ˆεi;θ)

where the residualsεˆifrom the estimated model are used. distribution-free

(5) ‘Direct’ approach (Multi-step estimation)

Figure

Figure 1 Linear AR model of US GNP growth.
Figure 3 Hamilton’s MSM(2)-AR(4) model.
Figure 4 MSM(2)-AR models of US GNP growth.
Figure 6 Regime inference.
+3

Referencias

Documento similar

of the magnetization is generally observed in all the NP chain structures (from a to i), mainly at frontiers. Other interesting features can be observed, as an onion state in the

“The Keynesian hypothesis that investment…can be treated as an independent variable…together with the assumption of full employment, also implies that the level of prices in

Solving generalized Lypunov differential operator differential equations without the exponential operator function We begin this section with an algebraic result that provides a

Taking into account that the relative dominance of regional shocks in local output is considered a key indicator to adopt a regional currency, we use a structural vector

As a consequence of this characterization it follows that, in a conformally stationary spacetime whose conformal Killing vector field is irrotational, the position vector

Theorem 8 Let (M, g) be a spacetime, which admits a causal conformal vector field such that the gradient of the conformal factor is a causal past-directed vector field, then

The main goal of this work is to extend the Hamilton-Jacobi theory to different geometric frameworks (reduction, Poisson, almost-Poisson, presymplectic...) and obtain new ways,

This very different electronic structure is related to the fact that the lowest 4f 13 5d(t 2g ) 1 1 3 T 1u state is isolated from the rest of close-lying terms by an energy gap,