CAPÍTULO 3. RESULTADOS DEL SISTEMA
3.4 Conclusiones del capítulo
We restrict our attention to linear Multiregression Dynamic Models, where the error distributions are Gaussian and the column vectorFt(r)is a linear function ofxt(r)with dimensionpr×1. Under these assumptions, the MDM equations 1.4.3a,1.4.3b and 1.4.3c as outlined by Queen and Smith (1993) may be simplified so that we may consider each individual noder in terms of a univariate Dynamic Linear Model (DLM), as described by West and Harrison (1997). If each state matrix Gt(r) is a pr×pr identity matrix and the observation variance is assumed to be constant over time, the DLM equations are
Obs. equation Yt(r) =Ft(r)⊺θt(r) +vt(r) vt(r) ∼ N [0, φ(r)−1]
State equation θt(r) =θt−1(r) +wt(r) wt(r) ∼ N [0,Wt(r)]
Initial information θ0(r) ∣D0∼ N [m0(r),C0(r)].
At each time t, there is a pr×1 state vector θt(r). The pr×1 state error vector is denoted bywt(r)and follows a mean-zero multivariate normal distribution withpr×pr covariance matrix Wt(r). The observation variance is assumed to be normally- and independently-distributed with mean-zero and constant varianceφ(r)−1. At timet=0, any information known about the system may be represented in the initial information set D0. This may include, for notational convenience, the (known) values ofFt(r) for all t. Thepr×1 prior mean vectorm0(r) andpr×pr covariance matrixC0(r) must be
specifieda priori.
As the state varianceWt(r)is unknown, it is encoded through a scalar discount factor
δ(r) ∈ [0.5,1], such that
Wt(r) =
1−δ(r)
δ(r) Ct−1(r) (1.4.6)
where Ct−1(r) is the posterior variance of the state variableθt(r) at timet−1. From equation 1.4.6, it is straightforward to see that ifδ(r) =1,Wt(r) =0 for all time, and the corresponding model is static. Lower values ofδ(r)treat the state variance as some fraction of the posterior variance at the previous time point; while this fraction is fixed,
Ct−1(r) (and therefore Wt(r)) may vary over time.
The posterior variance then becomes the ‘prior’ variance Rt(r) at timet, that is,
Rt(r) =Ct−1(r) +Wt(r) =
Ct−1(r)
δ(r) .
The posterior variance Ct(r) is updated at each time point t using the most recent observation yt(r).
The variances in the DLM that we need to estimate, the prior variance Rt, the fore- cast variance Qt and the posterior variance Ct, may all be expressed as a product of
the observation variance (inverse precision) φ(r)−1 and a ‘starred scale-free’ variance parameter (West and Harrison, 1997, p.109), denoted by a ∗, i.e.
Rt(r) =φ(r)−1Rt∗(r) Qt(r) =φ(r)−1Q∗t(r) Ct(r) =φ(r)−1C∗t(r).
Defining ‘scale-free’ variances in this way allows for these variance expressions to be updated via the DLM updating equations without any knowledge of φ(r)−1.
Define Dt= {D0, y1(r), . . . , yt(r)}, this is the initial information and the set of obser- vations available up to and including time t. Denote the posterior mean for θt(r) at time t as mt(r), and the forecast mean at time t as ft(r). Then the system evolves according to
Posterior at time t−1 p[θt−1(r) ∣φ(r), Dt−1] ∼ N [mt−1(r), φ(r)−1C∗t−1(r)]
Prior at time t p[θt(r) ∣φ(r), Dt−1] ∼ N [mt−1(r), φ(r)−1R∗t(r)]
One-step forecast p[Yt(r) ∣φ(r), Dt−1] ∼ N [ft(r), φ(r)−1Q∗t(r)]
Posterior at time t p[θt(r) ∣φ(r), Dt] ∼ N [mt(r), φ(r)−1C∗t(r)]
with the parameters updated through
ft(r) =Ft(r)⊺mt−1(r) Q∗t(r) =Ft(r)⊺R∗t(r)Ft(r) +1 mt(r) =mt−1(r) + R∗t(r)Ft(r)[Yt(r) −ft(r)] Q∗t(r) C∗t(r) =R∗t(r) − R ∗ t(r)Ft(r)Ft(r)⊺R∗t(r) Q∗t(r) .
At t=t0, the prior on the precision is
p[φ(r) ∣D0] ∼ G (
n0(r)
2 ,
d0(r)
2 ) (1.4.8)
where G(
⋅
,⋅
) denotes the gamma distribution with shape and rate parameters. The prior hyperparameters n0(r) and d0(r) must be specified a priori. Specification ofthe hyperparameters will be discussed further in subsection 1.6.1. At any time t, the updated prior on the precision is
p[φ(r) ∣Dt] ∼ G (
nt(r) 2 ,
dt(r)
with the hyperparameters updated at each time point using
nt(r) =nt−1(r) +1
dt(r) =dt−1(r) + [
Yt(r) −ft(r)]2
Q∗t(r) .
At time t, the updated estimate for the observation variance is given by
St(r) =
1
E[φ(r) ∣Dt] =
dt(r)
nt(r)
Let T
⋅
(⋅
,⋅
) denote the t-distribution with degrees of freedom, and location and scale parameters. The final marginal distributions are thenPosterior at time t−1 p[θt−1(r) ∣Dt−1] ∼ Tnt−1(r)[mt−1(r),Ct−1(r)] (1.4.10a) Prior at time t p[θt(r) ∣Dt−1] ∼ Tnt−1(r)[mt−1(r),Rt(r)] (1.4.10b) One-step forecast p[Yt(r) ∣Dt−1] ∼ Tnt−1(r)[ft(r), Qt(r)] (1.4.10c)
Posterior at time t p[θt(r) ∣Dt] ∼ Tnt(r)[mt(r),Ct(r)]. (1.4.10d)
The estimates for the scale parameters are
Rt(r) =St−1(r)R∗t(r) Qt(r) =St−1(r)Q∗t(r) Ct(r) =St(r)C∗t(r).
Retrospective Distributions
Equations 1.4.10b and 1.4.10c give the one-step ahead forecast distributions for θt(r) and Yt(r). The one-step forecast for Yt(r) provides a simple, closed-form formula for the likelihood stated in equation 1.6.1 while θt(r) estimates the strength of the regressors (the parent nodes) at time t given data y1(r), . . . , yt(r). When examining the behaviour of θ(r) over time, it is informative to consider not only the one-step estimates, but also retrospective estimates, {θT(r),θT−1(r), . . . ,θ1(r)} given all the
data, y(r) = {y1(r), . . . , yT(r)}. These may be obtained in a similar, one-step manner via the recursive relations outlined below. In order to maintain the notation used by West and Harrison (1997), the (r) notation is dropped temporarily so that θt(r) =
θt, φ(r) = φ etc. Then the bracket notation denotes the parameters k steps back in time. We have p(θt−k∣Dt) ∼ Tnt[at(−k), St St−k R∗t(−k)] k≥0. (1.4.11)
The parameters of this distribution may be obtained using the recursive relations
at(−k) =mt−k+Bt−k[at(−k+1) −mt−k] at(0) =mt
Rt(−k) =Ct−k+Bt−k[Rt(−k+1) −Rt−k+1]Bt−k Rt(0) =Ct (1.4.12a) where
Bt=CtR−t+11.
Note thatCtR−t+11=φ−1φC∗t(Rt∗+1)−1 and R∗t(0) =C∗t. For unknown variance φ−1, we may write equation 1.4.12a in terms of St, its best estimate at time t:
StR∗t(−k) =St−kC∗t−k+Bt−k[StR∗t(−k+1) −St−kR∗t−k+1]Bt−k =St−k[Ct∗−k+Bt−k[
St
St−k
R∗t(−k+1) −R∗t−k+1]Bt−k].
Dynamic Linear Model theory is outlined in detail in West and Harrison (1997, Chapter 4).
Using these relations, it is possible to construct
p[θt(r) ∣y(r)] ∼ TnT(r)[µt(r),Σt(r)] (1.4.13)
with
µt(r) =mt(r) +Ct(r)Rt+1(r)−1[µt+1(r) −mt(r)] (1.4.14a) Σ∗t(r) =C∗t(r) +C∗t(r)R∗t+1(r)−1[Σt+1(r) −R∗t+1(r)]Ct∗(r)R∗t+1(r)−1 (1.4.14b)
Σt(r) =ST(r)Σ∗t(r). (1.4.14c)
In this work, we use mt(r) and Ct(r) to denote the parameters of equation 1.4.10d (that is, estimates for θt(r)given the observations up until timet). We use µt(r)and Σt(r) to denote the parameters of equation 1.4.11 (estimates for θt(r) given all the datay(r)).