• No se han encontrado resultados

2.4. Functional Movement Screen (FMS)

2.4.2. Pruebas del movimiento funcional FMS TM

I congratulate the authors on delivering a stimulating and thought-provoking article. In this comment, in the context of data observed over time, in cases when one model has a posterior probability close to one but a stacking weight much smaller than one, I suggest a way to investigate the causes of the disagreement.

Here we focus on the case when data arrives over time. To simplify the discussion, let us assume that data are observed at discrete time points. Let ytbe a vector that contains

all the data observed at time t, t = 1, . . . , T . Further, let y1:t = (y1, . . . , yt). Then,

instead of using the leave-one-out predictive density p(yi|y−i, Mk), we may consider the

one-step ahead predictive density p(yt|y1:(t−1), Mk) which is given by

p(yt|y1:(t−1), Mk) =

p(yt|y1:(t−1), θk, Mk)p(θk|y1:(t−1), Mk)dθk.

We note that after ythas been observed, comparing p(yt|y1:(t−1), M1), . . . , p(yt|y1:(t−1),

MK) allows one to evaluate the relative ability of each model to predict at time t− 1

the vector of observations yt. Hence, in the context of data observed over time, instead

of p(yi|y−i, Mk), it seems more natural to use the one-step ahead predictive density

p(yt|y1:(t−1), Mk). Thus, for data observed over time the stacking of predictive distribu-

tions would choose weights  w = argmaxw∈SK 1 T  t=t∗+1 log K  k=1 wkp(yt|y1:(t−1), Mk), (1)

where the summation on t starts at t∗+ 1 because the first t∗ observations are used to train the models to reduce dependence on priors for parameters. We note that the above equation is very similar to that of the optimal prediction pools of Geweke and Amisano (2011,2012), except that they start the summation at t = 1.

It is also helpful to consider the formula for the posterior probability for each model. To keep exposition simple, let us assume equal prior probabilities for the competing models. Further, we assume that the first t∗ observations are used for training. Then, the posterior probability for model Mk is

P (Mk|y1:T) = T t=t∗p(yt|y1:(t−1), Mk) K k=1 T t=t∗p(yt|y1:(t−1), Mk) . (2)

Keeping (1) and (2) in mind, what can we infer when a model M has posterior

probability close to one but its weight w in the stacking of predictive distributions is

much smaller than one? The posterior probability being close to one means that M

is probably, amongst the K models being considered, the model closest in Kullback– Leibler sense to the true data generating mechanism. But its weight w being much

smaller than one means that there are important aspects of the true data generating mechanism that have not been incorporated in M .

We note that both (1) and (2) depend on the data only through the one-step ahead predictive densities p(yt|y1:(t−1), Mk). Thus, for data observed over time, when there are

disagreements between the posterior probabilities of models and the stacking weights, an examination of the one-step ahead predictive densities p(yt|y1:(t−1), Mk) such as

plotting them over time as in Vivar and Ferreira (2009) may help identify what aspects of the true data generating mechanism are being neglected by model M .

For example, an examination of p(yt|y1:(t−1), Mk) may indicate that model M pro-

vides better probabilistic predictions 95% of the time, but that in the remaining 5% of the time the observations are outliers with respect to M but are not outliers with

respect to a model M∗ that has fatter tails than M . In that situation, the outlying

observations would preventw from being close to one. Further examination of the out-

lying observations could possibly suggest ways to improve model M to get it closer to

the true data generating mechanism.

As another example, an examination of p(yt|y1:(t−1), Mk) may indicate that M and

another model M∗take turns at providing better probabilistic predictions. For example, say that for a certain environmental process, M provides better predictions during a

certain period of time, and then after that M∗ provides better predictions, and after that M provides better predictions, and so on. In that case, probably the environ-

mental process has different regimes, and thus for example a Markov switching model (Fr¨uhwirth-Schnatter,2006) may be adequate to model such environmental process.

I would imagine that a sensibly estimated leave-one-out predictive density p(yi|y−i,

Mk) could also be used for diagnostics. I would appreciate if the authors can comment

on advantages and difficulties associated with such use.

Finally, in the M-closed case, will the stacking weight of the true model converge to one as the sample size increases?

References

Fr¨uhwirth-Schnatter, S. (2006). Finite mixture and Markov switching models. Springer.

MR2265601. 987

Geweke, J. and Amisano, G. (2011). “Optimal prediction pools.” Journal of Econo-

metrics, 164(1): 130–141.MR2821798. doi:https://doi.org/10.1016/j.jeconom.

2011.02.017. 986

Geweke, J. and Amisano, G. (2012). “Prediction with misspecified models.” American

Economic Review , 102(3): 482–86. 986

Vivar, J. C. and Ferreira, M. A. R. (2009). “Spatio-temporal models for Gaussian areal data.” Journal of Computational and Graphical Statistics, 18: 658–674.MR2751645.

Contributed Discussion

Documento similar