Recomendaciones - Análisis e interpretación de resultados

4. Análisis e interpretación de resultados

5.2. Recomendaciones

3.1 Reliance on the list of models and the restriction to linear

combinations

We agree with Clyde and Zhou that the performance of stacking depends on the choice of model list, as stacking can do nothing better than the optimal linear combination

from the model list. Stacking is not strongly sensitive to the misspeciﬁed models (see Section 4.1 of our paper), but it will be sensitive to how good an approximation is possible given the ensemble space.

We discuss the concern of inﬂexibility of linear-additive-form of density combina- tion in Section 5.2, and construct the same orthogonal regression example as Clyde, in which stacking will not work to approximate the true model that is a convolution of individual densities. By optimizing the leave-one-out performance of combined prediction, the stacking framework can be extended to more general combination forms, such as the posterior family used in the BPS literature. Furthermore, simplex constraints will be unnecessary if it goes beyond the linear combination of densities. We are inter- ested in testing such approaches. Yoo proposes another way to obtain convolutional combinations by stacking in the Fourier domain.

3.2 Model expansion as an alternative

One setting where stacking can be used, but full model expansion could be more difficult, is when some set of different sorts of models have been separately fit. The same idea is summarized by Pericchi as “careful consideration of all the entertained models and admissible estimators for parameters should be considered prior to the optimization procedures.” We are less concerned about the situation described by Belitser and

Nurushev, Shin, and Zhou, in which the number of models are so large that stacking

can be both computationally expensive and theoretically inconsistent, because in that setting we would recommend moving to a continuous model space that encompasses the separate models in the list.

Stacking is not designed for model selection, but for model averaging to get good predictions. We do not recommend to use it as model selection, although models with zero weights could be discarded from the average. For large p and small n, instead of stacking or other model averaging methods, we recommend using an encompassing model with all variables and prior information about the desired level of sparsity (Piironen and Vehtari, 2017b,c). For example, the regularized horseshoe prior can be considered as a continuous extension of the spike-and-slab prior with discrete model averaging over models with diﬀerent variable combinations (Piironen and Vehtari,2017c). For high-dimensional variable selection we recommend a projection predictive approach (Piironen and Vehtari, 2016, 2017a), which has a smaller variance in selection process due to the use of the encompassing model as a reference model and has better predictive performance due to making the inference conditional on the selection process and the encompassing model.

3.3 Nonparametric approaches

Li and Iacopini and Tonellato suggest the use of nonparametric reference models

to eliminate the need of cross-validation. If we are able to make a good nonparametric model there is probably no need for model averaging. Although model averaging might be used as part of model reduction, instead of using component models p(·|y, Mk) we

would prefer to form the component models using a projection predictive approach which projects the information from the reference model to the restricted models (Piironen and Vehtari,2016,2017a).

Zhou suggests Bayesian nonparametric (BNP) models as an alternative to model

averaging. Indeed, the spline models used in the experiments in Section 4.6 of our paper can be considered as BNP models. We can compute fast LOO-CV also for Gaussian processes and other Gaussian latent variable models (Vehtari et al.,2016).

3.4 Logarithmic scoring rules

Finally, we emphasize that the choice of scoring rules in stacking depends on the under- lying application, and it is unlikely to give one optimal result that is applicable to any situation in advance. As Winkler, Jose, Lichtendahl and Grushka-Cockayne and

Gr¨uwald and Heide point out, there is no need to use log score if the focus is some

other utility. Our proposed stacking framework is applicable to any scoring rule. We are particularly interested in interval stacking that optimizes the interval score, which is likely to provide better interval estimation and posterior uncertainties.

We thank Franck for numerically verifying that stacking outperforms intrinsic Bayesian model averaging (iBMA) in simulations. This result suggests that the stacking procedure’s prior invariance property is a convenient bonus but not the only reason for its impressive performance.

References

Bernardo, J. M. and Smith, A. F. (1994). Bayesian theory. John Wiley & Sons.

MR1274699. doi:https://doi.org/10.1002/9780470316870. 1001

Buerkner, P., Vehtari, A., and Gabry, J. (2018). “PSIS assisted m-step-ahead predictions for time-series models.” Technical report. URL http://mc-stan.org/loo/

articles/m-step-ahead-predictions.html 1003,1004

Dawid, A. P. (1984). “Present position and potential developments: Some personal views: Statistical theory: The prequential approach.” Journal of the Royal Statistical

Society. Series A, 278–292. MR0763811. doi: https://doi.org/10.2307/2981683. 1003

Geweke, J. and Amisano, G. (2011). “Optimal prediction pools.” Journal of Econo-

metrics, 164(1): 130–141.MR2821798. doi:https://doi.org/10.1016/j.jeconom.

2011.02.017. 1003

Geweke, J. and Amisano, G. (2012). “Prediction with misspeciﬁed models.” American

Economic Review , 102(3): 482–486. 1003

Kamary, K., Mengersen, K., Robert, C. P., and Rousseau, J. (2014). “Testing hypotheses via a mixture estimation model.” arXiv preprint arXiv:1412.2044. 1004

McAlinn, K., Aastveit, K. A., Nakajima, J., and West, M. (2017). “Multivari- ate Bayesian Predictive Synthesis in Macroeconomic Forecasting.” arXiv preprint

arXiv:1711.01667. 1004

McAlinn, K. and West, M. (2017). “Dynamic Bayesian predictive synthesis in time series forecasting.” arXiv preprint arXiv:1601.07463.MR3664859. 1004

Piironen, J. and Vehtari, A. (2016). “Projection predictive model selection for Gaussian processes.” In 2016 IEEE 26th International Workshop on Machine Learning for

Signal Processing (MLSP), 1–6. 1005,1006

Piironen, J. and Vehtari, A. (2017a). “Comparison of Bayesian predictive methods for model selection.” Statistics and Computing, 27(3): 711–735. 1005, 1006

Piironen, J. and Vehtari, A. (2017b). “On the hyperprior choice for the global shrinkage parameter in the horseshoe prior.” In Artiﬁcial Intelligence and Statistics, 905–913. 1005

Piironen, J. and Vehtari, A. (2017c). “Sparsity information and regularization in the horseshoe and other shrinkage priors.” Electronic Journal of Statistics, 11(2): 5018– 5051. 1005

Roberts, D. R., Bahn, V., Ciuti, S., Boyce, M. S., Elith, J., Guillera-Arroita, G., Hauen- stein, S., Lahoz-Monfort, J. J., Schr¨oder, B., Thuiller, W., et al. (2017). “Cross- validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure.” Ecography, 40(8): 913–929. 1003

Shimodaira, H. (2000). “Improving predictive inference under covariate shift by weight- ing the log-likelihood function.” Journal of Statistical Planning and Inference, 90(2): 227–244. MR1795598. doi: https://doi.org/10.1016/S0378-3758(00)00115-4. 1002

Sugiyama, M., Krauledat, M., and M¨uller, K.-R. (2007). “Covariate shift adaptation by importance weighted cross validation.” Journal of Machine Learning Research, 8(May): 985–1005. 1002

Sugiyama, M. and M¨uller, K.-R. (2005). “Input-dependent estimation of generalization error under covariate shift.” Statistics & Decisions, 23(4/2005): 249–279.MR2255627.

doi:https://doi.org/10.1524/stnd.2005.23.4.249. 1002

Vehtari, A., Buerkner, P., and Gabry, J. (2018a). “Leave-one-out cross-validation for non-factorizable models.” Technical report. URL http://mc-stan.org/loo/

articles/loo2-non-factorizable.html 1003

Vehtari, A., Gabry, J., Yao, Y., and Gelman, A. (2018b). “loo: Eﬃcient leave-one-out cross-validation and WAIC for Bayesian models.” R package version 2.0.0. 1003 Vehtari, A., Gelman, A., and Gabry, J. (2017). “Pareto smoothed importance sampling.”

arXiv preprint arXiv:1507.02646. 1003

Vehtari, A., Mononen, T., Tolvanen, V., Sivula, T., and Winther, O. (2016). “Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models.”

In document Análisis de las alteraciones del movimiento en jugadores de fútbol de la categoría prejuvenil del club imbabura (página 61-73)