O. E.3 Identificar la relación que existe entre los conocimientos o dominios
II. Marco teórico.
2.1. Marco conceptual y las teorías.
2.1.3. Filosofía de la labor docente
In this chapter, we present a meta-learning strategy to combine the available forecasting models in a dynamic way. However, contrary to stacking, we sepa- rately model the individual expertise of each forecasting model, assuming these to be specialists in different parts of the time series. Consequently, the forecast- ing models are combined in such a way that they are only selected for predicting examples that they are expected to be good at. Moreover, as opposed to track- ing the error on past instances, our combination approach is more proactive as it is based on predictions of future loss of models. This can result in a faster adaptation to changes in the environment.
ferent areas of expertise across the input space. Moreover, it is common for the underlying process generating the time series to have recurrent structures due to factors such as seasonality (Gama and Kosina, 2009, 2014). In this context, we hypothesise that a meta-learning strategy enables the ensemble to better de- tect changes in the relative performance of models, or changes between different regimes governing a time series, and quickly adapt itself to the environment.
The proposed meta-learning strategy, hereby denoted as Arbitrated Dynamic Ensemble (ADE), is based on arbitrating (Koppel and Engelson, 1996; Ortega et al., 2001a), a method from the family of mixture of experts (Jacobs et al., 1991; Masoudnia and Ebrahimpour, 2014). A meta-learner is created for each forecasting model that is part of the ensemble. Each meta-learner is specifically designed to model how apt its base counterpart is to make an accurate prediction
for a given test example. This is accomplished by analysing how the error
incurred by a given learning model relates to the characteristics of the data. At test time, the base-learners are weighted according to their expected degree of competence in the input observation, estimated by the predictions of the meta-learners. This is illustrated in Figure 4.1.
ŷ = ∑ (ŷ
i. w
i)
...
...
m
1scale
m
2m
3...
m
sz
1z
2z
3...
z
sŷ1
ŷ2
ŷ3
...
ŷs
ê1
ê2
ê3
...
ês
w1
w2 w3
...
ws
meta predictions
base predictions
Figure 4.1: Workflow of ADE for a new prediction. The base-learners M produce the predictions ˆyi, i ∈ {1, . . . , s} for the next value of the time series. In parallel,
the meta-learners Z produce the weights wi of each base-learner according to
the predictions of their error (ˆei). The final prediction ˆy is computed using a
Let M and Z denote a set of s base models and a set of s meta models,
respectively. While a given base-learner miis trained to model the future values
of the time series, its meta-learning associate zi is trained to model the error
of mi. The model zi is an arbiter that can make predictions regarding the
error that mi will incur when predicting the future values of the time series.
The larger the estimates produced by zi (relative to the other models in the
ensemble), the lower the weight of mi will be in the combination rule.
Diversity among the experts is a fundamental component in building ensem- ble methods (Brown et al., 2005b). We start by addressing this issue implicitly, by using experts with different learning strategies, i.e. heterogeneous ensembles. We assume that the ensemble heterogeneity is useful to cope with the different dynamic regimes of time series. Besides heterogeneity, we encourage diversity explicitly during the aggregation of the output of experts. This is achieved by taking into account not only predictions of performance produced by the arbiters but also the correlation among experts in a recent window of observations.
We validate the proposed method in 62 real-world time series. Empirical experiments suggest that our method is competitive with different adaptive methods for combining experts and other meta-learning approaches such as stacking (Wolpert, 1992). In the interest of reproducible research, ADE is publicly
available as an R software package1. Moreover, all experiments reported in the
chapter are also reproducible2.
In summary, the contributions presented in this chapter are the following: • ADE, a novel method based on meta-learning for dynamically combining a
portfolio of forecasting models;
• The introduction of a blocked prequential procedure in the arbitrage ap- proach to obtain out-of-bag predictions in the training set in order to increase the data used to train the meta-learning models;
• A sequential re-weighting strategy for controlling the redundancy among the output of the experts using their correlation in a recent window of observations;
1tsensembler: on CRAN or at https://github.com/vcerqueira/tsensembler 2Instructions at: https://github.com/vcerqueira/forecasting_experiments
• An extensive empirical study encompassing: statistical comparisons with state of the art approaches; analysis on the different deployment strategies of the proposed method; sensitivity analysis on the main parameters of the proposed method; relative scalability analysis in terms of execution time; and a study on the value of increasing the number of experts in the ensemble.