CARACTERÍSTICAS DEL PROGRAMA:

PROPUESTA DE ESTUDIO DE LA LICENCIATURA EN CIENCIAS DE LA EDUCACION

Despite the wide range of contributions to the forecasting literature, it is widely accepted that there is no method that is applicable to all time series (Chatfield, 2000). This statement is corroborated by experiments performed by Aiolfi and Timmermann (2006) relative to the performance of forecasting models over a time series. They reported systematic evidence that some forecasting models have varying relative performance over time and that some forecasting models are persistently good (or bad) throughout the time series.

The idea that all predictive models have some limitations has been exten- sively explored in the literature of ensemble learning methods. Several empiri- cal and theoretical studies have shown that combining several individual models leads to better predictive performance (Ueda and Nakano, 1996; Breiman, 1996; Dietterich et al., 2000). Particularly in forecasting, combining different models is a well-studied topic (Bates and Granger, 1969; Armstrong, 1989). For example, Clemen (1989) presented an annotated bibliography comprising over 200 approaches.

Still, it is not clear how we should combine the predictions of a set of models. There are two groups of approaches devised to accomplish this task: static methods and dynamic methods. Static methods assign a weight to each model

in the ensemble, which is constant for all observations. The most common

predictions of the available models. In this particular approach, all individual models have equal weights.

The main limitation behind the application of static methods in time-dependent domains is that these may fail to capture the evolving dynamics of time series and cope with concept drift. As we described before, there is compelling evidence in the literature that not all models will perform equally well at any given prediction point (Aiolfi and Timmermann, 2006). In these scenarios, it is more common to adopt dynamic methods, in which the weights assigned to an individual model vary over time. This type of approach falls within the scope of online learning. Online learning denotes a learning paradigm in which a predictive model is updated when a new observation, or set of observations, becomes available (Littlestone and Warmuth, 1994). We will overview some dynamic methods used to combine a set of forecasting models. We split these into three dimensions: windowing approaches, regret minimisation approaches, and meta-learning approaches.

Windowing Approaches

Determining the weights of different predictive models at each time step is a dif- ficult task, and several methods have been proposed to accomplish this. Partic- ularly in forecasting, the simple average of the available experts (equal weights) has been shown to be a robust combination method (Clemen and Winkler, 1986) (Simple). Its competitive performance relative to approaches using estimated weights is known in the forecasting literature as the “forecasting combination puzzle” (Genre et al., 2013). Using the median value of the available predictions has also been explored (Marcellino, 2004). Nonetheless, more sophisticated approaches have been proposed.

Simple averages are sometimes complemented with model selection before aggregation, also known as trimmed means (SimpleTrim). For example, Jose and Winkler (2008) propose trimming a percentage of the worst forecasters in past data and average the output of the remaining experts.

One of the most common and successful approaches to combine predictive models in time dependent data is to weight them according to their performance. Typically the performance is determined on a window of recent data,

or by using some other forgetting mechanism that promotes the importance of recency. The idea is that recent observations are more similar to the one we intend to predict, and thus, they are considered more relevant. For example, Newbold and Granger (1974) use this approach to combine forecasting models (WindowLoss). More recently, van Rijn et al. (2015) proposed a method for data streams classification dubbed Blast. As opposed to fusing experts, they select the best recent performing one to classify the next observation. Bunn (1975) proposes an approach based on out-performance, where the weights of experts are determined by the number of times they have been the best in the past.

AEC (Adaptive Ensemble Combination) is a method for adaptively combin-

ing a set of forecasters (S´anchez, 2008). It uses an exponential re-weighting

strategy to combine forecasters according to their past performance, includ- ing a forgetting factor to give more importance to recent values. Timmermann (2008) argues that models have only short-lived periods of predictability for the prediction of stock returns. He proposes an adaptive combination based on the

recent R2_{of forecasters. If all models have a low explained variance (low R}2_{) in}

the recent observations, then the forecast is set to the mean value of those observations. Otherwise, the experts are combined by averaging their predictions with the arithmetic mean (ERP).

Regret Minimisation

Several strategies have been proposed for aggregating the output of forecasting models, which are based on the idea of regret minimisation. Regret is the average error suffered with respect to the best we could have obtained. Several approaches dynamically combine a set of predictive models by optimising this metric, namely the exponentially weighted average (EWA) (Vovk, 1990; Little- stone and Warmuth, 1994), the polynomially weighted average (Cesa-Bianchi and Lugosi, 2003) (MLpol), or the fixed share aggregation (Herbster and War- muth, 1998) (FixedShare). For a thorough review of these methods and their theoretical properties, we refer to the seminal work by Cesa-Bianchi and Lugosi (2006). Zinkevich (2003) proposed an online convex programming approach based on gradient descent that also guarantees regret bounds (OGD).

Combining by Learning

Meta-learning provides a way of modelling the learning process of a learning algorithm (Brazdil et al., 2008). Several methods use this approach to improve

the combination or selection of models (Wolpert, 1992; Todorovski and Dˇzeroski,

2003). We overview some meta-learning approaches that have been designed to combine or select a set of predictive models in time-dependent domains. Some of these approaches are static, for example, stacking, but in practice are often used in time-dependent domains.

A popular and successful approach for dynamically combining experts is to apply multiple regression on the output of the set of forecasting models. For example, Gaillard and Goude (2015) describe a setup in which Ridge regression is used to aggregate experts by minimising the L2-regularised least-squares (Hoerl and Kennard, 1970; Marcellino, 2004). The idea behind these approaches is similar to stacking (Wolpert, 1992), a widely used approach to combine predictive models (Stacking).

Rossi et al. (2014) present MetaStream for the dynamic selection of regression models in a data stream environment. MetaStream works by having a meta- learning model that periodically selects the most appropriate regression method to be used in the next few observations.

Gama and Kosina (2014) present a meta-learning approach designed to cope with concept drift in data streams classification problems. The system proposed by the authors is focused on re-occurring drift. It can be split into two layers: a base layer, where a predictive model is devised to solve the original problem; and a meta layer, which manages the learning process. When concept drift is detected, the meta layer decides whether to train a new model using recent observations or to re-activate a base model trained previously.

In document Incidencia del profesional en Ciencias de la Educación universitatia y en la formulación, desarrollo y evaluación de proyectos educativos (página 114-119)