8 1951 1960 Alegría tecnológica
8.1. Arquitectura entre 1951 y
As explained earlier, NNs have multiple parameters to be set, with significant impact on the model obtained and its performance. This is true for other algorithms as well. Besides, it is also necessary to choose the algorithms that are best suited for the task at hand.
Metalearning (MtL) aims at helping in the process of selecting a predictive algorithm to use on a given dataset (Brazdil et al., 2008). MtL is a sub-field of ML where algorithms are applied to (meta)data on ML experiments. Its objective is to take advantage of information obtained from previous tasks to improve the performance in new tasks. A recent survey on this subject can be found inLemke et al. (2015), where the authors overview MtL and the most common techniques used.
This technique is mainly used for the algorithm selection problem (Brazdil and Giraud- Carrier,2018), and has also been used to address the most common tasks - classifica- tion (Brazdil et al., 2003;Ali et al.,2018), regression (Gama and Brazdil,1995), time series (Prudˆencio and Ludermir, 2004) and clustering (Pimentel and de Carvalho, 2018). These approaches were then extended, for instance, to: selecting parameter settings for a single algorithm (Gomes et al.,2012); the whole data mining process (Ser- ban et al.,2013); problems from domains other than machine learning, e.g.: different optimisation problems (Abreu et al., 2009; Smith-Miles, 2009; Pavelski et al., 2018; Gutierrez-Rodr´ıguez et al., 2019;Chu et al.,2019); and also data streams (Gama and Kosina,2011). Furthermore, MtL approaches have been used for automatic parameter tuning (Molina et al.,2012). Also, a preliminary study developed within this research, and included in Chapter4, approaches the use of MtL for parameter selection in neural networks (F´elix et al., 2017).
2.3.1
Algorithm Recommendation
The Algorithm Selection Problem, originally formulated by Rice (1976), consists in determining the best algorithm to use for a certain dataset. MtL can take advantage of information previously obtained on several datasets with several algorithms to approach this problem (Brazdil et al., 2008). This knowledge is used to build a
2.3. METALEARNING 21
metamodel that, given a new dataset, gives the system the ability to recommend the best algorithm(s).
Figure2.2 illustrates the MtL process for the algorithm recommendation problem.
ab dnew
Metafeatures
Performancea,d Metadata
a 3 a 2 aA ... a 1 d1 d 2 d 3 d D ... Metamodel Metafeatures Apply metamodel
Figure 2.2: Metalearning process for the algorithm recommendation problem. The process starts (in the left part of the figure) with D datasets d1, . . . , dD and A algorithms a1, . . . , aA (possibly with associated parameter settings). In a preliminary phase, the algorithms are applied to the datasets, and the performance obtained by each algorithm on each dataset is saved.
Then the metalearning process is performed (shaded part of the figure). The datasets are characterised and the resulting characteristics – metafeatures (Subsection2.3.2) – are saved. The metadata is composed of the performances obtained on the previous phase and the metafeatures computed here.
The metadata is used to build a metadataset with the same structure as a general ML dataset (described at the beginning of this section): E instances of I independent variables x1, . . . , xI (the metafeatures) and one dependent variable y. The dependent variable may be, for example, the best algorithm, the performance of a given algorithm, or the performance rank of a given algorithm.
Then, metalearning computes a (meta)model m0 that tries to fit the function f such that
ˆ
y = f (x1, . . . , xI) (2.18)
which best approximates the true output of y. The learning task to apply to the metadataset will depend on the nature of the dependent variable.
When a new dataset (dnew) is studied (right part of the figure), the first step is to compute its metafeatures. The metamodel obtained previously is then applied to this new metadata in order to select the algorithm and/or set the parameter values that best suits the new dataset.
2.3.2
Metafeatures
Metafeatures are values that represent characteristics of a ML experiment, of its input, or of its output, and aim at describing the knowledge obtained in the past. The design of metafeatures that contain useful information about the performance of algorithms is one of the main challenges in metalearning (Brazdil et al.,2008).
Metafeatures are typically categorised as Simple, statistical and information-theoretic; model-based, and; landmarkers (Brazdil et al., 2008). Next, we describe the different categories, providing some examples. For a list of frequently used metafeatures, please refer to Vanschoren (2018).
Simple, statistical and information-theoretic metafeatures: represent the char- acteristics of the dataset and are the most commonly used.
Simple metafeatures include the number of examples (Brazdil et al.,2003;Gama and Brazdil, 1995; Kalousis et al., 2004), number of attributes (Gama and Brazdil, 1995), number of nominal attributes (Ali et al., 2018), proportion of symbolic attributes (Brazdil et al.,2003;Kalousis et al.,2004) and proportion of missing values (Brazdil et al.,2003;Kalousis et al.,2004). Some other examples are correlation and dissimilarity (Pimentel and de Carvalho,2018) for selecting clustering algorithms, number of jobs and machines (Pavelski et al., 2018) on a MtL approach for the flowshop problem, or capacity and demand (Gutierrez- Rodr´ıguez et al.,2019) for selecting meta-heuristics for the vehicle routing prob- lem.
Statistical measures include skewness (Gama and Brazdil, 1995) and kurto- sis (Gama and Brazdil,1995), but also mean, median and standard deviation of attributes (Chu et al.,2019) or default accuracy (Ali et al., 2018).
Information-theoretic measures include entropy of classes (Brazdil et al., 2003; Gama and Brazdil,1995;Kalousis et al.,2004;Ali et al.,2018) or attributes (Gama and Brazdil, 1995; Ali et al., 2018) and mean mutual information of class and attributes (Brazdil et al., 2003; Gama and Brazdil, 1995).
Model-based metafeatures: based on the model applied to the data. Examples of this type of metafeatures are: error correlation of pairs of algorithms among different datasets (Kalousis et al., 2004), flowshop objective (Pavelski et al., 2018), fitness distance correlation (Chu et al., 2019) and number of trees (Ali et al., 2018).