• No se han encontrado resultados

UA 9. Lenguaje visual

4.6 Análisis general de instrumento III. Cuestionario

selection methods are constantly appearing using different strategies: a) combining several feature selection methods, which could be done by using algorithms from the same approach, such as two filters (Y. Zhang, Ding, & Li, 2008), or coordinating al- gorithms from two different approaches, usually filters and wrappers (Y. Peng, Wu, & Jiang, 2010; El Akadi, Amine, El Ouardighi, & Aboutajdine, 2011); b) combining feature selection approaches with other techniques, such as feature extraction (Vainer, Kraus, Kaminka, & Slovin, 2011) or tree ensembles (Tuv, Borisov, Runger, & Torkkola, 2009); c) reinterpreting existing algorithms (Sun & Li, 2006), sometimes to adapt them to specific problems (Sun, Todorovic, & Goodison, 2008a); d) creating new methods to deal with still unresolved situations (Chidlovskii & Lecerf, 2008; Loscalzo, Yu, & Ding, 2009) and e) using an ensemble of feature selection techniques to ensure a bet- ter behavior (Saeys, Abeel, & Van de Peer, 2008; Bol´on-Canedo, S´anchez-Maro˜no, & Alonso-Betanzos, 2012).

2.3

Summary

To confront the problem of the high dimensionality of data, feature selection algorithms have become indispensable components of the learning process. Hence, a correct se- lection of the features can lead to an improvement of the inductive learner, either in terms of learning speed, generalization capacity or simplicity of the induced model.

Feature selection methods may roughly be divided into three types: filters, wrappers and embedded methods. This chapter has summarized the advantages and disadvan- tages of each approach, as well as described the algorithms that will be used throughout this thesis.

CHAPTER

3

A critical review of feature selection methods

In the past few years, feature selection algorithms have become indispensable com- ponents of the learning process to get rid of irrelevant and redundant features. As mentioned in the previous chapter, there exist two major approaches in feature selec- tion: individual evaluation and subset evaluation. Individual evaluation is also known as feature ranking (Guyon & Elisseeff, 2003) and assesses individual features by assign- ing them weights according to their degrees of relevance. On the other hand, subset evaluation produces candidate feature subsets based on a certain search strategy. Be- sides this classification, feature selection methods can also be divided into three models: filters, wrappers and embedded methods (Guyon, 2006). With such a vast body of fea- ture selection methods, the need arises to find out some criteria that enable users to adequately decide which algorithm to use (or not) in certain situations.

There are several situations that can hinder the process of feature selection, such as the presence of irrelevant and redundant features, noise in the data or interaction between attributes. In the presence of hundreds or thousands of features, such as DNA microarray analysis, researchers notice (Y. Yang & Pedersen, 1997; Yu & Liu, 2004a) that commonly a large number of features is not informative because they are either irrelevant or redundant with respect to the class concept. Moreover, when the number of features is high but the number of samples is small, machine learning gets particularly difficult, since the search space will be sparsely populated and the model will not be able to distinguish correctly the relevant data and the noise (Provost, 2000).

This chapter reviews the most popular feature selection methods in the literature and checks their performance in an artificial controlled experimental scenario. In this manner, it is contrasted the ability of the algorithms to select the relevant features and to discard the irrelevant ones without permitting noise or redundancy to obstruct this process. A scoring measure will be introduced to compute the degree of matching between the output given by the algorithm and the known optimal solution, as well as the classification accuracy.

Chapter 3. A critical review of feature selection methods

3.1

Existing reviews of feature selection methods

Feature selection, since it is an important activity in data preprocessing, has been widely studied in the past years by the machine learning researchers. This technique has found success in many different real world applications like DNA microarray analysis (Yu & Liu, 2004b), intrusion detection (Bol´on-Canedo, S´anchez-Maro˜no, & Alonso-Betanzos, 2011; W. Lee, Stolfo, & Mok, 2000), text categorization (Forman, 2003; Gomez, Boiy, & Moens, 2012) or information retrieval (Egozi, Gabrilovich, & Markovitch, 2008), includ- ing image retrieval (Dy, Brodley, Kak, Broderick, & Aisen, 2003) or music information retrieval (Saari, Eerola, & Lartillot, 2011).

Bearing in mind the large amount of feature selection methods available, it is easy to note that carrying out a comparative study is an arduous task. Another problem is to test the effectiveness of these feature selection methods when real data sets are em- ployed, usually without knowing the relevant features. In these cases the performance of the feature selection methods clearly rely on the performance of the learning method used afterwards and it can vary notably from one method to another. Moreover, per- formance can be measured using many different metrics such as computer resources (memory and time), accuracy, ratio of features selected, etc. Besides, datasets may include a great number of challenges: multiple class output, noisy data, huge number of irrelevant features, redundant or repeated features, ratio number of samples/number of features very close to zero and so on. It can be noticed that a comparative study tackling all these considerations could be unapproachable and therefore, most of the in- teresting comparative studies are focused on the problem to be solved. So, for example, Forman (2003) presented an empirical comparison of twelve feature selection methods evaluated on a benchmark of 229 text classification problem instances; another com- parative study (Sun, Babbs, & Delp, 2006) is used for the detection of breast cancers in mammograms. Other works are devoted to a specific approach (Liu, Liu, & Zhang, 2008), in which an experimental study of eight typical filter mutual information based feature selection algorithms on thirty-three datasets is presented; or an evaluation of the capability of the survival ReliefF algorithm (sReliefF) and of a tuned sReliefF ap- proach to properly select the causative pair of attributes (Beretta & Santaniello, 2011). Similarly, there are works examining different feature selection methods to obtain good performance results using a specific classifier (naive Bayes in (M.-L. Zhang, Pe˜na, & Robles, 2009), C4.5 in (Perner & Apte, 2000) or the theoretical review for support vector machines (SVMs) in (Victo Sudha & Raj, 2011)). Related to dataset challenges, there are several works trying to face the problem of high dimensionality, in both di-