Comparison of EvoDAG and State-of-the-Art Classi- fiersfiers

Results of the Comparison of our Proposed Selection Heuristics against ADS and NVS

4.6 Comparison of EvoDAG and State-of-the-Art Classi- fiersfiers

After analyzing the performance of the different selection schemes, it is the moment to compare EvoDAG against state-of-the-art classifiers. We chose EvoDAG only with the combination of the selection schemes: acc-rnd, rnd-rnd, fit-fit, ads-rnd, nvs-rnd. acc- rnd, because it is the heuristic that gives better results, rnd-rnd they are the simplest schemes and they proportionate good results, fit-fit because it uses the traditional selection schemes based on fitness, and finally, ads-rnd and nvs-rnd because they are the selection schemes of the state of the art. We decided to compare EvoDAG against sixteen classifiers of the scikit-learn python library [80], all of them using their default parameters. Specifically, these classifiers are Perceptron, MLPClassifier, BernoulliNB, GaussianNB, KNeighborsClassifier, NearestCentroid, LogisticRegression, LinearSVC, SVC, SGDClassifier, PassiveAggressiveClassifier, DecisionTreeClassifier, ExtraTreesClassifier, RandomForestClassifier, AdaBoostClassifier and GradientBoostingClassifier. It is also included in the comparison two auto-machine learning libraries: autosklearn [23] and TPOT [76]. It is important to mention that TPOT (see Section 2.4) is a Genetic Program- ming tool for automatically constructing and optimizing machine learning pipelines using 14 preprocessors, 5 feature selectors, and 11 classifiers; all these techniques im- plemented in scikit-learn.

Figure 4.11 shows the comparison of classifiers based on macro-F1 ranks. The best classifier, based on the results of these experiments, is TPOT, followed by EvoDAG acc-rnd, autosklearn, and EvoDAG rnd-rnd. It can be seen that the use of our proposed selection heuristic based on accuracy and negative random selection improves the performance of EvoDAG and positioned it into second place. EvoDAG performs better than the scikit-learn classifiers, and it is competitive with auto-machine learning libraries.

For validating the results, we use the statistical Wilcoxon signed-rank test [102],

Figure 4.11: Comparison of EvoDAG against state-of-the-art classifiers based on macro-F1 rank. The average rank sorts classifiers, and the ranks values are on the left.

The blue boxplots represent EvoDAG systems. Source: Own elaboration.

and the p-values were adjusted with the Holm-Bonferroni method [44] to consider multiple comparisons. Macro-F1 values were used for the statistical test. TPOT was found statistically better than Logistic Regression, KNeighborsClassifier, NearestCen- troid, AdaBoostClassifier, SVC, Linear SVC, GaussianNB, BernoulliNB, PassiveAggres- siveClassifier, Perceptron, SGDClassifier. Nevertheless, there were not found statistical differences between TPOT and EvoDAG. Figure 4.12 shows the results of Wilcoxon signed-rank test [102] by pairs of classifiers. It can be seen that the results of TPOT are statistically different from the ones obtained with EvoDAG fit-fit, the classical selection schemes, EvoDAG ads-rnd, and EvoDAG nvs-rnd, the ones of the state-of-the- art. However, no statistical differences were found between TPOT and EvoDAG acc-rnd or EvoDAG rnd-rnd. It confirms that our proposed selection schemes statistically im- prove the performance of EvoDAG.

Figure 4.12: Statistical comparison (Wilcoxon test) of the different classifiers based on macro-F1. Black cells represent that the pair of schemes are statistically different with a 95% confidence. Source: Own elaboration.

The classifiers’ comparison based on the time that they spend learning the model

is presented in Figure 4.13. It can be seen that scikit-learn classifiers are the fasters;

most of them spend from 0.007 to 0.009 seconds per sample. EvoDAG, with the different selection schemes, spends more time than scikit-learn classifiers in the learning phase. It spends, on average, from 0.5 to 5 seconds per sample. However, EvoDAG is considerably faster than the auto-machine learning libraries, autosklearn and TPOT, that consume on average 11.5 and 57.68 seconds, respectively, in the learning phase.

Figure 4.13: Comparison of EvoDAG against state-of-the-art classifiers based on the time required by the classifiers’ training phase. The time is presented in seconds, and it is the average time per sample. The average time sorts classifiers, and those values are on the left. The blue boxplots represent EvoDAG systems. Source: Own elaboration.

Once more time, an analysis in two dimensions for comparing the different classifiers based on performance (macro-F1 average rank) and time (seconds per sample) was performed (see Figure 4.14). Remembering, the closest is the classifier to the origin, the better it is in terms of performance and time. We can observe that the classifiers in the Pareto frontier are TPOT, EvoDAG acc-rnd, EvoDAG rnd-rnd, GB, ET, and DT. The interpretation of this is as follows. If you want a good performance and you do

have results very fast with good performance, but not the best performance, use Gradi- ent Boosting Classifier. On the other hand, if you wish to a considerable performance at a reasonable time, use EvoDAG acc-rnd.

Figure 4.14: Comparison of EvoDAG with state-of-the-art classifiers based on macro-F1 and the time required by the classifiers’ training phase. The time is presented in seconds, and it is the average time per sample. The classifiers are: EvoDAG acc-rnd, EvoDAG rnd-rnd, EvoDAG fit-fit, tpot, autosklearn, Perceptron (PER), MLPClassifier (MLP), BernoulliNB (NBB), GaussianNB (NB), KNeighborsClassifier (KN), NearestCentroid (NC), LogisticRegression (LR), LinearSVC (LSVC), SVC, SGDClassifier (SDG), PassiveAggressiveClassifier (PA), De- cisionTreeClassifier (DT), ExtraTreesClassifier (ET), RandomForestClassifier (RF), AdaBoostClassifier (AB) and GradientBoostingClassi-

In document INFOTEC_DCCD_CNSG_07092... - Repositorio INFOTEC (página 133-139)