Vol. 37, No. 13, 20 October 2006, 939–947
FE-CIDIM: fast ensemble of CIDIM classifiers
GONZALO RAMOS-JIME´NEZ*, JOSE´ DEL CAMPO-A´VILA and
RAFAEL MORALES-BUENO
Departamento de Lenguajes y Ciencias de la Computacio´n, E.T.S. Ingenierı´a Informa´tica, Universidad de Ma´laga. Ma´laga, 29071, Spain
(Received 30 December 2005; in final form 2 April 2006)
An active research area in machine learning is the construction of multiple classifier systems to increase learning accuracy of simple classifiers. In this article, we present E-CIDIM, a multiple classifier system designed to improve the performance of CIDIM, and FE-CIDIM, an algorithm developed to speed up E-CIDIM. CIDIM is an algorithm that induces small and accurate decision trees. E-CIDIM keeps a maximum number of trees and induces new trees that may substitute the old trees in the ensemble. The substitution process finishes when none of the new trees improves the accuracy of any of the trees in the ensemble after a preconfigured number of attempts. FE-CIDIM has been developed to speed up the convergence of E-CIDIM using a more restrictive substitution method. We will show that the accuracy obtained thanks to a unique instance of CIDIM can be improved utilizing these new multiple classifier systems.
Keywords: Machine Learning; Multiple classifier systems; CIDIM
1. Introduction
Classification and prediction tasks are two of the most popular activities in machine learning. There are many approaches that try to extract knowledge from data. These approaches are very diverse, but one of the most active research area is multiple classifier systems. They have benefited from the idea of using a committee or ensemble of models to perform cited tasks.
In the literature we can find many approaches to define a multiple classifier system. Thus, we have meth-ods that mainly reduce variance, such as bagging (Breiman 1996) or boosting (Freund and Schapire 1996), and methods that reduce bias, such as stacked generalization (Wolpert 1992). Other multiple classifier methods, such as cascading (Gama and Brazdil 2000), generate new attributes from the class probability estimation. Delegating (Ferri et al. 2004) is another method, and it works with examples in the data set,
using part of them in each classifier and delegating the rest of the examples to the next classifier. In short, there are many methods for generating multiple models. Voting is the most common way used to combine classifiers. Thus, the errors introduced by one classifier can be corrected with the good decisions made by other classifiers. Several variants of voting have been proposed. The simplest voting method is uniform voting, in which every classifier has the same importance. Weighted voting is another method in which every classifier has an associated weight that can increase or decrease its importance.
Many kinds of models can take part into a multiple classifier system. Decision trees are widely used in machine learning such as CART (Breimanet al. 1984), ID3 (Quinlan 1986), C4.5 (Quinlan 1993), ITI (Utgoff et al.1997), etc. and they have some positive character-istics. They have the ability of splitting the hyperspace into subspaces and fitting each space with different models. Another good feature is the understandability.
Taking this into account, we propose two multiple classifier systems (called E-CIDIM and FE-CIDIM) *Corresponding author. Email: [email protected]
International Journal of Systems Science
ISSN 0020–7721 print/ISSN 1464–5319 onlineß2006 Taylor & Francis http://www.tandf.co.uk/journals
whose basic classifiers are decision trees. These decision trees are induced by CIDIM [control of induction by sample division method (Ramos-Jime´nezet al. 2005b)], an algorithm that will be presented briefly.
This article is organized as follows. In Section 2, we will briefly describe CIDIM and its utilization in a multiple classifier system. We will introduce E-CIDIM and how this method can take advantage from the design of CIDIM in Section 3. FE-CIDIM, the algorithm designed to speed up E-CIDIM, will be presented in Section 4. Some experimental results are shown in Section 5. Finally, in Section 6, we summarise our conclusions and suggest future lines of research.
2. CIDIM
CIDIM (Ramos-Jime´nezet al.2005b) was developed to induce accurate and small decision trees. It uses three ideas to reach this goal: it divides the training set into two subsets, it groups values, and it defines an internal bound condition for expansion. Let us comment on these characteristics in more detail:
. The top-down induction of decision trees (TDIDT) algorithms (Quinlan 1986, 1993) generally divide the set of examples into two subsets: the training subset (used to induce the tree) and the test subset (used to test the results). CIDIM makes an additional division. It divides the training subset into two new subsets with the same class distribution and similar size: the construction subset (called CNS) and the control subset (called CLS). Every node has its corresponding CNS and CLS subsets. When an expansion is made, CNS and CLS subsets of the parent node are divided into multipleCNS andCLS subsets, each one corresponding to the appropriate son node. Thus, the size of CNS and CLS decrease as the node is deeper in the tree.
. Let us consider an attribute with values
fO1,O2;. . .,Ong. If this attribute is selected to be expanded, one branch is added to the tree for each possible value. Thus, it is necessary to know all the possible values of the attribute if it is a nominal attribute. If the attribute is continuous, a previous division into intervals must be made. CIDIM uses a greedy algorithm to find groups of consecutive values. It is based on a recursive splitting of the values in groups. Initially there is a unique group with all the values of the attribute that is being considered. In each step, CIDIM evaluates if the splits will produce an improvement. The process continues until each group has only one value or until there is no improvement since previous splits. The couple (attribute, division) that
produces the best improvement is selected to expand the node.
. Usually, the expansion of one tree finishes when all examples associated with a node belong to the same class, yielding too large trees. In order to avoid this overfitting, external conditions are considered by different algorithms (C5, an updated version of C4.5, demands that at least two branches have at least a preconfigurable number of examples). CIDIM uses the following as an internal condition: a node is expanded only if its expansion improves the accuracy calculated onCLS. Tree expansion supervision is local for every node and it is driven by two indexes: the absolute index IA and the relative index IR (Equations (1) and (2)). For every step, a node is expanded only if one or both indexes are increased. If one index decrease, expansion is not made. The absolute and relative indexes are defined as
IA¼ PN
i¼1CORRECTðeiÞ
N ð1Þ
IR¼ PN
i¼1PCðeiÞðeiÞ
N ð2Þ
where N is the number of examples in CLS, e a single example, C(e) the class of the e example, PmðeÞ the probability of m class for the e example, and CORRECTðeÞ ¼1 if PCðeÞðeÞ ¼maxfP1ðeÞ,P2ðeÞ,. . .,
PkðeÞgor 0 if another case.
A description of CIDIM can be seen in the algorithm 1.
Algorithm 1
Decision trees generated by CIDIM are usually smal-ler than those obtained with other TDIDT algorithms. This allows the induction of more general trees that will also be more understandable for human experts. At the same time, the accuracy keeps similar to decision trees induced by other TDIDT algorithms.
CIDIM can be applied to any problem with a finite number of attributes. These attributes must be nominal and can be ordered or not. If the problem has continuous attributes, they can be discretized, resulting
1:CNSðConstructioN SubsetÞandCLSðControL SubsetÞ
are obtained by a random dichotomic division of the set of examples used to induce the tree
2:foreach non-leaf nodedo:
2:1:Select the best splittingðconsidering a given disorder measureÞ
2:2:ifsplitting does not improve prediction thenLabel node as a leaf-node 2:3:ifsplitting improves prediction
ordered nominal attributes. The class attribute must have a finite number of unordered classes.
These advantages have been used to solve real problems, such as system modeling (Ruiz-Go´mezet al. 2005) or modeling of prognosis of breast cancer relapse (Jerez-Aragone´set al.2003).
3. E-CIDIM
Improving the generalization of classifiers is an aim of machine learning. Voting methods try to achieve this improvement. Many algorithms have been developed (Schapire 1990, Freund 1995, Freund and Schapire 1997, Breiman 1996) and numerous studies have been made about them (Aslam and Decatur 1998, Bauer and Kohavi 1999, Kearns and Vazirani 1999).
We can divide these algorithms into two categories: those that change the data set distribution depending on the previous steps of the algorithm (usually called boosting algorithms (Schapire 1990, Freund 1995, Freund and Schapire, 1997)) and those that do not change the cited distribution [usually called bagging algorithms(Breiman 1996)].
E-CIDIM (Ramos-Jime´nez et al. 2005a) is based on the bagging scheme. It uses CIDIM as the basic classi-fier, and it induces decision trees to build the ensemble. CIDIM is a ‘‘randomized’’ algorithm that makes a random division of the training set into two subsets (CNS and CLS), and this suits the bagging scheme very well.
Two parameters are needed by E-CIDIM to induce the multiple classifier system: the maximum number of trees in the ensemble (max_number_of_trees) and the number of failed attempts before stopping (number_of_ failed_attempts). For prediction process, a voting method must be selected.
Firstly, E-CIDIM initializes the ensemble (Ensemble) with some decision trees induced by CIDIM using the training set (E). After initialization there are half as many decision trees as indicated by the parameter for maximum number of trees (max_number_of_trees). New decision trees are induced by CIDIM using the same training set (E). These decision trees are usually dif-ferent because CIDIM makes a random dichotomic divi-sion, and the induction is made with different subsets (CNS andCLS) for each execution. If the new induced tree (New_tree) has an error rate lower than the one of the worst tree in the ensemble (Worst_tree), the new induced tree is added to the ensemble. When the size of ensemble is greater than a preconfigured maximum (max_number_of_trees), the decision tree with the worst success rate is removed from the ensemble. E-CIDIM tries iteratively to incorporate new decision trees to the ensemble. When it fails consecutively a preconfigured
number of times (number_of_failed_attempts), E-CIDIM stops. A description of E-CIDIM can be seen in the algorithm 2.
Algorithm 2
When the ensemble has been induced, we can use it to predict. We have defined three kinds of voting: uniform voting, weighted by tree voting, and weighted by rule voting. Uniform voting is the simplest kind of voting: every tree gives its prediction vector and every one has the same weight. If we use a weighted voting method, we must define the weights. For weighted by tree voting, every decision tree gives its prediction vector and is weighted using the success rate of the respective decision tree, and then they are combined in a new pre-diction vector. For weighted by rule voting, every deci-sion tree gives its prediction vector and is weighted using the weight of the rule in the respective tree. The class predicted by the multiple classifier system is the majority class in the prediction vector.
We have described E-CIDIM; now, we describe how the selected voting method and the parameters influence the performance of E-CIDIM.
In table 1, we can see that uniform voting and weighted by rule voting obtain the best results. Although there are differences between them, these differences are insignificant. Thus, we have set uniform voting as the default value for the voting method because of its simplicity.
In table 2, we can see that accuracy is generally better when E-CIDIM keeps more trees in the ensemble, although there are some cases in which accuracy is not the best when E-CIDIM uses 20 trees. If we watch the
In:E,maxnumberoftrees,numberoffailedattempts 1: Ensemble¼Ø
2: Initialnumberoftrees¼ dmaxnumberoftrees=2e 3: for1toInitialnumberoftreesdo:
3:1: New tree Induce new tree with CIDIM usingE 3:2: Ensemble¼Ensemble[Newtree
4: Failedattempts¼0
5: whileFailedattempts<numberoffailedattemptsdo: 5:1: Worsttree¼ fx2EnsemblejSuccessrateðxÞ
<successrateðyÞ
8y2Ensemble^x6¼yg
5:2: Newtree Induce new tree with CIDIM usingE 5:3: iferrorrateðNewtreeÞ<errorrate
ðWorsttreeÞthen: 5:3:1: Failedattempts¼0
5:3:2: Ensemble¼Ensemble[Newtree
5:3:3: ifjEnsemblej>maxnumberoftreesthen: 5:3:3:1: Ensemble¼EnsembleWorsttree 5:4: else
5:4:1: Failedattempts¼Failedattemptsþ1 Out:Ensemble
average number of leaves in the ensemble, we can note that it is smaller when E-CIDIM keeps more trees in the ensemble. Taking these questions into account, we will set the default value ofmax number of treesto 20. There is a disadvantage when E-CIDIM keeps more trees in the ensemble: it takes more time to finish the induction, but in this case, we consider the performance of E-CIDIM to be positive.
We have set voting method andmax_number_of_trees to their default values, and now we study the influence of
number_of_failed_attempts to the performance of E-CIDIM. As we can see in Table 3, the smallest trees and the most accurate trees are induced whennumber_ of_failed_attempts is lower. In addition, the fastest executions are the ones with lower number_of_failed_ attempts too. Thus, we will set the default value of number_of_failed_attemptsto 5.
4. FE-CIDIM
FE-CIDIM has been designed to speed up the conver-gence of E-CIDIM. Its main goal is to reduce the time taken by the algorithm to stop. This is the reason for its name: fast E-CIDIM.
The algorithm E-CIDIM adds new trees to the ensem-ble if they are better than the worst tree in it, and then the counter of failed attempts (Failed_attempts) is reset to zero and the algorithm must try a minimum of number_of_failed_attemptsnew trees to find the solution (and these new trees must be worse than the worst one to stop). But, sometimes, the differences between the worst tree and the new one are very small, and it would be desirable not to include the new tree in the ensemble and not to reset the counter (Failed_attempts). This is the basic idea introduced by FE-CIDIM.
As can be seen in the algorithm 3, we have added a new parameter called . This parameter has been used to restrict the condition for adding a new tree to the ensemble. The only differences between the algorithms E-CIDIM and FE-CIDIM are the inclusion of a new parameter (), and its utilization in the condition of the
Table 2. Number of leaves, accuracy and execution time depending on the maximum number of decision trees in the ensemble.
Data set max_number_of_trees Leaves Accuracy Time (ms.)
Balance 5 70.140.79 75.972.06 425.5435.98
10 69.030.99 77.571.21 673.7777.76
20 67.741.16 78.831.41 1034.29108.54
Ecoli 5 36.720.87 80.270.92 913.33115.80
10 36.200.72 80.600.63 1392.2999.72
20 35.830.43 80.540.86 2295.01139.75
Ionosphere 5 20.360.33 90.050.88 1745.71127.07
10 20.260.40 90.430.61 2709.53248.81
20 19.890.22 90.600.68 4266.83285.88
Pima 5 41.232.06 73.290.69 608.81116.14
10 40.021.08 73.510.56 960.5482.91
20 38.381.30 73.860.43 1526.34124.11
Wdbc 5 20.620.46 95.120.86 1588.23157.61
10 20.070.35 95.200.65 2345.18194.93
20 19.850.15 95.120.51 3876.90210.14
Configuration: uniform voting andnumber_of_failed_attempts¼10 Boldface text indicates best values.
Table 1. Comparison between voting methods. Configuration:max_number_of_trees¼10 and
number_of_failed_attempts¼0.
Data set Voting method Accuracy
Balance Uniform 77:570:65
Weighted by tree 77:550:62 Weighted by rule 77:770:45
Ecoli Uniform 80:600:56
Weighted by tree 80:660:54 Weighted by rule 80:930:46
Ionosphere Uniform 90:430:01
Weighted by tree 90:250:01 Weighted by rule 90:390:01
Pima Uniform 73:510:63
Weighted by tree 73:580:65 Weighted by rule 73:980:73
Wdbc Uniform 95:201:21
Weighted by tree 95:151:20 Weighted by rule 95:151:58
sentence 5.3. Now, to include a new tree in the ensemble, this must be more accurate than the worst tree in the ensemble and the difference must be greater than a mini-mum improvement factor (determined by).
Algorithm 3
The allowed values for are inð0, 1. When the value ofis one, we have a particularized case of FE-CIDIM: the algorithm E-CIDIM. As we use lower values for, the time taken to get a solution is lower too. We have
experimented with different values for (1.0, 0.95, 0.9, 0.8, 0.7, 0.5, 0.3, 0.2, 0.1 and 0.01), some of them (1.0, 0.8 and 0.5) are presented in table 4 and, for some experiments (Ecoli and Pima), we have plotted all the values (Figure 1).
As we can see in table 4, the accuracy keeps similar for different values of, none of the configuration is clearly the best taking into account the accuracy and the differ-ences are not significant. But the improvement of FE-CIDIM is clear if we consider the execution time and the average number of leaves in the ensemble. The best value for will depend on the problem, but we will set the default value of to 0.5.
5. Experimental Results
The experiments we have done and the results we have obtained are now exposed. Before we go on to deal with the particular experiments, we must explain some questions:
. The data sets we have used have been taken from the
UCI Machine Learning Repository (Blake and Merz, 2000) and are available online. For the first experi-ment, where we present how the performance of an isolated CIDIM algorithm can be improved by using a multiple classifier system, we have used five data sets (balance-scale, ecoli (E. coli or E scherichia coli), ionosphere, pima-indians-diabetes, and breast-cancer-wisconsin). All of their attributes are continu-ous and they have been discretized because CIDIM is designed for dealing with nominal variables
Table 3. Number of leaves, accuracy and execution time depending on the number of failed attempts to induce a better decision tree.
Data set number_of_failed_attempts Leaves Accuracy Time (ms.)
Balance 5 64.470.76 78.881.30 619.1448.44
10 67.741.16 78.831.41 1034.29108.54
20 70.830.86 78.021.22 1891.98115.39
Ecoli 5 34.660.36 81.080.66 1359.22114.26
10 35.830.43 80.540.86 2295.01139.75
20 37.110.49 80.390.76 4121.79229.32
Ionosphere 5 19.370.28 90.400.81 2539.77216.45
10 19.890.22 90.600.68 4266.83285.88
20 20.530.24 90.340.57 7558.41494.31
Pima 5 34.580.46 73.550.52 912.3045.60
10 38.381.30 73.860.43 1526.34142.11
20 42.641.09 73.680.46 2795.18300.12
Wdbc 5 19.150.28 95.290.55 2179.77133.89
10 19.850.15 95.120.51 3876.90210.14
20 20.520.23 94.870.81 6847.52309.11
Configuration: uniform voting andmax_number_of_trees¼20 Boldface text indicates best values.
In : E,max numberoftrees,numberoffailedattempts, 1: Ensemble¼Ø
2: Initialnumberoftrees¼ dmaxnumberoftrees=2e 3: for1toInitialnumberoftreesdo:
3:1: Newtree Induce new tree with CIDIM usingE 3:2: Ensemble¼Ensemble[Newtree
4: Failedattempts¼0
5: whileFailedattempts<numberoffailedattemptsdo: 5:1: Worsttree¼ fx2EnsemblejsuccessrateðxÞ
<successrateðyÞ
8y2Ensemble^x6¼yg
5:2: Newtree Induce new tree with CIDIM usingE 5:3: iferrorrateðNewtreeÞ< errorrate
ðWorsttreeÞthen: 5:3:1: Failedattempts¼0
5:3:2: Ensemble¼Ensemble[Newtree
5:3:3: ifjEnsemblej>maxnumberoftreesthen : 5:3:3:1: Ensemble¼EnsembleWorsttree 5:4: else
5:4:1: Failedattempts¼Failedattemptsþ1 Out:Ensemble
Table 4. Number of leaves, accuracy and execution time depending on the values for parameter.
Data set Leaves Accuracy Time (ms.)
Balance 1.0 64.470.76 78.881.30 646.4874.49
0.8 57.670.84 78.400.70 304.7637.82
0.5 53.061.62 78.840.86 169.782.04
Ecoli 1.0 34.660.36 81.080.66 1370.68114.61
0.8 33.150.36 80.780.85 746.5296.36
0.5 31.100.61 80.730.53 381.945.02
Ionosphere 1.0 19.370.28 90.400.81 2561.72224.70
0.8 18.840.23 90.650.81 1895.61183.18
0.5 18.360.24 90.970.58 1080.7862.76
Pima 1.0 34.580.46 73.550.52 924.8146.40
0.8 23.251.09 74.260.62 255.658.25
0.5 22.721.17 74.200.53 243.145.16
Wdbc 1.0 19.150.28 95.290.55 2233.67135.15
0.8 18.530.51 95.220.53 1727.26180.10
0.5 16.410.47 95.190.64 937.95125.96
Configuration: uniform voting,max_number_of_trees¼20 andnumber_of_failed_attempts¼5. Boldface text indicates best values.
(ordered or unordered). For the second experiment, where we make a more detailed study of E-CIDIM and FE-CIDIM, we have used five additional data sets (breast-cancer, careval, KR-vs-KP, mushroom and nursery). In these cases, the attributes are nom-inal. Adding these five new data sets to the second experiment, we increase the diversity of the experi-mental study including unordered attributes and larger data sets (mushroom and nursery).
. E-CIDIM and FE-CIDIM have been compared with other well-known methods: bagging (Breiman 1996) and boosting (Freund and Schapire 1996). For the experiments, we have used the implementation of bagging and boosting given in Weka (Witten and Frank 2000). These two algorithms have been executed using J48 (implementation of C4.5) as their basic classifier. We have configured E-CIDIM with its default configuration (uniform voting, max_number_of_trees¼20 and number_of_failed_ attempts¼5). Considering this, bagging and boosting have been configured to do 20 iterations. FE-CIDIM has been configured with its default configuration too (default configuration of E-CIDIM and ¼0.5). All experiments have been run using a PC a Pentium IV processor and 512 MB memory.
. For each experiment, the presented values for accu-racy, average size of trees and execution time have been obtained from a 1010 fold cross-validation. Average and standard deviation values are given. To compare results, a statistical test must be made
(Herrera et al. 2004), and a t-test has been conducted using the results of the cited 1010 fold cross-validation. The t-test values have been calculated using the statistical package R (R Development Core Team 2004). A difference is con-sidered as significant if the significance level of the t-test is better than 0.05. We have selected the results obtained by E-CIDIM with default configuration as the reference values. Thus,indicates that the value is significantly better than the one of E-CIDIM. sig-nifies that the value is significantly worse than the one of E-CIDIM. In addition to this comparison, the best result for each experiment has been emphasized using numbers in boldface.
The first experiment (Table 5) shows that the solution achieved by an isolated tree induced by CIDIM can be improved using an ensemble of trees. In this case, we have only used E-CIDIM with two variants.
In the second experiment (Table 6), we have compared the results given by bagging, boosting, E-CIDIM, and FE-CIDIM.
Having obtained the results shown in table 6, we can reach some conclusions:
. The average size of the decision trees induced by E-CIDIM is significantly smaller than those induced by bagging or boosting for almost every experiments we have done (there are two exceptions: balance and KR-vs-KP data sets). Here we can see one of the advantages of using CIDIM as the basic classifier: the algorithm induces small decision trees. Even more, if we use FE-CIDIM we almost always get the smallest ensembles (there is an exception with KR-vs-KP data set) and the differences are always sig-nificantly better if we take E-CIDIM as the reference.
. The accuracy reached by E-CIDIM is significantly better than the accuracy reached by bagging or boost-ing in many cases. To induce small decision trees with-out losing too much accuracy has a foundation in the way in which E-CIDIM makes good use of CIDIM’s advantages. It improves the isolated performance of an unique CIDIM combining them in an ensemble. This can be seen in table 5. If we consider the accuracy achived by FE-CIDIM, we can see that it is compar-able with the accuracy of E-CIDIM and in some cases (ionosphere and pima data sets), it is signifi-cantly better.
. The execution time of E-CIDIM is frequently the
worst, but this can be overcome using the algorithm FE-CIDIM. As can be seen, the execution time of FE-CIDIM is always significantly better than the one of E-CIDIM and the differences are great. On the other hand, the time taken by FE-CIDIM is comparable with the time taken by bagging or boosting (usually better but sometimes worse). Table 5. Comparison between CIDIM, E-CIDIM with
max_number_of_trees¼10 and E-CIDIM with default configuration.
Data set Algorithm Accuracy
Balance CIDIM 68.481.18
E-CIDIM-10 77.361.33
E-CIDIM-20 78.881.30
Ecoli CIDIM 77.290.80
E-CIDIM-10 80.490.89
E-CIDIM-20 81.080.66
Ionosphere CIDIM 88.711.64
E-CIDIM-10 90.390.85
E-CIDIM-20 90.400.81
Pima CIDIM 73.290.74
E-CIDIM-10 73.630.59
E-CIDIM-20 73.550.52
Wdbc CIDIM 92.481.05
E-CIDIM-10 95.150.48
E-CIDIM-20 95.290.55
Average values and standard deviations are given. Significance tests are with respect to E-CIDIM with default configuration.
¼value is significantly worse than the one of the E-CIDIM. Boldface emphasizes best values
6. Conclusions
This article introduces E-CIDIM and FE-CIDIM, two multiple classifier systems that use CIDIM as its basic classifier. CIDIM is an algorithm that induces small and accurate decision trees and E-CIDIM takes advantage of their characteristics. Thus,
E-CIDIM improves the isolated performance of an unique CIDIM combining them in an ensemble. On the other hand, FE-CIDIM makes some adjustments to improve some aspects of E-CIDIM.
We have compared results obtained with E-CIDIM, FE-CIDIM, bagging, and boosting over different data sets, and we can note some questions. E-CIDIM and Table 6. Comparison between bagging, boosting, E-CIDIM, and FE-CIDIM.
Data set Algorithm Leaves Accuracy Time (ms.)
Balance Bagging-20 53.660.72 74.291.39 184.203.68
Boosting-20 99.551.71 72.711.11 317.601.96
E-CIDIM-20 64.470.76 78.881.30 646.4874.49
FE-CIDIM-20 53.061.62 78.840.86 169.782.04
Cancer Bagging-20 28.901.09 73.220.69 179.9019.80
Boosting-20 46.350.80 67.911.47 241.8040.61
E-CIDIM-20 13.312.19 69.330.82 178.4031.01
FE-CIDIM-20 7.791.50 69.580.44 64.915.84
Car Bagging-20 120.410.57 93.440.33 539.5067.04
Boosting-20 153.770.84 96.240.23 1045.50107.71
E-CIDIM-20 65.110.93 92.960.73 943.5430.89
FE-CIDIM-20 51.231.97 90.970.81 307.6136.82
Ecoli Bagging-20 65.850.80 78.430.94 172.9031.22
Boosting-20 89.732.18 75.751.45 196.7011.52
E-CIDIM-20 34.660.36 81.080.66 1370.68114.61
FE-CIDIM-20 31.100.61 80.730.53 381.945.02
Ionosphere Bagging-20 40.200.73 88.880.55 342.503.31
Boosting-20 51.390.46 92.330.90 399.0042.25
E-CIDIM-20 19.370.28 90.400.81 2561.72224.70
FE-CIDIM-20 17.780.24 90.970.58 1080.7862.76
KR-vs-KP Bagging-20 29.020.17 99.410.08 4091.35234.66
Boosting-20 46.230.65 99.630.08 8159.251030.37
E-CIDIM-20 30.480.35 99.290.09 5376.20349.77
FE-CIDIM-20 29.490.34 99.270.11 4009.72158.94
Mushroom Bagging-20 24.410.08 100.000.00 2541.50128.76
Boosting-20 24.880.15 100.000.00 215.2058.80
E-CIDIM-20 13.040.03 100.000.00 2176.50155.40
FE-CIDIM-20 13.040.03 100.000.00 2176.42153.90
Nursery Bagging-20 337.9262.81 97.460.14 4710.10370.26
Boosting-20 512.071.90 99.650.05 8638.55302.50
E-CIDIM-20 238.071.76 97.840.09 7095.45471.63
FE-CIDIM-20 218.911.45 97.530.10 1871.6841.12
Pima Bagging-20 96.881.28 72.990.68 520.9039.04
Boosting-20 135.932.72 70.711.20 681.8014.08
E-CIDIM-20 34.580.46 73.550.52 924.8146.40
FE-CIDIM-20 22.721.17 74.200.53 243.145.16
Wdbc Bagging-20 56.511.03 93.950.69 421.8071.34
Boosting-20 82.350.88 95.380.37 609.4026.22
E-CIDIM-20 19.150.28 95.290.55 2233.67135.15
FE-CIDIM-20 16.410.47 95.190.64 937.95124.77
Average values and standard deviations are given. Significance tests are with respect to E-CIDIM.
¼value is significantly better than the one of E-CIDIM.
¼value is significantly worse than the one of E-CIDIM. Boldface emphasizes best values.
FE-CIDIM induces ensembles of trees whose sizes are very small while the accuracy that it is achieved by E-CIDIM is comparable with the accuracies achieved by bagging and boosting. Thus, we can conclude that E-CIDIM has a reasonably good performance. However, there is a disadvantage: E-CIDIM takes more time than bagging or boosting to induce the ensemble. To overcome this problem, we can use the algorithm FE-CIDIM, that is several times faster.
Out aim of improving E-CIDIM and FE-CIDIM involves two issues:
. We are working to improve the CIDIM algorithm providing it the ability of working with continuous attributes. In this way, we will not have to discretize real variables to nominal ordered variables and an automatic execution of CIDIM (or E-CIDIM or FE-CIDIM) will be made.
. We are also working to automatize the selection of the parameters. Thus, no previous configuration would be needed.
Acknowledgement
This work has been partially supported by the MOISES-TA project, number TIN2005-08832-C03-01, of the MEC, Spain.
References
J.A. Aslam and S.E. Decatur, ‘‘General bounds on statistical query learning and PAC learning with noise via hypothesis boosting’’, Inform. Comput., 141, pp. 85–118, 1998.
E. Bauer and R. Kohavi, ‘‘An empirical comparison of voting classifi-cation algorithms: Bagging, Boosting and variants’’,Mach. Learn., 36, pp. 105–139, 1999.
C. Blake and C.J. Merz, ‘‘UCI repository of machine learning data-bases’’, University of California, Department of Information and Computer Science, 2000.
L. Breiman, ‘‘Bagging Predictors’’,Mach. Learn., 24(2), pp. 123–140, 1996.
L. Breiman and J.H. Friedman and R.A. Olshen and C.J. Stone, ‘‘Classification and Regression Trees’’, Chapman and Hall, New York, 1984.
C. Ferri and P. Flach and J. Herna´ndez-Orallo, ‘‘Delegating classifiers’’, inProceedings of the 21st International Conference on Machine Learning, Omnipress, 2004.
Y. Freund, ‘‘Boosting a weak learning algorithm by majority’’,Inform. Comput., 121, pp. 256–285, 1995.
Y. Freund and R. E. Schapire, ‘‘Experiments with a new boosting algorithm’’, inProceedings of the 13th International Conference on Machine Learning, 1996, pp. 146–148.
Y. Freund and R. E. Schapire, ‘‘The strength of weak learnability’’, J. Comput. Syst. Sci., 55, pp. 119–139, 1997.
J. Gama and P. Brazdil, ‘‘Cascade generalization’’,Mach. Learn., 41, pp. 315–343, 2000.
F. Herrera and C. Herva´s and J. Otero and Luciano Sa´nchez, Tendencias de la Minerı´a de Datos en Espan˜a, Red Espan˜ola Minerı´a Datos, chapter Un estudio empı´rico preliminar sobre los tests estadı´sticos ma´s habituales en el aprendizaje automa´tico, 2004.
J.M. Jerez-Aragone´s and J. A. Go´mez-Ruiz and G. Ramos-Jime´nez and J. Mun˜oz-Pe´rez and E. Alba-Conejo, ‘‘A combined neural network and decision trees model for prognosis of breast cancer relapse’’,Artif. Intell. Med., 27(1), pp. 45–63, 2003. M.J. Kearns and U.V. Vazirani, ‘‘On the boosting ability of top-down
decision tree learning algorithms’’, J. Comput. Syst. Sci., 58, pp. 109–128, 1999.
J.R. Quinlan, ‘‘Induction of Decision Trees’’, Mach. Learn., 1, pp. 81–106, 1986.
J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, San Francisco, 1993.
R Development Core Team,R: A language and environment for statis-tical computing, R Foundation for Statisstatis-tical computing, Vienna, Austria. 3-900051-07-0. http://www.R-project.org.
G. Ramos-Jime´nez, J. Campo-A´vila and R. Morales-Bueno, ‘‘E-CIDIM: Ensemble of CIDIM classifiers’’, Lecture Notes in Artificial Intelligence, 3584, pp. 108–117, 2005a.
G. Ramos-Jime´nez, and J. Campo-A´vila, and R. Morales-Bueno, ‘‘Induction of decision trees using an internal control of induction’’, Lecture Notes in Computer Science, 3512, pp. 795–803, 2005b. J. Ruiz-Go´mez, G. Ramos-Jime´nez, J. del Campo-A´vila, A.
Garcı´a-Cerezo, and R. Morales-Bueno, ‘‘Modelado de un robot mo´vil basado en aprendizaje inductivo’’, inProceedings of the VI Jornadas de Transferencia Tecnolo´gica de Inteligencia Artificial (TTIA-2005), Granada, 2005, pp. 175–182.
R.E. Schapire, ‘‘The strength of weak learnability’’,Mach. Learn., 5, pp. 197–227, 1990.
P.E. Utgoff and N.C. Berkman and J.A. Clouse, ‘‘Decision Tree Induction Based on Efficient Tree Restructuring’’, Mach. Learn., 29(1), pp. 5–44, 1997.
I.H. Witten and E. Frank,Data Mining: Practical machine learning tools with Java implementations, Morgan Kaufmann, San Francisco, 2000.
D. Wolpert, ‘‘Stacked generalization,Neural Networks, 5, pp. 241–260, 1992.