• No se han encontrado resultados

6. PROPUESTA DIDÁCTICA

6.4. EXPLICACIÓN DE LAS TAREAS

6.4.2. Tarea 1: ¿Conozco mi cuerpo?

The three data mining tools were compared using four performance criteria, namely:

1. the classification accuracy of the rule set on training instances;

2. the generalization ability, measured as the classification accuracy on a test set; 3. the number of rules in the rule set; and

4. the average number of conditions per rule.

While the first two criteria quantify the accuracy of rule sets, the last two express the complexity, hence comprehensibility, of rule sets.

For each database, each of the three tools was applied 30 times on 30 randomly constructed training and test sets. Each triple of simulations (i.e., BGP, C4.5 and CN2) was done on the same training and test sets. Results reported for each of the performance criteria are averages over the 30 simulations, together with 95% confidence intervals. Paired t-tests were used to compare the results of each two algorithms in order to determine if there is a significant difference in performance. For each of the datasets used for experimentation, the optimal values for the BGP parameters, as summarized in Table 1, were first determined through cross- validation.

Results

Tables 2 and 3 show the mean accuracy on training and test sets over 30 runs for each algorithm. The confidence intervals and the standard deviations of the

Table 1: BGP system parameters

Parameter Data Set

Ionosphere Iris Monks1 Monks2 Monks3 Pima-diabetes

c 0.1 0.1 0.1 0.1 0.1 0.1 L 0.0 0.0 0.0 0.0 0.0 0.0 Tournament Size, k 20 10 10 10 10 20 Initial temperature, T0 2000 300 200 1500 500 2000 Prob. RHS is Attribute, PA 0.1 0.2 0.7 0.2 0.1 0.1 Mut. On RHS, MRHS 0.2 0.7 0.7 0.4 0.3 0.2

Mut. On Rel. Op., MRO 0.4 0.2 0.2 0.2 0.3 0.4

Probability Pruning, PP 0.5 0.2 0.2 0.5 0.5 0.5 Probability Crossover, PC 0.5 0.5 0.8 0.5 0.5 0.5

accuracies are also given. The stars in the table indicate for each task which algorithm has the highest accuracy. Table 2 shows the accuracies on the training set. This table consistently shows that CN2 has the highest training accuracy on each task. Table 3, which summarizes the accuracies on the test set, shows that CN2 overfits the training data, since it does not perform well on the test set. The accuracies obtained by CN2 and C4.5 on the Monks1 task were very consistent. Each run resulted in perfect classification. When the data set of a task does not contain any noise, the two algorithms CN2 and C4.5 will most probably find a perfect classifier. The Monks2 problem is one of the exceptions to this rule, because it does not contain any noise and still has an accuracy of about 63% for CN2 and C4.5. For Monks3 there was only one available data set, so it was not possible to perform several runs of the algorithms. Therefore, for this task no confidence intervals, standard deviation or t-test were calculated.

The results of the t-tests are given in Table 4. For the Iono task, both CN2 and C4.5 obtained significantly better results than BGP. On the other hand, BGP performed significantly better than both CN2 and C4.5 on the Monks2 data set. On the remaining tasks the differences in mean accuracies were not found to be of significant size. BGP lacks the exploration power to find a classifier in a search that involves many continuous attributes, like the Iono task. This could be improved by adding a local search on the threshold level.

Table 3: Accuracy on test set, including confidence levels at 95% probability and standard deviation of accuracy. The star indicates the best accuracy on the given task.

Task BGP CN2 C4.5 Test Standard Test Standard Test Standard Accuracy Deviation Accuracy Deviation Accuracy Deviation Iono 0.892 ± 0.111 0.037 0.921 ± 0.097 0.040 0.979 ± 0.051* 0.007 Iris 0.941 ± 0.085 0.027 0.943 ± 0.083 0.034 0.945 ± 0.082* 0.030 Monks1 0.993 ± 0.029 0.025 1.000 ± 0.000* 0.000 1.000 ± 0.000* 0.000 Monks2 0.684 ± 0.166* 0.040 0.626 ± 0.173 0.039 0.635 ± 0.172 0.051

Monks3 0.972* n/a 0.907 n/a 0.963 n/a

Pima 0.725 ± 0.160 0.031 0.739 ± 0.157* 0.024 0.734 ± 0.158 0.025 Table 2: Accuracy on training set, including confidence levels at 95% probability and standard deviation of accuracy. The star indicates the best accuracy on the given task.

Task BGP CN2 C4.5

Training Standard Training Standard Training Standard Accuracy Deviation Accuracy Deviation Accuracy Deviation Iono 0.895 ± 0.120 0.013 0.989 ± 0.038* 0.003 0.979 ± 0.051 0.007 Iris 0.967 ± 0.064 0.012 0.987 ± 0.040* 0.012 0.982 ± 0.047 0.010 Monks1 0.994 ± 0.026 0.022 1.000 ± 0.000* 0.000 0.999 ± 0.008 0.003 Monks2 0.715 ± 0.161 0.012 0.992 ± 0.030* 0.004 0.769 ± 0.150 0.049

Monks3 0.934 n/a 1.000* n/a 0.951 n/a

In comparing the three algorithms, the biggest difference was not in the resulting accuracies, but in the mean number of rules extracted. As shown in Table 5, the classifier of the BGP algorithm used consistently less rules than the classifiers that resulted from CN2 and C4.5. What is especially striking in these results is that the BGP algorithm performs no tree or rule pruning of the best individual, in contrast to both C4.5 and CN2. The difference in the number of rules extracted is nicely illustrated on the Monks2 task, where BGP extracted on average six rules, while CN2 extracted 122.8 rules, and C4.5 extracted 13.9 rules. The mean number of conditions per rule for BGP is slightly larger in the Iono and Iris task, but smaller in the remaining tasks, showing that BGP managed to extract more crisp rules for most of the tasks.

The running time of the algorithms was not mentioned among the performance criteria in comparing the algorithms, but since big differences in running time for BGP versus CN2 and C4.5 were observed, it seems apt to discuss this topic here. Every time a recombination operator, like crossover, is applied to a decision tree, the training instances need to be redivided to the leaf nodes of the decision tree. Thus, the time complexity of one generation of BGP is in the order of R * (N * P), where

Table 4: Comparison between BGP and CN2, and BGP and C4.5 using t-tests over 30 training and test sets to determine confidence intervals at 95%. A ‘+’ means that BGP showed better results than the algorithm it is compared to and ‘-’ means BGP’s results wore worse. The bold font indicates that one method is significantly better than the other methods.

Task BGP vs. CN2 BGP vs. C4.5 Iono -0.0286 ± 0.0263 -0.0385 ± 0.0142 Iris -0.0237 ± 0.0267 -0.00400 ± 0.0115 Monks1 0.00657 ± 0.00933 -0.00657 ± 0.00933 Monks2 +0.0576 ± 0.0165 +0.0485 ± 0.0154 Pima -0.0132 ± 0.0160 -0.00844 ± 0.0190

Table 5: Mean number of rules per run and mean number of conditions per rule for each of the tasks and each of the algorithms. The star indicates for each row the smallest number of rules.

Task BGP CN2 C4.5

Average nr. Average nr. Average nr. Average nr. Average nr. Average nr.

Rules conditions rules conditions Rules conditions

Iono 4.70* 2.39 17.07 2.35 8.57 2.15 Iris 3.37* 2.02 5.33 1.64 4.10 1.60 Monks1 4.37* 2.22 18.0 2.37 21.5 2.73 Monks2 6.00* 2.96 122.8 4.53 13.9 3.01 Monks3 3* 1.67 22 2.17 12 2.77 Pima 3.70* 1.97 35.8 2.92 12.73 3.90

R is the number of recombination operators, N is the number of training instances and P is the number of individuals in a population. For k generations the time complexity is linear, of the order O(k * (R * (N * P))). BGP has a much longer running time than the other two algorithms CN2 and C4.5 (in the order of hours versus minutes). This is a serious disadvantage of BGP. The computationally complexity of BGP limits the application of this tool to databases of small sizes. However, strategies such as local search and windowing (as employed in C4.5) can be used to decrease the computational complexity of BGP.

Documento similar