6. Marco Teorico
6.3 La Educación Para Los Pueblos Indígenas En Colombia
6.3.4 La Etnoeducaciòn en Colombia
6.3.4.4 La Etnoeducación en el Pueblo Wayuu
6.3.4.4.2 Antecedentes Etnoeducativos del Pueblo Wayuu
For this set of experiments, we focus on poorly predicted classes for datasets with twenty or fewer classes but more than twenty shapelets. The reason for restricting our interest to datasets with twenty or fewer classes is the relationship between the number of shapelets and the number of classes. With twenty or fewer classes, we can retain at least one shapelet from each class, whereas we must lose shapelets
0.0 0.2 0.4 0.6 0.8 1.0 F1 Rule set 0.0 0.2 0.4 0.6 0.8 1.0 F1 Ensemble Classifier
Ensemble Classifier better here
Rule set better here
Figure 6.4: Comparison of F 1 score between rule set and ensemble classifier on all classes of low-dimensionality data. The ensemble is better on 59 of the 100 classes, the rule set is better on 41.
representing some classes if there are more than twenty classes. We examine such cases in Section 6.5.4.
The traditional way to make Apriori tractable on larger datasets is to increase the minimum confidence and support constraints. We use a single record as the minimum support, and the base incidence rate of the class of interest in the training set as the minimum confidence. As shown in Section 6.7, using a higher minimum confidence and support alters the character of the rule set, and can eliminate rules that may be of interest, such as high confidence, low support exception rules. Such rules are particularly important for nugget discovery, as rules that target a minority class are very likely to be exception rules, simply because the number of records of that class is low. Prima facie, this method does not seem appropriate for our approach, as we are interested in classes that are difficult to predict. By using higher minimum support and confidence values, we may miss the rules we are looking for (see Section 6.7). Regardless, as a comparison method, we attempt to use the built-in constraints of
Apriori to create rule sets on higher-dimensional data by adjusting the parameter settings.
The first problem we encounter is that there appears to be no principled way to set a minimum support value, beyond trying a range of values. Different datasets have different itemsets for different classes; a support value that works for one class of a dataset may not work for another. Finding parameter settings that work is a time-consuming process, and one that may not result in the best rule set.
The second problem is that there is very little difference between parameter set- tings that will create an empty rule set, and settings that will deliver an explosion in time or space usage by the algorithm. Apriori was not designed to work with very high-dimensionality data, and small changes in the minimum support can have large effects on how the algorithm operates. For example, the DistalPhalanxTW dataset has 912 shapelets. If the minimum support is set to 19 records, no rules are gener- ated. If the minimum support is decreased by one record, the smallest granularity possible, there is an explosion in the space requirement, and the software crashes due to inadequate RAM. This is the case even when the minimum confidence is set to 1. It may be possible to produce a rule set on the DistalPhalanxTW dataset by tak- ing advantage of high-performance computing facilities with much greater quantities of RAM, but it seems likely that time would be a factor even with sufficient space. This is a consequence of the Apriori algorithm being exponentially complex in the attribute space.
Because of these problems, we do not make use of Apriori’s built-in constraints to deal with high-dimensionality data. Instead, we experiment with three methods of reducing the dimensionality of the data: truncation, class-specific truncation, and clustering. For each of these methods, we reduce the dimensionality to 20 shapelets, a size that is tractable for every problem we use.
The first dimensionality-reduction method is based on truncating the shapelet data to the first 20 shapelets. The attributes in the binary shapelet data are ordered, with the first shapelet being the most discriminative. To create the truncated data, we first ensure that at least one shapelet from each class is included. Then we add those shapelets that are higher in the order until 20 shapelets have been included.
The second dimensionality-reduction method we test involves keeping only those shapelets that correspond to the class we are attempting to predict. Again, we restrict the data to 20 shapelets.
The final approach we try uses our existing clustering method to reduce the di- mensionality of the data. The datasets are clustered using MDLStopCE clustering (see Section 5.6.2), a method that does not require any parameters. To reduce the dimensionality of the datasets with more than 20 shapelets, we enforce hierarchical clustering until there are twenty clusters, and select the best shapelet from each clus- ter to represent the cluster. We perform the binary transform on the data using the class transform approach, and use the correlation filter to remove any attributes that are entirely positively or negatively correlated.
We compare the three dimensionality-reduction methods using a Friedman test at a significance level of 0.01. The results are shown in Figure 6.5. There is no signif- icant difference between the three methods of reducing dimensionality. We continue our experiments and evaluation using truncation, as it is the simplest of the three methods.
We test the performance of nugget discovery on truncated data by comparing the F 1 values with those of the ensemble (the classes we use are all poorly predicted by the ensemble). We use a Wilcoxon Signed Rank test with a significance level of 0.01. The test shows that there is no significant difference between the performance of the ensemble and the performance of the rule set.
CD 3 2 1 1.7258 Truncated 2.0645 Class Specific 2.2097 Clustered
Figure 6.5: Critical-difference diagram comparing three methods of reducing dimen- sionality in terms of the F 1 score of the ensemble on the medium-dimensionality datasets reduced using each method. There is no significant difference between the methods.
Figure 6.6 shows the differences sorted by the original F 1 score of the ensemble on that class. We see that nugget discovery performs better than the ensemble (indicated by negative values) where the initial F 1 score is very poor. As the performance of the ensemble increases, the performance of nugget discovery on truncated data decreases. Interestingly, the relationship is stronger when the absolute performance of the ensemble is considered, rather than the performance of the ensemble relative to the base incidence of the class in the training data.
These findings suggest that nugget discovery on truncated data, despite not being significantly better than the ensemble, may still be useful in situations where perfor- mance is especially poor. A good example of this is the Earthquakes dataset, where the ensemble performs very poorly on class 1, achieving an F 1 score of only 0.0541, while nugget discovery performs very well, scoring 0.444. In cases like this, nugget discovery may be a useful way to improve predictive accuracy on a minority class, even where the dimensionality of the shapelet data has been severely restricted (in the case of Earthquakes, from 2807 shapelets to 20).
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
F1 of Ensemble
0.4
0.2
0.0
0.2
0.4
0.6
Difference in F1 between Ensemble and Rule Set
Figure 6.6: Differences in F 1 between ensemble and nugget discovery on poorly pre- dicted classes of medium-dimensionality data, sorted by the F 1 of the ensemble on that class.
shapelet data is to restrict it to cases where the performance of the ensemble is very poor in absolute terms, as nugget discovery is unlikely to offer any greater accuracy than the ensemble if the accuracy is not very low, even if the ensemble is performing poorly relative to the incidence of the class in the training data. For the datasets we examine, no classes on which the ensemble has an F 1 greater than 0.5 benefit from nugget discovery, and the general trend is an increasing improvement offered by nugget discovery as the ensemble F 1 score decreases.