• No se han encontrado resultados

The performance of a classifier is defined as inversely related to the number of classification errors it produces. The performance of a solution to a classification problem does not merely depend on the fit of the classifier to the problem, or more precisely, we can influence the performance not only by choosing the right classifier, there are also different ways of augmenting the performance.

This section discusses such augmentation techniques, namelybagging[10] and boosting[18]. The techniques both use the principle of taking multiple classifiers or classifier instances and having them vote on the class of an instance. The majority decides the class.

Bagging –an acronym for bootstrap aggregating– uses an unweighted vote. That means that every classifier’s vote contributes equally to the choice of class for the instance. Bagging can be used on a multitude of classifiers, but achieves the best performance when used with a classifier that itself performs well, as tested with the C4.5/J48 tree classifier mentioned before [10].

The bagging technique takes multiple classifiers and trains them on a sub set of the training data. It then aggregates the results, thereby pulling itself up “by the bootstraps”. More formally, the procedure starts with a learning set L, which is split into k subsets Lk. For each of these subsets, a predictor

φ(x,Lk) is trained, where the classy is predicted by inputx(Predictor is used

synonymously with classifier). What this means is that each predictor votes on the class fory, the class with the most votes is chosen as the label fory.

The bagging procedure performs best with unstable results for the classifiers, because this makes it possible to filter out bad results by voting. Breiman refers to his technique as “making a silk purse out of a sow’s ear” to describe that it can take a reasonably performant classifier and turn it into one that performs remarkably well. A downside of the technique that it obscures its inner workings, as the underlying classifiers that produce the result are hidden from view. Boosting is a technique similar to bagging, in the sense that it uses a vote on the classification of an instancey. However, where it differs is that in boosting,

Chapter 5. Literature Research

different weights are assigned to classifiers. In essence, boosting algorithms give more weight to the training instances that are more difficult to train.

A well known boosting algorithm called AdaBoost was created by Freund and Schapire, as described in [17] [18]. This takes a weak learning classifier – or one that does not perform well on a training set– and “boosts” its results. The algorithm, more specifically the variant for two or more possible classes Adaboost.M2 [18], takes the steps below to reach a hypothesis on the class y

belonging to some instancex.

1. Use the classifier to give a vector (0,1)k of class probabilities for the k

classes, by feeding it a distribution of example-incorrect label pairs. This distribution is called a mislabel distribution, which Freund and Schapire define as a distributionDtover the setBof pairs (i, y) whereiis a training

example andy a class incorrectly associated with that example (formally,

B = {(i, y) :i ∈ {1. . . m}, y 6= yi}, for a total number of instances m).

InDtweights are assigned to each mislabeled pair, where higher weights

stand for more difficult examples.

2. The classifier produces a vector X×Y → [0,1] for the pairings, where 1 stands for high probability, 0 for low. What this means is that if the classifier gives a value (near) one for a pairing of instance iand (correct) class label yi and a value near zero for the pairing of that instance and

mislabely, it concludes that the (correct) classificationyifor that instance

is plausible. If the value for the (i, yi) is (near) zero and vice versa for

the incorrect pairing, according to the weak classifier, the instance should be labelled with the incorrect class label y. Values around 0.5 indicate that the classifier evaluates the incorrect and correct pairings as equally plausible.

3. The next step is to determine thepseudo-loss of the hypothesis that was produced by the classifier, a measure proposed by Freund & Schapire. The thought behind the pseudo-loss is that it emphasizes important misclassi- fications, by using the (weighted) mislabel distributionDt. The lower the

pseudo-loss is, the better the hypothesis addresses the instances that are difficult to classify.

4. The weight for the instances that were classified in the hypothesis is up- dated, which in turn is used to compute distribution Dt+1 fromDt. Mis-

classified instances get a higher weight than correctly classified instances, therefore realizing the principle of the boosting algorithm: the predictor that is good at classifying the difficult cases has a stronger weighing vote. The output of the algorithm (that is, instance-class label pairings) is com- puted by finding they which has the maximum weighted average value amongs theT hypotheses for an instancex. In this computation, hypotheses with high pseudo-loss are given little weight in determining the final hypothesis.

Performance The boosting algorithm AdaBoost needs a weak classifier that performs (slightly) better than random [18]. In terms of the algorithm, a random classifier would give every pairing (i, y) the same likelihood, i.e. 0.5.

5.1. Correlation induction

The “boost” achieved by this algorithm has been experimentally [18] shown to be better than bagging in most cases when combined with a weak classifier. Using a strong classifier (one that itself has good performance) the results of bagging and boosting are comparable. Furthermore, using the boosting algo- rithm makes sense in combination with a strong classifier, as it still increases performance in most cases, although it hurts performance in a few classification problems.

A simpler variant of the AdaBoost algorithm called Adaboost.M1 (which dif- fers from M2 in its error measure, it does not use pseudo-loss), shows more con- sistent but averagely lower performance than M2. The combination AdaBoost- weak classifier has comparable performance to strong classifiers such as C4.5 [33].

Documento similar