MATERIAL Y METODOLOGÍA
6.1 Descripción de las tecnologías utilizadas durante los experimentos.
Multi-class refers to more than two classes. These classification algorithms are investigated as a means to identify a faulty sensor once a fault has been detected. Whilst some
of the binary classifiers discussed in section 3.5 are inherently multi-class, others need
assistance to expand to k > 2 classes. There are two distinct methodologies by which
binary classification algorithms are expanded for multi-class applications, namely the One- Versus-All and One-Versus-One transformation. The binary classifiers from section 3.5 are all capable of multi-class classification when using the scikit-learn machine learning
library [43]. This section will present and discuss the various strategies used to expand
the binary classifiers to a multi-class setting.
Classifier Name Transformation Strategy
Na¨ıve Bayes Inherently Multi-class
Decision Tree Inherently Multi-class
k-Nearest Neighbors Inherently Multi-class
Logistic Regression One-Vs-All
Support Vector Machine One-Vs-One
Table 3.4: Summary of multi-class classifiers.
3.6.1
Inherently Multi-class
3.6.1.1 Overview
The Na¨ıve Bayes, Decision Tree and k-Nearest Neighbors classifiers are all inherently multi-class. The section below will give a brief overview of their multi-class operation and show how the equations that govern them apply to a multi-class setting.
Na¨ıve Bayes
The multi-class Na¨ıve Bayes algorithm is simply an extension of its binary counterpart by extending the projection of conditional probability distributions on respective axes
to each of the k classes. New data points are assigned to the class that maximises the
conditional probability as given by equation3.5.16.
Decision Tree
of the split across all the classes. The Gini impurity equation3.5.17 can be expanded to
account fork classes denoted by subsetC ={C1, C2, ..., Ck} as follows:
HG(C) = 1−
k
X
i=1
p(Ci) (3.6.1)
wherep(Ci) is the class probability.
Similarly, the entropy impurity equation 3.5.18can be rewritten as:
HE(C) =− k
X
i=1
p(Ci) log(p(Ci)) (3.6.2)
Using the impurity equation the information gain3.5.19that is used to assess the quality
of the split between parent S and children (SL, SR) in a multi-class problem is given by:
∆IG(S) =H[](C)−p(SL)H[](CL)−p(SR)H[](CR) (3.6.3)
Once the tree is fully grown each leaf node represents one of thek classes.
k-Nearest Neighbors
The k-Nearest Neighbors is an intuitive algorithm whose functioning remains the same regardless of the number of classes. An unlabeled data point is still assigned to the class with the majority vote.
3.6.1.2 Advantages and Disadvantages
The advantages of applying inherent multi-class algorithms are that they only require the establishment of a single classifier with comparable performance to their binary counter- parts.
3.6.2
One-Versus-All
3.6.2.1 Overview
The One-Versus-All approach establishes k binary classifiers. Each individual classifier
is trained to identify a distinct class. This is achieved by assigning the training data
belonging to the distinct class a positive label and grouping the remaining k-1 classes
together under a separate negative label. The target labels assigned to the ith classifier
are given by:
yi =
(
+1 if y=i
Class 1 Rest Rest Class 2 Class 3 Rest +1 +1 +1 -1 -1 -1
Figure 3.13: The One-Versus-All approach to multi-class classification for the example
dataset wherek=3 establishes separate Logistic Regression binary classifiers with decision
boundaries indicated by the dashed line. The binary classifier is trained to identify one class from the remainder of the dataset.
The separate binary classifiers are combined by assigning the new data point a label that corresponds to the class that maximises the probability function:
P(y=i|~x, ~wi, bi) =
1 1 +e−(w~T
i~x+bi) (3.6.5)
wherew~i and bi are model parameters for the ith classifier.
Class 2 Class 1
Class 3
Figure 3.14: The separate Logistic Regression binary classifiers indicated by the dashed lines are combined using equation 3.6.5 to establish the multi-class classifier decision boundary as indicated by the solid lines and shaded regions.
3.6.2.2 Advantages and Disadvantages
The one-versus-all approach is generic in the sense that it can be applied to any bi- nary classification algorithm. The drawbacks are that the computational and memory
requirements are more intensive as it requires k binary classifiers. The second distinct
disadvantage is that the individual binary classifiers are biased as they are trained on unbalanced datasets consisting of more negative class training data points than positive class training data points.
3.6.3
One-Versus-One
3.6.3.1 Overview
The One-Versus-One approach establishes 12k(k-1) binary classifiers where each individual
classifier is trained on a different pair of classes from the original training set. These individual classifiers are combined to form a single classifier. This is an ensemble approach to multi-class classification where the majority vote is used to classify new data points.
Class 1 Class 2 Class 1 Class 3 Class 2 Class 3 Point A Point A Point A
Figure 3.15: The One-Versus-One approach to multi-class classification for the example
dataset wherek=3 establishes separate SVM binary classifiers using a linear kernel. Point
A is a new data point that will be assigned to Class 3 using a majority vote.
3.6.3.2 Advantages and Disadvantages
The number of binary classifiers required for the One-Versus-One approach increases as a
squared function of the number of classesk. This will make the One-Versus-One approach
computationally intensive. The advantage over the One-Versus-All approach is that the binary classifiers are trained on an unbiased training set.