• No se han encontrado resultados

MATERIAL Y METODOLOGÍA

6.1 Descripción de las tecnologías utilizadas durante los experimentos.

Multi-class refers to more than two classes. These classification algorithms are investigated as a means to identify a faulty sensor once a fault has been detected. Whilst some

of the binary classifiers discussed in section 3.5 are inherently multi-class, others need

assistance to expand to k > 2 classes. There are two distinct methodologies by which

binary classification algorithms are expanded for multi-class applications, namely the One- Versus-All and One-Versus-One transformation. The binary classifiers from section 3.5 are all capable of multi-class classification when using the scikit-learn machine learning

library [43]. This section will present and discuss the various strategies used to expand

the binary classifiers to a multi-class setting.

Classifier Name Transformation Strategy

Na¨ıve Bayes Inherently Multi-class

Decision Tree Inherently Multi-class

k-Nearest Neighbors Inherently Multi-class

Logistic Regression One-Vs-All

Support Vector Machine One-Vs-One

Table 3.4: Summary of multi-class classifiers.

3.6.1

Inherently Multi-class

3.6.1.1 Overview

The Na¨ıve Bayes, Decision Tree and k-Nearest Neighbors classifiers are all inherently multi-class. The section below will give a brief overview of their multi-class operation and show how the equations that govern them apply to a multi-class setting.

Na¨ıve Bayes

The multi-class Na¨ıve Bayes algorithm is simply an extension of its binary counterpart by extending the projection of conditional probability distributions on respective axes

to each of the k classes. New data points are assigned to the class that maximises the

conditional probability as given by equation3.5.16.

Decision Tree

of the split across all the classes. The Gini impurity equation3.5.17 can be expanded to

account fork classes denoted by subsetC ={C1, C2, ..., Ck} as follows:

HG(C) = 1−

k

X

i=1

p(Ci) (3.6.1)

wherep(Ci) is the class probability.

Similarly, the entropy impurity equation 3.5.18can be rewritten as:

HE(C) =− k

X

i=1

p(Ci) log(p(Ci)) (3.6.2)

Using the impurity equation the information gain3.5.19that is used to assess the quality

of the split between parent S and children (SL, SR) in a multi-class problem is given by:

∆IG(S) =H[](C)−p(SL)H[](CL)−p(SR)H[](CR) (3.6.3)

Once the tree is fully grown each leaf node represents one of thek classes.

k-Nearest Neighbors

The k-Nearest Neighbors is an intuitive algorithm whose functioning remains the same regardless of the number of classes. An unlabeled data point is still assigned to the class with the majority vote.

3.6.1.2 Advantages and Disadvantages

The advantages of applying inherent multi-class algorithms are that they only require the establishment of a single classifier with comparable performance to their binary counter- parts.

3.6.2

One-Versus-All

3.6.2.1 Overview

The One-Versus-All approach establishes k binary classifiers. Each individual classifier

is trained to identify a distinct class. This is achieved by assigning the training data

belonging to the distinct class a positive label and grouping the remaining k-1 classes

together under a separate negative label. The target labels assigned to the ith classifier

are given by:

yi =

(

+1 if y=i

Class 1 Rest Rest Class 2 Class 3 Rest +1 +1 +1 -1 -1 -1

Figure 3.13: The One-Versus-All approach to multi-class classification for the example

dataset wherek=3 establishes separate Logistic Regression binary classifiers with decision

boundaries indicated by the dashed line. The binary classifier is trained to identify one class from the remainder of the dataset.

The separate binary classifiers are combined by assigning the new data point a label that corresponds to the class that maximises the probability function:

P(y=i|~x, ~wi, bi) =

1 1 +e−(w~T

i~x+bi) (3.6.5)

wherew~i and bi are model parameters for the ith classifier.

Class 2 Class 1

Class 3

Figure 3.14: The separate Logistic Regression binary classifiers indicated by the dashed lines are combined using equation 3.6.5 to establish the multi-class classifier decision boundary as indicated by the solid lines and shaded regions.

3.6.2.2 Advantages and Disadvantages

The one-versus-all approach is generic in the sense that it can be applied to any bi- nary classification algorithm. The drawbacks are that the computational and memory

requirements are more intensive as it requires k binary classifiers. The second distinct

disadvantage is that the individual binary classifiers are biased as they are trained on unbalanced datasets consisting of more negative class training data points than positive class training data points.

3.6.3

One-Versus-One

3.6.3.1 Overview

The One-Versus-One approach establishes 12k(k-1) binary classifiers where each individual

classifier is trained on a different pair of classes from the original training set. These individual classifiers are combined to form a single classifier. This is an ensemble approach to multi-class classification where the majority vote is used to classify new data points.

Class 1 Class 2 Class 1 Class 3 Class 2 Class 3 Point A Point A Point A

Figure 3.15: The One-Versus-One approach to multi-class classification for the example

dataset wherek=3 establishes separate SVM binary classifiers using a linear kernel. Point

A is a new data point that will be assigned to Class 3 using a majority vote.

3.6.3.2 Advantages and Disadvantages

The number of binary classifiers required for the One-Versus-One approach increases as a

squared function of the number of classesk. This will make the One-Versus-One approach

computationally intensive. The advantage over the One-Versus-All approach is that the binary classifiers are trained on an unbiased training set.