• No se han encontrado resultados

Considera que sus padres han compensado su ausencia con

PREGUNTA 12. ¿Cree usted que es necesario exponer alternativas que permitan fortalecer lazos afectivos entre padres e hijos?

8. CONCLUSIONES Y RECOMENTACIONES

where C is the parameter for tradeoffbetween the maximal margin and the minimal classifica- tion error.

Many approaches have been proposed for solving this optimization problem, such as Se- quential Minimal Optimization [87], Stochastic Gradient Descent [125, 102] and Dual Coor- dinate Descent [56]. The typical software packages for training SVM include SVMLight [61], LibSVM [23] and LiblinearSVM [41].

We list the advantages of using SVM as the text classification algorithm.

1. Good generalization capability. SVM has good generalization capability as SVM tries to maximize the margin between the positive and the negative examples. Moreover, with the soft margin [26] introduced in the optimization equation, SVM is even robust against the noisy data in the training set.

2. Effective text classification. Usually, the word vocabulary for text data is very large.

The high dimensionality of text datasets often leads to the training data being linearly separable [61, 41]. Thus, SVM with linear kernel (e.g., LiblinearSVM) is often very suitable for text classification task.

Due to the advantages mentioned above, SVM is a good choice for our task. Furthermore, previous work [60, 61, 20, 121] empirically verifies that SVM is superior to other classification algorithms such as Naive Bayes and KNN, in terms of text classification performance. Thus, we choose SVM, specifically linear SVM, as the base classification algorithm to classify webpages into search engines (see Chapter 3).

2.1.4

Evaluation Measures

In this subsection, we review methods to evaluate the classification performance of classifiers on the testing set. We firstly introduce a tool, called a confusion matrix [107], for perfor- mance analysis. The confusion matrix is a table with two rows and two columns (see Fig- ure 2.2). Each cell in the table reports the number of specific prediction judgements, includ- ing true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN).

2.1. TextClassification 23

Each of the four statistics is computed based on the classifier prediction and the actual true classes. For example, given a classifier f and a testing dataset Dtest, TP can be computed as

P|Dtest|

i=1 1( f (xi)= 1 and yi = 1) where1is the indicator function with value 0 and 1 defined on

the logic expression “ f (xi)=1 and yi =1”.

Figure 2.2: Confusion matrix.

The simplest performance measure is the accuracy, which is defined as the proportion of correctly classified examples in the testing set:

Maccuracy( f )=

T P+T N

T P+FP+FN+T N (2.11)

Although the definition of accuracy is very intuitive, it is rarely used in real-world text clas- sification due to the class imbalance problem [58]. For example, given a spam classification dataset with only ten positive examples (i.e., spam mails) and 90 negative examples (e.g., nor- mal mails), it is trivial to achieve 90% accuracy by predicting all examples as negative (i.e., normal mails). However, such a highly accurate algorithm is useless for spam classification.

To make a realistic evaluation of classification performance on imbalanced datasets, people have developed several more effective measures. The most popular measures are precision and recall. Mprecision( f )= T P T P+FP (2.12) Mrecall( f )= T P T P+FN (2.13)

We can see that precision is actually the ratio of correctly predicted positive examples over all positive predictions; and recall is actually the ratio of correctly predicted positive examples over all actual positive examples.

We use an example to show the difference among the three measures. Consider again the spam classification dataset with ten positive examples and 90 negative examples. We assume that a classifier f makes a confusion matrix as shown in Table 2.1.

Table 2.1: An example of confusion matrix. Actual positive Actual negative

Predicted positive TP=1 FP=3

Predicted negative FN=9 TN=87

Then, we have Maccuracy =(1+87)/(1+87+3+9)=0.88, Mprecision = 1/(1+3)= 0.25 and

Mrecall =1/(1+9)=0.1. The high accuracy is actually very misleading while the performance measures based on precision and recall are more realistic. Specifically, for the reported spam mails, only 25% prediction is correct based on precision. Based on recall, we can find that this classifier can only detect 10% spam mails. The results based on precision and recall make us reject this spam classifier for real-world deployment.

Usually, it is more convenient to judge classification performance by a single measure. To make a trade-offbetween precision and recall, we often use the harmonic mean of precision and recall, called F1-score [121], in text classification:

Mf 1score( f )= 2 1 Mprecision( f ) + 1 Mrecall( f ) = 2×Mprecision( f )×Mrecall( f ) Mprecision( f )+ Mrecall( f ) (2.14)

Figure 2.3 plots the 3D value curve of a F1-score based on precision and recall. We can see that to achieve a high F1-score, classifiers must have both high precision and high recall. Optimizing classifiers on a single measure regardless of the other measure, will not improve F1-score much. For example, for the spam classification dataset we discussed before, if a classifier predicts all examples as positive, it can achieve 100% recall but the precision will be only 10%. This results in the F1-score as low as 0.18.

Figure 2.3: The value curve of F1-score based on precision and recall.

To conclude, based on the discussion above, we mainly use the F1-score to evaluate clas- sification performance. Precision and recall will also be used to explain the results of the F1-score.

2.2. HierarchicalTextClassification 25

Documento similar