4.1 Controlador Monofasico
4.1.2 Carga Resistiva e Inductiva (R-L)
Figure 4-13 presents this section organisation chart with the associated subsections.
Figure 4-13 The section organisation chart: The section organisation chart provides the layout of this section with the associated subsections.
4.5.1.1 Score probabilities
The testing data input variables are scored to generate score probabilities of a range 0 ≤
x ≤ 1 where x is the input vector values to predict output y. The score probabilities provide
a measure of observations that are correctly predicted as TP and TN including the two prediction errors of FP and FN within the range {0,1}. These prediction observations of TP, TN, FP and FN are presented in the tables below, are calculated to determine how many of these observations score probability values fall within each score bins set of between 0 and 1 (expressed in the set as {0,1}).
Therefore, the expected output is y = 0 or SQLIA negative if the function of the predictor variables x is closer to 0 expressed as f(x) ≈ 0. Conversely, the prediction output is y = 1 or SQLIA positive when the function of predictor variables x is closer to 1 denoted as f(x) ≈ 1. Also, taken into consideration are the prediction errors rate of FP and FN. The prediction errors are observations of predicted output that have been wrongly classified which is used to gauge the performance of a classifier.
4.5.1.2 Threshold
The MAML studio sets by default the cut-off threshold for the prediction of TP and TN including the errors of FP and FN to be 0.5. This cut-off of 0.5 is a predetermined threshold employed by classification algorithms; it is a cost function of x to predict y.
Statistical measures and ROC curve
76
Therefore f(x) < 0.5 score probability value is predicted as SQLIA negative (0) while f(x)
≥ 0.5 is predicted as SQLIA positive (1). The score probabilities are partitioned into ten
score bins of 0.1 increments of the set {0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0}. The collation of the prediction variables of TP, TN, FP and FN falls on these cut- offs between the set {0.0,1.0}. These prediction observations are used to calculate the accuracy, precision, recall and F1 score statistical measures. Table 4-10 is a formula table that includes the abbreviation used in statistical measures throughout this thesis.
Table 4-10 Formula table: A table detailing how the performance metrics are calculated.
Terminology Formula
True Positive (TP) -
True Negative (TN) -
False Positive (FP) -
False Negative (FN) -
Positive events (PE) TP+FN
Negative events (NE) FP+TN
Positive observations (PO) TP+FP
Negative Observations (NO) FN+TN
Total events (TE) PO+ NO
4.5.1.3 Statistical measures
Figure 4-14 presents this section organisation chart with the associated subsections.
Figure 4-14 The section organisation chart: The section organisation chart provides the layout of this section with the associated subsections.
The statistical measurements of a trained model are provided by accuracy, precision, recall and F1 score. These statistical measures are calculated by collating the score probability values of TP, TN, FP and FN under various cut-off of 0.1 increments between
0.0 ≤ x≤ 1.0.
4.5.1.3.1 Accuracy
Accuracy is the proportion of the actual accurate results from the total cases. It has relevance in interpreting the performance of a model trained as it costs the same in a data
Statistical measures
77
set with an even distribution of features [46]. The accuracy is calculated by the formula elements in Table 4-10 above with the Equation 4-5 below.
Accuracy (A) = (TP + TN) / TE Equation 4-5 4.5.1.3.2 Precision
Precision is the proportion of true overall positive results returned by a trained model. In an even and uneven distribution of features, it will cost more in FP for a wrong prediction of a feature. In this scenario, it is the administrative overhead of wrongly predicting SQLIA positive. The precision is calculated by the formula elements in Table 4-10 above with the Equation 4-6 below.
Precision (P) = TP / PO Equation 4-6 4.5.1.3.3 Recall
The recall is the TP rate, which is the fraction of total correct results returned by the model. In an even and uneven distribution of features, it will cost more for prediction error of FN. In this scenario, it is the security loophole of wrongly predicting actual SQLIA as negative, which implies the system is unprotected against data pilfering with damaging ramifications. The recall is calculated by the formula elements in Table 4-10 above as the prediction rate of a trained classifier with the Equation 4-7 below.
Recall (R)= TP / PE Equation 4-7
4.5.1.3.4 F1 Score
The F1 score is a measure of accuracy that balances precision and recall. It has relevance in interpreting the performance of a trained model as it costs the same in a dataset with an even or uneven distribution of features [46]. The F1 score is calculated by the formula elements in Table 4-10 with the Equation 4-8.
F1 Score (F) =2 x (R x P) / (R+P) Equation 4-8 4.5.1.4 ROC curve and AUC
ROC curve is a graphical plot using variation in threshold discrimination to illustrate the performance of the binary classifier with a curve towards the upper left corner indicating
78
a better performing model. AUC or area below the curve in the graph plot is a measure of TP Rate (TPR) or recall on the y-axis against FP Rate (FPR) on the x-axis. An excellent prediction model achieved through this thesis is interpreted from the AUC value of 0.9 ≤
x ≤ 1. The ROC curve AUC provides a measure of the performance of a classifier which
is graded as presented in Table 4-11.
Table 4-11 AUC grading: A table showing the grading of AUC that set the benchmark for the measure of performance metrics of a trained model [51].
AUC Values Grade
0.9 ≥ x ≤ 1.0 Excellent
0.8 ≥ x < 0.9 Good
0.7 ≥ x < 0.8 Fair
0.6 ≥ x < 0.7 Poor
x< 0.6 Random prediction/worthless