• No se han encontrado resultados

3. INCURSIÓN DE LA RELATIVIDAD EN EL AULA

3.1. CONTEXTO DE LA ACTIVIDAD

Testing or assessing models is to quantify the model error which relates to the differences between the predicted values and the observed values when a test data set was presented to the model. To effectively test the model, the test dataset should not be used in calibrating or training of the model. One common method to create such a test dataset from a sample of data is to randomly split the sample data into the calibration (or train) dataset and the test dataset. This strategy was often used for testing infrastructure deterioration models (Micevski et al. 2002; Baik et al. 2006).

This study adopted two scalar performance measures, namely, false negative rate and overall success rate derived from the confusion matrix (Hajmeer and Basheer 2003) and the goodness-of-fit test (Micevski et al. 2002; Baik et al. 2006) for assessing the

performance of the developed deterioration models on a test dataset. 3.4.1 Confusion Matrix

There are always four possible situations between a predicted case and an observed case: (1) true negative (TN) when the model correctly identifies a negative case (i.e. pipe in poor condition), (2) true positive (TP) when the model correctly identifies a positive case (i.e. pipe in good condition), (3) false negative (FN) when the model wrongly identifies a pipe actually in poor condition as in good condition, and (4) false positive (FP) when the model wrongly identifies a pipe actually in good condition as in poor condition. It is obvious that the consequence of an FP case is just the inspection cost. On the other hand, the consequence of an FN case is far more severe since when that pipe fails, all costs including repair, penalty and disruption should be added. In this study, the positive case or the pipe in good condition is defined as the pipe is either in condition 1 or 2. If the pipe condition is 3, it is defined as a negative case.

These four possible situations can be used to assess the predictive performance of a deterioration model by using the confusion matrix or the contingency table (Johnson and Wichern 2002) on a test dataset as given in Table 3-2. For example, the FN21 in this table means the number of pipes in the observed condition 2 which were incorrectly

predicted as the condition 1. Furthermore, the total number of pipes which were observed in condition 1, 2 and 3 are O1, O2and O3respectively and the total number of

pipes which was predicted in condition 1, 2 and 3 are P1, P2and P3respectively.

Table 3-2: Confusion matrix Predicted condition 1 (good) 2 (fair) 3 (poor) Total 1 (good) TP11 FP12 FP13 O1 2 (fair) FP21 TP22 FP23 O2 Observed condition 3 (poor) FN31 FN32 TN33 O3 Total P1 P2 P3

The overall success rate (OSR) and false negative rate (FNR) can be used to assess the predictive performance of the four deterioration models (i.e. MDDM, OPDM, NNDM and PNNDM) which were developed to predict the condition changes of individual pipes in this study. The OSR and FNR cannot be used for assessing the Markov model since this model was not able to predict the condition changes of a particular pipe due to the lack of longitudinal data. The OSR and FNR can be computed from the confusion matrix using Equations (3-29) and (3-30) respectively.

11 22 33 1 2 3 TP TP TN OSR O O O + + = + + (3-29) 31 32 31 32 33 FN FN FNR FN FN TN + = + + (3-30)

The OSR indicates how well the deterioration models predict the condition of individual pipes for all cases. The FNR indicates the risk associated with the use of the models. It is obvious that a ‘good’ deterioration model requires high OSR and low FNR.

3.4.2 Goodness-of-Fit Test

The goodness-of-fit test using Pearson chi-squared test statistic (χ2) is based on a null

hypothesis that the observed frequency is matched with the estimated (or predicted) frequency (Micevski et al. 2002). This test can be used for the five deterioration models

the fitness of a model (Montgomery et al. 2004). The test statistic χM2 for the

deterioration models in this study can be calculated using equation (3-31).

= − = 3 1 2 2 ( ) c c c c M P P O χ (3-31)

where: O c is the observed number of pipes in condition c c

P is the predicted number of pipes in condition c

If the test statistic 2

M

χ

is larger than the critical 2 0.05,2

χ (95% confidence level and 2 degree of freedom), the hypothesis is rejected. The goodness-of-fit test shows how confidently a model fit with a set of observations. To ensure the accuracy of 2

M

χ , one rule of thumb should be enforced. That is the predicted number of pipes in any condition c must be at

least 5 (Montgomery et al. 2004).

For the Markov model, test statistic 2

M

χ can be computed using the predicted proportions of pipe P P1, 2 and P3 in each condition over a time interval (by Equation (3-

2) in Section 3.3.1.3) and the computed proportions of pipes observed O O1, 2 and O3 in

condition 1, 2 and 3 from the test dataset. For the remaining four deterioration models, the test statistic 2

M

χ can be computed by using P P1, 2 and P3 (which are the column

sums) and O O1, 2 and O3 (which are the row sums) in Table 3-2.