7. Capítulo 1: Espacio social desde Bourdieu
7.2 Espacio Social en Managua
7.2.1 Barrios y Posición de las (Mega) iglesias
To estimate the quality of time series segmentation we use a number of performance metrics describing in the context of binary classification. The problem can be extended to classification of a greater number of classes by providing the measures for each class independently or in combination. In our special case we assume three classes and hence we calculate the following measures by merging two classes, such that only two classes remain. This procedure will be repeated for all possible combinations. Accuracy is calculated as the ratio of correctly classified data points to a total data points which can be ineffective for performance evaluation in a class-imbalanced dataset,
Accuracy =!"!!"!!"!!"!"!!" . (1)
Sensitivity or Recall is defined as the proportion of a class of interest, e.g. where change point between segments occurs, that was recognized correctly,
Sensitivity = Recall =!"!!"!" . (2)
For imbalanced class distribution where the ratio of changes to total data is small one can use g-mean, which utilizes both the ratio of positive accuracy (Sensitivity) and the ratio of negative accuracy,
𝐺 − 𝑚𝑒𝑎𝑛 = Sensitivity ∙ Specificity, (3)
where
Specificity =!"!!"!" . (4)
Precision is calculated as the ratio of true positive data points to total points classified as change points,
Precision =!"!!"!" . (5)
The difference in time between estimated segments’ boundaries and the actual change points can be treated as performance measure as well. In this case a number of useful metrics can be implemented. Mean absolute error (MAE) measures the distance of the
predicted change point 𝑦! to the actual change point 𝑦!, i=1,…,N,
0 50 100 150 200 250 300 3 2 1 0 1 2
𝑀𝐴𝐸 = !!!!|!!!!!|
! . (6)
Mean squared error (MSE) can take large values if a number of sufficient outliers occur in the classified data,
𝑀𝑆𝐸 = !!!!(!!!!!)!
! . (7)
Mean signed difference (MSD) evaluates the direction of the error, i.e. if the location of the predicted point before or after the actual time position of the change point, 𝑀𝑆𝐷 = !!!!(!!!!!)
! . (8)
Root mean squared error (RMSE) accumulates square error between protected and actual change point and offset the scaling factor of squaring the differences,
𝑅𝑀𝑆𝐸 = !!!!(!!!!!)!
! . (9)
Two methods can be used to normalize root mean square error (NRMSE), namely using the range of the observed change points or the mean of observed change points, 𝑁𝑅𝑀𝑆𝐸 =!"# !"#$ ! !! !!"#!!! , (10) 𝑁𝑅𝑀𝑆𝐸 = !!"#$ ! !!!!!! . (11)
3.
Main results
The data was examined in a tenfold stratified cross validation classification scheme, where 90 percent of the data was used to calculate features and train a naïve Bayes classifier which was then tested on the remaining 10% of examples. This procedure was repeated ten times and the overall result was calculated and is presented in confusion matrices in the following list.
As features for classifying we used: Mean, Variance, Kurtosis, Skewness, Interquartile range and autocorrelation from order 1 up to order 3. The performance measures (1)-(5) are denoted with an index, which represents the class such that the other two are merged to a single one. In this evaluation we assume that the time where the behaviour possibly changes and we only want to find the class of each segment
1. Scenario 1:
Table 1: Confusion Matrix for Scenario 1
Estimated Class
True Class 1 2 3
1 9838 155 7
2 207 9483 310
3 15 477 9508
Table 2: Performance Measures for Scenario 1
Class 1 2 3 Accuracy 0.9872 0.9617 0.9730 Sensitivity 0.9838 0.9483 0.9508 Specificity 0.9889 0.9684 0.9841 G-Mean 0.9863 0.9582 0.9672 Precision 0.9779 0.9375 0.9677 2. Scenario 2:
Table 3: Confusion Matrix for Scenario 2
Estimated Class
True Class 1 2 3
1 7498 2337 165
2 2003 7667 330
3 15 186 9799
Table 4: Performance Measures for Scenario 2
Class 1 2 3 Accuracy 0.8493 0.8381 0.9768 Sensitivity 0.7498 0.7667 0.9799 Specificity 0.8991 0.8739 0.9752 G-Mean 0.8177 0.8168 0.9776 Precision 0.7879 0.7524 0.9519
3. Scenario 3:
Table 5: Confusion Matrix for Scenario 3
Estimated Class
True Class 1 2 3
1 9905 42 53
2 2 9998 0
3 112 0 9888
Table 6: Performance Measures for Scenario 3
Class 1 2 3 Accuracy 0.9930 0.9985 0.9945 Sensitivity 0.9905 0.9998 0.9888 Specificity 0.9943 0.9979 0.9973 G-Mean 0.9924 0.9988 0.9931 Precision 0.9886 0.9958 0.9947 4. Scenario 4:
Table 7: Confusion Matrix for Scenario 4
Estimated Class
True Class 1 2 3
1 9655 345 0
2 370 8249 1381
3 2 1474 8524
Table 8: Performance Measures for Scenario 4
Class 1 2 3 Accuracy 0.9761 0.8810 0.9048 Sensitivity 0.9655 0.8249 0.8524 Specificity 0.9814 0.9091 0.9310 G-Mean 0.9734 0.8649 0.8899 Precision 0.9629 0.8193 0.8606
As we can see from Table 1-8, all 4 scenarios lead to good classification and high performance measures.
In addition to our fest set-ups, we also want to test for the quality of segmentation using our classification algorithm as described above. Therefore we defined a moving window of fixed length 100 and calculate for every time series in the testing set the classification result on series
𝑥!, ⋯ , 𝑥!!!! → 𝑐!, 𝑖 = 1,11, ⋯ ,201,
which yields a sequence
𝑐!, 𝑐!! ⋯ , 𝑐!"# , 𝑐! ∈ 1,2,3 . This sequence is transformed into a two class problem by either
2 → 1 𝑜𝑟 2 → 3 .
Considering these sequences, we define change-points as points I s.t 𝑐! ≠ 𝑐!!!
and compare them with the actual change-point. We transform such points i to a change point j by
𝑗 = 10 𝑖 + 45
and use j to calculate our performance measures (7)-(11), which can be found in Table 9:
Table 9: Change-point Performance Measures for Scenario 1-4
Scenario + Transformation MAE MSE MSD RMSE
1( 2 → 3) 23.527 1025.5 2.3656 32.008 1( 2 → 1) 29.865 1430.6 -14.417 37.823 2( 2 → 3) 34.779 1856.8 13.764 43.091 2( 2 → 1) 32.994 2116.7 -20.600 46.008 3( 2 → 3) 22.860 815.01 -0.021398 28.549 3( 2→1) 28.094 1211.5 -8.6860 34.807 3( 2→3) 42.673 2664.7 38.41 51.621 3( 2→1) 21.452 573.39 -20.484 23.946
4.
Conclusion and Outlook
The classification results show quite good quality with high values for all considered performance measures, accuracy ranges from 83.81 % to 99.85 % in our experiments, while the sensitivity attains values from 74.98 % to 99.98 %. In all our scenarios the different states are thus rather easy to distinguish with only a small set of features.
Of course we have to tackle additional problems in case of real scenarios, like highly imbalanced data, more subtle changes and violation of model assumptions. Moreover our feature space is tailored to the changing properties of the time series, in real data the construction of appropriate features is crucial and a difficult task on its own. The change point performance measures are also satisfying, when we consider that if we chose change points random and uniformly on our possible js, the resulting measures would be
MAE: 62.5 MSE: 5825 MSD: 50/-50 RMSE: 76.32
We expect that proposed methodology can be successfully implemented to real segmentation sensor measurements of different industrial machines used as technical elements of critical infrastructure to predict the anomalies and operational quality in realistic settings.
Acknowledgements
This work has been supported by the COMET-K2 Center of the Linz Center of Mechatronics (LCM) funded by the Austrian federal government and the federal state of Upper Austria.
References
[1] Abramson, M. (2015) Sequence classification with neural conditional random fields. IEEE 14th International Conference on Machine Learning and Applications (ICMLA), Miami, FL, pp. 799-804.
[2] Awad M., Khanna R. (2015) Support vector machines for classification. In: Efficient Learning Machines. Apress, Berkeley, CA.
[3] Bonaccorso, G. (2017) Machine learning algorithms. Packt Publishing. [4] Brockwell, P.J. and Davis R.A. (1991) Time Series: Theory and Methods
Springer Series in Statistics, pp.78-79
[5] Chandola, V., Banerjee, A. and Kumar, V. (2009) Anomaly detection: A survey. ACM Computing Surveys (CSUR) 41(3), 15.
[6] Engle, R.F. (1982) Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation.
Econometrica 50 (4): pp. 987–1007.
[7] S. R. Gunn. (1997) Support Vector Machines for Classification and Regression. Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton.
[8] Keogh, E., Chu, S., Hart, D., Pazzani, M. (2004) Segmenting time series: A survey and novel approach. Data Mining in Time Series Databases 57, pp. 1-22. [9] Khanteymoori A.R., Homayounpour M.M., Menhaj M.B. (2008) A Bayesian network based approach for data classification using structural learning. In: Sarbazi-Azad H., Parhami B., Miremadi SG., Hessabi S. (eds) Advances in Computer Science and Engineering. CSICC 2008. Communications in
[10] Mucherino A., Papajorgji P.J., Pardalos P.M. (2009) k-Nearest Neighbor Classification. In: Data Mining in Agriculture. Springer Optimization and Its
Applications, vol 34. Springer, New York, NY
[11] Parsian, M. (2015) Data algorithms. O’Reilly Media Inc.
[12] Povinelli, R.J., Johnson, M.T., Lindgren, A.C. and Ye, J. (2004) Time series classification using Gaussian mixture models of reconstructed phase spaces. I In IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 6, pp. 779-783, June 2004.
[13] Sharma, H. and Kumar, S. (2016) A survey on decision tree algorithms of classification in data mining. International Journal of Science and Research (IJSR) 5(4).
[14] Wang, P., Wang, H., Wang, W. (2011) Finding semantics in time series. Proceedings of the 2011 international conference on Management of data - SIGMOD’11, June 12–16, 2011, Athens, Greece.