3.6 Rendimiento, análisis y valoración
3.6.5 Comparación Experimental
In general, this work proposes the use of intelligent approaches to improve the quality of management and diagnosis provided to physicians regarding the foetal status after delivery. The aim is to improve the accuracy of the diagnostic methods of the foetal hypoxic state via two main ways; the first is to improve the diagnosis or the first detection of the early pathological changes of the foetus using the pH and BDecf levels of different umbilical artery blood profile samples, while the second is to find other combination of parameters that may help in hypoxic status detection using different machine learning classifiers and identify the most accurate classifier/s that can be used in classification problem of real world medical data .
129
This research is inspired by the urgent need for a new pathway that could reduce the burden on the shoulders of medical staff, and at the same time enhance the quality of the new-born’s life as well as their family. The use of machine-learning based diagnostic methods could reduce the need for specialist assessment as they can learn from previously diagnosed new-borns to diagnose new cases. These machine-learning classifiers could also be used to train non-specialist doctors to improve their decision- making procedure.
Foetal hypoxia can be detected using two main parameters, namely pH and BDecf [16], however our goal is to discover whether we can detect foetal hypoxia using other parameters such as BE, PCO2, Apgar1 and Apgar5 using various ML classifiers , as well as identifying the threshold of these values that may cause adverse neurological outcome in foetuses.
To establish intelligent diagnostic methods, an experimental procedure was undertaken using six popular supervised machine-learning classifiers. This stage is usually known as the knowledge acquirement stage, where classifiers learned, identified patterns and gained knowledge from new-borns’ records in order to classify new hypoxic cases. Thereafter we have tested the learning of the classifiers and generalisation capabilities using a number of samples that have not been used in the training process through different trials using various parameters in each trial. Using a number of statistical measures, we have assessed the classifiers’ sensitivity, specificity and classification accuracy to establish a classifier performance evaluation.
In this context, all the ML classifiers having an acceptable performance results through all trials. All classifiers apart of SVM and KNN improved their classification
130
performance in using different variables by considered the combination of (pH, BE, BDecf, PCO2) is important for the classification in the 1st trial while in the 2nd and 3rd trial, they also detected the AS1, AS5 as an important variables which give good combination between all the variables that can help the physician and other medical staff in the diagnosis of the foetal hypoxia.
CART for instance, shows good classification performance by using all the important variables through all the trials. CART classifier considers a non-linear supervised learning method that is typically used to classify non-linear separable data and can be graphically represented as a binary decision tree. The CART classifier uses the ratio of information gain as a splitting criterion. The best spilt would minimise the impurity of the output data subsets. From the resulting subsets, the splitting process is repeated until a stopping criterion is invoked.
This study also employed two ensemble learning classifiers, i.e. RF and GBM methods. They are considers a collection or ensemble of decision trees (i.e. CART). RF and GBM are taking the concept of CART a step further via the generation of dozens of trees. In contrast to CART, which uses all of the parameters along with the whole training dataset to build a classifier, RF and GBM select an arbitrary sample of the data and determine a particular subset of parameters to build each tree individually. Both CART and RF considered the most accurate classifiers in the first and second trial, while the third trial shows the GBM is the most accurate classifier followed by SVM then RF and CART classifiers.
Away from tree-driven classifiers, this study has also implemented the NNET classifier. NNET showed very good and reliable results with respect to the classification of the foetal hypoxic cases especially in the first and second trials when
131
it was the third most accurate classifier. NNET ranked all the four variables (pH, BDecf, BE, PCO2) as an important variables in detecting the hypoxic cases, although it consider the AS5 is the most important one in the third trial , which may affect the model choice in future works as it consider the last parameter in detecting the hypoxic cases. The output of the NNET classifier could be more difficult to interpret when compared with tree-driven classifiers.
On the other hand, some of the ML classifiers such as KNN and SVM have failed to achieve a reliable classification regarding the hypoxic cases in the first and second experiment. Their classification ability are not reliable, as they are biased toward less important parameters. Although their performance improved in the third trial making the SVM is the most accurate classifier among all the classifiers, but this cannot be dependable as both of them failed in detecting the pH value as an important variable in the first trial. The level of pH increased in both types (respiratory and metabolic acidosis) of hypoxia, which means it is very important for the classification. While in the second and third trial they ranked the metabolic variables (BE, BDecf) as important variables with very low favourability toward the respiratory one (PCO2), which may affect the final medical diagnosis as the hypoxic process includes combination of both respiratory and metabolic changes.
Failure of SVM and KNN in classifying the cases using reliable combination of the variables can be explained according to the way of their working, both of them depending on the distribution of the inputs, as well as the distance between the inputs (i.e. Euclidean distance). Not all the parameters can contribute equally by using the Euclidean distance method. Choosing the wrong kernel for the SVM or any other
132
tuning parameters such as the sigma value or K number for KNN can be consider another factors in the poor performance of these classifiers.