• No se han encontrado resultados

Classification results obtained with the conventional 2D TA approach are in line with what the current state-of-the-art has achieved in the paediatric literature. For the commonly reported classification task of separating MB, PA and EP, the accuracies of 83% (SVM) and 87%(ANN) are comparable with the work reported in [36] (71% with T1 and 74% with T2) and [37] (63% EP, 81% PA and 94% MB) The primary aim of this study was to determine whether the inclusion of multi- slice information obtained through 3D TA of conventional MRI could improve diagnostic classification of childhood brain tumours. The obtained results suggest that the value of TA can be maximised using 3D features in paediatric settings. Statistical findings obtained with McNemar’s test indicate that this improvement in performance was significant for four of the six classifiers that were tested. It is worth noting, however, that all six classifiers showed relatively low EP sensitivity. This is likely to be due to the highly imbalanced nature of the dataset, where only 7 EP samples were present in the cohort, leading to the three tumour classes not being equally represented. In other words, the limited number of EP samples seems to have caused the classifiers to categorise most cases as MB and PA. Another possibility is that EP, being typically heterogeneous masses, might have textural properties that are common with the other two tumour types, which could lead to classifiers not being able to accurately discriminate it from the two other classes. In terms of important features, most of the ones chosen during the feature selection stage are attributes that were derived from GLCM and histogram statis- tics techniques. An important point to keep in mind is that the feature selection technique used is supervised, in the sense that it requires prior knowledge of the class label. This means that an element of over-optimistic bias might have been introduced during the classification model validation stage, as feature selection

features by finding a splitting value that yields the best gain in entropy. This is repeated recursively with a stopping criteria that is based on the Minimal Descrip- tion Length principle. The use of this technique as a feature selection method is based on the assumption that since a feature’s entropy can be used as a measure of its discriminative power, those features that were rejected by the algorithm can be assumed to be redundant. A feature would not be discretised if no appropriate cut-off points are found. Therefore, whilst features measured across different direc- tions compute strongly correlated patterns (as shown in Figure 5.2) some of them do not have sufficient information in terms of gain in entropy. The inclusion of these features will therefore not yield extra value to the classification performance. Another interesting finding is that only twelve features in the highly ranked 3D subset were a result of analysis along the z-axis (inter-slice information). Hence, the addition of inter-slice patterns has contributed to improvements in classifica- tion performance, but it is likely that improvements were mostly due to classifiers being able to capture information from the whole volumetric ROI, compared to just selecting a single slice and extracting features that are not representative enough to classify tumours. The limited number of important through-plane fea- tures maybe due to the presence of slice gaps, which ranged between 0.8-1.5 mm for T1 and 0.6-1.5 mm for T2 in the dataset we used.

mance shown by four of the tested classifiers is likely due to the fact that PCA computes new meta-features, namely principal components. These are linear com- binations of the original attributes, which may be difficult for classifiers to gener- alise well with, as the feature space significantly changes and the original features may lose their original meaning. The fact that PCA is a dimensionality reduc- tion and not a feature selection tool adds the additional limitation of not being able to deduce a definite sub-set of important features, making it an impractical option for understanding relationships between the original textural features and classification performance.

The exact accuracy of radiological reporting is unknown and the study pre- sented here suffers from the disadvantage that the radiologists did not have to offer a diagnosis or even a differential diagnosis2. Despite this limitation, the re-

view has the advantage that it was contemporaneous and gives some insight into the difficulties of radiological reporting. If only the reports where a single correct diagnosis is offered are taken as correctly diagnosed, the overall success rate is 14/47 (30%). However, if we exclude cases where no diagnosis was proposed, the accuracy is 14/30 (47%) and if the first diagnosis in a list is taken as the favoured one, 20/30 (66%) are correct. Whilst the accuracy of diagnosis for cases where no diagnosis was proposed could be greater than this, it would seem unlikely. In reality, the most common reason for not offering a diagnosis is uncertainty and it is an interesting observation that some level of uncertainty exists in 27/47 (57%) of the reports. It may well be that a key role of a decision support system based on texture analysis is to improve this uncertainty. To illustrate how the use of TA can help achieve this, consider Figure 5.10, which shows a summary of probabili-

2Conventionally, radiologists produce an initial characterisation of the tumour’s appearance

on the basis of a combination of their training, experience and individual judgement. The radiologist’s job is not to offer a final diagnosis as the current gold-standard is histopathological examination.

Figure 5.10: A bar plot summarising probabilities assigned to each individual diagnosis by the neural network classifier during LOOCV. Actual class is (a) Medulloblastoma (b) Pilocytic Astrocytoma and (c) Ependymoma. Note that bars marked with an asterisk indicate a misclassification made by the classifier.

ties assigned to each individual diagnosis by the neural network classifier. Taking Figure 5.10(a) as an example, one can see how the two misclassified MB samples (marked with an asterisk) had close likelihoods of being MB and PA, according to the classifier; suggesting limited confidence in the final diagnosis. Such informa- tion could be potentially valuable as diagnostic aids for radiologists in practical settings.