CAPACIDAD FINANCIERA
4. Estudio de Mercado
4.5 Determinación de la Muestra
In section 6.4 we presented an extended feature-set for each patient, the derivation of which was based on extensive feature selection experiments presented in the same chapter. In this experiment, we assess the predictability of the seizure state in each Patient-File, on an extended feature-set with up to 204 extracted features, ranging over all 6 EEG channel recordings. The outcome of this experiment will reveal the effect of an extended feature-set, comprising new and original features on the predictability of the seizure state of the brain.
7.4.1 Methods for Advance Prediction of Seizures on Extended Feature-set
The extended feature-set developed in chapter 6 (see section 6.4) is used as the new dataset for this experiment. From each channel, a further 20 features were extracted, increasing the number of features per channel to 34, and growing the feature-vector to incorporate f ×204 elements. This increase in the size of the feature matrix demands
additional computational cost. The experiment was distributed over a cluster of 8 core
64bit CentOS machines, with each machine in the cluster running 8 Matlab pools in
parallel. By modulating the experiment in several smaller workloads of single runs of training and classification for each patient, each predictor module was only implemented in a single thread, hence, eliminating the need to distribute the model building over different workstations.
7.4.2 Results of Advance Seizure Prediction on Extended Feature-set
The four performance measures Accuracy, Sensitivity, Specificity and S1-Score of the Delete presented in Figure 7.8 and Table 7.6. Each of the curves in Figure 7.8 is a measure of the performance of prediction on the extended feature-set, averaged over all the patient predictors (except patients 3, 10 and 14 which were removed as outliers).
In results of Delete we can see divergence of the lines from the patterns we have seen throughout this chapter. The Specificity value is the highest we have seen thus far, with mean 99.70%. This is indicative of an improvement in the Specificity measure through the introduction of additional features. Accuracy is also steadily high, although the line has a downward slope towards the later timepoints. The Sensitivity is variable throughout time, as we have seen in previous results, however, the lowest values are just below 70%.
The S1-Score once again mimics the behaviour of Sensitivity, the most variable of the two measures. The maximum S1-Score is at , with 88.94%, which indicates that prediction does not perform better than seizure detection. The maximum value however, is the lowest we have seen throughout the experiments in this chapter. The shape of the S1-Score is different to what we have seen in other experimental results. The initial dip is sharper, dipping at timepoint t = 3 at a value approximately ~9% lower. After the initial dip, the line monotonically ascends, fluctuating between local maxima and minima, until it hits a dip at timepoint t = 18 which is statistically the minimum value. After hitting the minimum it starts ascending until it reaches the final timepoint
at 83.05%. The S1-Score line is fairly smooth between the several dips and peaks, predominantly for timepoints 12, 13, 14, and 15.
ACC t SP t SS t S1 t min 93.32 19 99.61 19 66.90 18 78.30 18 max 94.40 5 99.78 15 81.84 0 88.94 0 t = 0 94.27 0 99.62 0 81.84 0 88.94 0 t = 20 93.49 20 99.66 20 73.39 20 82.90 20 mean 94.05 99.70 73.33 83.05 median 94.09 99.71 73.40 83.17 mode 93.32 99.61 66.90 78.30 std 0.33 0.04 3.25 2.42 range 1.09 0.17 14.95 10.63
Table 7.6 Summary of important data statistics from the stepwise advance seizure prediction by Delete on 18 Multi-Channel Extended Feature-Set patients.
Inter-Patient Variability
In the box and whisker diagrams of the Delete experiment (Figure 7.8), the variability among distinct timepoints is quite low, except for the timepoints in the range , where the initial dip seen in Figure 7.7 takes place. Most boxes are long, with some spanning a range of 20%, indicating high variability among the predictor performance at each timepoint with a centred or top median line; respectively indicating that the underlying population is symmetric or left-skewed. This means that despite the variability between some predictor values at various timepoints, the median and above the median are densely populated and variability is mainly in the lower quartile. The values are among the lowest we have seen so far for the Delete experiments, with the lowest whisker in the range of [48%, 60%]. Although outliers were removed from the average performance analysis, some of them have performed extremely poorly with values as low as 17%. The outliers for the timepoint (detection at seizure onset) are patients 13 and 2. Patients 14, 10 and 3 do not seem to appear in the outliers, suggesting that the extended feature-set may have improved their overall performance.
Table 7.7 summarises the paired t-test of several predictive timepoints of all patients in the extended feature-set predictive experiments. The comparison of t = 1 and t = 0 shows significant differences which confirms the observed variation seen at these moments in Figure 7.7. We also examine t = 20 vs. t = 18 and t = 8 vs. t = 3 which are respectively local maxima and minima in Figure 7.7. The results of the t-test
0≤t≤3
suggest that there are no significant differences between these moments, which in turn entails that the timepoints in the extended feature-set setting do not particularly differ from one another. The extended feature-set, as with the multi-channel setting, produces a more constant outcome compared to the single channel scenario.
Paired Differences
95% Confidence Interval of the
Difference
Mean Stdev Lower Upper t df Sig. (2-tailed)
T=0,T=1 2.733 2.819 -4.017 -1.450 -4.443 20 0.000
T=20,T=18 2.694 7.478 -6.098 0.710 -1.651 20 0.114
T=8,T=3 1.939 9.165 -2.233 6.111 0.969 20 0.344
Table 7.7 The mean S1-score of the extended feature-set (204) for all patients in several time-points is examined in a paired t-test. The test examines t=0 vs. t=1, t=20, vs. t=18 and t=8 vs. t=3. The values in bold indicate p≤.05 and are considered statistically significant.
Figure 7.8 Summary of stepwise advance prediction by Delete on 18 Multi-Channel
Figure 7.9 The Box and Whiskers diagram for stepwise advance prediction by Delete on 21 Multi-Channel Extended Feature-Set patients.
7.4.3 Discussion on Advance Prediction of Seizures on Extended Feature-set In this experiment, we used a 204 dimensional feature-set with a mixture of the original feature-set (Table 6.1) and new features (Table 6.12) introduced in chapter 6. We saw in experiment II of this chapter that the introduction of additional features extracted from all recorded EEG channels did not improve advance prediction. In this experiment we used a far larger feature-set which is yet again derived from all 6 EEG channels, but with 20 additional features introduced per channel.
The Delete algorithm produced a similar S1-Score trend to that seen in the previous experiment, this time with higher oscillations between time steps. The overall spectrum of values was not necessarily poor, with S1-Score within the range [78.30%, 88.94%]. This reveals that i) regardless of changes to the feature-set, a common trend can be observed for the S1-Score over time-steps throughout all experiments seen thus far, supporting the hypothesis that seizure activity markers exist several minutes before the actual onset. ii) The use of more features does not improve the predictability of our model, though the outcome is still relatively high. The full effect of introducing new
features is not prevalent in this experiment as it is over-shadowed by the noise induced by the high number of features. The effects of a reduced subset of the new feature-set will be evaluated in a later section of this chapter.
7.5 Experiment IV: Advance Seizure Prediction on Subset of Extended Feature-