• No se han encontrado resultados

7. HERRAMIENTA PARA LA OPTIMIZACIÓN DEL MANTENIMIENTO

7.5. Resultados de la herramienta

7.5.4. Matriz de incidencias

When the same hyper-parameter search was carried out on the threshold dataset con- sisting of the top 4,998 Index SNPs, the interpretation of the findings was a lot more inscrutable. There was a marked difference in performance across all different algorithms depending on the types of inputs that were entered (raw allele count, scaled allele counts, and weighted allele counts). Because of this, when the results are presented, the infor- mation from all of the separate five splits has been omitted in favour of showing the different patterns of performance for the different inputs. All of the results can be seen in figures 3.17 to 3.21. Unlike the figures for the GWAS significant dataset, only one of the five splits is displayed per figure, and the panels represent the different types of input that were entered into the models. As before, the x axes represent the different values of the hyper-parameters, and the y axes show the ROC score achieved.

The results for the linear kernel can be seen in figure 3.17. This was probably the most stable of the kernels, and the plots show that there was no variation in performance seen for either the raw, or the scaled allele count inputs. For the weighted inputs however, the performance did increase as C reduced in size. It did in fact reach an asymptote at the

higher value seen, so there was no point in reducing the parameter further. This suggests that, for this input type, the model performs best when it is allowed to generalise more and allow more incorrect classifications for the training set when building the model.

Raw allele counts Scaled allele counts Weighted allele counts

0.4 0.5 0.6 0.7 0 2 4 0 2 4 0 2 4 C Mean R

OC score from cross v

alidation

Figure 3.17.: The performance levels of the linear kernel across the different values of C for the three different types of input: raw allele counts, scaled allele counts and the weighted allele counts.

The results for the polynomial kernel across different values of C and γ are displayed in figures 3.18 and 3.19. From these it can be seen that the non-scaled inputs, both weighted and non-weighted, result in similar performances across the different degree values, whereas the scaled inputs result in a very similar pattern that was seen before in the GWAS significant dataset with the even degree values performing very poorly, and the value of three performing better. The better performances of the even degrees for the non-scaled inputs could well have been a result of an anomaly in the model building which was highlighted in the XOR example. It is probably safer to assume that the scaled values are more reliable because of this reason, but it was considered of interest to report the vast differences in performance which can happen with the different data types. Of note, when some of the points overlap in the plots, the differences can be quite hard to spot. The reason for this is that the software package used for the plotting (the ggplot2 package for the R language (Wickham, 2009)) plots the points in a layered fashion. The points for the higher degrees are therefore placed over the lower degrees and while an attempt has been made to increase the transparency, there was no other way to plot this tight data and still display the required information. In order to see that

this overlapping of points is happening, a figure of the weighted values for the polynomial kernel across different values of C, with separate panels for the different degree values can be seen in figure A.6 in appendix A.

Raw allele counts Scaled allele counts Weighted allele counts

0.4 0.5 0.6 0.7 0 2 4 6 0 2 4 6 0 2 4 6 C Mean R

OC score from cross v

alidation

Degree

4 3 2

Figure 3.18.: The performance of the polynomial kernel across the different values of C the three different types of input: raw allele counts, scaled allele counts and the weighted allele counts.

Raw allele counts Scaled allele counts Weighted allele counts

0.4 0.5 0.6 0.7 0.00 0.02 0.04 0.00 0.02 0.04 0.00 0.02 0.04 γ Mean R

OC score from cross v

alidation

Degree

4 3 2

Figure 3.19.: The performance of the polynomial kernel across the different values of γ the three differ- ent types of input: raw allele counts, scaled allele counts and the weighted allele counts.

The RBF kernel performance is shown in figures 3.20 and 3.21, and it can be seen that only the weighted inputs were not highly sensitive to the γ parameter. The other two input types only performed above chance levels for the lowest values allowed of this parameter. It is difficult to know which of the input types would be best to use when interpreting the results from this kernel, as it is more robust at dealing with the presence of zero values in the data than the polynomial kernel. This was seen in section 3.2.1 using the XOR examples. It certainly showed more robust performance when the weighted inputs were used, but these levels do not seem to surpass those seen in the linear kernel. In the interest of consistency with the previous dataset, the main focus will be on the scaled allele counts.

Raw allele counts Scaled allele counts Weighted allele counts

0.4 0.5 0.6 0.7 0 2 4 6 0 2 4 6 0 2 4 6 C Mean R

OC score from cross v

alidation

Figure 3.20.: The performance of the RBF kernel across the different values of C the three different types of input: raw allele counts, scaled allele counts and the weighted allele counts.

Raw allele counts Scaled allele counts Weighted allele counts 0.4 0.5 0.6 0.7 0.00 0.02 0.04 0.00 0.02 0.04 0.00 0.02 0.04 γ Mean R

OC score from cross v

alidation

Figure 3.21.: The performance of the RBF across the different values of γ the three different types of input: raw allele counts, scaled allele counts and the weighted allele counts.

The scores and hyper-parameters chosen for the scaled inputs can be seen in table 3.7. Probably the most notable point of interest here is how low the γ values are for the non-linear kernels. Due to these extreme values being used, it was of great interest to see how they performed on the held out data, and if the performance levels replicated for the OMNI chip data.

Table 3.7.: All results and hyper-parameters for the best performing models for the 4,998 alleles in the larger dataset.

Linear Poly-2 Poly-3 Poly-4 RBF

Split Score C Score C γ Score C γ Score C γ Score C γ

1 0.594 0.002 0.4768 1.168 <.0001 0.6115 0.195 <.0001 0.501 1.168 <.0001 0.6359 0.195 <.0001 2 0.594 0.167 0.4899 1.168 <.0001 0.6169 1.168 <.0001 0.502 0.195 <.0001 0.627 0.195 <.0001 3 0.603 0.002 0.494 2.44 0.002 0.6189 0.195 <.0001 0.505 0.195 <.0001 0.642 0.195 <.0001 4 0.603 0.002 0.484 1.168 <.0001 0.614 0.195 <.0001 0.503 1.168 <.0001 0.63 0.195 <.0001 5 0.601 0.002 0.489 1.168 <.0001 0.6116 1.168 <.0001 0.503 0.195 <.0001 0.6329 1.168 <.0001

Performance on the held out test data

As was done in the GWAS significant dataset, the models for all of the splits were tested on the 10% of the samples that were held out each time. Again, the panels represent the five different splits, and the bars show the value of the single score that was made when testing the optimum models on the 10% held out test data. This was done with the models built using the scaled allele counts. The barplots of these can be seen in

figure 3.22. It is immediately evident that none of the kernels performed at the level of the polygenic score (shown with the green bars). Once again, the even degrees of the polynomial are showing a distinct drop in performance, but the linear kernel is no longer showing the same dominance over the others. In every split, is it being beaten by either the RBF kernel, the polynomial-3 kernel, or both.

Split 1 Split 2 Split 3 Split 4 Split 5

0.00 0.25 0.50 0.75 1.00 Scores Algorithm Polygenic Score Linear Polynomial 2 Polynomial 3 Polynomial 4 RBF

Figure 3.22.: The performance of the best models on the 10% of held out data from each splits for the larger dataset.

Permutation procedure for all algorithms

Again, the results seen for the held out data showed the same pattern between the different algorithms as those seen during the CV process. The similarities between the test and CV results can be seen in table 3.8. The Mean CV Score column is the mean score of the score columns for the respective algorithms in table 3.7, and the Mean test score column shows the averages of the test scores across the five splits; the numbers represented by the bar heights in figure 3.22. As all of the data so far has presented only the results of the hyper-parameter search, and the single point scores from the held out data for the five splits, a permutation procedure similar to that seen in the GWAS significant dataset was performed, in order to show distributions of performance for the different algorithms.

Table 3.8.: The mean results of the scores from the CV procedure and the 10% test data splits for all of the algorithms using the 4,998 SNPs.

Algorithm Mean CV Score Mean test score

Polygenic Score N/A 0.703

Linear 0.599 0.6

RBF 0.634 0.649

Polynomial-2 0.487 0.491

Polynomial-3 0.615 0.625

Polynomial-4 0.503 0.508

The parameters for the permutations were set to match the best performances that were observed. For this threshold dataset, for the scaled allele counts, the performances were actually quite robust across the different parameters, but there were still trends showing higher scores. The linear kernel displayed a small preference for very small values of C; the non-linear kernels favoured lower values of γ but were indifferent to the values of C. In scikit-learn, the default value for C is 1, and the default for γ is the reciprocal of the sample size used in the training set, 1

n =

1

0.75×7731 = 0.000172. This value matches the

best performing values so was deemed to be sufficient. Therefore, for this procedure, the default parameters were used with the exception of the linear kernel, which had a value of 0.002 for C, as this was most frequently the best parameter (table 3.7). The main intention here was to see if the increase in performance for the RBF and polynomial- 3 kernels would be consistent for different train/test permutations. As can be seen in figure 3.23, this was indeed the case, with the green and blue boxes of the RBF and polynomial-3 kernels positioned higher than the mustard-yellow coloured box for the linear kernels. The actual median values shown by the centre lines in the boxes were as follows: linear - 0.6008, RBF - 0.6354, polynomial 3 - 0.6164. All of the differences between all pairs of algorithms were shown to be highly statistically significant on repeated pairwise t-tests, with all combinations showing p < 2 × 10−16, after Bonferroni correction

for multiple comparison testing.

The performances for all of the SVMs are greatly outperformed by the polygenic score analysis. This is in itself a linear combination of the features, so while the results of the machine learning alone suggests a role of interactions between the SNPs, the superior performance of the polygenic score method makes it difficult to claim that the inclusion of information from interactions is improving on predictive power.

0.50 0.55 0.60 0.65 0.70 R OC Score Algorithm Polygenic Score Linear RBF Polynomial 2 Polynomial 3 Polynomial 4

Figure 3.23.: Box plots showing the distributions of the 100 train/test permutations for the larger dataset for the I1M chip only.

Inclusion of OMNI chip information

As was done with the GWAS significant dataset, these models were then tested on the samples of the OMNI dataset. The results show a similar pattern to those seen in the held out data and can be seen in table 3.9. The median score from the permutation procedure shown in figure 3.23 is shown in the third column. These values are the numbers represented by the centre line in the boxes.

Table 3.9.: The performance of the models for the 4998 SNP on the OMNI data, and show how this compares with the median scores seen in the box plot.

Algorithm Score on OMNI data Median Score on I1M data Polygenic Score 0.694 0.697 Linear 0.614 0.6 Polynomial 2 0.497 0.49 Polynomial 3 0.62 0.616 Polynomial 4 0.506 0.496 RBF 0.6487 0.635

As these results suggest that it is safe to combine the data from the two chips together for the increased number of SNPs, the same permutation procedure was carried out with the combined data and is shown in figure 3.24. An immediate aspect to note in this figure is that the performance of the SVM models increases substantially with the inclusion of the extra OMNI data. This effect is not seen at all for the permutations of the polygenic score in the leftmost panel. In fact, the only effect seen there is that the variance of the distribution of scores has narrowed. In this plot, the RBF kernel has been moved to be positioned next to the linear kernel to allow for easier comparison.

Polygenic Score Linear RBF Polynomial 2 Polynomial 3 Polynomial 4

0.50 0.55 0.60 0.65 0.70 R OC Score Chip I1M Both Chips

Figure 3.24.: Box plot showing the distributions of results for all the algorithms on the larger dataset of 4998 SNPs. The results show the performances for both the I1M chip only and the combined chip data.

This figure shows that all of the SVM kernels benefited from the additional samples from the OMNI chip, but the polygenic score did not. This observation was supported by the results of independent t-tests for all of the different algorithms, as seen in table 3.10. There was also significant differences seen between all pairwise comparisons of the different algorithms for the information from both chips, using repeated pairwise t-tests. All of these tests showed p < 2.2 × 10−16, after Bonferroni correction.

Table 3.10.: Results of independent t-tests looking at the differences between using information from both chips, and the information from the I1M chip only, for the 4,998 SNPs.

Algorithm t-value(198) p-value

Polygenic Score -0.6 0.55 Linear 16.227 < 2.2 × 10−16 RBF 20.03 < 2.2 × 10−16 Polynomial-2 7 < 3.97 × 10−11 Polynomial-3 19.94 < 2.2 × 10−16 Polynomial-4 9.5 < 2.2 × 10−16

Documento similar