• No se han encontrado resultados

The test was first implemented on the estimates from the bolasso method discussed in section 5.1. Those estimates that were assigned a value different than zero from the bolasso were the ones that their significance was assessed by the covariance test. Therefore for this step, the data matrices for each dataset included only the columns which corresponded to the non-zero estimates from the bolasso. Before applying the covariance test, the true λ-knots had to be computed again for those estimates, with respect to the new data matrices. This was again done by the predictor-

76 CHAPTER 5. RESULTS OF THE ANALYSIS corrector algorithm. This time however, we did not need to run bootstrap and the data matrices were of smaller size. Therefore, the complete predictor-corrector algorithm was used and not the modified one.

The p-values from the covariance test are shown in figure 5.7. Considering a 0.05 significance level, the red line is placed on that value for the y-axis. Moreover, the order by which the p-values are given (according to x-axis), corresponds to the order by which the coefficients enter the model. That is, the first value is for the first coefficient which enters the model, etc. According to Lockhart et al. [15], the coefficients that have a p-value bigger than the given significance level, are considered non-significant given the others in the model. As we previously stated, there is a big ongoing discussion around the interpretation of those p-values. We however, shall consider them under the usual interpretation, by now.

Those coefficients that have a p-value smaller than 0.05 are considered to be significant. That is, they have to be in the model, while the others don’t. Obvi- ously, the total number of significant coefficients for the N RD dataset is higher than that for the RD dataset, which is completely reasonable. Finally, the reason that the RD dataset has two significant coefficients can be connected with possible non-randomness that were accidentally generated. This however, can not be in- vestigated any further. The true outcome from the figure is that the N RD dataset has more significant coefficients and bolasso has indeed found them.

Figure 5.7: This figure shows the p-values from the covariance test, for the bolasso estimates. The right figure is from the N RD dataset, while the left is from the RD dataset. The red line in both figures is placed on y = 0.05, assuming an α = 0.05 significance level. For the N RD dataset, the total number of coefficients chosen as non-zero from the bolasso is 64, while from those, 19 are considered significant from the covariance test. For the RD dataset, the total number of coefficients chosen as non-zero from the bolasso is 33, while only 2 are considered significant from the covariance test.

Taking into account only the significant coefficients, the lasso paths were again computed. In figure 5.8 we see the lasso paths from the significant coefficients of the bolasso. The left one is from the N RD and the right is from the RD dataset. Clearly, we now see some differences with the figure 5.6 and the background fre- quencies which were common to each dataset have gone. This was expected. Even if the background frequencies were the same for the two datasets, the coefficients from N RD had weights for causing the MI event and thus, they are significant.

5.2. SIGNIFICANCE TESTING 77

Figure 5.8: This figure gives a further reduction of the figures 5.1 and 5.6, where only the significant coefficients are taken into account. Again, the left figure is from the N RD dataset while the right is from the RD. Each coloured line in the plot corresponds to one coefficient. The y-axis is the value of the estimated coefficient and the x-axis consists of the λ-knots, computed by the predictor-corrector algorithm and placed in descenting order.

Figure 5.9 shows only the significant estimates of the bolasso method. The noteworthy here, is that none of the leftover coefficients has a confidence interval which includes the zero value. That is, all the estimates from the bolasso, which had taken the zero value at least once among the bootstraps, are not considered as significant at all. This applies to both datasets. Note however, that as said in section 4.1, 100 drugs were given weights for the N RD dataset. Here, not only the significant ones but also the ones from the bolasso output, are clearly less than 100. This has probably happened because of the reduction step in figure 4.2. On that step, we couldn’t ensure that some of the 100 drugs will not leave the N RD dataset.

Figure 5.9: This figure is a reduced version of figure 5.5, where the non-significant estimates have been removed. The left figure corresponds to the N RD dataset, while the right figure corresponds to the RD dataset. The red points are the estimated coefficients from the bolasso, computed by taking the mean among the bootstrap samples. The black lines correspond to 95% confidence intervals of each estimate, which were obtained from the bolasso.

78 CHAPTER 5. RESULTS OF THE ANALYSIS So far, it seems that the covariance test works quite well. Any confusion that could have been caused by the intervals which included the zero value, is now gone. Furthermore, it is quite remarkable the fact that not all of the bolasso estimates are significant. Since the lasso method, in general, sets some coefficients exactly to zero, one should expect that using bolasso, all the non-significant estimates will be set to zero. This however, seems not to be the case.

Finally, the significant estimates were chosen for each datasets and were re- estimated. The re-estimation was done using cyclic coordinate descent on the matrices which contained only the columns of the significant coefficients. Further- more, the estimation was done using λ = 0 this time, which corresponds to the usual maximum likelihood [15]. Those estimates are the final estimates for the two datasets from the bolasso method and will be presented in section 5.2.3.