3. Calidad, funcionamiento y desarrollo de los SLEP
3.1.1 RESULTADOS SIMCE Y ESTÁNDARES DE APRENDIZAJE
First of all, we are interested in the preservation of type one error of correlation approach in each setting described above. Preservation of type one error is one
important aspect to determine the performance of the method. In dierent settings, we all consider a genomic region in which individuals may have a CNV. The purpose of the analysis is to detect and locate such CNVs associated with a particular phenotype.
The null hypothesis for our association test may hold in one of two ways: (1, No CNV) no individuals with CNVs are present in the sample, or (2, No association) individuals with CNVs are present in the sample, but the CNV does not aect the disease and thus dose not change the probability of developing the phenotype. The eect of transformations of p-values and association direction would be similar. In this section, I focus on the analysis for dierent transformations of p-values in signed case. Simulation results are shown in tables and gures.
4.5.1.1 large n, small J setting
For large n, small J setting, I set n = 1000 and J = 200, which is exactly the same setting as permutation approach. The results for the FWER analysis are summarized in Table 4.1 for dierent transformations of p-values in signed direction of associations.
Table 4.1 perfectly demonstrates that correlation method under such setting is guaranteed to preserves the correct type I error under both null hypothesis for dier-ent transformations in signed directions of association under linear model assumption between phenotype and genotype. This phenomenon is also illustrated graphically in Figure 4.6. Obviously, the correlation method to estimate the whole sample correla-tion is nice to apply since it preserves accurate type one error and p-values under the null appear to be uniformly distributed.
4.5.1.2 small n, large J setting
For small n, large J setting, I set n = 50 and J = 500 for simulation. As details described before, the rank of the estimated sample correlation does not depend on the
Table 4.1: Preservation of Type I error for correlation method in large n, small J
setting for dierent transformations in signed direction of association with nominal α = .05 in two possible settings for which the null hypothesis holds. The simulated genomic region contained 200 markers, 30 of which were spanned by a CNV with signal-to-noise ratio of 2. The CNV was present in either 0% or 50% of the 1000 samples, depending on the null hypothesis setting. A detailed description of the simulation data is given in Section 3.4.
Signed Signed Signed
None Normal Log
No CNV 0.041 0.042 0.046
No Association 0.043 0.046 0.046
No CNV
Figure 4.6: Ability of correlation approach under dierent transformations in signed directions of association for Large n, small J setting to maintain family-wise error rate under the two null scenarios.
number of markers but on the sample size n. Applying singular value decomposition, we simplify the correlation matrix and speed up the calculation. The results for the FWER analysis are concluded in Table 4.2 for signed direction of associations with
dierent transformations of p-values. And the distributions of p-values under two kinds of H0 are shown as well in Figure 4.7.
Table 4.2: Preservation of Type I error for correlation method in small n, large J setting for dierent transformations and directions of association with nominal α = .05 in two possible settings for which the null hypothesis holds. The simulated genomic region contained 500 markers, 30 of which were spanned by a CNV with signal-to-noise ratio of 2. The CNV was present in either 0% or 50% of the 50 samples, depending on the null hypothesis setting. A detailed description of the simulation data is given in Section 3.4.
Signed Signed Signed
None Normal Log
No CNV 0.049 0.049 0.048
No Association 0.046 0.047 0.048
From the table and gure, it is pretty good results in terms of the preservation of type I error and p-values under two null are close to uniform distribution. Therefore, although it is quite diculty to estimate the whole sample correlation matrix using tradition way under small n, large J setting, SVD approach provides a perfect al-ternative to estimate the correlation matrix and conduct the kernel-base aggregation of marker-level association test.
4.5.1.3 large n, large J setting
Similarly, under large n, large J setting, I take n = 300 and the total number of markers J = 500. Applying shrinkage approach, it is necessary to select an optimal number of sparse diagonals and an appropriate shrinkage intensity. As illustrated in Section 4.4.3, the larger the number of sparse diagonal, the more information the sparse correlation matrix includes and thus the more accurate estimate of correlation
No CNV
Figure 4.7: Ability of SVD approach under small n, large J setting for dierent transformations of p-values in signed association to maintain family-wise error rate under the two null scenarios.
matrix, which requires less shrinkage. It is suggested to pick d ≥ 3bw from simula-tions. Figure 4.5 shows that type one error tend to be stable when the number of sparse diagonals is larger than one specic number. Therefore, as we do not know the true CNV size, it will be better to pick a large d to preserve type I error and contain the important correlation part while choosing a smaller d to keep it sparse. For the simulation, I choose a bandwidth of 30 markers for kernel-based method and pick the number of sparse diagonal d = 150 and minimum shrinkage intensity λ = 0.012 to guarantee the sparse and positive-denite correlation matrix. The results of FWER analysis are present in Table 4.3 and Figure 4.8.
Under large n, large J, a lot of approaches can not be applied and thus
corre-Table 4.3: Preservation of Type I error for applying shrinkage approach on correlation method for dierent transformations in signed association with nominal α = .05 in two possible settings for which the null hypothesis holds. The simulated genomic region contained 500 markers, 30 of which were spanned by a CNV with signal-to-noise ratio of 2. The CNV was present in either 0% or 50% of the 300 samples, depending on the null hypothesis setting.
Signed Signed Signed
None Normal Log
No CNV 0.052 0.047 0.038
No Association 0.062 0.059 0.053
No CNV
Figure 4.8: Ability of shrinkage approach under dierent transformations in signed association to maintain family-wise error rate under the two null scenarios.
lation matrix is complicated to estimate. In terms of preservation of type one error, the shrinkage method seems to be a nice choice to estimate the correlation matrix
under this setting when picking a proper d and shrinkage intensity λ.