After performing quality control on the study data, no individuals were excluded based on the filter of having at least 90% of variants genotyped successfully. At the marker level, 839,176 variants had poor imputation quality (IMPUTE2 INFO < 0.5), 957,399 variants had minor allele frequencies (MAFs) below 0.01, and three variants had HWE p-values below 1 x 10-6 in females. As a result, 983,033 variants were excluded from analysis due to being outside these quality control thresholds, leaving 267,185 variants passing quality control.
3.3.1 Single Marker Tests
Linear regression was performed separately for males and females for each of the 267,185 chromosome X variants that passed quality control. In these individual analyses, no variants achieved genome-wide significant association (P < 5 x 10-8). A single variant (rs188930264) demonstrated a p-value of less than 1 x 10-5 for association with MSE at age 15 years for females (Table 3.2). No variants demonstrated P < 1 x 10-5 in the male-only analysis.
Table 3.2: Summary of results for variants on chromosome X achieving association test P-values less than 1 x 10-5 (FEMALES). EAF =: effect
allele frequency; Effect = effect size per copy of Effect Allele; SE: standard error of beta estimate; N = 2,146.
Variant Position Effect Allele Other Allele EAF Effect SE P-value Nearest Gene
Figure 3.2: Chromosome X quantile-quantile plot from sex-specific linear regression association tests with MSE at age 15 years. Blue squares = males; red
circles = females. Y-axis shows observed negative log10 p-values and X-axis shows
expected negative log10 p-values according to the null hypothesis of no genetic
association. Red line: line of unity (observed = expected).
Genomic inflation factors (λGC) for males and females were 0.912 and 1.092
respectively. This can be partially observed by the slight deviation of observed p- values from the red line (null hypothesis) in the quantile-quantile plot, which shows a graphical summary of the sex-specific association test results for all variants examined (Figure 3.2). Due to this relatively large discrepancy in λGC values between
as MSE was originally analysed as a quantitative trait, the analysis was repeated with MSE coded as a binary trait, with cases defined as individuals with MSE of ≤-1.00 D as used in previous investigations of children from this cohort (Guggenheim et al., 2012; Guggenheim et al., 2013b; Guggenheim et al., 2014; Shah et al., 2017). Tests for association were repeated for this binary trait using logistic regression, with λGC values now 1.044 and 1.003 for males and females,
respectively. In addition, an alternative, hypothetical, quantitative trait was simulated for the 4070 individuals included in the original XWAS. This new trait was designed to follow a standard normal distribution, centred on a mean of zero and standard deviation of one. This simulated trait breaks the relationship between genotype and phenotype; thus, the resulting association test p-values would be expected to follow the null hypothesis of being uniformly distributed (λGC = 1). The
λGC values for the simulated quantitative trait were 1.016 and 1.021 for males and
females respectively. Together, the above results for the binary trait and the simulated quantitative trait support the validity of the methods used to calculate λGC values for this XWAS investigation. Thus, the observed discrepancy in λGC
between the sexes (0.912 vs. 1.092) is likely to be a chance finding.
In the meta-analysis of the XWAS summary statistics for males and females, no marker achieved P < 5 x 10-8 (Table 3.3, Figure 3.3).
Table 3.3: Summary of results for variants on chromosome X from meta-analysis of association tests for MSE at age 15 years (P < 1 x 10-5).
EAF = effect allele frequency; Effect = effect size per copy of allele B; SE = standard error of beta estimate; N = 4,070 for all variants.
Variant Position Effect Allele Other Allele EAF Effect SE P-value Nearest Gene
rs145471572 128366278 G C 0.012 -0.654 0.134 1.04 x 10-6 RPS26P56
rs189623102 118251135 C T 0.010 -0.579 0.120 1.41 x 10-6 KIAA1210
rs58142779 118182646 A C 0.012 -0.541 0.119 5.27 x 10-6 LOC727838
Figure 3.3: Chromosome X Manhattan and Quantile-Quantile plots from meta-analysed results. Panel A: Horizontal blue line denotes an
arbitrary threshold for declaring “suggestive” evidence of association (P = 1 x 10-5). Panel B: Y-axis shows observed negative log10 p-values and
X-axis shows expected negative log10 p-values according to the null hypothesis of no genetic association. Red line = line of unity (observed =
3.3.2 Sex-Specific Effects
Figure 3.4 shows a graphical summary of the sex-specific association test results for all variants examined. This plot appears to be generally symmetrical between the upper and lower sections, representing females and males respectively. However, some variants appeared to demonstrate stronger association with MSE in one sex than the other, as shown by asymmetry of the peaks in the figure. Closer inspection of effect sizes (beta) estimated from the individual association tests showed virtually no correlation between males and females (r = -0.016; Figure 3.5). Moreover, Welch’s unpaired samples t-test identified no significant difference in mean effect size between males and females (t = -0.769, df = 498,640, P = 0.442). Further evaluation of these differences in effect size between the sexes showed that, for a false discovery rate of 5%, there were no significant differences. The ten variants demonstrating the greatest difference in effect size between the sexes are shown in Table 3.4.
Figure 3.4: Chromosome X Miami plot from sex-specific linear regression association tests for MSE at age 15 years. Upper portion (red) =
Figure 3.5: Differences in effect size (beta) estimates for males and females from individual chromosome X association tests. Plot of beta estimates for males vs.
females from individual chromosome X association tests. Red line = line of unity (Females Beta = Males Beta). N = 4,070 across 267,185 variants.
Table 3.4: Summary of the 10 variants demonstrating the greatest difference in effect size (beta) between the sexes. EAF = effect allele
frequency; SE = standard error of beta estimate; P SexDiff = p-value from the test for male and female beta estimates being identical.
Variant Position Effect Allele Other Allele Males Females P SexDiff N EAF Beta SE N EAF Beta SE
rs194287 119052285 T C 1924 0.145 0.077 0.042 2146 0.150 -0.223 0.054 1.13 x 10-5 rs194289 119053638 A G 1924 0.145 0.077 0.042 2146 0.150 -0.222 0.054 1.17 x 10-5 rs194284 119050148 C T 1924 0.181 0.087 0.040 2146 0.188 -0.196 0.051 1.24 x 10-5 rs5910718 119087645 A G 1924 0.169 -0.067 0.039 2146 0.167 0.214 0.051 1.30 x 10-5 rs5910714 119083274 C T 1924 0.169 -0.067 0.039 2146 0.167 0.214 0.051 1.31 x 10-5 rs138808720 119096689 C G 1924 0.244 -0.064 0.035 2146 0.239 0.186 0.046 1.50 x 10-5 rs194320 119078123 T C 1924 0.170 0.069 0.039 2146 0.167 -0.209 0.051 1.52 x 10-5 rs74945102 119039414 G A 1924 0.297 0.110 0.033 2146 0.303 -0.122 0.043 2.02 x 10-5 rs5910730 119101709 C T 1924 0.175 -0.059 0.040 2146 0.171 0.219 0.052 2.25 x 10-5 rs5910728 119101395 G A 1924 0.235 -0.066 0.035 2146 0.229 0.176 0.046 2.60 x 10-5
3.3.3 Gene-based and Gene-set Analyses
Gene-based test were performed using VEGAS2 and MAGMA. The 10 genes demonstrating strongest association are presented in Tables 3.5 and 3.6, respectively. Both methods produced similar results as to identifying which genes were most likely to be associated with MSE at age 15 years. However, after adjusting p-values for multiple testing using a FDR of 5% (0.05), the strength of association signal differed between the two methods. Specifically, no genes from the VEGAS2 analysis demonstrated evidence of association, yet MAGMA analysis suggested the genes GPM6B, PRPS1, ZNF449 and NRK as likely candidate genes.
A competitive gene-set analysis was also performed using MAGMA. In this analysis, in which 10,468 gene-sets were tested for association with MSE at age 15 years, a total of 13 gene sets achieved a FDR < 0.05. Of these 13 gene sets, five had association test p-values < 0.05 after Bonferroni correction (Table 3.7). The gene
Table 3.5: The 10 genes demonstrating strongest association from the VEGAS2 gene-based association test. Start and stop positions listed
include ±50 kb flanking regions. nSNPs = number of variants included in gene region; Test Statistic = gene-based χ2 test statistic with nSNPs degrees of freedom; P-value = obtained from Test Statistic and adjusting for LD between variants; FDR = false discovery rate; Lead Variant = variant within gene locus with strongest association signal from previous SNP-based association test. Total number of genes tested = 1,252.
Gene Start Stop nSNPs Test Statistic P-value FDR Lead Variant
GPM6B 13739061 14006831 468 1514.83 2.59 x 10-4 0.324 rs6633386 ZNF449 134428695 134547338 131 701.04 9.60 x 10-4 0.411 rs5930699 ZNF75D 134332535 134528012 177 794.71 1.30 x 10-3 0.411 rs5975507 PRPS1 106821653 106944256 49 254.12 1.60 x 10-3 0.411 rs9887704 BCORL1 129089163 129242058 94 279.94 1.64 x 10-3 0.411 rs875080 MIR6086 13558410 13658465 159 524.77 2.43 x 10-3 0.507 rs3747418 LOC100129520 124403968 124506950 151 933.25 3.19 x 10-3 0.571 rs3135278 LOC100506790 134480353 134581672 97 411.68 4.04 x 10-3 0.632 rs5930699 KIAA1210 118162597 118334542 178 485.18 6.18 x 10-3 0.640 rs189623102 FRMPD3 106715679 106898474 35 95.19 6.43 x 10-3 0.640 rs66874155
Table 3.6: The 10 genes demonstrating strongest association from the MAGMA gene-based association test. Start and stop positions listed
include ±50 kb flanking regions. nSNPs = number of variants included in gene region; Z-Statistic = gene-based test statistic; P-value = obtained from Z-Statistic under the assumption of a normally distributed model; FDR = false discovery rate. Total number of genes tested = 803.
Gene Start Stop nSNPs Z-Statistic P-value FDR
GPM6B 13739062 14006861 479 3.82 6.59 x 10-5 0.045 PRPS1 106821654 106944256 52 3.58 1.74 x 10-4 0.045 ZNF449 134428696 134547338 157 3.57 1.79 x 10-4 0.045 NRK 105016536 105252602 201 3.51 2.22 x 10-4 0.045 ZNF75D 134369719 134528034 185 3.33 4.29 x 10-4 0.069 LOC100129520 124403698 124509063 160 2.85 2.22 x 10-3 0.296 FRMPD3 106719819 106898474 37 2.76 2.91 x 10-3 0.334 BCORL1 129064277 129242058 132 2.64 4.11 x 10-3 0.413 CTAG1A 153763418 153865075 41 2.47 6.74 x 10-3 0.559 KIAA1210 118162598 118334542 196 2.43 7.45 x 10-3 0.559
Table 3.7: Gene-sets demonstrating FDR < 0.05 from MAGMA gene-set association test. nGenes = number of genes included in gene set; Beta
= gene-set test statistic; SE = standard error; FDR = false discovery rate; Bonferroni Adjusted P-value = P-value multiplied by the number of gene-sets tested. Total number of gene-sets tested = 10,468.
Gene-set nGenes Beta SE P-value FDR Bonferroni
Adjusted P-value CEBALLOS_TARGETS_OF_TP53_AND_MYC_DN 2 3.80 0.778 6.45 x 10-7 0.002 0.007 WALLACE_PROSTATE_CANCER_DN 1 3.94 0.808 7.08 x 10-7 0.002 0.007 ROSS_LEUKEMIA_WITH_MLL_FUSIONS 1 3.94 0.808 7.08 x 10-7 0.002 0.007 PLASARI_TGFB1_SIGNALING_VIA_NFIC_10HR_UP 1 3.94 0.808 7.08 x 10-7 0.002 0.007 RNTCANNRNNYNATTW_UNKNOWN 3 2.59 0.574 3.73 x 10-6 0.008 0.039 GSE15324_NAIVE_VS_ACTIVATED_ELF4_KO_CD8_TCELL_DN 2 2.45 0.563 7.68 x 10-6 0.013 0.080 GSE43955_TGFB_IL6_VS_TGFB_IL6_IL23_TH17_ACT_CD4_TCELL_60H_UP 4 1.77 0.409 8.89 x 10-6 0.013 0.093 GCM_AQP4 2 2.60 0.605 1.04 x 10-5 0.014 0.109 FINETTI_BREAST_CANCERS_KINOME_BLUE 1 2.33 0.565 2.15 x 10-5 0.023 0.226 GSE41867_DAY6_VS_DAY15_LCMV_CLONE13_EFFECTOR_CD8_TCELL_UP 5 1.57 0.382 2.21 x 10-5 0.023 0.232 HADDAD_B_LYMPHOCYTE_PROGENITOR 3 2.06 0.51 2.89 x 10-5 0.027 0.302 GSE3920_IFNA_VS_IFNB_TREATED_ENDOTHELIAL_CELL_DN 6 1.53 0.394 5.49 x 10-5 0.048 0.575
3.3.4 Power Calculation
The power to detect associations in this current investigation was calculated in order to identify likely causes for the lack of evidence supporting variation on chromosome X with MSE at age 15 years. Table 3.8 shows that based on the sample size used for the current investigation, there was only 2.72% power to detect variants with an effect size of 0.10 D and MAF of 0.50 at p-value < 5 x 10-8. The power to detect associations decreased steadily for less common variants (Table 3.8, Figure 3.6A).
As the desired power to detect variants with true associations is typically defined as 80% (Cohen, 1992; Hong and Park, 2012), the power calculation was repeated to identify the sample size required to achieve this level of power. As shown in Figure 3.6B, a sample of approximately 35,000 individuals would have the ability to successfully detect 80% of true associations across variants with MAF > 0.1. A sample size of approximately 70,000 individuals would be required to detect variants with a MAF of 5%. To achieve 60% power for variants with MAF of 5%, the sample size required would be just over 55,000 individuals.
Table 3.8: Power (β) to detect an effect size of 0.10 D for variants with MAFs ranging between 0.01 and 0.50. Based on a sample size of N = 4,070 for a normally
distributed trait with mean (SD) = -0.382 (1.28) D and Type I error rate (α) of 5 x 10-8. MAF = minor allele frequency.
MAF Power (%) MAF Power (%) 0.01 < 0.01 0.26 0.92 0.02 < 0.01 0.27 1.02 0.03 < 0.01 0.28 1.12 0.04 < 0.01 0.29 1.22 0.05 < 0.01 0.30 1.32 0.06 0.01 0.31 1.43 0.07 0.01 0.32 1.53 0.08 0.02 0.33 1.64 0.09 0.03 0.34 1.74 0.10 0.04 0.35 1.84 0.11 0.06 0.36 1.94 0.12 0.08 0.37 2.04 0.13 0.10 0.38 2.13 0.14 0.13 0.39 2.22 0.15 0.17 0.40 2.30 0.16 0.21 0.41 2.37 0.17 0.25 0.42 2.44 0.18 0.31 0.43 2.50 0.19 0.36 0.44 2.56 0.20 0.43 0.45 2.61 0.21 0.50 0.46 2.65 0.22 0.57 0.47 2.68 0.23 0.65 0.48 2.70 0.24 0.74 0.49 2.71 0.25 0.83 0.50 2.72
Figure 3.6: Power calculations. Panel A: Power to detect variants with an effect size of 0.10 D for the ALSPAC sample with MAFs ranging
between 0.01 and 0.50. Based on a sample size of N = 4,070 for a normally distributed trait with mean (SD) = -0.382 (1.28) D and Type I error rate (α) of 5 x 10-8. MAF = minor allele frequency. Panel B: Minimum sample sizes required to detect variants with effect sizes of 0.10 D with 60- 80% power. Based on a normally distributed trait with mean MSE (SD): -0.382 (1.28) D and Type I error rate of 5 x 10-8. MAF = minor allele frequency.