Capítulo 1. La embriaguez en la época prehispánica y la época colonial
1.2 El consumo de bebidas embriagantes tras la conquista
1.2.1 La participación de los frailes ante el control de la embriaguez
NEU4768
Pedigrees for FAM002 and FAM006 are illustrated in Figure 3.6. Pedigree for FAM007 was not available.
3.5.3.2 Single-‐SNP and aggregate tests for rare variants
Analysis in the study so far has based criteria for variant searching by the filtering approach and largely focused on loss of function variants only. A second more practical approach taking into account all sequenced variants was used to perform single SNP and gene-‐level burden tests. The single SNP test compared calls from 222 neurological disorder control exomes (captured with Agilent Sure Select version 1) to coeliac exomes, similar to the test one would apply in a GWAS. An excess of rare variants in the HLA-‐complex on chr6 was observed, with significant p values ranging from 10-‐4 and 10-‐7, as illustrated in the
Manhattan plot (Figure 3.10). No other SNP reached p=10-‐7 or higher. A synonymous SNP in NDUFV2 on chr18 reached p=1.26-‐6 (MAF 0.0187); mutations in this gene are associated with Parkinson’s disease (Hattori, Yoshino et al. 1998) and bipolar disorder (Washizuka, Kakiuchi et al. 2003; Doyle, Dahl et al. 2011) highlighting that the signal is likely to be associated with one of the neurological diseases in the control exomes rather than CD. While the test accounted for target capture efficiency and only calls with comparable call rates were used, there are still evident pitfalls using different capture platforms (Agilent Sure Select for controls compared to Roche NimbleGen for coeliacs) and likely false positives were evident (see quantile-‐quantilte (Q-‐Q) plot in Figure 3.11).
Figure 3.10: Manhattan plot of single-‐SNP tests comparing the case data (n =
41) with the control samples (n = 222)
Figure 3.11: Q-‐Q plot of single-‐SNP tests comparing case data (n=41) with control samples (n=222)
An aggregate test for rare variants in a complex trait, using a minor allele frequency based on 1000G, offers a genome wide approach that limits problems that can be associated with SNP filtering: within-‐gene heterogeneity and reduced penetrance. This type of test compares the number of variants within a gene to the genome-‐wide distribution of rare variants in the same functional category to derive a gene-‐based Fisher exact P-‐value (two-‐tailed) (Stitziel, Kiezun et al. 2011; Kiezun, Garimella et al. 2012). The test aggregates variants into discrete features (a natural grouping unit in the exome is a gene) to obtain greater statistical power. This is achieved by reducing multiple tests, as the number of genes containing aggregated rare variants is tested rather than one test per rare variant, and combining allele frequencies of aggregated variants to achieve a higher overall allele frequency compared to small individual rare variant allele frequencies.
0 5 10 15 20 25 0 10 20 30 40 Prion_celiac
Expected distribution: chi−squared (2 df) Expected Obser ved + + + + + + ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +++++++++ +++++++++++++++++++++++ + + +++++++++ ++++ +++++++ ++++++ + +++ + + +++ +++ ++++++ +++ + ++++ ++ +++
Three tests were performed comparing SNP calls in cases and controls based on a genome-‐wide distribution of rare alleles. A single-‐SNP P-‐value from multiple variants in a gene was combined and derived from a two-‐tailed Fisher exact test, allowing the same inferences as one would make in a genome wide association test. Related exome-‐sequenced individuals were removed to eliminate bias, and the remaining 41 exomes were compared to 222 neurological control exomes; only variants with a MAF <0.5% in 1000G 2011 reference dataset were observed. Genes with rare variants in all deleterious functional categories are shown in Table 3.5. Table 3.6 lists genes harbouring loss of function variants only. Table 3.7 lists genes with loss of function variants in immune genes.
Table 3.5: Top 5 most significant genes for the aggregate test rare variants (LoF,
non-‐synonymous and splice site) between cases and controls
Gene
Number of rare alleles in controls
(n = 222)
Number of rare alleles in cases (n = 41*) Fisher p PER2 6 9 0.00057 PLEKHA6 0 4 0.00097 FLG 5 7 0.0026 SLC3A1 2 5 0.0029 WDR59 2 5 0.0029
*Total number of cases after removing additional exomes within each family. Rare allele is defined by a frequency less than 0.5% in the 1,000 genomes data (n = 1092).
Table 3.6: Top 3 most significant genes for the aggregate test for rare LoF
variants only between cases and controls
Gene
Number of rare alleles in controls
(n = 222)
Number of rare alleles in cases (n = 41*) Fisher p ITGAE 0 2 0.027 TEX14 0 2 0.027 CUBN 2 3 0.043
*Total number of cases after removing additional exomes within each family. Rare allele is defined by a frequency less than 0.5% in the 1,000 genomes data (n = 1092).
Table 3.7: Top 15 most significant genes for the aggregate test for rare LoF
variants in immune genes between cases and controls
Gene
Number of rare alleles in controls
(n = 222)
Number of rare alleles in cases (n = 41*) Fisher p CD1C 0 3 0.005 CERK 0 3 0.005 CRLF3 0 3 0.005 DDR1 2 4 0.010 HLA-‐DOA 4 5 0.012 ZFYVE16 4 5 0.012 IKZF3 1 3 0.016 RPS6KA2 1 3 0.016 CDH17 3 4 0.020 LPP 5 5 0.020 CD180 0 2 0.022 CTGF 0 2 0.022 DNM1L 0 2 0.022 EB13 0 2 0.022 IFNW1 0 2 0.022
*Total number of cases after removing additional exomes within each family. Rare allele is defined by a frequency less than 0.5% in the 1,000 genomes data (n = 1092). The results in tables 3.6 and 3.7 are based on multiple testing corrections hence the observed differences in P values; the test in table 3.7 contained a lower number of genes than the test in table 3.6, so the penalty for multiple testing was reduced.
Genes in table 3.5 did not appear to have any potential function for CD susceptibility, or any other overlapping disease where one can deduce a shared function. For example, an excess of rare variants in cases and controls in PER2 is possibly owing to its function as a circadian pacemaker in the mammalian brain involved in behavioral and metabolic factors, rather than being enriched for CD risk variants; mutations in SLC3A1 are associated with cystinuria, an autosomal recessive disease characterized by kidney stones (Pras, Raben et al. 1995); FLG, a gene that encodes the filaggrin protein that forms a component of the skin barrier, has strongly associated LoF variants in atopic eczema and ichthyosis vulgaris (Sandilands, Terron-‐Kwiatkowski et al. 2007) but no association has been implicated in InBD susceptibility (Van Limbergen, Russell et al. 2009). Based on protein function, ITGAE and CUBN were suggestive candidates for further screening. ITGAE, also known as CD103, encodes an alpha integrin involved in tissue specific retention of T lymphocytes at the basolateral surface of intestinal epithelial cells and is a possible accessory function for the activation of epithelial cells (Cepek, Parker et al. 1993; Sheridan and Lefrancois 2011). Two confirmed novel stop gain (nonsense) SNVs in ITGAE c.2962G>T (p.Glu988X) (identified in SAL-‐12553-‐6 from FAM014) and c.314T>A (p.Leu105X) (identified in Neu7058-‐39198 from Neu7058), were not present in 222 controls. Both SNPs were tested for segregation in all affected and unaffected individuals of FAM014 and Neu7058. The c.314T>A substitution was present in four individuals in Neu7058, three of which were non-‐disease cases. Only one other unaffected individual carried the c.2962G>T substitution in FAM014. Neither mutation segregated with disease in two families.
CUBN (cubilin) is located on chromosome 10p21.1 and is expressed within the
epithelium of the intestine where it acts as a receptor for intrinsic factor-‐ vitamin B (12) complexes (Fodinger, Wagner et al. 2001). Missense and insertion mutations in this gene have been associated with megalobastic anaemia in Finnish families (Aminoff, Carter et al. 1999), a rare autosomal recessive condition characterized by selective intestinal vitamin B12 malabsorption. It is not known whether the three individuals bearing the nonsense mutation in this gene have megaloblastic anaemia; it is common for
CD patients to have low B12 and folate levels, causing pernicious anaemia. A recent meta-‐analysis to identify risk variants for albuminuria for early prevention of chronic kidney disease located a risk variant in CUBN to be associated with albuminuria level in individuals with diabetes (Boger, Chen et al. 2011). Three novel stop gain (or nonsense) mutations in CUBN (RefSeq accession number NM_001081) were observed in three separate individuals: c.4459C>T (p.Arg1487X), c.5428C>T (p.Arg1810X) and c.6359G>A (p.Trp2120X). All substitutions are possibly damaging, predicted by PolyPhen and GAIIx sequence pile-‐up data indicated real heterozygotes with a high read depth (173, 44 and 53 respectively), confirmed by Sanger sequencing.
Overall, candidate genes harbouring true (as confirmed by Sanger sequencing) rare variants, i) shared by related exomes, ii) that showed a higher burden in cases than controls, and iii) that segregated in familial disease cases, were selected for resequencing based on interesting immune function, size and number of exons.
3.6 Chapter discussion
Strategies to discover rare major impact variants in common disease have been widely discussed (Cirulli and Goldstein 2010; Eichler, Flint et al. 2010) and exome-‐sequencing based studies are a popular approach to test for association of rare coding variants with complex phenotypes. The empirical successes of candidate gene resequencing (Ji, Foo et al. 2008; Johansen, Wang et al. 2010) and Mendelian studies suggest a large portion of disease-‐associated variation lie within coding exons (Cooper, Ball et al. 1998; Botstein and Risch 2003; Glazov, Zankl et al. 2011). Based on this, it was likely that many rare mutations in a gene(s) were to be located that could contribute to missing disease heritability.
The 75 coeliac sample dataset contained an abundance of rare coding variants (~33,000) and sequencing additional samples would probably continue to reveal additional rare variants. Keizun et al. discovered that as sample size increases the number of observed variants increases (an average of 40 times more