FUNDACION EDUCATIVA INSUTEC AREA DE BIENESTAR

Participaciòn de estudiantes por programas

FUNDACION EDUCATIVA INSUTEC AREA DE BIENESTAR

We observed homozygous deletions in 894 CNVRs across the genome, with 376 (42.1%) homozygous deletion CNVRs residing on segmental duplications (Suppl. Fig. 4). While 70.6% of homozygous deletion regions were only observed in a single individual, 10% were observed in 10 or more individuals encompassing 60 Mb of sequence, suggesting that approximately 2% of the human genome may be “disposable.” However, phenotypic

information on these individuals is of particular interest with respect to a potential role of a given disease gene and direction of intervention at a gene or biological pathway level. To determine if CNVs cluster at specific genome hotspots, we investigated the sequence content at the sites of CNV. Among 5,378 CNVRs uncovered, 1,725 (32.9%) deletion CNVRs and 1,150 (42.5%) duplication CVNRs reside on segmental duplications. The majority of CNVRs harbored both deletions and duplications: 5,091 (97.2%) of the deletion CNVRs also have duplications and 2,623 (96.9%) of duplication CNVRs also have deletions at these loci. Segmental duplication rearrangements are generated by non- allelic homologous recombination; however, not all annotated segmental duplications are fixed in humans, but rather are CNVs. Thus, CNVRs harbor both deletions and

duplications, whereas pairs of segmental duplications with high sequence similarity, including dispersed repetitive elements (Alu elements), retrotransposons, and sequence homology within 100bp segments, are all features of the human genome that contribute to extensive CNV aggregation over generations (43).

The recombination hotspots of the genome predispose to CNVs and were found to be enriched for CNVs (Sup. Figure 3.8) as previously published (39). To further emphasize this point, we have overlaid our CNVRs with publicly available recombination hotspot maps in order to make a collective conclusion that recombination hotspots correlate with CNV boundaries (Sup.Figure 3.9).

To explore the potential of lethal homozygozity loci as determined by absence of

with significantly low homozygous deletion rate in search for loci out of Hardy Weinberg equilibrium that are likely to be homozygote lethal. We observed ATP binding,

intracellular organelle lumen, transmembrane transport, and metal ion binding genes to meet these criteria (Sup. Table 3.17), suggesting that these genes are of fundamental biological importance for survival.

We did PCA on the raw GWAS data to address population stratification and to verify reported ethnicity. By using the correlates as a covariate for the logistical regression test statistic, the correlates are removed from any confounding.

Regarding novelty of the CNV content uncovered, 17% of the CNVs we observed are novel, thus 83% concur with previous reports, of which about 15% would be classified as large CNVs (i.e., above 100kb). Of the 17% novel CNVs, all CNVs represented with 10 or more SNPs were

experimentally validated without failure. Over 95% of the large CNVs (>100kb) are captured by more than 10 SNPs. These CNVs replicate between ethnicities in our study and frequency observed here compares to published studies such

Figure 3.4. Frequency, Length and Gene Impact Features of CNVRs detected in this study.

Increased frequency CNVRs tend to be biased away from genes and be restrained to smaller genomic regions. Duplications appear to be less constrained.

as Conrad et al., typically used as gold standard. The ParseCNV algorithm used for the analysis (70), has been extensively validated for CNV confidence measures, providing another level of QC standard for CNV call validation.

It is noteworthy that in general, deletions tend to be biased away from genes, whereas ancestral duplications appear to cluster on certain gene families throughout the course of evolution (Figure 3.4). While it can be difficult to define the exact CNV breakpoints, it is usually clear if a CNV disrupts genes/exons or not. Common CNVs are less likely to disrupt genes and are therefore less likely to impact on disease than are rare CNVs. Common variants typically flank disease associated regions, consistent with the intricate and fragile balance of such variation.

3.3 Functional impact of CNV loci and relations to specific genomic elements To evaluate the relationship between CNV location and disease impact, we investigated functional elements of the genome to see if CNVs were observed in critical regions including RefSeq genes, OMIM genes, Ultra-conserved elements, conserved non-coding elements, non-coding RNAs, gene exons, and OMIM morbid (Table 3.1), all of which have the ability to influence phenotype expression.

We used DAVID(46) to evaluate genes impacted by CNVRs for functional annotation clustering by searching through Gene Ontology, INTERPRO and several other functional databases. We observed functional enrichment of deletion CNVR impacting several gene classes, including secreted proteins, growth factor mediators, molecules involved with

regulation of protein kinase cascade, regulation of protein amino acid phosphorylation,and tumor necrosis factor-like molecules. In contrast,

we observed significant functional enrichment of duplication CNVR in molecules

Table 3.1. Impact of CNVR Loci on Functional Elements at the Genome-Wide Level

involved with negative regulation of signal transduction, negative regulation of cell communication, phosphoprotein, DNA binding, as well as in several sequence variants affecting diversity of adult human height, or largely opposing effects to those of the deletion CNVRs. For homozygous deletion CNVRs, we observed significant enrichment for gene classes involving intermediate filament protein and cytoskeletal keratin

molecules. The CNV enriched regions of most interest included Coil 1A, Coil 1B, Coil 2, Head, Linker 1, Linker 12, Rod, Tail, all of which are fundamentally biologically relevant with respect to disease influence (Sup. Figure 3.6).

GWAS has been a powerful tool in uncovering disease loci and unfolding new biology in hundreds of complex medical disorders; thus, we leveraged the GWAS genotyping data from over 68k individuals to detect copy number variation. CNVs likely complement the

CNVRs RefSeq _genes OMIM _genes conserved Ultra- elements conserved non-coding elements non-coding RNAs Gene Exons OMIM morbid DGV CNV Map Study Freq High Conserved >1% NHGRI GWAS Catalog Loci Deletions 1.11 1.13 0.92 0.67 2.47 1.18 2.24 1.41 0.44 1.60 Loci Duplications 1.10 1.13 0.87 0.60 2.68 1.17 2.19 1.42 0.27 1.40 Loci CN=0 Deletions 0.97 0.98 0.96 0.95 4.00 1.04 7.00 1.33 1.67 3.87 Genes Deletions 1.29 1.07 1.59 0.63 1.70 0.36 1.51 2.14 0.31 1.73 Genes Duplications 1.41 0.91 1.70 0.46 1.48 0.09 1.56 2.24 0.22 6.12 Genes CN=0 Deletions 0.96 1.32 0.88 1.17 5.00 1.14 8.00 2.00 2.15 10.82

genetic burden of many genes identified by genotype association. Among 5,378 CNVR loci uncovered, 1,409 resided in GWAS regions associating with one or more complex OMIM disease traits (Sup. Table 3.9). Moreover, 28% of deletions, 34% of duplications and 39% of homozygous deletions overlapped significant GWAS signals at P<5x10-8. For comparison, we generated random SNP seeded CNVR windows of equal number and size to the observed CNVRs to model the null distribution resulting in 17% deletions, 24% duplications, 10% homozygous deletions overlapping reported GWAS signals at p<5x10-8, resulting in p=3.96x10-38 for deletions, p=5.94x10-15 for duplications and p=1.31x10-47 for homozygous deletions (p=4.56x10-78 combined) in favor of CNV enrichment for GWAS loci. Co-localization of CNVs with GWAS genomic regions is significantly above expectations, suggesting complementary genetic mechanisms perturbing disease genes through both common and rare variants that co-exist at GWAS loci.

There are several genomic regions in the human genome that are unstable and hard to characterize. The reasons for this vary but in general, these regions are highly duplicated, polymorphically inverted, contain assembly sequence gaps, or may be flanked by

segmental duplications of variable copy number. All of these features are being increasingly observed in CNV regions of the human genome and their biological implications are likely to unfold in the near future. Genotype calls in regions of CNVs characterized by homozygous deletions result in random genotyping since there is no DNA template to bind. Mendelian discrepancies in families are more often observed in deletions and Hardy–Weinberg disequilibrium regions, whereas no call SNP genotypes

are more often observed in duplications at the population level. The latter can also flag CNVs based on a region of genotypes (172).

Due to the design of the Illumina SNP-array platform, common CNVs are poorly

captured as SNPs are omitted from the array that resides in such regions. The platform’s SNP tagging approach is based on linkage disequilibrium (LD), which is a measure of correlation between markers. When occurring in LD regions, SNP genotype studies have the power to tag and associate CNVs with the trait under study. When the LD between any two variations (r2) is close to 1, then either variation can be typed and the other inferred by the tagging approach. We calculated LD between each of the 48 common CNVRs we detected with frequency >5%. CNV tagging by SNP genotypes was poor with only 5 r2 values exceeding 0.8. Loci showing r2 of 0.6-0.8 accounted for 5 CNVRs. Loci showing r2 of 0.3-0.6 accounted for 11 CNVRs. Loci showing r2> 0.1 accounted for 32 CNVRs. Thus, only 10% of CNV events could be effectively tagged by SNP

genotypes in the surrounding region (Sup. Table 3.10). Since the CNV events dominantly captured by the platform are relatively rare (<1% population frequency) for the majority of loci while SNP genotypes are typically common (>1% population frequency) the common GWAS SNPs have diminished ability to tag rare CNVs. Therefore, these CNVs are rare events rather than copy number polymorphisms (CNPs) which could be more amenable to SNP genotype tagging. This underscores the value of CNV detection in addition to SNP genotype association to reveal novel insights into disease pathogenesis, as these are independent variants.

The recent Wellcome Trust Case Control Consortium (WTCCC) CNV study typed 19,000 individuals on targeted Agilent Comparative Genomic Hybridization (CGH) uncovering 3,432 polymorphic common CNVs(39). However, a study of association of CNVs with disease revealed the same exact loci as the previously done SNP genotype GWAS (2), suggesting that analysis of common CNV may be somewhat redundant to SNP genotyping. Logically, it follows that rare CNV association may reveal novel disease association loci. Comparing the regions with >5% CNV occurrence in the current study with those reported by WTCCC, 16/29 deletions agree while 2/5 duplications agree for an overall concordance rate of 51% (Sup. Table 3.11). After reviewing the clustering of probes underlying these regions we conclude that the discordant calls are most likely due to incorrect or biased cluster definition due to high CNV frequency, leading to ambiguity of the diploid cluster based on the intensity only CGH array used by WTCCC. Thus, the apparent lack of overlap with the previous WTCCC study (39) results from the fundamental difference between the platforms used, where our focus is on rare recurrent CNVs which is tailored for the Illumina platform used, and that of the WTCCC is tailored towards common CNPs, with the two having little in common and yielding

complementary findings.

In document Proyecto de vida (página 43-51)