Capítulo 3. El estudio del caso
3.1 La embriaguez y los delitos de homicidio y heridas
3.1.4 El encarcelamiento
Chapter 7
Future work and directions
This research in this thesis has attempted to target rare variation predisposing to CD. The current dataset leads to the conclusion that rare variation does not lead to disease risk in this family-‐based cohort, but future work based on the results here may possibly lead to a different outcome. Below is a list of proposals for a future PhD student:
• Test EPAS1 in a larger case-‐control sample size: since the P value in the 4,608-‐sample dataset was 0.007 it may worth testing this gene in a larger sample size for any significant rare variant disease associations. A custom Taqman genotyping assay containing all EPAS1 coding SNPs is the simplest experiment that will quickly answer this question.
• Resequence CUBN: this gene was too large to incorporate into the Fluidigm targeted resequencing assay. For this, a single-‐gene resequencing experiment in a medium case-‐control sample to begin with, (approximately 500 cases and controls) will test whether there is an excess of rare variation in coeliac cases compared to controls. Fluidigm resequencing technology can still be used here, but a 48-‐plex assay would suffice.
If one was to continue the search for rare variation in CD using familial samples to enrich for disease mutations and to account for familial clustering of disease, another design would be to exome sequence every affected individual from many coeliac pedigrees and compare to a matching population control dataset e.g. the UK10K exome dataset. This would provide a highly annotated dataset of every coding mutation in all sequenced individuals, however it may also lead to some data being discarded due to the sharing of chromosomal regions in families. Additionally, up to thousands of samples may be required to achieve the statistical power required in a complex disease, but in terms of sample design, many different approaches can be applied to achieve the best statistical result. It has been shown that exome sequencing trios and then performing a family-‐based association test may be particularly useful for rare variants, since the sample set would be robust to population stratification and Mendelian errors can be checked to reduce the false positive rate (De, Yip et al. 2013). Furthermore, there is evidence of increased sensitivity to find lower effect sizes
with the use of an enriched trio (one sibling from an ASP) in gene-‐based tests (Preston and Dudbridge 2013 in press). Since the study here utilized a family design and a case-‐control design on candidate genes, it provides a clue that the search for heritability may yield positive results if focused elsewhere. The following section discusses future research in the field of genetics that can be applied to CD, if one was to move away from attempting to locate rare disease variation.
7.1 Further research in the field of coeliac disease genetics
Immunochip findings in CD show that most of the association signals are localized around transcription start sites and 3’ UTR regions (Trynka, Hunt et al. 2011). Additionally, ENCODE findings revealed that most disease variants lie in regulatory regions and significant activity in these areas, including how much of the protein is produced rather than any modification to its structure, prove that there is much more occurring in non-‐coding regions than previously thought (Schaub, Boyle et al. 2012). For further genetic studies in CD, it may be a good idea to revisit findings from GWAS and fine mapping studies and attempt to link variant signals, even those not reaching GWAS significance as these probably fit under the umbrella of undetected loci, with a causal variant. Studies have shown that SNPs associated with common traits are enriched for expression quantitative trait loci (eQTL) (Lango Allen, Estrada et al. 2010; Nica, Montgomery et al. 2010; Nicolae, Gamazon et al. 2010), and even the last CD GWAS study found significant eQTLs in 20/38 non-‐HLA coeliac loci (Dubois, Trynka et al. 2010). The best example is the SORT1 gene associated with plasma LDL concentration, where the associated variant modifies a CEBPB transcription factor binding site located in an enhancer, directly altering the expression of
SORT1 (Musunuru, Rader et al. 2010). Since common trait associated SNPs may
be acting by altering gene regulatory regions, assessing cell subtypes with phenotypic associations might be able to identify true causal variations. The ENCODE project revealed SNPs associated with a disease phenotype were also associated with a specific cell type or transcription factor (Dunham, Kundaje et
al. 2012). A study by Trynka et al supports this finding in a study identifying chromatin marks in cell types (Trynka, Sandor et al. 2013). They show that chromatin peaks overlap with SNPs associated with common traits, e.g. 31 SNPs from RA regions overlap with chromatin marks in CD4+ regulatory T cells. Their findings highlight that cell type specific chromatin marks associated with phenotype can identify causal cell types. Looking deeper into immune cell subtypes in CD associated loci may therefore be the next step to further elucidate specific causal pathways.
Methods for single-‐cell analysis can be applied to enable deeper resolution of cell types. Methods published in the past have employed whole-‐genome amplification (WGA) of single cells (Zhang, Cui et al. 1992) and degenerate oligonucleotide PCR-‐based methods, but this technique generates short products not useful for many applications (Telenius, Carter et al. 1992). Multiple displacement amplification using hexamer primers and Phi 29 DNA polymerase generates much larger products (<10Kb) (Dean, Nelson et al. 2001) and is used for genotyping SNPs on Illumina chips, for example (Barker, Hansen et al. 2004). New methodologies are continuously being published to increase coverage required for single cell sequencing. A recent study reported a new WGA method named MALBAC, eliminating amplification bias associated with previous WGA methods (Zong, Lu et al. 2012). The authors designed primers to anneal randomly to single-‐cell DNA molecules, performed PCR with a DNA polymerase with displacement activity to create semi-‐amplicons, and then used these as templates to produce full amplicons (Figure 7.1). With this technique, they were able to identify SNVs from MALBAC-‐amplicons with no false positives and measure mutation rates of cancer cell lines.
Figure 7.1: MALBAC single-‐cell WGA to decrease amplification bias
MALBAC = multiple annealing and looping-‐based amplification cycles. Taken from Zong, Lu et al. 2012.
Now, advances in NGS have enabled direct analysis of single cell genomes. A recently published study applied single-‐cell RNA sequencing in dendritic cells from bone marrows of mice to investigate heterogeneity in the response of these cells to lipopolysaccharide (Shalek, Satija et al. 2013). The study revealed interesting findings surrounding variation across single cells, such as bimodal splicing patterns with one isoform having a distinct function, differential activity in clusters of genes (i.e. in antiviral regulatory genes where co-‐variation in different cell transcripts helped to identify the antiviral cell circuit), and variation in expression patterns reflecting different cell developmental states. If
194 such variation is observed across immune cells, there is further scope in linking disease genotypes to single-‐cell phenotypes.
Commercial companies, such as Fluidigm, have also progressed onto single cell genomics. Fluidigm’s intergrated microfluidics system has been developed for preparation of hundreds of cDNA libraries from single-‐cell samples for mRNA sequencing, enabling single-‐cell gene expression profiling. The technology combines 96 cDNA library preparations in parallel on an array (Figure 7.2). The amplified cDNA samples are then subjected to library preparation for Illumina sequencing. The method has shown to produce high quality sequencing libraries by Fluidigm’s Research and Development group, and also confirmed transcriptional heterogeneity within homogenous cell populations (Shug, Chen et al. 2013). Using this technology to assess single-‐cell expression in CD might detect whether there are specific variations within cells from CD associated immune loci.
Figure 7.2: Fluidigm IFC cell capture illustration
The IFC array performs single–cell cDNA library preparations in tiny compartments. Taken from (Shug, Chen et al. 2013)
To summarize, the points outlined at the start of this chapter can be undertaken for further progression of locating rare variation in CD: EPAS1 and CUBN might hold key genetic variants predisposing to CD risk and are likely candidate genes based on their function and findings in this thesis. If these experiments do not
ERCC 0 10 0.9 0.97 0 10 0.96 0.97 0 10 0.97 0.96 0 10 0.96 0.97 0 10 0.97 0.9 010 0 10 ERCC 0.97 0.95 0.97 0.97 0.96 0.9 0.97 0.97 0.97 ERCC 0.9 0.97 0.96 0.9 0.97 0.97 0.96 0.9 010 0 10 ERCC 0.96 0.96 0.97 0.96 0.97 0.97 0.9 ERCC 0.97 0.96 0.97 0.96 0.97 0 10 0.97 0 10 ERCC 0.97 0.97 0.97 0.97 0.96 ERCC 0.9 0.97 0.97 0 10 0.96 0 10 ERCC 0.97 0.9 0.97 ERCC 0.9 0 10 0.9 0 10 ERCC 0.97 0 10 0 10 0 10 0 10 0 10 0 10 0 10 ERCC a b a b