Table 7 shows the significance and estimated locations for the CDH3/CDH1 and IRF8 windows. The original analysis of the WTCCC data had shown that both windows were significantly associated with CD (110-8 and 610-9, respectively). Neither of these signals were however initially replicated with the NIDDK GWA scan of the full data-set but showed significant association when
information provided by the NIDDK IBDGC. This subset included patients with ileal CD who had involvement of at least one extra-ileal intestinal location, i.e.
jejunal, colorectal or perianal. The signal near CDH3/CDH1 was replicated using this subset of the NIDDK data despite the much smaller number of cases (Table 7). For both GWA scans, the estimated location Ŝ for the former window was between CDH3 and CDH1, within an LDU block that spans 65 kb (Figure 8). The causal locus could be anywhere within that block, which encompasses the 3' region of CDH3 through to CDH1 intron 2 (Figure 8), but it also includes a functional promoter SNP rs16260 (Li et al., 2000) (Figure 8) previously associated with Irritable Bowel Syndrome (IBS).
Figure 8. Localisation within the CDH3/CDH1 region.
The LD map of the region is illustrated by the black line, which is obtained by plotting HapMap LDUs (Y axis) against kb (X axis).
The red vertical arrow is the estimated location Ŝ for both data sets but because of the very long LD block, the causal location(s) could reside anywhere in this block. The rs16260 is a functional SNP within this block, which is located 365 nucleotides upstream of the transcription start site for CDH1. NIDDK Ileal+ data: ileal CD with involvement of at least one extra-ileal intestinal location. The blue horizontal dashed line shows the 95% Confidence Interval for the WTCCC CD dataset. The purple horizontal dashed line represents the 95% Confidence Interval for the NIDDK Combined Ileal+ data. The orange triangles represent the SNPs on the Affymetrix 500K genotyping array used in the WTCCC CD study. The blue diamonds represent the SNPs on the Affymetrix 500K genotyping array used in the WTCCC study. The orange triangles represent the SNPs on the Illumina HumanHap 300v.1 genotyping array used in the NIDDK study.
For the IRF8 window, the WTCCC data yielded an Ŝ location 29 kb downstream of the gene, within a small block that is flanked by LD breakdown (Figure 9). The analysis of the NIDDK data for patients with any extra-ileal intestinal involvement showed a signal, which is 1.7 kb telomeric to IRF8 within a region of LD breakdown (Figure 9).
Table 7. Association statistics and estimated location of the causal variation for two different windows covering CDH3 (68,678-68,733 kb), CDH1 (68,711-68,869 kb) and IRF8 (85,933-85,956 kb) in relation to the
*Ileal CD with involvement of at least one other extra-ileal intestinal location. §432 Jewish and 515 non-Jewish controls were used for these analyses.†A window χ2 using composite likelihood. Note that the 95% CI for IRF8 for the UK and North American data are non-overlapping suggesting heterogeneity of location of the causative change, i.e.: allelic heterogeneity.
Figure 9. Localisation within the IRF8 region.
The LD map of the region is illustrated by the black line, which is obtained by plotting HapMap LDUs (Y axis) against kb (X axis).
The red vertical arrow is the estimated location Ŝ for both data sets NIDDK Ileal+ data: ileal CD with involvement of at least one extra-ileal intestinal location. Two SNPs have been identified for UC and MS in GWA meta-analyses.The short light blue horizontal dashed line shows the 95% Confidence Interval for the WTCCC CD dataset. The purple horizontal dashed line represents the 95%
Confidence Interval for the NIDDK Jewish Ileal+ data. The green horizontal dashed line represents the 95% Confidence Interval for the NIDDK Non Jewish Ileal+ data. The 95% CIs are non-symmetrical due to the block-step structure of the human genome. The blue diamonds represent the SNPs on the Affymetrix 500K genotyping array used in the WTCCC study. The blue diamonds represent the SNPs on the Affymetrix 500K genotyping array used in the WTCCC study. The orange triangles represent the SNPs on the Illumina HumanHap 300v.1 genotyping array used in the NIDDK study.
These two NIDDK datasets yielded essentially identical Ŝ locations. The 95%
CI for both the Jewish and non-Jewish data included part of the IRF8 gene with an estimated location 24 kb upstream of the WTCCC signal (Table 7, Figure 9), further suggesting allelic heterogeneity.
In this analysis 98 windows were tested. Statistical significance observed in the WTCCC data passed the Bonferroni threshold calculated based on these 98 tests (See Methods Section of this chapter). These signals were replicated using the NIDDK data with its smaller sample size and SNP coverage.
IV. DISCUSSION
Here, using the multi-marker genetic mapping approach on the available WTCCC, three novel significant signals of association with CD were found on chromosome 16q alone, all of which were replicated using the NIDDK GWAS data supporting the presence of genetic heterogeneity on chromosome 16q.
Evidence is presented for the first time that there is an independent involvement of a locus in the vicinity of NOD2, near the neighbouring gene CYLD, The pattern of association shows clear evidence of an independent signal but does not formally show that the CYLD gene is involved. A distant enhancer affecting NOD2 expression could be implicated but other evidence suggests that CYLD itself may be important.
First, in support of this notion, genome-wide cDNA microarray analysis demonstrates that CYLD gene expression is down-regulated in CD (Costello et al., 2005). NOD2 interacts with Nuclear Factor- кB (NF-кB) signalling in a complex way, which includes involvement of ubiquitinylation (Abbott et al., 2004).
Functional studies have so far provided conflicting evidence on the effect that the three NOD2 mutations have with respect to inflammation and bacterial recognition (Eckmann and Karin, 2005). On the one hand, it was reported that upon bacterial LPS stimulation, the three mutant NOD2 alleles result in the
ablation of the NOD2 Peptidoglycan-sensing activities and the failure to activate NF-кB (Bonen et al., 2003). On the other hand, a mouse-model of the frameshift 3020insC mutation showed increased levels of the cytokine IL-1β, which is a pro-inflammatory cytokine, after stimulation with MDP (Maeda et al., 2005). It is of interest that CYLD is a deubiquitinating (removal of ubiquitin) enzyme which has been shown to regulate cell proliferation, cell survival and inflammatory responses (Brummelkamp et al., 2003) and is also involved in NF-кB signalling.
Dysregulation of NF-кB signalling leads to a defective immune system causing an immunodeficient or autoimmune phenotype depending on whether NF-кB function is impaired or persistent (Courtois and Gilmore, 2006).
CYLD is important in immune homoeostasis since it prevents the spontaneous activation of NF-кB in peripheral T and B lymphocytes. The peripheral T-cells from CYLD-deficient mice have increased sensitivity and a heightened response to T-Cell Receptor (TCR) stimulation, which leads to spontaneous inflammation in the colon (Reiley et al., 2007), and colitis-associated tumorigenesis (Zhang et al., 2006). Inflammation is the major underlying phenotype of CD and some Crohn’s Disease patients develop colon cancer at a later stage in life. In this study a causal locus downstream of CYLD was mapped, which lies 2kb upstream of a putative gene regulatory element (Pennacchio et al., 2006) and approximately 7kb upstream of a Leprosy susceptibility GWAS SNP (Zhang et al., 2011). The element is located in an LD block spanning 8.3kb (Figure 7A, B). This regulatory element could potentially be an enhancer for CYLD, especially since CYLD expression has been found to be downregulated in CD. However, the estimated
location of the putative causal agent in this possible regulatory region could also be pointing to a mutation in which acts as a distant element in relation to NOD2.
The lower significance in the NIDDK data, as compared to the WTCCC result, is probably not only due to the difference in sample size but also because the phenotypic classifications were different in the two studies. The WTCCC included any subtype of CD and not just the ileal form of the condition. The 95%
Confidence Interval (CI) for the CYLD window is very narrow for both datasets because the Ŝ is within a region of LD breakdown caused by a recombination hot spot.
Unfortunately, the data analysed in this study did not contain the actual NOD2 mutations and the results are therefore based on data stratified according to a SNP, which is in complete LD with the three common NOD2 mutations.
Subsequent to publishing the Chromosome 16q study (Elding et al., 2011), a meta-analysis on Leprosy (Zhang et al., 2011) identified a susceptibility GWAS SNP closely located to the CYLD signals identified in this Chapter using the LDU multi-marker mapping approach. This is of interest since both CD and Leprosy display common immunologic characteristics, which include a Th1-cell response with granuloma formation (Zhang et al., 2009). It is also suggested that some CD cases might have a common mycobacterial cause as seen in Leprosy (Schurr and Gros, 2009). Additionally, the ENCODE (ENCyclopaedia Of DNA Elements) Consortium subsequently published a comprehensive list of all the DNA
sites and their respective Transcription Factors as well as DNAse I Hypersensitive site, which are sites that are commonly found in transcriptionally active genes. It was interesting to find that the estimated location of the putative causal agent and the 95 % CI for the CYLD signal that was identified in this Chapter contained several transcription factor binding sites. Indeed the estimated location of the putative causal agent mapped precisely upstream of a binding site for the transcription factor CEBPB, in a DNase I Hypersensitive Site. The gene that codes for this transcription factor is essential in the regulation of genes that are implicated in inflammatory and immune responses (Boi et al., 2012; Mayer et al., 2013). The IBDase FP7 European consortium performed an independent conditional analysis where they stratified patients based on the actual NOD2 mutations and subsequently identified the involvement of CYLD independently from that of NOD2 in patients that do not carry the common NOD2 mutations (I.
Cleynen, 2011), which to an extent re-enforces the findings presented here. demonstrating that considering phenotypic heterogeneity is clearly important.
The second region covers the genes CDH3 and CDH1 encoding cadherin proteins that participate in cell recognition, signalling, morphogenesis and tumour progression. CDH1 encodes an epithelial cadherin (E-Cadherin) expressed in the intestine with essential functions in intestinal homeostasis (Schneider et al.,
2010). The loss of E-cadherin expression leads to apoptosis and shedding of cells and to disruption of the maturation of Paneth and Goblet cells, important in the innate immune system and microbial defence (Schneider et al., 2010). E-cadherin helps to maintain the intestinal epithelial defence system, and reduced CDH1 expression is a feature of CD and UC patients with an inflamed intestinal epithelium (Gassler et al., 2001). A genome-wide linkage analysis on CD reported evidence of linkage on 16q, some 25cM downstream of NOD2, in families that did not carry the NOD2 mutations (van Heel et al., 2003). Notably, CDH3 and CDH1 are at that linkage peak, approximately 25Mb downstream of NOD2.
However, GWA studies and meta-analyses on CD did not detect these genes despite the large number of samples and imputation based on the latest HapMap samples. A GWA study on Ulcerative Colitis on the other hand, did report CDH1 as a new susceptibility locus (Barrett et al., 2009), with the most significant SNP for UC 180kb upstream of the gene, which will be further discussed in Chapter IV.
Although CDH1 has not been reported for CD by others to date, this analysis demonstrates that the most likely location of the causal variant is within an interval that is flanked by CDH3 and CDH1. This interval is an LD block that includes a functional promoter variant (rs16260; Figure 8) (Li et al., 2000) that is also associated with IBS (Villani et al., 2010), a chronic inflammatory condition involving recurrent abdominal discomfort. CDH1 was also detected in a GWA meta-analysis for colorectal cancer and the reported SNPs fall within our confidence interval (Houlston et al., 2008). CDH3, which encodes
Placental-Cadherin (P-cadherin), is also implicated in colorectal carcinomas (Milicic et al., 2008).
The third region became statistically significant after separating the data further into Jewish and non-Jewish patients and this effect shows the importance of considering heterogeneity of ancestry. The third region covers the IRF8 gene, which encodes the transcription factor also known as Interferon Consensus Sequence-Binding Protein, which plays a negative regulatory role in cells of the immune system. Using a subset of the NIDDK data (Jewish patients with ileal CD plus additional extra-ileal involvement) yielded an estimated causal location 24 kb away from the estimate obtained using the WTCCC data. The difference in localisation may reflect allelic heterogeneity.
The significant association in the IRF8 region on chromosome 16q was originally identified in a meta-analysis on Multiple Sclerosis (MS) (De Jager et al., 2009), which was shortly followed by a meta-analysis that identified an association between the region harbouring IRF8 and UC (Anderson et al., 2011), the other sub-type of IBD. However, the latest association studies on CD failed to reveal any CD-specific associations on Chromosome 16q other than NOD2.
Thus, the analysis presented here, together with other GWASs, confirm the continually emerging evidence for the presence of shared loci among different immune and inflammatory-mediated complex diseases. IRF8 has been additionally implicated in Systemic Lupus Erythematosus (SLE) (Lessard et al., 2012) as well as MS (De Jager et al., 2009), UC (Anderson et al., 2011) and CD
(Elding et al., 2011) (Figure 9). This greatly suggests the implication of shared pathways between these diseases, and hence calls in the need for pathway analysis in complex diseases. IRF8 expression is crucial in bone metabolism, with IRF8-deficient mice displaying extensive osteoporosis, one of the extra-intestinal manifestations associated with CD (Zhao et al., 2009).
This analysis provided a novel insight into the genetic and phenotypic heterogeneity of CD and demonstrates that the approach taken in this part of the project may be a promising way forward to study other frequent and complex diseases. Following the results that were obtained from this hypothesis-driven part of this project, the next step was to perform a hypothesis-free GWAS, using the LDU multi-marker mapping approach to the WTCCC and NIDDK CD datasets, which will be explored in the next chapter.