Transient promoter/reporter gene transfection experiments have demonstrated that constructs containing the Cdx binding elements in each orientation (Figure 4.14) gave comparable levels of CAT activity (except for pC P 2-C A T ) indicating that the Cdx elements function in an orientation
independent manner, and therefore act as enhancer elements in the colon promoter of the C A l gene. This finding is in line with a recent report by Levy
et a l (1996) who showed that Cdx2 binding sites from the SI gene are able to activate transcription of either the SI promoter or heterologous promoters when placed at a distance from the promoter and in either orientation. However, transient transfection of H oxa-7 promoter/reporter constructs containing two Cdx binding sites in the reverse orientation has shown that there is no transcriptional activation by C dxl (Subramanian et a l , 1995).
6 .3 Do caudal proteins form heterodim ers? Im plications for C A l
Protein/protein interaction between homeodomain proteins to form dimers may occur and may involve specific cysteine residues. Three cysteine residues occur in hCdx2, none of which are within the homeodomain (Figure 5.7). Two of these cysteine residues are unique to the Cdx2 proteins, one lies in the conserved hexapeptide region (Figure 5.6; Table 5.1) and the other towards the amino terminus of each protein (Figure 5.6). Cdx2 has been shown to bind as a homodimer to the Cdx binding elements in the human SI
promoter under reducing conditions (Suh et a l, 1994). This contrasts with the HoxB5 protein which contains a cysteine residue within its homeodomain but which functions in co-operative DNA binding only when the protein is oxidised (Galang et a l, 1993).
A recent report implies that C dxl, Cdx2 and Cdx4 are able to form dimers (Taylor & Traber, 1996). It seems reasonable to suggest that like many other transcription factor families, the different Cdx proteins may interact with one another to form heterodimers, thus increasing the flexibility of gene
regulation. This idea is of particular significance for the regulation of
intestine-specific genes since both C dxl and Cdx2 are expressed throughout the intestine (James & Kazenwadel, 1991; James e ta l , 1994). The different levels of expression of Cdxl and C dxl along the cephalocaudal axis (James & Kazenwadel, 1991; James et a l, 1994) could possibly lead to the active protein complex being made up of varying proportions of heterodimer along the intestinal axis. This might be important for the differentiation process and in gene regulation if Cdxl and Cdx2 differ in their affinities for DNA elements
and in their efficiency of gene transactivation. Given that the expression profile of C dxl in the adult colon is rather similar to that of Cdx2 and C A l,
the possibility that C dxl might also be involved in the regulation of C A l in the colon epithelium needs to be addressed. In addition, the prospect that C dxl and Cdx2 could form functional heterodimers to combinatorily modulate C A l gene expression needs to be explored.
The observation that different members of the same transcription factor family can bind to DNA as homodimers or heterodimers is well documented. For example, the fos and jun leucine zipper proteins bind to the TPA (12-0- tetradecanoylphorbol-13-acetate)-responsive element with a higher affinity as a heterodimer complex than the jun protein as a homodimer complex
(Kouzarides & Ziff, 1988). These data show that differences in the stability of dimérisation exist and in addition it is likely that the binding specificity of the homodimer or heterodimer complex differs. The presence in a cell of a number of distinct complexes that are able to bind to a particular DNA sequence element suggests that competition for binding also plays a role in regulating gene expression. For example, M af family members also form homodimers and heterodimers with each other to activate transcription. Interestingly, the small Mafs function as repressors of a - and p-globin gene transcription when bound as homodimers (Kataoka et al., 1995) but as activators when bound as heterodimers with the p45NF-E2 erythroid-specific factor (Igarashi et al.,
1994). Another family of nuclear proteins with a different DNA
binding/dimerisation motif are the basic helix-loop-helix (bHLH) family. The motif was initially described in two proteins E l 2 and E47 (Murre et al., 1989). Both of these proteins bind specifically to a DNA sequence (k E 2 ) within the enhancer of the immunoglobulin kappa chain gene, but with different
affinities, such that E47 has a greater affinity for k E 2 and is capable of binding to this as a homodimer. These observations emphasise the significance of protein/protein interactions in the regulation of gene
expression as well as the importance of understanding whether different Cdx proteins interact with one another by binding to similar DNA elements to regulate gene expression in different ways and at different times.
6 .4 CP2-m ediated transcriptional repression
In in vitro studies the C A l CP2 sequence leads to a total loss o f gene activity when attached in its natural orientation to a heterologous promoter. This apparent repressor activity may have no biological significance, since there is no evidence to suggest that this sequence is functional in the natural gene in vivo, nor in artificial constructs in which the CP2 sequence is placed in its natural position upstream of the CPI sequence (Figures 4.14 and 4.19).
The cfj-acting sequence responsible for repression must be orientation dependent since repression is not seen when CP2 is transfected in its reverse orientation (see sections 4.2.1.1 and 4.2.1.2). The position of the Cdx element in the pC P 2-C A T construct differs from that in the natural C A l gene. The element in pC P2-C A T is 175 bp upstream from the SV40 TATA box, whereas in the natural CAl gene it is 140bp upstream from the C A l TATA motif. It is conceivable that the different arrangement may affect the transcriptional activation.
An AT-rich sequence is present (5'-TA TA A A A -3'; Figure 3.18) 29bp upstream from the Cdx motif in CP2. This element is identical to the TATA m otif in the C A l gene and might be involved in the repression of
transcription. It is possible that in the artificial pC P2-C A T construct this element competes with the natural TATA motif for the binding of the basal transcription apparatus. However, the sequence downstream of this possible TATA box shows little similarity to the general form, Py2CAPyg, which would indicate the presence of a transcription initiation site, thus, while this motif might bind transcription factors it may not supplant the correct TATA box in the initiation of transcription.
6 .5 D N asel hypersensitive sites exist upstream of the C A l colon prom oter
The finding that Cdx2 is expressed in the small and large intestine, while C A l expression is specific to the large intestine, implies that negative regulatory factors must be involved in the suppression of C A l transcription in the small intestine. These factors remain to be identified but clues to the position of additional binding elements are suggested by the presence of
DNasel hypersensitive sites (DHS) in more distal promoter sequences. Apart from DHS 6c, which was the starting point of this investigation, other DHS sites have been identified within the large first intron of the C A l gene and present targets for further investigation. DHS 5c is colon-specific (Figure 1.5) and situated some 8kb upstream from exon 1 and 26kb downstream of exon
la. Since DHS 5c is found solely in colon epithelial cells it may mark regulatory sequences involved in defining the colon-specific regulation of
C A L
Two other sites, DHS lec and DHS 2ec, are situated approximately 27.5kb and 26.5kb upstream of exon 1, respectively and 7kb and 8kb downstream of exon la (see Figure 1.5). The role of sequences in the region of DHS lec and DHS 2ec is interesting since these sites are found in both the colon and erythroid promoters when C Al is actively transcribed but do not occur in non-C Al expressing cells. It is probable that a complex form of gene regulation exists for C A L one aspect of which involves interactions which impose the mutually exclusive function of the two tissue-specific promoters. Factors involved in this regulation may be either ubiquitous or tissue-specific and may act either positively or negatively.
There is some evidence to suggest that one or more members of the GATA family of transcription factors might be regulating the C Al gene, either in a positive, or a negative fashion. The GATA family of transcription factors have a highly conserved DNA-binding domain consisting of two zinc fingers and a basic region (Martin & Orkin, 1990). Six members of this gene family have been identified. GATA-L GATA-2 and GATA-3 are expressed in an overlapping pattern in the haematopoietic cell lineages while GATA-4, GATA- 5 and GATA-6 are expressed primarily in the heart and gut (Simon, 1995; Laverriere et a l, 1994). Laverriere and co-workers (1994) have found that the expression of GATA-4, -5 and -6 in the chicken appears in the colon during embryogenesis, however this expression is not maintained during adult life. The finding that GATA-6 is expressed in the heart and small
intestine of adult mice has been confirmed by Morrisey et al. (1996), but this study did not include an analysis of colon expression.
Several potential binding elements for the GATA family of
transcription factors occur in the C A l colon promoter, notably in the CPI (-78), CP2 (-1 7 9 ) and CP5 (-4 1 7 and -4 7 0 ) regions. The CPI and CP2 GATA sites are fairly well conserved between man and mouse, each differing by only one nucleotide (Figure 3.18). Each of these C A l promoter fragments binds an erythroid-specific factor (CABl) with a very similar mobility in EMSAs (Figures 3.2 and 3.3). While this factor has not been identified it is tempting to suggest that GATA may play a role in the regulation of the C A l
gene via the colon promoter. In vertebrates the GATA motif (A/T GATA A/G) is recognised by the GATA family of zinc finger proteins and these sites have been found in the promoters of a large number of erythroid and non-erythroid genes. One scenario might be that GATA-4, -5 and -6, which are expressed in the small intestine, may interact with the C A l colon promoter to repress transcriptional activation in the small intestine. Interaction with GATA factors might also provide an explanation for the suppression of colon- specific expression of C Al in erythroid cells where GATA-1, -2 and -3 would carry out a similar repressive function.
GATA binding sites also occur in the promoters of the SI (-7 1 ) and
LPH (-6 1 and - 9 6 ) genes and it is interesting to note that the relative
distance of these sites from the transcription start sites is similar to that found for C Al (Figure 6.1). GATA factors have recently been a major focus of attention in the study of gene regulation in the intestine and it is now known that these genes play a role in the regulation of the SI and LPH genes in the small intestine. Work on the SI promoter has shown that the -7 1 GATA site is involved in the transcriptional activation of this gene and that mutation of this element leads to the loss of gene activation (Silberg & Traber, 1997). This site has been shown to bind GATA-4 in EMSAs, although in transfection studies neither GATA-4, -5 or -6 are able to transactivate gene expression via this element. In contrast, GATA-6 binds to and transactivates expression of the LPH gene via the -6 1 GATA element and mutation of this element results in reduced transcriptional activation (FitzGerald et al., 1997).
The GATA elements are also able to function as negative cw-acting elements involved in repressing transcription. For example, GATA-1 has been
-1 7 9 -1 6 8 GATA Cdx CAl -1 1 4 Cdx - 7 8 GATA
OZ) C
3 0 - 2 8 TATA +25 HNF-1 +77 GATAo
-1 7 5 HNF-1 SI —88 —71 _C4 HNF-1 GATA - 2 7 TATA■CXJ
L P H - 8 5 HNF-1 -6 1 G A TA - 5 3 _ 32 nF ig u re 6.1 Model o f the carbonic anhydrase 1 ( CAl ), sucrase-isomaltase (SI) and lactase-phlorizin hydrolase (LPH) promoters and their D N A binding elem ents adjacent to the transcription starts sites. The D N A regulatory elements and their cognate D N A binding proteins are shown. Arrows indicate the transcription start site and numbers the position o f the D N A elem ents relative to the start site.
found to suppress transcription of the human epsilon-giobin gene in adult erythroid cells (Raich et al., 1995), and the rat serine dehydratase gene contains GATA-like sequences which function to suppress transcription of this gene in fetal hepatocytes (Noda et a l, 1994). The importance of GATA elements in the promoters of intestinal genes is highlighted by the finding that in Caenorhabditis elegans such elements are required for the activation of intestine-specific genes, such as the intestine-specific ges-1 gene (Stroeher et al., 1994). The erythroid-like transcription factor (ELT-1) is expressed in the intestine of C. elegans and has been shown to activate transcription of a reporter gene via GATA binding sites (Shim et al., 1995) and therefore is likely to be involved in intestine-specific regulation via these GATA sites.
The promoters of the C A l, SI and LPH genes also share binding elements for the hepatocyte nuclear factor 1 (HNF-1) transcription factor. HNF-1 was first thought to be a liver-specific transcription factor but is now known to be expressed in the intestine (Mendel & Crabtree, 1991). Two HNF-1 factors designated H N F l-a and H N F-lp are examples of DNA-binding proteins that contain both a divergent homeodomain and sequence motifs similar to the POU proteins (Mendel & Crabtree, 1991). The HNF-1 sites in the C A l and SI promoters are fairly well conserved and differ by one or two nucleotides between man and mouse and the site in the LPH promoter differs by one nucleotide between man and pig. The C A l colon promoter contains an HNF-1 site approximately 52 nucleotides downstream of the TATA box (+25; Figure 6.1). Whether or not this sequence binds H N F l-a or HNF-1 P remains to be determined. Interestingly, the SI gene promoter contains two HNF-1 elements, (Figure 6.1) designated SIF2 (-7 3 to - 8 8 ) and SIF3 (-1 5 6 to -1 7 5 ). Both H N F l-a and H N F-lp bind to the SI promoter but with different affinities and only HNF-1 a activates transcription through these elements (Wu et a l, 1994). The conservation of these promoter elements between species and their positions close to the TATA boxes indicates their
importance for transcription. An HNF-1 binding site similar to the SIF3 site in the SI promoter has been described in the third intron of the apoB gene. This HNF-1 element is situated approximately 400bp downstream from the B SIFl Cdx binding site and the binding of H N F l-a to this element also represses transcription of apoB in Caco-2 intestinal cells (Lee et a l, 1996).
6 .6 Cdx2 binds to TATA motifs
Transient transfection experiments have shown that Cdx2 is capable of binding to the SV40 gene promoter resulting in the transactivation of the CAT reporter gene (see section 4.2.1.1 and 4.2.1.2; Figures 4.15 and 4.19). The SV40 promoter sequence contains a Cdx2 binding site encompassing the SV40 TATA box and EMSA competition assays have demonstrated that an oligonucleotide encompassing this site is capable of competing with the C A l
C P 017 Cdx2 m otif for the binding of Cdx2 (Figure 4.23).
The Cdx2 protein also binds to the TATA box of the calbindin-D9k
(CaBP9k) gene which is expressed at high levels in the proximal small
intestine (Lambert et aL, 1996). Cotransfection of a Cdx2 expression vector into cells which do not support cell-type specific transcription of the CaBP9k
gene results in marked repression of basal transcription from a minimal
CaBP9k promoter, containing a TATA box which fits the Cdx2 binding site consensus sequence. In contrast, cotransfection of the Cdx2 expression vector into intestinal cells results in an increase in gene activation above the basal levels of transcription. These findings have lead to the idea that Cdx2 might interact with other factors specific to intestinal cells to regulate
expression via the TATA box (Lambert et al., 1996). The mechanism by which this is achieved has not yet been established and this opens up another interesting area for investigation. There are precedents for the direct
involvement of the TATA box in tissue-specific gene expression which might provide suitable models. For example, the |3-globin gene is regulated by the erythroid-specific protein GATA-1 which interacts with the TATA element in the p-globin promoter and ^ a 3' enhancer sequence (approximately 1.9kb downstream of the transcription start site; Fong & Emerson, 1992). Although the exact mechanism of transcriptional activation is not known it is thought that GATA-1 binds to the TATA box and forms a complex with itself or other proteins bound to the enhancer sequence through DNA loop formation (Fong & Emerson, 1992). This structure is then recognised by TFIID (TATA- binding protein; TBP) in combination with adaptor proteins and GATA-1 is displaced from the TATA box to allow transcription to take place. The TBP and GATA-1 proteins do not form a complex together but function separately through the same DNA binding site and can displace each other, depending
on their relative concentrations. Therefore, the tissue-specific regulation of the (3-globin gene requires enhancer-promoter interaction which is mediated in part by GATA-1 bound to the TATA box.
Another example might be the regulation of the growth hormone gene by the pituitary-specific transcription factor, GHF-1, where a cell type-specific promoter element of 15bp, encompassing the TATA box of the growth
hormone gene, has been shown to be important in determining tissue-specific transcription (MacCormick eta l., 1991).
The finding that these genes possess a specialised TATA box capable of interacting with other cw-acting sequences determining tissue specificity is intriguing since EMSA studies have shown that the TATA box of the C A l
gene is capable of binding Cdx2 (Figure 4.24). W hether this binding might be involved in the tissue-specific expression o f C A l needs to be clarified but it is possible that sequences marked by the colon-specific hypersensitive site DHS 5c (Figure 1.5) may interact with this TATA box sequence in a similar manner.
6 .7 Recent advances in C dx
6.7.1 T he cloning of h u m an CDX2
In this thesis the cloning and sequencing of the human CDX2
(hCDX2) cDNA has been described. It has very recently emerged that the
hCDX2 cDNA has also been cloned and sequenced independently by Mallo and colleagues (Mallo et a l, 1997). This sequence has only 70bp of 5 ' UTR (94% similar to the \iCDX2 sequence shown in Appendix 1, while the human cDNA described in this thesis has an extra 234bp of 5'-untranslated sequence plus 56bp of 5'-flanking sequence. Nucleotide sequence extending 83bp