3.2 ESTUDIO DE TRÁFICO
3.2.3 CARACTERÍSTICAS DE TRÁFICO
Recently the U.S. Human Genome Project announced revised five-year research goals which include the development of efficient methods for the identification of genes and for the placement of known genes on physical maps or sequenced stretches of DNA (Collins and Galas, 1993). The observations made in this thesis in Chapters 3, 5 and 6 might suggest that splice sites are not suitable targets for the identification of coding regions using short oligos, since similar sequences seem to occur in non-coding regions. It may be well worth trying DOP-PCR using other consensus sites associated with coding regions. Other methods for isolating coding regions are probably now more appropriate. Exon trapping systems are now commercially available although many of the reservations expressed in Chapter 1 still apply. Another method that involves isolating human transcripts from human/rodent somatic cell hybrids can scan entire chromosomes or microdissected chromosome fragments but generally only constitutively expressed genes are isolated (Liu et al, 1989; Corbo et al, 1990). Direct screening of cDNA libraries with YACs enables fragments of several hundred Kb in length to be scanned at once (Wallace et al, 1990; Elvin et al, 1990). Recent advances have been made in producing normalised cDNA libraries which enrich for low copy cDNAs and deplete high copy cDNAs such as those encoding housekeeping genes (Patanjali et al, 1991). Such
libraries can be pooled from many tissue sources at many different stages of development in order to create a library representing all possible transcribed sequences. However, because of the complexity of the probes, the method produces high background signal, resulting in low reproducability. Another approach has used immobilized YACs to enrich for cDNAs (Lovett et al, 1991; Parimoo et al, 1991). This strategy has more recently been modified by biotinylating the human genomic DNA, having first digested it and added linker/primers for PCR amplification and hybridizing with cDNAs in solution (after preblocking repeat sequences) and then capturing the hybridized duplexes using streptavidin-coated magnetic beads (Tagle et al, 1992; Morgan et al, 1992; Korn et al, 1992; Tagle et al, 1993a). This technique was used to screen a pool of 6 YACs spanning the Huntington’s Disease candidate region (Tagle et al, 1993b). Low abundance cDNAs from this region were enriched several thousand fold. One drawback is that either a full-length library must be screened with the resulting small inserts in order to get full-length transcripts or the eluted cDNAs captured must be preserved by cloning and the resulting clones sequenced.
The magnetic bead cDNA capture method constitutes a very powerful technique and if used with a pool of cDNA libraries representing all possible transcripts and with effective blocking of all repeat regions, could be used to identify all coding sequences within a given genomic region. Since the method is PCR based, it could also be extended to larger regions such as microdissected chromosomes or whole chromosomes. The approach is preferable to the Expressed Tagged Site approach of partially sequencing cDNAs from a library, for instance a brain library as carried out by Adams et al (1992) and then mapping the sequences. The magnetic bead method could be used for scanning brain cDNA libraries for each Y AC in turn across areas on chromosomes of interest.
8.5 THE HUMAN GENOME PROJECT SEQUENCING STRATEGIES
As mentioned above the U.S. Human Genome Project have recently revised their five-year research goals. These include: (1) The development of efficient approaches to sequencing one- to-several Mb regions of DNA of high biological interest. (2) The development of technology for high throughput sequencing, focusing on systems which integrate all steps from template preparation to data analysis. (3) The development of sequencing capacity to allow sequencing at a collective rate of 50 Mb per year (Collins and Galas, 1993). Ideas for new sequencing technologies for large volume sequencing of the human genome are discussed in Chapter 7. Sequencing from octamers, which has been demonstrated as feasible using a PCR based approach here, would be very useful for bulk sequencing projects, since a library of all possible octamers (65,536) could easily be made. However the superior technology has now been developed whereby a library of hex amers (4096) is capable of sequencing any template (Kieleczawa et al, 1992). Correct sequencing requires the selection o f three or four hexamers that will align on the template adjacent to each other. Addition of single stranded binding protein at the correct concentration prevents priming from a single hexamer or just two adjacent primers. Thus priming should be completely specific to the selected site. On an uncharacterized cosmid clone the procedure would be performed in a random manner, using pools of hexamers, until a proportion of the cosmid had been sequenced (20-25%, Studier, 1989) and then directed sequencing would be used to fill in the gaps. The method has already been adapted for use with "dye-deoxy" sequencing (Hou and Smith, 1993) compatible with the Applied Biosystems automated system. It seems that a computerised mass sequencing strategy using such approaches cannot be far off. A similar project has been developed in parallel to that of Kieleczawa et al (1992) involving libraries of hexamers or pentamers, stacking these "modular” primers along the template in the same way, but without the use of single stranded binding protein (Kotler et al, 1993). Another approach to large volume sequencing , known as the "Janus Strategy", has been developed and involves subcloning
DNA into Janus, an M13 vector, that will allow the sequencing of both strands from a single stranded template (Burtland et al, 1993). Adoption of these or similar techniques should bring the cost of a genome sequencing project down below $0.50 per base, in accordance with one of the original U.S. Human Genome Project short term goals.
SUMMARY
The hypotheses tested in this thesis have demonstrated the following
1. Short GC-rich primers could be used for PCR amplification and when human genomic DNA was used as template the products, when cloned and sequenced, were enriched 66-fold for non-CpG depleted sequences. However the length of sequences was not enough to prove that they had originated from CpG islands, nor was any evidence of the méthylation status of the source DNA provided and no evidence was shown to suggest that the PCR product was derived from human DNA rather than from contamination by prokaryotic genomes. More powerful methods of isolating CpG islands have since been developed (Cross et al, 1994). 2. Short GC primers when used less stringently could not distinguish between the CpG islands of cloned genes and the non-CpG depleted vector genomes. Conventional methods for identifying CpG islands within clones using rare cutter restriction analysis would be appropriate.
3. GC-rich octamers were capable of priming PCR sequencing accurately in only 3 out of 8 instances. Ligation of primer/I inkers to rare cutter sites prior to PCR sequencing could be a mor specific method of sequencing directly into CpG islands within clones, although only limited evidence was shown to support this.
4. Rare-cutter sites can be targetted by hybridization with 8-mer oligos capable of discriminating 7/8 and 8/8 complementarity. This method has been used to screen large numbers of clones (Estivill and Williamson, 1987; Melmer and Buchwald, 1990).
high rate of false positive hybridization. Results using PCR with such oligos were inappropriate for analysis.
6 . Amplification using DOP-PCR for targetting splice sites also had low success rate, partly due to the high occurrence of splice site-like sequences within the clones. Methods such as exon trapping or screening normalized cDNA libraries with clones would be the preferred method for detecting coding regions.
APPENDIX A: M13 CLONES OF PCR USING SHORT GC-OLIGOS ON HUMAN GENOMIC