The “positional candidate” approach combines genetic linkage information with knowledge of the map position of genes. Genes can automatically become candidates for a disorder if they are located within the critical interval for a disease. Those that possess functional features that relate them with the pathophysiology of the disease become favoured candidates and are screened first for mutations in affected individuals. Functional information is advantageous, but is not always required and no gene should be overlooked. The identification of such positional candidates is now being reduced to searching computer databases where lists of genes that map to specific regions can be obtained. Such an approach has become feasible and more popular with the construction of a “transcript map” of the human genome.
At the initial discussions concerning the Human Genome Project, Professor Sidney Brenner proposed that large scale sequencing of cDNAs should take precedence over genomic sequencing. This idea was taken up in the form of “Expressed sequence Tags” (ESTs). These are partial sequences generated from random cloned cDNAs. Transcripts from the 5' end or random areas can provide information on the protein coding region. Those derived from the 3' region are preferable for selecting STSs to be used in mapping as they are less conserved between species and less likely to be interrupted by an intron (Wilcox et a l 1991).
The first approach to a full scale generation o f ESTs was by Venter and colleagues who sequenced over 8,000 cDNA clones from human brain libraries over several years (Adams et a l 1991, 1992a, 1993a and 1993b). They also generated
174,472 new ESTs from 300 libraries constructed from a range of human tissues at various stages of development (Adams et a l 1995). Their preliminary studies were
aimed at sequencing random primed or 5' regions from directional libraries to maximise the number o f ESTs that contain protein coding sequences. A further 18,698 ESTs were created by the Genexpress Centre (Généthon; Houlgatte et a l 1995) who
assigned 2,733 of these to specific chromosomes by PCR on human-rodent somatic cell hybrids. The Washington-University-Merck EST project in collaboration with the IMAGE consortium (Integrated Molecular Analysis o f Genomes and Their
Introduction
Expression; Lennon et a l 1996) was the third large scale attempt at EST production. More than 300,000 ESTs derived from both 5' and 3' ends of cDNAs were generated, mostly from normalised libraries from a range of tissues (Hillier et a l 1996).
In these studies, sequences were analysed to produce an “electronic expression profile” of the mRNA distribution. The nucleotide and amino acid transcripts of these ESTs were compared to Genbank entries from a broad array of organism and tissue sequences via BLAST similarity searches (Altschul et a l 1990). Most previously characterised genes were represented by ESTs. Many ESTs were tentatively identified as new members o f gene families or human homologues of genes from other organisms as they were similar but not identical to known genes. A large percentage of ESTs were novel and were subsequently analysed via the codon predicting program GRAIL to estimate the probability that the sequence encoded a protein. The expression levels of certain transcripts could also be determined by observing the measure of EST redundancy as mRNA copy numbers are reflected in the composition of cDNA libraries, unless normalised or subtracted libraries were used. All these efforts provided an estimate o f the diversity and activity of expressed sequences in different tissues by cataloguing them in respect to their assumed roles in cellular biology. Most accumulated sequences are gathered in the dbEST division of the Genbank database (Boguski et a l 1993) and can be accessed publicly. Although these studies have provided an important insight into the expression profile o f tissues at various developmental stages, only a limited number o f these ESTs were mapped to chromosomal regions.
Many small-scale projects have mapped a small number of ESTs through hybrid panels or onto genomic clones (e.g. Polymeropoulos et a l 1993; Berry et al
1995). Hudson and colleagues reported the results o f an international effort, sponsored by the Human Genome Organisation (HUGO), which assigned >16,000 ESTs on radiation hybrid panels and -1,000 onto YACs (Schuler et a l 1996). These were placed relative to a common framework map which is connected by -1,000 Généthon polymorphic markers. This placed ESTs to at least the same resolution as the critical intervals of diseases (0.5-5 Mb). As well as providing positional candidates that are readily accessible by database searching, it has provided a large source of STSs to increase the density and reinforce clone continuity on the physical map.
Introduction
Once ESTs or genes that reside in a region of interest are identified, redundant clones can be used for assembling contigs o f overlapping sequences to yield the full length gene transcript. Several groups have reported the clustering of deposited ESTs and gene sequences and formed a non-redundant set o f unique sequences as a standard for comparison for each transcript. This includes the TIGR Human cDNA collection and Tentative Human Consensus Sequences (THCs; Adams et al. 1995), the Unigene set (Schuler et al. 1996) and the Merck Gene Index (Aaronson et al. 1996).
The positional candidate gene approach has become more successful in the last two years with increasing information on the locations o f diseases and genes. Many disease causing genes have been identified this way. For example, by searching the human EST databases using sequences from known members of the human ATP- binding cassette (ABC) family of genes, some of which are known to be involved in disease (e.g. CFTR with cystic fibrosis), Allikmets and colleagues (1996) identified 21 new members. Their map locations and patterns of expression were determined to serve as positional candidates for diseases mapping to those regions. One of the EST- associated genes known to map to Ip22-p21 was soon found to be pathologically implicated in patients with Stargardt’s macular dystrophy and age related macular dystrophy which also map to this locus (Allikments et al. 1997). In another example,
the retinoschisis disease gene had been linked to Xp22.3-p22.1. Sauer et al. (1997) identified mutations in a novel gene which had been identified as a mapped EST in the retinoschisis critical region and which was expressed exclusively in the retina.