2.2 SISTEMA DE IMPRESIÓN DE TICKETS (SIT)
2.5.1 DISEÑO DE LA BASE DE DATOS
Total no. DDRT- PCR sequences used to search genome databases 10 1 1 12 34
Table 3.9 Summary o f sequence database searches fo r human testis-specific DDRT cDNAs fragments.
3.13 Analysis of human DDRT cDNAs matching to nucleotide database sequences
3.13.1 DDRT cDNAs that matched to the same database entry
In three cases, separate DDRT cDNAs have identified the same sequences in the databases: 2.2(VA) and 2.3(VA) (section 3.14.1); 3.1 (VC), 3.2(VC) and 3.5(VC) (section 3.14.5); and 4VA2 and 4VA3. These cDNAs all possess the same nucleotide sequence. Each of these sets of fragments were cut from the same DDRT-PCR gels and differed in size on the gel by only a small number of nucleotides.
This could either be because the 10-mer primers have annealed to the same cDNA at several places, or because the Taq polymerase enzyme has added an extra dATP nucleotide onto the end of the sequence during the PCR reaction resulting in the same cDNA amplifying at slightly different sizes and therefore electrophoresing to different levels within the gel.
19VC1 and llV G c do not have the same nucleotide sequence, and were identified using different primer combinations (VC anchor, OP-19 and VG anchor, OP-
11), but both matched to different portions of the human mitochondrial genome (see figures 3.27 and 3.28). The sequence of llV G c shows that it has a poly(A) stretch within it (see Appendix, section Al.1.2) which may well have formed a priming site for the d ln V G anchor primer, independent of the oligo 10-mer primer during the radioactive PCR reaction.
Query = 19VC1, 175 bases Database : organelles
>EM:MIHSXX V00662 H.sapiens mitochondrial genome Length = 16,569
Minus Strand HSPs:
Score = 865 (239.0 bits), Expect = 1.3 e - 6 5 , P = 1.3e-65
Identities= 173/173 (100%),Positives = 173/173 (100%),Strand = Minus/Plus Query: 173 TCATAGCCGAATACACAAACATTATTATAATA AACACCCTCACCACTACAATCTTCCTAG 114
I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Sb j ct : 3977TCATAGCCGAATACACAAACATTATTATAATAAACACCCTCACCACTACAATCTTCCTAG Query : 113 GAACAACATATGACGCACTCTCCCCTGAACTCTACACAACATATTTTGTCACCAAGACCC 54 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Sb j ct 4 037 GAACAACATATGACGCACTCTCCCCTGAACTCTACACAACATATTTTGTCACCAAGACCC Query: 53 TACTTCTAACCTCCCTGTTCTTATGAATTCGAACAGCATACCCCCGATTCCGC 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Sbjct4097 TACTTCTAACCTCCCTGTTCTTATGAATTCGAACAGCATACCCCCGATTCCGC 414 9
Figure 3.27 BLAST results for DDRT cDNA 19VC1 from Genbank database; I9VCI sequenced with pTAg 5 'primer from ligA Tor clone.
Q u e r y = l l v g c , 178 bases D a t a b a s e : o r g a n e l l e s
> E M : M I H S X X V00662 H . s a p i e n s m i t o c h o n d r i a l g e n o m e Length = 16,569
Plus Strand HSPs:
Score = 521 (144.0 bits). Expect = 5 . 6e-37, P = 5 . 6e-37
Identities = 105/106 (99%), Positives = 105/106 (99%), Strand = Plus/Plus Query: 73 CCCATTCTATACCAACACCTATTCTGATTTTTCGGTCACCCTGAAGTTTATATTCTTATC 132 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Sb j ct : 6585CCCATTCTATACCAACACCTATTCTGATTTTTCGGTCACCCTGAAGTTTATATTCTTATC Query: 133 CTACCAGGCTTCAGAATAATCTCCCATATTGTAACTTACTACTCCG 178 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I S b j c t : 6645 CTACCAGGCTTCGGAATAATCTCCCATATTGTAACTTACTACTCCG 6690
Figure 3.28 BLAST results fo r DDRT cDNA llV G c from Genbank database; llV G c sequenced with pTAg 5 'primer from ligA Tor clone.
To find out which genes these mitochondrial sequences represented, the sequences of 19VC1 and llV G c were used to search the SwissProt protein database using BLASTX (Gish and States, 1993; Altschul et a l, 1990) available through the World Wide Web at http://www.genome.ad.jp/SIT/BLAST.html. Table 3.10 shows the results of these database searches.
DDRT- PCR cDNA Databases searched p < le -0 5 Identity (Similarity) Reading frame Homol(^ies GenbankID 19VC1 SwissProt 1.2e-03 53/57 (92%) -2 NADH- ubiquinone oxidoreductase chain 1 NIJIM HUMAN l l V G c SwissProt 2.9e-18 34/39 (87%) +1 cytochrome c oxidase polypeptide I COXl HUMAN
Table 3.10 Protein database search residts fo r DDRTcDNAs 19VCI and 11 VGc.
Table 3.10 shows that DDRT-PCR cDNAs 19VC1 and 11 VGc are in fact derived from separate genes within the mitochondrial genome.
3.13.2 DDRT cDNAs that matched to zt/w/Ll repeats
Alus are polymorphic short interspersed elements (SINEs) that are present in primate genomes. They constitute about 5% of the human genome and, if present within a gene, are usually found within introns or 3' untranslated regions (3' UTR). Some Alu
elements are still transcriptionally active and are retrotransposed within the genome. Quite often, Alu elements are mistakenly included in genome databases by researchers (the mistaken insertion or deletion of nucleotides whilst reading sequencing gels which may result in the inclusion of an Alu element within an open reading frame) and so any matches to sequences within the genome databases that contain these elements should be carefully checked (Claverie and Makalowski, 1994).
DDRT cDNA llV G b has been excluded from further analysis for the present time as this cDNA matches to LINE 1 repeat present in the database. 1.4(VA), llV G a and 4.8(VC) however, did not match specifically to repetitive sequences, but to cDNAs present in the database that are known to contain repetitive elements.
The program RepeatMasker (Smit, A.F.A., Green, P. unpub., vers. 19/9/97 available through the HGMP) was used to search these sequences, and all three were found to consist entirely o f repetitive sequences. The homologies to cDNAs in the database is therefore probably a result o f these sequences matching the repeats within these cDNAs. 114/117bp of 1.4(VA) matched to a MER2 repeat; 93/lOObp of llV G a matched to a LINE 1 repeat and 87/lOlbp o f 4.8(VC) matched to a retroviral LTR.
All o f these four DDRT cDNAs have been excluded from further analysis, although they may nevertheless be derived from transcripts which are genuinely testis- specific.
3.13.3 DDRT cDNAs th at matched to cDNAs/ESTs in the databases
cDNA clones present in Genbank often represent anonymous sequences, of which the only information known is the tissue used to produce the cDNA library from which they were derived. Many o f the cDNA clones or ESTs present within the databases however are available to researchers royalty-free; these include cDNAs and ESTs identified by large-scale sequencing projects; for example, the Washington-Merck EST project or the I M A G E consortium. This will be a useful resource for the further study o f DDRT cDNAs that were homologous to database cDNAs/ESTs, as these sequences are usually longer and therefore better suited for use as probes on Northern blots or cDNA library filters.
3.13.3.1 Searching the Unigene database with DDRT cDNAs
It is possible to gain extra information for the DDRT cDNAs that match to ESTs in the database. The ESTs they match can be used to search the Washington-Merck Unigene EST database (http://www.ncbi.nlm.nih.gov/Schuler/Unigene). Within this database, ESTs that demonstrate enough homology to form overlaps, are clustered into groups o f sequences. The mapping information for the group is therefore composite as it is assumed that these overlapping ESTs all represent the same gene. Thus, if mapping or expression information is known for a single EST, that information also applies to the other ESTs in the cluster. However, there are some problems caused by chimaeric ESTs present within the database; Wolfe et al. (1997) quote that 20% of EST clusters in Unigene suffer from this problem.
By searching the Unigene database, further expression information was gained for many o f the DDRT cDNAs (table 3.11), those DDRT cDNAs not appearing in the table did not match to any EST clusters within the Unigene database.
Whilst providing some useful additional information on possible gene homologues o f DDRT cDNAs, searching the Unigene database may also provide information as to where an EST cluster maps in the genome. The map locations shown in
table 3.11 are from the Science 96 transcript map
DDRT- PCR cDNA Unigene ID Total no. of ESTs in cluster
Best Swiss-Prot match Genbank ID and protein
name
Best mRNA/gene match Genbank ID and gene name
Expression information (cDNA sources) of ESTs
Mapping information
(Science transcript map)
4.2(VA) Hs.75970 37 P12805 GIG protein (X
laevis)
U11861 Human GIG
homolog (edg-2) mRNA
Brain, foreskin, heart, lung, muscle, ovary, parathyroid, prostate, skin, testis, thyroid, uterus, whole embryo
6.4(VA) Hs.99821 39 None None None None Bone, brain, colon, ear, foreskin,
kidney, liver, lung, parathyroid, placenta, skin, testis, tonsil, embryo
Chromosome IG D1GS549-D1GS561
19.3(VA) Hs.37099 104 P36425 Sperm surface
protein Spl7 (rabbit)
Z48570 Human Spl7 gene Adipose, adrenal gland, aorta, blood, brain, breast, CNS, eye, foreskin, heart, liver, lung, muscle, ovary, prostate, testis, tonsil, uterus, whole embryo
Chromosome 14 D14S81-D14S265 3.1(VC)/ 3.2(VC)/ 3.5(VC) Hs.3844 32 P25800 Human rhombotin-1
U24576 Human breast
tumour autoantigen mRNA
Adipose, brain, breast, colon, ear, heart, lung, pancreas, thyroid, uterus, whole embryo
Chromosome 1 D1S2G7-D1S2865
4.1(VC) Hs. 14839 39 P46279 DNA-directed
RNA
polymerase II
U52427 Human RNA
polymerase II