Secuencia del Gen WT1 en equinos - El gen del Tumor de Wilms (WT1)

2.6. El gen del Tumor de Wilms (WT1)

2.6.3. Secuencia del Gen WT1 en equinos

To determine whether the rRNA operons were present, each draft genome was uploaded into RNAmmer. The 5S rRNA genes were detected in all of the strains and two fragmented 23S rRNA genes were detected in strain D1, but not the other two strains (refer to the CD6_{for the} RNAmmer output containing the 5S rRNA gene sequences). The 16S rRNA genes were not found in any of the three genomes.

The fragmented genome sequences meant that the rRNA operons were also fragmented. When the identified 5S rRNA genes were located on the draft annotated genomes, using Geneious (v. 7.0.4), there was no full-length 23S rRNA gene located on the same contigs. Small fragments of the 16S rRNA genes were found alongside some of the fragmented 23S rRNA genes. This was concerning given RNA operons are fundamental to a bacterial genome and they are traditionally used for studying phylogenies. This section describes how the consensus rRNA gene

sequences were determined. 5S rRNA genes

Using RNAmmer it was determined that each of the three dairy strains, A1, P3 and D1, had 10 copies each of the 5S rRNA gene, all of which were 116bp in length (refer to the CD6for the RNAmmer output containing the 5S rRNA gene sequences). The program Geneious (v. 7.0.4) was used to compare the sequences of the ten 5S rRNA gene copies and calculate the variant frequencies (Table 4.9 and Figure 4.5).

Table 4.9. Variant frequency of polymorphisms in the 5S rRNA gene sequences of the G. stearothermohilus dairy strains.

Nucleotide position Change Variant Frequency (%)

A1 P3 D1 6 G/A 90/10 90/10 90/10 13 C/T - - 90/10 88 C/T 90/10 90/10 - 95 A/G 90/10 90/10 90/10 107 C/T 80/20 80/20 90/10 108 T/C 90/10 90/10 90/10 116 A/G/T 70/20/10 70/20/10 70/20/10

Figure 4.5. Alignment of the 5S rRNA copies. (A) A1 and (B) D1.

In D1 one of the 5S rRNA genes is either a lone copy or part of a split operon, with the surrounding genes of this 5S rRNA gene being the same as those surrounding the 5S rRNA gene from a split operon in the genome of G. kaustophilusHTA426 (Figure 4.6). In this case the 5S rRNA gene is separated from the 23S rRNA gene by 398 853 bp.

Figure 4.6. Genetic context of a 5S rRNA gene from strain D1, which sits by itself, compared with

the split rRNA operon found in G. kaustophilus HTA426.The rRNA operon is coloured in red, tRNAs in

The three dairy strains had a 5S rRNA gene diversity of 5.2% (Table 4.10). This is similar to the strain G. kaustophilus HTA426.

Table 4.10. Diversity of the 5S rRNA genes.

Strain Copy

number

Sequence length Percentage diversity (%) G. stearothermophilus A1 10 116 bp 5.2 G. stearothermophilus P3 10 116 bp 5.2 G. stearothermophilus D1 10 116 bp 5.2 Geobacillus sp C56-T3a ₉ _{117 bp} _8.6 G. kaustophilus HTA426a ₉ _{116 bp} _4.9 G. thermodenitrificans NG80-2a 10 116 bp 5.5 B. subtilis 168a 10 118 bp 6.9

a_{Data taken from Pei}

et al.(2012)

16S and 23S rRNA genes

No full-length copy of any of the 16S or the 23S rRNA genes could be found in the final assemblies for the strains A1 and P3, nor was a full-length copy of a 16S rRNA gene found for strain D1; therefore, the draft assemblies were also uploaded into RNAmmer to determine whether a full-length copy of either of these genes could be found. One full length 16S rRNA and one full length 23S rRNA gene was found in each genome using the highest k-mer value of 249, but no 5S rRNA genes were detected. When the k-mer value was decreased no full-length copies of the 16S and 23S rRNA genes were detected, but the 5S rRNA genes were detected. Given at least 8 copies of the 16S and 23S rRNA genes were still missing, a mapping approach was undertaken to determine a consensus sequence for both the 16S and 23S rRNA genes. The original sequence reads were mapped against the full-length sequence of the two

ribosomal genes detected using RNAmmer as described in Section 4.2.3. The consensus 16S and 23S rRNA gene sequences are available on the CD7.

To determine the diversity of the 16S and 23S rRNA genes and to give an indication of how many rRNA operons were present in each of the newly sequenced strains, the variant frequency of each SNP, insertion and deletion was calculated (Table 4.11 and 4.12). The variant frequency gives an indication of how many ribosomal operons would contain a given SNP in a given gene. For example, the SNP at position 462 in the 16S rRNA gene of D1

(highlighted in grey in Table 4.11) had a variant frequency of 88.9/10.7 %, which would probably equate to 9 out of 10 copies of this gene having nucleotide C and 1 out of 10 copies having nucleotide T.

Table 4.11. Variant frequency of polymorphisms in the 16S rRNA gene sequences of the G. stearothermophilus dairy strains.

Nt position

Change Polymorphism type

Variant frequency (%) Coverage

A1 P3 D1 A1 P3 D1 36 C/T SNP 89.1/10.4 91.1/8.7 - 4036 2167 67 G/A SNP 74.2/25.5 76.3/23.4 76.4/23.5 4211 2705 3731 69 T/C SNP 74.1/25.5 76.3/23.4 76.3/23.5 4225 2727 3758 70 T/G/C SNP 74.1/25.7/< 1 76.5/23.4/<1 57.6/23.8/18.6 4227 2742 3774 71 G/A SNP 74.3/25.3 76.5/23.2 76.3/23.6 4225 2755 3790 72 G/A SNP 90.0/9.8 85.4/14.3 - 4236 2773 - 73 G/A SNP - - 81.7/18. - - 3822 80 Plus T Insertion 23.5 22.3 - 4245 2850 - 80 C/T SNP - - 58.9/40.9 - - 3908 81 T/C SNP - - 58.8/41.0 - - 3918 83 G/T SNP - - 76.1/23.7 - - 3952 84 Minus A Deletion 23.3 21.9 - 4264 2921 - 84 A/G SNP - - 76.0/23.7 - - 3982 87 C/T SNP 74.4/25.0 76.5/23.1 59.3/40.4 4265 2962 4020 183 A/G SNP 88.0/11.2 89.8/9.6 - 4596 4256 - 196 C/T SNP 75.4/24.1 74.3/25. - 4461 4229 - 451 A/G SNP - - 88.4/11.0 - - 5185 462 C/T SNP 87.5/11.9 86.0/13.5 88.9/10.7 4046 3769 5216 482 G/A SNP 87.1/12.1 85.9/13.6 88.5/11.2 4113 3800 5359 751 C/T SNP 76.9/22.5 78.6/20.4 89.7/9.5 5316 4976 6502 1175 A/G SNP 88.5/10.9 87.5/11.8 - 5379 5148 - 1428 C/T SNP 87.6/11.8 88.0/11.7 - 3553 3389 - 1459 G/T SNP 80.7/19.1 77.3/22.3 67.8/31.9 2942 2794 3144 1460 C/T SNP 80.2/19.3 77.3/22.4 67.4/32.2 2915 2767 3093 1462 A/C SNP 80.4/19.2 77.4/22.2 67.4/32.2 2872 2733 3042

Table 4.12. Variant frequency of polymorphisms in the 23S rRNA gene sequences of the G. stearothermophilus dairy strains.

Nt position Change Polymorphism type

Variant Frequency (%) Coverage

A1 P3 D1 A1 P3 D1 154 G/T SNP - - 90.6/9.1 - - 5577 455 A/T SNP - - 82.7/10.2 - - 8050 639 G/A SNP - - 83.1/16.7 8286 690 C/T SNP 58.7/40.5 56.6/42.9 - 7069 6786 - 892 G/A SNP - - 81.7/18.1 - - 8850 995 G/A SNP 78.2/21.5 78.4/21.2 - 7524 6958 - 1085 T/C SNP 78.2/21.5 77.1/22.5 - 6977 6754 - 1460 G/A SNP 89.1/10.3 88.0/11.7 - 5548 5261 - 2157 G/A SNP 87.4/12.1 86.7/12.6 - 5134 4843 - 2237 T/G SNP - - 89.0/10.8 - - 5858 2239 G/T SNP - - 83.0/16.5 - - 5846 2654 C/T SNP - - 88.5/10.9 - - 6673 2674 G/A SNP 88.3/11.1 88.0/11.4 87.1/12.4 5522 5260 6715

The number of polymorphisms was used to calculate the diversity of both the 16S and 23S rRNA genes (Table 4.13 and Table 4.14). The diversity of the 16S rRNA gene was quite high, but was similar to that of G. kaustophilusHTA426. The diversity of the 23S rRNA gene was a lot lower. As with the variant frequencies A1 and P3 have the same diversity whereas D1 was slightly lower for the 16S rRNA gene and slightly higher for the 23S rRNA gene. The rRNA operon copy number could be 9 or 10 based on both variant frequency data and the number of 5S rRNA genes. It is possible that one of the 5S rRNA genes is an orphan gene. Orphan genes have been found in Bacilli e.g. Bacillus megaterium,B. clausiiand B. halodurans. However, in these cases the diversity of the 5S rRNA genes was > 3% (Peiet al., 2012).

Table 4.13. Diversity of the 16S rRNA genes.

Strain Copy

number

Sequence length Percentage diversity (%) G. stearothermophilus A1 9 or 10a 1547 1.23 G. stearothermophilus P3 9 or 10a ₁₅₄₇ _1.23 G. stearothermophilus D1 9 or 10a ₁₅₄₇ _1.16 G. kaustophilus HTA426 9 1553 0.77b G. thermodenitrificans NG80-2 10 1551 1.22b Mean diversityc _0.55b

a_{Estimated copy number} b_{Data taken from Pei}

et al.(2010)

c_{Mean diversity of all of the bacteria analyzed by Pei}

et al.(2010)

Table 4.14. Diversity of the 23S rRNA genes.

Strain Sequence length Number of

polymorphisms Percentage diversity (%) G. stearothermophilus A1 2926 6 0.21 G. stearothermophilus P3 2926 6 0.21 G. stearothermophilus D1 2926 8 0.27 G. kaustophilus HTA426 0.41a Bacillus species 0.17 – 0.92a Mean diversity 0.40a

a_{Data taken from Pei}

et al. (2009)

b_{Mean diversity of all of the bacteria analyzed by Pei}

et al.(2009)

4.3.4 Gene prediction and annotation

Both RAST and Prokka were used for gene prediction and annotation. RAST is easier to use in that it is web based (Azizet al., 2008), whereas Prokka software is run from the command line. Prokka is faster than RAST at annotating one genome (approximately 10 min compared with overnight) and Prokka has the advantage of generating multiple file types (e.g. GBK and SQN files) (Seemann, 2014). Prokka generated files were used for further analyses. This was in part because Prokka was faster and generated multiple file types, but also because in general the annotations appeared to be more reliable. These annotations were used in Chapter 6 as one way of identifying putative genes involved in biofilm formation and sporulation.

Two examples of the resulting annotations, using strain P3, from using these two methods are described below. The first example, (Figure 4.7) shows a region of the genome that encodes for a kinase and modulator that may be involved in biofilm formation and the annotations are described in Table 4.15. The RAST annotation gave the correct description but the protein

names (EpsC and EpsD) were incorrect. In addition, there appear to be no studies describing an EpsX protein. The annotations were verified by carrying out a BLASTX against B. subtilis 168.

Figure 4.7. Gene organisation of a region of the genome for G.stearothermophilus strain P3

containing putative biofilm genes. A.RAST annotation. B. Prokka annotation.Colours represent the

encoded function of each gene as annotated by RAST or Prokka. Grey represents a hypothetical protein.

Table 4.15. Comparison of annotation descriptions between RAST and Prokka for a region of the genome of strain P3 containing putative biofilm genes.

Position on genome

RAST description Prokka description BLASTX result using B. subtilis 168 as a reference

Top hit Evalue

1,082,894- 1,083,637 Tyrosine-protein kinase transmembrane modulator EpsC CDS Capsular polysaccharide type 8 biosynthesis protein Cap8A ywqC (BSU36260) epsA(BSU34370) 7e-84 8e-52 1,083,627- 1,084,325 Tyrosine-protein kinase EpsD Tyrosine-protein kinase YwqD epsB (BSU34360)

ptkA(previously known as ywqD) (BSU36250

1e-77 1e-74 1,084,384-

1,085,169

EPSX protein Hypothetical protein No hit

1,086,142-

1,085,201 Cell envelope-associated transcriptional attenuator LytR-CpsA-

Psr, subfamily F2

Transcriptional regulator

The second example, comparing the two different methods of annotation, shows a region of the genome that contains a CRISPR array (Figure 4.8). The RAST pipeline did not pick up the repeat region; instead it identified regions within the CRISPR array as hypothetical proteins (Table 4.16).

Figure 4.8. Gene organisation of a region of the genome for G.stearothermophilus strain P3

containing a CRISPR array. A. RAST annotation, B. Prokka annotation.Colours represent the encoded

function of each gene as annotated by RAST or Prokka as follows: grey, hypothetical protein; light green,

flagellar assembly protein; black, repeat region; light blue, CRISPR associated genes; dark blue cas

genes; and dark green, member of the PD-(D/E)XK nuclease superfamily.

Table 4.16. Comparison of annotation descriptions between RAST and Prokka for a region of the genome of strain P3 containing a CRISPR array.

Position RAST description Prokka description

1-1221 Hypothetical protein Hypothetical protein

1289-1654 Hypothetical protein Flagellar assembly protein H

1851-4347 --- Repeat region 2004-2141 Hypothetical protein --- 2352-2465 Hypothetical protein --- 2890-3009 Hypothetical protein --- 3268-3534 Hypothetical protein --- 3531-3659 Hypothetical protein --- 3868-4104 Hypothetical protein ---

4527-6242 CRISPR-asscociated protein, TM1802 family CRISPR-asscociated protein, TM1802 6244-7203 CRISPR-asscociated protein, TM1801 family Hypothetical protein 7216-7959 CRISPR-asscociated protein, TM1800 family CRISPR-asscociated protein Cas5 7961-10291 CRISPR-asscociated helicase Cas3 CRISPR-asscociated nuclease/helicase Cas3 10301-10810 CRISPR-asscociated RecB family exonuclease Cas4a PD-(D/E)XK nuclease superfamily 10813-11814 CRISPR-asscociated protein Cas1 CRISPR-asscociated endonuclease Cas1 11825-12088 CRISPR-asscociated protein Cas2 CRISPR-asscociated endoribonuclease Cas2

In document Análisis de la expresión del gen wt1 (tumor de wilms) en melanoma equino (página 37-45)