• No se han encontrado resultados

Utilización del derecho de autor en el entorno digital

PROGRAMAS PROPUESTOS,

PROGRAMA 4: Utilización del derecho de autor en el entorno digital

Further information as to the insert DNA contained in each cosmid was gleaned from restriction digests with different enzymes. All eight cosmids identified in Section 4.4.2 displayed some common fragments when digested with BamHI or PstI (Figure 4.6).

Cosmids F13-1 and F6-6 appeared almost identical according to the size of the restriction fragments generated. Cosmids B4-1, G22-5 and E18-8 also had strikingly similar band patterns. Thus cosmids B4-1 and F13-1 were chosen for sequencing. These cosmids contained all planosporicin biosynthetic genes identified so far (Section 4.5.1), but none were present at the ends of the cosmid (Section 4.5.2). Despite these similarities, they gave different restriction digest patterns. The total size of each cosmid was estimated by addition of all the digest fragments. Both cosmids were approximately 40 kb in size, giving an estimated insert size of 32 kb, likely large enough to contain the entire biosynthetic gene cluster. For the rest of this thesis, cosmid B4-1 is referred to as pIJ12321 and cosmid F13-1 is referred to as pIJ12322.

156 4.6 Cosmid sequencing and annotation

4.6.1 Sequencing

P. alba cosmids B4-1 (pIJ12321) and F13-1 (pIJ12322) were sequenced using Sanger sequencing at the DNA sequencing facility at the University of Cambridge. The insert DNA in pIJ12321 was 37760 bp, while pIJ12322 contained 37916 bp. Although very similar in size, the segment of cloned P. alba gDNA differed slightly between the two cosmids.

pIJ12321 has the 1830 bp of sequence used to identify the clone located centrally within the insert DNA, whereas in pIJ12322 this sequence is located towards the right hand border of the insert DNA with the vector backbone. ClustalW2 was used to align the insert gDNA from pIJ12321 and pIJ12322 (Chenna et al. 2003). This identified one discrepancy, an additional cytosine residue in the pIJ12322 sequence. This potential frameshift mutation would have destroyed a stop codon present in the pIJ12321 sequence.

Correspondence with the Cambridge Sequencing facility confirmed that this additional cytosine was an error during the assembly of the pIJ12322 sequence in Consed (Gordon et al. 1998).

Figure 4.6 : Restriction digest patterns of different P. alba cosmids.

A; Digest with BamHI. B; Digest with NcoI. Size estimation is given by Hyperladder I (Bioline) with band size in kb indicated next to the gel.

157 4.6.2 Annotation

The full insert DNA of pIJ12321 was annotated using Artemis software (Rutherford et al.

2000). Instrumental to the correct assignment of ORFs was analysis of the GC content.

The high-GC content of actinomycete genomes skews the usage of codons. The triplet code allows most freedom at the third codon position. Consequently the first, second and third nucleotide positions of codons show distinct differences in GC content, changing from intermediate to low to high GC content (Bibb et al. 1984). The GC-frameplot tool within the Artemis program tracks the GC content at each position within a moving window of a stipulated number of codons. Analysis of the GC content at each position reveals both where ORFs may be situated and their directionality (Figure 4.7). The start site of each ORF was adjusted to an ATG or GTG with a potential ribosome-binding site (RBS) nearby. The sequence GGAGG was used as the ideal RBS, located approximately 5 to 15 bp upstream of the start codon (Kieser et al. 2000). Thus the final annotation uses the start codon with a putative RBS upstream which creates the largest ORF.

The protein sequences from putative ORFs were compared to the NCBI protein database using the BLASTP search program (Altschul et al. 1990). Proteins with a high percentage similarity to those encoded by the planosporicin gene cluster were used to assign a putative function to each ORF.

It is interesting to note that a number of genes appear translationally coupled due to overlapping start and stop codons. This is common in gene clusters as a method of co-ordinating regulation. Translational coupling is common where the stoichiometric ratio of protein expression is essential for optimal functionality. Elsewhere in the cluster, there are larger gaps between ORFs. This gives scope for regulation of transcription through RNA secondary structure. Hairpin loops in mRNA may act as transcription terminators. These alter the stoichiometry of protein expression; coding regions before the stem loop are transcribed at a higher rate compared to those after the stem loop. This may be crucial when an enzyme and its substrate are expressed from the same operon. One enzyme can process many substrates so the stoichiometric ratio needs to be altered accordingly to optimise efficiency.

158

Figure 4.7 : Cosmid B4-1 viewed in Artemis (Sanger). GC frameplot is displayed (window size 400) above the annotation of putative ORFs.

159 4.7 P. sp. DSM 14920 cosmids

4.7.1 PCR on P. sp. DSM 14920 cosmids

The NAICONS consortium constructed a second Planomonospora cosmid library with gDNA from P. sp. DSM 14920. In this strain, planosporicin is referred to as NAI-97.

NAICONS identified four partially overlapping cosmids that appeared to contain planosporicin biosynthetic genes. In this work, PCR using cosmids 4B8 and 9A7 as templates amplified all three lan genes; lanA, lanB and lanE, while cosmid 4H8 appeared to lack lanE and cosmid 7C2 to lack lanB (Figure 4.8).

The sequence and annotation of pIJ12321, described in detail in Chapter 5, depicts the organisation of the psp cluster in P. alba. Genome scanning of P. sp. DSM 14920 identified the planosporicin prepropeptide and analysis of a few kilobases upstream and downstream revealed a gene organisation very similar to that of P. alba (M. Sosio, personal communication). To investigate if sequences flanking the biosynthetic gene cluster in each strain show synteny, primers were designed to amplify across the predicted left and right borders of the psp cluster. These primers amplified a fragment of the same size whether P. alba or P. sp. DSM 14920 DNA was used as a template (Figure 4.9). Thus gene order appears to be conserved between the psp clusters from P. alba and P. sp. DSM 14920.

Figure 4.8 : PCR analysis to determine which P. DSM 14920 cosmids were likely to contain the entire planosporicin gene cluster.

Cosmids 9A7, 4H8, 4B8 and 9A7 were used as templates to amplify lanA (1289FlanA and 1289RlanA; 230 bp), lanB (3088F2 and 3088R2; 303 bp) and lanE (FlanEF and RlanEF;

221 bp) from the psp gene cluster. PCR products were run on a 1 % agarose gel by electrophoresis. The ladder is Hyperladder I (Bioline) with band sizes annotated in bp.

160 4.7.2 Restriction digest on P. sp. DSM 14920 cosmids

Interestingly, restriction digests comparing P. alba cosmids with those from P. sp. DSM 14920 appeared quite different. Figure 4.10 depicts the band pattern created by a BamHI or NcoI digest. Several restriction fragments are common across all eight P. alba cosmids.

The majority of these are absent from the digest pattern of the P. sp. DSM 14920 cosmids. This implies that although the amino acid sequences of the encoded proteins is conserved, there is considerable variation at the nucleotide level in the planosporicin biosynthetic gene clusters from the two Planomonospora species.

Figure 4.9 : PCR analysis to check the similarity of P. DSM 14920 cosmid 4B8 to P. alba cosmid B4-1, both encoding the planosporicin gene cluster.

Cosmid DNA was used as a template to amplify regions across the borders of the planosporicin gene cluster. LongAmp polymerase (NEB) was used to amplify two regions.

ES1F and ES1R amplify 1073 bp across the left hand border. ES2F and ES2R amplify 1883 bp across the right hand border. PCR products were run on a 1 % agarose gel by electrophoresis. The ladder is Hyperladder I (Bioline) with band sizes annotated in kb.

161

Figure 4.10 : Restriction digest patterns of eight P. alba cosmids and four P. sp. DSM 14920 cosmids with two different enzmyes.

A; Digest with BamHI. B; Digest with NcoI. Restriction fragments were run on a 1 % agarose gel by electrophoresis. Size estimation is given by Hyperladder I (Bioline) with band size in kb indicated next to the gel.

162 4.7.3 End-sequencing of P. sp. DSM 14920 cosmids

The P. sp. DSM 14920 cosmid library was made with a derivative of the SuperCosI vector used for the P. alba cosmid library. Consequently the same primers; end_F and end_R were used to sequence into the ends of the insert DNA. Cosmid 7C2 yielded end-sequence data, which a BLASTN search of the P. alba 454 database yielded contig01289 as a top hit. The presence of lanA and lanB at the end of this insert implied that part of the lantibiotic biosynthetic gene cluster would be missing from the cosmid. Cosmids 4B8 and 9A7 gave end-sequences which a BLASTP search of the NCBI protein database revealed similarity to Streptosporangium roseum genes for primary metabolism, implying the entire cluster may be present within these cosmids. For the rest of this work, cosmid 4B8 is referred to as pIJ12325 and cosmid F13-1 is referred to as pIJ12326.

4.8 Discussion

This Chapter describes how genome scanning through 454 sequencing created contigs that enabled BLAST-type searches for genes similar to those commonly found in lantibiotic gene clusters. This information allowed the amplification of probes which were hybridised against a P. alba cosmid library. From the selection of positive hits, a combination of PCR, restriction digests and end-sequencing were used to deduce which cosmids likely contained the entire cluster which were then sent for sequencing. The next Chapter continues along this theme with the bioinformatic characterisation of the cluster and a full discussion of the likely functions of individual genes.

Further work has fully justified the methodology chosen to identify the planosporicin gene cluster. An activity based screen could have been performed by integrating the P. alba cosmid library into a heterologous host. However, choice of host would have been difficult, as there would have been no guarantee that the cluster would have been expressed at a level that allowed planosporicin detection. Likewise, use of heterologous expression to identify which cosmids out of those which give a positive hit after hybridisation with a radioactive probe would likely have proven a futile exercise, as Chapter 6 details how attempts made to heterologously express the gene cluster proved slow and cumbersome.

Chapter 3 described how the predicted planosporicin propeptide sequence based on the 2007 structure of planosporicin had a number of errors (Castiglione et al. 2007). 454 sequence data revealed the C terminal Pro and Gly residues were actually located at positions 9 and 10 respectively. This was further confirmed in this Chapter through the amplification and sequencing of the lanA prepropeptide from P. alba as a probe to identify positive clones from the cosmid library. A retrospective investigation into the methods

163 used to identify the planosporicin biosynthetic gene cluster enables the conclusion that

this genome scanning approach was appropriate. The differences in the precursor peptide compared to the published structure (Chapter 3) and lack of heterologous expression in Streptomyces (Chapter 6) would most likely have confounded other methods.

4.9 Summary

454 genome scanning revealed the lanA gene for planosporicin within the P. alba genome A cosmid library was generated from P. alba gDNA

Three probes for the planosporicin gene cluster were amplified using 454 sequence data.

Eight out of 3072 clones hybridised to all three probes

Cosmids B4-1 (pIJ12321) and F13-1 (pIJ12322) were sequenced

pIJ12321 contained the entire 15.3 kb cluster centrally located within 37 kb of P. alba gDNA

Two cosmids from P. sp. DSM 14920 appear likely to contain the planosporicin gene cluster

164

Chapter 5 : The planosporicin