• No se han encontrado resultados

DETERMINACIÓN DEL IMPORTE, DE LA TASA DE INTERÉS, DEL TIEMPO Y DE LAS TASAS NOMINALES Y EFECTIVAS DE VALOR PRESENTE DE LA ANUALIDAD ORDINARIA.

BLOQUE II CALCULAS TIPOS DE ANUALIDADES

SECCION DE CÁLCULOS ELECTRÓNICOS

15 VALOR FUTURO 16 PERIODOS

2.5 DETERMINACIÓN DEL IMPORTE, DE LA TASA DE INTERÉS, DEL TIEMPO Y DE LAS TASAS NOMINALES Y EFECTIVAS DE VALOR PRESENTE DE LA ANUALIDAD ORDINARIA.

The A. nidulans genome assembly (version CADRE 2.5) was downloaded via the ensemblgenomes ftp server (ftp://ftp.ensemblgenomes.org/pub/) and used to assemble a Bowtie compatible FASTA index (Chapter 2.4.2).

173

To ensure the relative quality of each fragment library was consistent between samples, a preliminary test of mapping quality was performed with Bowtie alone. Reads were trimmed at the 3′ end to a final length of 35 bp as the error rate in SOLiD reads increases rapidly with each additional base position beyond 35 bp from the 5′ end (Applied Biosystems,2008). Stringent mapping of the trimmed reads against the A. nidulans genome was performed in Bowtie, allowing 0 mismatches per read in order to best assess sequencing quality for each sample. Non-uniquely mapping reads were assigned randomly to a single position to prevent total read mapping statistics from being artificially inflated by reads mapping to multiple positions. Using these settings, all libraries mapped to the A. nidulans genome at a low rate, but demonstrated excellent balance of reads mapping to each strand. Percentage read mapping between samples was also reasonably consistent, with only the 72 hour nitrogen starvation library showing significantly increased mapping compared to other samples (Table 5.2).

Table 5.2.Read quality assessment mapping. Read distribution between forward and reverse strands was even for each sample and mapping figures showed a high level of consistency between samples, with only the 72 hour nitrogen starvation sample showing a significantly higher level of mapping.

Library condition Number of reads Mapped to forward strand

Mapped to reverse strand

Minimal medium + nitrate 60,003,026 2781874 (4.6%) 2810440 (4.7%)

Complete medium 60,257,046 3523898 (5.8%) 3501683 (5.8%)

Minimal medium + ammonium

46,929,150 2122804 (4.5%) 2142050 (4.6%) 4 hour nitrogen starvation 71,009,476 4098809 (5.8%) 4095745 (5.8%) 72 hour nitrogen starvation 65,995,033 6990113 (10.6%) 7145774 (10.8%)

174

Full mapping of reads from newly sequenced samples was performed with Tophat in conjunction with Bowtie using basic settings (methods 2.4.3). No reference genes were supplied to Tophat for this mapping. Had gene models been provided, Tophat would have extracted the transcript sequences and assembled an artificial transcriptome, then use Bowtie to map reads preferentially to this construct, before mapping to the rest of the genome. This would have led to a strong mapping bias at the loci of current gene models, which our previous data had shown to be incomplete and of poor quality. By mapping without a gene model annotation we aimed to remove this bias and produce a more accurate representation of the A. nidulans transcriptome.

Resulting BAM files were indexed and sorted using SAMtools to facilitate visualisation of mapped reads in the Broad Institute’s Integrative Genomics Viewer (IGV) software

(Robinson et al., 2011; Thorvaldsdottir et al., 2012). IGV is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets, capable of displaying both reads which map across splice junctions, and separate tracks to show junctions predicted by Tophat (Robinson, et al., 2011; Thorvaldsdottir et al., 2012).

Visual analysis of the mapped data in IGV displayed a large number of Tophat-defined splice junctions which correspond to those in the annotation, but included many more which

spanned extremely long genetic regions, often traversing several genes (Fig. 5.1). One of the ways Tophat identifies putative splice junctions is by splitting reads which partially align to the genome into two segments, mapping the partial alignment and the rest of the read

independently. If the second segment is mapped downstream of the first, Tophat identifies the gap between them as a splice junction (Trapnell et al., 2009). The high frequency and

obvious error of these extremely long junctions would have led to difficulties and inaccuracy in data analysis. This led to a reassessment of mapping criteria to reduce the occurrence of

175

this phenomenon, and subsequent remapping of each sample to obtain more accurate junction predictions.

Figure 5.1.Tophat based mapping of extremely long splice junctions displayed in IGV.

While the majority of splice junctions predicted by Tophat fit the annotation, a large number of junctions (shown as horizontal red lines in bottom track) were predicted to span several genes (genes are shown in blue in the gene annotation track). This was due to segments from split reads mapping at distant loci and Tophat defining the gap in between as a splice junction. Instances of this read splitting are shown in the mapped reads track, blocks of red (forward strand) and blue (reverse strand) indicate reads aligned to the genome, while the horizontal lines indicate gaps between split reads.

176 5.4. Assessment of Tophat junction mapping

To address the issue of excessively long junction finding, it was necessary to limit the

maximum intron length when using Tophat to perform read alignment. The default maximum intron length in Tophat is 500,000, and when searching for junctions ab initio, TopHat will ignore donor/acceptor pairs farther than this many bases apart. This is many times larger than any known junction found in the A. nidulans genome and provided a reason for Tophat reporting junctions which spanned unrealistic distances and across multiple genes.

As a test of this hypothesis, Tophat mapping was performed with a maximum intron length of 21,000, the size of the largest gene in the current annotation. Being many times smaller than the default 500,000 bp length, this greatly reduced the number of extremely long introns found by Tophat. However, a number of extremely long introns persisted.

To determine an appropriate limit for intron length in A. nidulans, currently annotated intron lengths were used as a starting point. Software was written to extract a list of all intron

lengths from the current A. nidulans gene annotation. While the software output contained the length of every intron in the A. nidulans genome, further processing was required to extract meaningful data. To identify the range of intron sizes, the software was updated to sort the values in the intron length array by size. This produced an output file of intron lengths listed from shortest to longest, assisting the rapid identification of the range of intron sizes in A. nidulans and the maximum intron size found (Chapter 2.5.11)

To gain an overview of all intron lengths, graphical representations of the resulting dataset were produced in R (R Development Core Team, 2008) (Fig. 5.2). The box and whisker plot produced indicated an extremely low interquartile range with a large number of outliers. To provide an alternative view of the data, a histogram was also created in R (Fig. 5.3),

177

Figure 5.2. Box and whisker plot of intron lengths in A. nidulans. The length of all introns is plotted, with the five number summary indicated by the box and whiskers. The box

indicates the lower quartile, median and upper quartile, while the whiskers indicate the lowest and highest values determined by statistical methods, indicating that the majority of introns fall within this length range. The circles represent lengths determined to be statistical outliers, meaning that the majority of intron lengths fall within the boxed region around 40-100 bp, with the median value being 82. However, there is a significant number of outliers observed up to approximately 1200 bp in length, beyond which only very few introns are observed.

In tro n len gth (bp )

178

Figure 5.3. Histogram of intron lengths in A. nidulans.The number of introns of each length is shown, indicating the vast majority fall within the 40-100 bp length range as was suggested by Fig. 5.2.

179

Annotated intron length in A. nidulans was shown to range between 2 and 3571 bp. However, from Fig. 5.2 and Fig, 5.3 it was apparent that nearly all annotated introns are < 1200 bp, with the dataset containing a small number of larger outliers. The largest intron was more than double the length of any other at 3571 bp, raising questions about the validity of this

annotation. To identify the gene containing this intron, the software was updated to record the ID of genes as they were processed and record the ID of the gene containing the largest intron. The finished version of this software is described further in Chapter 2.5.11.

The modified software was used to identify the largest intron as being annotated in gene AN4390.4 (CADANIAG00006067). While our RNA-seq data showed no evidence of an intron at this locus (Fig. 5.4), it is still possible that introns of this length would exist in A. nidulans. Lowering the maximum intron length beyond this point would potentially have prevented real introns from being found by Tophat. A maximum intron size therefore had to be over 3571. An optimal limit of 5,000 was selected as it appeared to give excellent mapping results on visual inspection in IGV, while being appropriately large so as not to exclude any real junctions of greater than average length. Maximum intron lengths below this value were also tested, however this caused no appreciable improvement in mapping quality and only a minor decrease in junctions found (32 fewer were found with maximum intron length of 4,000).

180

Figure 5.4. RNA-seq and junction data for gene AN4390.4 (CADANIAG00006067). Tophat aligned total RNA-seq reads displayed in IGV is shown, including read coverage (top track), individual for the forward (blue) and reverse (red) strands (middle track). The bottom track shows the annotated gene in this region (blue) and the splice junctions predicted by Tophat (red). This confirmed the presence of 2 small introns, however there was no evidence for the existence of the large 3571 bp intron at this locus.

181

More recent annotations of the A. nidulans genome which were released subsequent to this analysis no longer contain an intron at this locus. Studies into the structure of AN4390 have shown the gene to start at the exon displayed before the first confirmed junction in Fig. 5.4 (de Groot et al., 2009).