• No se han encontrado resultados

CAPITULO V PROPUESTA

5.1 FORMULACIÓN DEL MODELO MATEMÁTICO:

5.1.1 IDENTIFICACIÓN DE LAS VARIABLES PRINCIPALES DEL MODELO

The expected expression ratio for all elements for an RNA sample labelled with Cy3 and Cy5 and hybridised together on the same array is 1 (or a log(2) expression ratio of 0). The extent to which the observed expression ratios differ from 0 therefore gives a measure of the error introduced by experimental sources of variation. This was determined for mRNA purified from the B-cell line Ramos (Figure 3.11 A). Of the elements deemed to be present by the Axon software, 95.6% have a log(2) ratio of between -0.5 and 0.5 (ratio of 0.707-1.414). Assuming this error follows a normal distribution (Wolfmger et al, 2001), experimental noise therefore becomes significant (at the 95% confidence level) outside of this range. The 4.4% of elements which have

significant expression ratios by this definition tend to have low signal to background ratios in both channels (Figure 3.1 IB). Filtering the data to remove elements with a signal to background ratio below 2 in the Cy5 channel and 1.5 in the Cy3 channel removes 74% of this experimental noise. This improves the results such that 98.9% of all log(2) ratios are now between -0.5 and 0.5. The filtering cut-offs for the two channels are set differently to take into account the lower signal to background ratio in the Cy3 channel.

Experimental replicates were performed to assess the contribution of sources of variation to the measured expression patterns (Table 3.4). A number of these were performed using the PEL cell line BC-3. The cell line was grown from frozen stocks on 3 separate occasions (I, II and III), RNA purified, labelled with Cy5 and analysed on the array with the Cy3-labelled reference RNA. RNA sample III was also analysed on two additional occasions (III rpt and III rpt II), Four different batches of reference RNA were used in these analyses to control for the effect of batch-to-batch variation in the reference. BC-3 I was hybridised with reference batch 2, BC-3 II with batch 4, BC-3 III with batch 5 and BC-3 III rpt and III rpt II with batch 6.

Sample Experim ent Type o f replicate Reference batch

BC-3 BC-3! 2

BC-3 II Biological 4

BC-3 III Biological 5

BC-3 III rpt Technical and biological 6

BC-3 III rpt II Technical and biological 6

BCBL-1 BCBL-1 6 BCBL-1 rpt Technical 6 BCBL-1 II Biological 8 HBL-6 HBL-6 2 HBL-6 rpt Technical 2 JSC-1 JSC-1 2 JSC-1 rpt Technical 2 RPMI-8226 RPMI-8226 4 RPMI-8226 rpt Technical 4

The data from the three BC-3 technical replicates were used to assess the relationship between signal to background ratio and experimental variation, by the method of Ross et al, (2002). The background subtracted signal intensity for each channel was binned into 10-gene sets with increasing minimum signal intensities and plotted against the mean of the standard deviation of the triplicates for each set of binned genes. As was found with the Ramos array data, weak signals tend to contain more experimental noise (Figure 3.11C). The effect is particularly noticeable for the Cy5 channel which shows an increase in variation (SD>0.3) in elements with a signal to background ratio below 2. The SD for elements with stronger signals tends towards 0.2, as was also found by Ross et al, (2002). The filtering criteria used to reduce the experimental noise in the Ramos array therefore also removes the most variable data between technical replicates.

The effect of the filtering criteria on positive and negative control data (section 3.2.2.2) was assessed. Filtering removes all data for all negative control elements on all arrays examined whereas the majority of the low-copy number control data is maintained (Figure 3.1 ID). Therefore, filtering ensures data that doesn’t represent genuine gene expression is removed.

Figure 3.11. Defining criteria for data fiitering (opposite).

A. Histogram of iog(2) expression ratios generated from Ramos mRNA iabeiled with CyS and Cy5 and hybridised to the same array. The expected ratio for each eiement is 0 so the measured expression ratio is equai to the error. The fitted normai distribution is shown in red. The dashed verticai iines border the region containing 95% of the data. These therefore represent the 95% confidence limits and all data beyond these limits is significantly different from a iog(2) ratio of 0 (p<0.05).

B. Scatter plot showing signal to background ratios for the CyS and Cy5 channels of the data shown in (A). Data in pink corresponds to those elements that lie beyond the 95% confidence levels shown in (A). These data have low signal to background ratios in both channels. 74% of these elements are removed by filtering for signal to background ratios below 1.5 and 2 in the CyS and Cy5 channels respectively. The grey box indicates data removed by these filtering criteria. Note that Cy5 signals are stronger than their CyS counterparts.

C. Standard deviation of triplicate BC-S technical replicates plotted against mean signal to background ratio. The data are ordered by increasing signal to background ratio and both variables are shown as a moving average with a window size of 10. Trend lines (red) are fitted with a power function. The grey boxes indicate the data removed by the fiitering criteria shown in (B). Weak signals in the Cy5 channel show greater variation. The SD of stronger signals tends to 0.2 in both channels.

D. Mean, SD and SE (standard error) of the signal to background ratio for the control elements named beneath in each of the two channels (n=40 arrays). The dashed red lines show the fiitering cut-offs for each channel. Data below these lines are removed by the fiitering criteria. The majority of the positive low copy-number control data (spiked TMV RNA at 6 copies/cell) are maintained after filtering whereas all negative control data are removed. Note that Cy5 signals are stronger than their CyS counterparts.

S D (log(2) median of ratios)

o p o o o o o o

o

N) o> oi a>

S D (log(2) median of ratios)

o o o o o o o o o - k N > C i ) i ^ m o > ' v i o o

o

£ Observations N * 8 issssmssmsi . a.v '-j o UJ Signal to background ratio

Q -

D H

Cy5 signal to background ratio

The signal to background ratio filtering criteria were applied to the entire dataset using ArrayAnalyser, a spreadsheet created in Excel (section 2.6.5). On average, filtering removes 8.3% of array elements (SD=3.0, n=40) in addition to the 42.8% that were flagged as not present. To confirm that the filtering criteria do reduce experimental variation, their effect on the correlation between the BC-3 replicates was examined. For all possible pair-wise comparisons, the filtered data are more correlated, hence less variable, than the unfiltered data (Figure 3.12). The lower the correlation between the unfiltered data, the greater the improvement after filtering. This demonstrates that high variability is associated with elements with low signal to background ratios. To confirm that this is not merely an effect of removing data, equal numbers of random genes were filtered from the data. This was found not to increase the correlation coefficient demonstrating that the filtering process selectively removes experimental variation.

The correlation between filtered log(2) expression ratios for all replicate experiments was calculated (Table 3.5). The average correlations for all technical and biological replicates are 0.96 (n=7) and 0.90 (n=9) respectively. The stronger correlation between technical replicates is to be expected considering the reduced experimental variation.

Technical replicates Correlation coefficient

HBL-6 HBL-6 rpt 0.97 JSC-1 JSC-1 rpt 0.95 BCBL-1 BCBL-1 rpt 0.97 RPMI-8226 RPMI-8226 rpt 0.96 BC-3 III III rpt 0.96 BC-3 III III rpt II 0.94 BC-3 III rpt III rpt II 0.98 Biological replicates BC-31 BC-3 II 0.84 BC-3 1 BC-3 III 0.89 BC-3 1 BC-3 III rpt 0.92 BC-3 1 BC-3 II rpt II 0.91 BC-3 II BC-3 III 0.89 BC-3 II BC-3 III rpt 0.92 BC-3 II BC-3 III rpt II 0.92 BCBL-1 BCBL-1 rpt 0.89 BCBL-1 rpt BCBL-1 II 0.92

Table 3.5. Pearson correlation coefficients between filtered data from experimental replicates.

B

Filtering ■ Before □ After □ Random Biological Y Technical ■S E §

Log(2) median of ratios

III rpt II

1

Figure 3.12. Filtering the data by signal to background ratio increases the correlation between experimental replicates.

A. Comparison of correlations between BC-3 experimental replicates before and after filtering for signal to background ratio. The filtering criteria are shown in Figure 3.11. An equal number of data were also filtered from each array at random as a control. In all cases, elements flagged as not found are removed. Experiments I, II and III represent BC-3 cell samples grown on 3 separate occasions (biological replicates). Ill, III rpt and III rpt II represent the RNA from III being labelled and hybridised on three separate occasions (technical replicates). As expected, technical replicates show less variance than biological replicates.

B. Scatter plots showing the correlation between filtered log(2) median of ratios for the 5 BC-3 experimental replicates. Data were filtered to remove all elements flagged as not found or with signal to background ratios below the set limits (Figure 3.11). The histograms show the distribution of log(2) ratios for each array.