• No se han encontrado resultados

Given the presence of multi-membership genes and the hypothesis established in the previous section, correlation analysis seems an appealing approach to further study the behaviour of such genes. Since each gene is likely to have a particular contribution to the activity of a pathway, that may be relatively steady, we can examine that contribution as a percentage of the total activity of the pathway.

3: Central hypothesis

93

We estimate the correlation between the expressions in terms of percentage for all single-membership genes in a pathway and for all multi-membership genes in the same pathway (Pavlidis, Payne & Swift 2008). Naturally, observing higher correlation for single-membership genes as they only contribute to the activity of the pathway in question and thus exhibit more consistent expression may provide evidence supporting our rationale. As previously discussed, unlike single- membership genes, multi-membership genes can participate in the functionality of any combination of the pathways they are members of, at any particular time. Thus, unlike single-membership genes, multi-membership genes‘ intensity values, as extracted from a microarray slide, represent a net effect. The biological system may require activation of certain pathways and regulate the production of a protein part of their network in a way that its quantity increases. At the same time it may require deactivation of other pathways in which the same protein participates. The resulting balance may affect the expression observed on the microarray leading to less consistent readings for groups of proteins part of a biochemical pathway, encoded by multi-membership genes, when each pathway is examined in isolation from the rest. For example in a pathway consisting of genes A, B and C that say contribute 20%, 50% and 30% to the overall pathway activity, we can add the log2 ratios for genes A,

B and C to get an estimate of that total activity of that path, and then examine the percentage of contribution for each gene in various microarray experiments. Ideally, we would want to obtain values close to the percentages above in each experiment where the pathway is activated. The obtained values should be more consistent in the case of single-membership genes than in the case of multi-membership genes.

To examine this we initially identified 19 experiments (GSM99081 to 83, GSM99108 to 112, and GSM99171 and GSM99172) on Escherichia coli, from microarray data available at Gene Expression omnibus (GEO), platform GPL3503 that contain a large number of expressed Urea Cycle genes (01/09/2008). The KEGG Urea Cycle pathway is a good candidate for our analytical approach as it consists of 16 single-membership and 12 multi-membership genes, reasonable numbers to allow meaningful comparison. We divide the intensities, separately for the group of single- and the group of multi-membership genes, per experiment by their sum, to obtain a

3: Central hypothesis

94

measure of the contribution of each gene to the behaviour of the pathway. We then compare the correlation between the obtained contribution values of the 12 multi- membership genes and the 16 single-membership genes, throughout the 19 experiments. For both cases we acquire a set of 171 correlation values, and perform a two sample t-test which reveals that the values are significantly different with a p- value of 1.3×10-12. Furthermore, in the case of single-membership genes the correlation values are higher with 86.5% of the values being above the level of significant correlation at p=1%. In contrast, for the multi-membership genes only 41.5% of the values exceed the threshold of significance at 1%. The assumption that multi-membership genes expression is the net effect of their contribution to their constituent pathways is in agreement with these findings. Single-membership genes apparently show more consistent behaviour as they only contribute to the functionality of the KEGG Urea Cycle pathway.

As KEGG is constantly updated it currently holds the Urea Cycle path in a larger pathway termed Arginine and Proline metabolism (01/02/2011) that contains more genes subsequently identified and added to the updated KEGG. The pathway consists of a total of 43 genes, 21 of which are unique members of the pathway in question, while 22 are multi-membership genes. We performed an analysis of the correlation of expression values for these new subsets of genes based on GEO platform GPL3503, consisting of 140 experiments on Escherichia coli.

We observed that the correlations of expression between each couple of the 140 experiments were higher in the case of single-membership genes than in the case of multi-membership genes. A two sample t-test revealed that the correlation values for these two subsets of genes in the KEGG Arginine and Proline metabolism pathway where significantly different, with a p-value of 2.4×10-12.

Figure 3.20 graphically represents the correlation values for single- and multi- membership genes, for all the combinations of the 140 experiments by two. Interestingly, correlations for multi-membership genes have a tendency for significant negative values, which is in agreement with the observation that they show contradictive behaviour. We may ascribe this behaviour to their contribution to pathways that may have opposing activity under certain experimental conditions. In

3: Central hypothesis

95

particular, the percentage of correlations for single-membership genes below -0.5 is only about 6% as opposed to close to 13% for multi-membership genes.

Figure 3.20 Correlation between expression values. The values correspond to

Escherichia coli gene couples in the KEGG Arginine and Proline metabolism

pathway in a subset of 140 experiments (GPL3503 from GEO).

We performed a similar analysis of the Oxidative phosphorylation KEGG pathway, based on experiments where at least 50% of the genes in the path show differential expression for a threshold of one standard deviation of intensity value. This criterion is satisfied by 28 experiments, allowing for 378 comparisons, where we observed a mean correlation of 0.30 for single-membership genes, as opposed to only 0.07 for multi-membership genes, thus more than 4 times lower value. A two sample t-test showed that the correlation values for all experiments were significantly different with a p-value of 1.3×10-5. In about 12% of the cases for the first group of genes the correlation was significant at p-value of 0.01, while this was true for only about 4% of the latter group.

We performed the same comparative analysis for all 140 experiments without applying a threshold of expressed genes in the pathway. Naturally, we obtained lower correlation values, but the pattern was the same with single-membership genes

3: Central hypothesis

96

showing an average correlation of 0.050 as opposed to only 0.001 for multi- membership genes. In this case a two sample t-test revealed that the 9730 correlation values for each group are significantly different with p-value of 1.6×10-8.