• No se han encontrado resultados

4 CAPITULO IV: ANALISIS SITUACIONAL DE LA EMPRESA: AREA DE COMPRAS

4.3 ANALISIS DE CAPITAL HUMANO

4.3.1 Diagnóstico del problema para el usuario interno

4.3.1.5 Información Obtenida

To investigate how genes within a cluster are functionally related, and how cluster- ing helps distinguish different functional groups, we apply Gene Ontology enrichment analysis, introduced in Section 1.2.2 to our clustering outcome. In the process, GO terms that are likely to be over-represented in each of the clusters are identified. These GO terms are of interest because they represent the most common functions that the genes in a cluster share.

The probability that a given GO term is over-represented in a gene cluster can be calculated using the hypergeometric distribution [131]. The process proceeds as follows. First, for each cluster, all unique GO terms that are associated with the genes in the cluster are identified. Then for each term, two statistics are needed: the number of genes in the cluster that are annotated to a term and all known genes annotated to a term. With this information, the hypergeometric distribution can be applied to

identify GO terms that are associated to more genes in a cluster than by chance. The probability that a GO term appear not merely by chance is indicated by the resultant

p-values. Using the hypergeometric distribution, suppose there are j genes annotated

to a function in a total of G genes in the genome, the p-value of observing h or more genes in a cluster of size b annotated to this function is given by [131]

p[O h]=1 h−1 X i=0 b i ! Gb ji ! / G j ! , (2.27)

where O is the number of genes annotated with the function. The lower the p-value is, the more unlikely the null hypothesis that the terms appear by chance is true. In this way, the over-represented terms are found for each cluster.

We analyse the functional categories that are statistically over-represented in the clusters obtained by the proposed algorithms (PMDE clusters) and SplineCluster (SC clusters). For simplicity, we provide the enrichment analysis results in Table2.2 and Table 2.3 based on the Biological Process Ontology. As indicated by the lowest p- values in each cluster, all PMDE clusters have a statistically significant set of cell cycle related terms (all lowest p < 10−5), while for SC only six out of eight clusters have such significance. We observed that from the remaining two clusters of poorer quality (p = 6.35×10−3 and 2.51× 10−4), some genes involved in DNA replication

(SLD2,POL12, CDC45 etc. [126]) were combined into PMDE cluster 5, resulting in a tight cluster that has a significantly functional over-representation of DNA strand elongation (p = 5.04× 10−9) and other functions in DNA replication. Such a high quality cluster is essential for predicting unknown functions of genes such as YHR151C and YNL058C within the cluster.

Table 2.2: Over-represented GO terms by the proposed PMDE algorithm for the Y5 data set

Cluster GO ID GO term p-values Gene

counts

1 GO:0006118 electron transport 1.06E-06 5

1 GO:0006119 oxidative phosphorylation 5.82E-06 5

1 GO:0042775 ATP synthesis coupled electron transport 1.13E-05 4

2 GO:0006974 response to DNA damage stimulus 1.09E-06 12

2 GO:0045005 maintenance of fidelity during DNA replication 2.56E-06 5

2 GO:0000135 septin checkpoint 3.37E-06 3

3 GO:0006268 DNA unwinding during replication 3.31E-09 5

3 GO:0032392 DNA geometric change 3.49E-08 5

3 GO:0006270 DNA replication initiation 5.54E-07 5

4 GO:0005975 carbohydrate metabolic process 7.61E-06 8

4 GO:0006101 citrate metabolic process 0.000164 2

4 GO:0006091 generation of precursor metabolites and energy 0.000185 7

5 GO:0022616 DNA strand elongation 5.04E-09 8

5 GO:0051276 chromosome organization and biogenesis 1.73E-08 26

5 GO:0009719 response to endogenous stimulus 1.79E-08 17

6 GO:0007020 microtubule nucleation 1.05E-08 6

6 GO:0007017 microtubule-based process 2.92E-08 9

6 GO:0007059 chromosome segregation 1.09E-07 9

7 GO:0000070 mitotic sister chromatid segregation 3.84E-05 5 7 GO:0007001 chromosome organization and biogenesis 4.69E-05 13 7 GO:0016481 negative regulation of transcription 5.08E-05 7

8 GO:0000910 cytokinesis 2.14E-06 7

8 GO:0000278 mitotic cell cycle 1.22E-05 9

Table 2.3: Over-represented GO terms by the SplineCluster Algorithm for the Y5 data set

Cluster GO ID GO term p-values Gene

counts 1 GO:0006268 DNA unwinding during replication 7.38E-05 3 1 GO:0006267 pre-replicative complex formation 9.54E-05 3 1 GO:0050790 regulation of catalytic activity 0.000178 4

2 GO:0006260 DNA replication 9.51E-08 10

2 GO:0006310 DNA recombination 9.44E-07 9

2 GO:0006974 response to DNA damage stimulus 9.14E-06 11

3 GO:0022402 cell cycle process 1.63E-06 16

3 GO:0000278 mitotic cell cycle 3.14E-05 11

3 GO:0000074 regulation of progression through cell cycle 3.55E-05 9

4 GO:0022616 DNA strand elongation 1.59E-10 9

4 GO:0006273 lagging strand elongation 5.73E-09 7

4 GO:0006261 DNA-dependent DNA replication 1.35E-07 9

5 GO:0007165 signal transduction 0.006354 4

5 GO:0007154 cell communication 0.010349 4

5 GO:0030541 plasmid partitioning 0.011825 1

6 GO:0009262 deoxyribonucleotide metabolic process 0.000251 2

6 GO:0006259 DNA metabolic process 0.000476 7

6 GO:0006334 nucleosome assembly 0.000587 2

7 GO:0007017 microtubule-based process 9.30E-06 5

7 GO:0007020 microtubule nucleation 4.25E-05 3

7 GO:0009225 nucleotide-sugar metabolic process 9.01E-05 2

8 GO:0007120 axial bud site selection 1.14E-06 5

8 GO:0000819 sister chromatid segregation 1.66E-05 6

In addition, good agreement was found between known biological functions and gene clusters found by the proposed algorithm. Many clusters in the PMDE parti- tion are significantly enriched with distinctive cell cycle relevant functions, indicat- ing a good separation of functional clusters. For example, cluster 5 has an over- representation of DNA strand elongation (P < 10−8), and cluster 6 is enriched with microtubule nucleation and chromosome segregation (P < 10−7) which is crucial to

chromosome division. Consistent with their biological functions, two clusters involv- ing genes expressed in M and earlier phases reveal patterns of slightly different peak time: cluster 3 contains an over-representation of genes involved in DNA unwinding during replication (P < 10−8) and DNA geometric change (P < 10−7); and cluster 8

is enriched with cytokinesis that is known to occur after replication and segregation of cellular components. The two gene clusters are both biologically meaningful and statistically sound.

Documento similar