• No se han encontrado resultados

9. DISCUSIÓN

9.2. CARACTERÍSTICAS DEL ESTUDIO

9.2.5 Índices de gravedad

3· " simBP max(simBP ) !2 + simM F max(simMF ) !2 + simCC max(simCC) !2# (2.17) where max(sim) is the maximum possible similarity for an aspect. The ad- vantage of using f unSim is that by squaring the contribution made by each GO aspect, this contribution is made even stronger for high scores and even weaker for low scores.

2.4

Evaluating semantic and functional similarity

In this thesis, a distinction is made between works that evaluate semantic and func- tional similarity approaches and works that make use of these approaches to answer a biological question. The former will be discussed in this section, while the latter will be covered in Section 2.5. In particular, not all semantic and functional similarity approaches described in this chapter were rigorously evaluated against other forms of biological similarity, so not all of them will be covered here. Only the examples of evaluation most relevant to the work presented in this thesis will be discussed.

One key issue with the evaluation of semantic similarity approaches is that there is no benchmark for measuring functional similarity. There are a number of other kinds of biological similarity against which functional similarity can be evaluated, although they all have both advantages and disadvantages. One of the most com- monly used approaches is sequence similarity. Indeed, it has been demonstrated that in many cases, sequence similarity also implies functional similarity. There are however also a not insignificant number of examples of convergent evolution where functional homologues (i.e. gene products with the same or highly similar function) have little or no sequence similarity, as well as examples of divergent evolution, where sequence homologues (i.e. gene products with high sequence similarity, e.g. from gene duplications) have little or no functional similarity.

Another popular evaluation approach is to use gene expression similarity as a benchmark for functional similarity. As with sequence similarity, similarity in gene expression in many cases implies functional similarity but equally, there are situa- tions where functionally similar gene products have expression profiles that bear no resemblance to each other.

The same problem applies to almost all approaches for evaluating functional similarity, with the possible exception of human judgement. This approach however

2.4 Evaluating semantic and functional similarity

has the double disadvantage of lacking objectivity as it requires expert knowledge and of being limited to small datasets as manually evaluating hundreds or even thousands of gene product pairs is not practical. All other approaches applied to date reflect functional similarity up to a point but all include a significant element of false positive and false negative generation. This is why there is a need for functional similarity in the first place. If another form of biological similarity perfectly matched functional similarity, the latter would not be of interest.

A further difficulty in functional similarity evaluation is that it is very hard to make comparisons across studies. Information content-based measures in particular are dependent on the corpus used to calculate the results. While non-IC measures do not suffer from this drawback, they are still susceptible to changes in the ontology, so two studies using exactly the same dataset and parameters may differ in their results if based on two different GO releases. For this reason, it is essential to state exact details of all parameters used, including GO release, included or excluded evidence codes, dataset, measures and any other study-specific variables.

In their assessment, Lord et al. [2003a] validated the semantic similarity mea- sures for Resnik, Lin and Jiang, with AVG functional similarity, against sequence similarity scores. They found a very significant degree of correlation between the two types of similarity, particularly for the molecular function aspect. They used the SWISS-PROT-Human database for the estimation of concept frequencies. GO’s three aspects were considered individually and all ontological edges between concepts were treated as “IS A” links. The authors found that none of the three methods significantly outperformed the others. There were differences in individual perfor- mances, e.g. for the molecular function aspect, the Resnik approach obtained the highest correlation with sequence similarity, but the approach performed worst for the other two ontological aspects, while the Jiang approach scored the lowest cor- relation for MF. It was also found that the different aspects of GO are largely independent of each other.

Other evaluation studies using sequence similarity as a benchmark include Pesquita et al. [2008] and Mistry and Pavlidis [2008]. Both included the original measures studied by Lord, but Pesquita et al. added MAX, BMA and GraSM with BMA functional similarity, as well as the graph-based approaches simGIC and simUI, while Mistry and Pavlidis also added MAX functional similarity, and kappa, cosine, weighted cosine, Term Overlap (TO) and NTO similarity measures. Corpora were UniProt [The UniProt Consortium, 2008] and NCBI Gene [Pruitt et al., 2006] for mouse genes, respectively. Pesquita’s study found that simGIC performed highest overall, with Resnik’s measure the best out of the IC-based measures. Mistry’s

2.4 Evaluating semantic and functional similarity

work found the best correlation between sequence and functional similarity for TO and Resnik with MAX. The study also included a comparison of different measures against each other in which TO and Resnik/MAX also had the highest correlation with each other.

Wang et al. [2004] investigated the relationship between semantic similarity and gene expression using the same approaches as Lord et al. The data for this work was derived from microarray experiments, with the Saccharomyces Genome Database [Cherry et al., 1998] used as the corpus to estimate term frequencies. Functional similarity values were averaged across five expression correlation intervals. Signifi- cant correlation was found between GO-based similarity and gene expression for all three approaches and for all three aspects of the GO, but as for Lord, none of the approaches outperformed the others.

Similar experiments, using the same semantic similarity approaches but MAX functional similarity and also considering correlation with gene expression, were carried out by Sevilla et al. [2005]. This group used data from mouse gene expression experiments. Unlike any of the previous groups, they concluded that the Resnik approach significantly outperformed both Lin and Jiang. This interpretation was given with the justification that the latter two are relative measures, which may give misleading results if the gene product annotations are too general and do not exploit the full depth of knowledge available in the GO.

In both of these, as well as any subsequent gene expression-based studies, expres- sion correlation values were averaged across intervals of semantic similarity. Only Sevilla et al. [2005] commented on the pair-by-pair results, which were found to show poor correlation.

Couto et al. [2005] investigated the relationship between semantic similarity and protein families, using UniProt as corpus and the same semantic similarity measures as Lord, but BMA functional similarity, as well as adding their own GraSM ancestor choice. They found a good degree of correlation, with Jiang’s method performing strongest in their measurements, and Lin’s approach mostly outperforming Resnik. They also found that GraSM outperformed the single ancestor approach.

Schlicker et al. [2006] included both sequence similarity and family similarity in their evaluation and concluded that their simRel measure outperforms Resnik and

Lin.

Other forms of similarity used to date include protein-protein interactions and clustering with human judgement. The latter was used by Wang et al. [2007] to validate their hybrid semantic similarity approach. They used a set of manually cu- rated pathways from the SGD and based on each pathway, stipulated which protein