• No se han encontrado resultados

SISTEMAS GENERALES DE PREDICCIÓN DE MORTALIDAD EN UCI

5. ÍNDICES PRONÓSTICOS (SCORES DE GRAVEDAD)

5.1. SISTEMAS GENERALES DE PREDICCIÓN DE MORTALIDAD EN UCI

As their name implies, edge-based semantic similarity approaches quantify the sim- ilarity between two GO terms based on the edges in the graph path from one term to the other. They can be subdivided further into approaches that consider the distance between two terms and approaches that consider the path shared by two terms. Most edge-based semantic similarity approaches applied in the GO use ei- ther the terms’ shared path to evaluate similarity or a combination of shared path

3

In the original paper, precision is defined as p. This was changed here to avoid confusion with the probability of occurrence p defined in Equation 2.2

4

N was used in the original paper but replaced here in order to avoid confusion with N , the total number of terms in the corpus, in Equation 2.2

2.2 Similarity between GO terms

and distance, except for the approach by Jakonien˙e et al. [2006], which uses only distances.

Rada et al. [1989] proposed the first distance-based semantic similarity measure. This simplest form of edge-based similarity counts the edges of the path between two terms. If there is more than one path, the average of the paths is taken. This kind of approach is not appropriate for use in the GO as it assumes that all edges carry the same weight, i.e. represent the same difference in meaning. This is not the case in the GO where some edges connect terms that have a very similar meaning whereas others are far more loosely related. The relationship “Golgi apparatus” (GO:0005794) IS A “intracellular membrane-bounded organelle” (GO:0043231) is intuitively closer than the relationship “membrane” (GO:0016020) IS A “cell part” (GO:0044464), even though the two pairs of concepts are linked by the same type of edge.

In addition, edge-counting approaches require an evenly distributed hierarchy, which is also not the case in the GO. As the GO evolves with current research trends, some areas are far deeper (longer paths from root to leaf nodes) than oth- ers even though leaf nodes with shorter paths to the root can be equally specific in their meaning. Maximum root to leaf distance varies from 2 to 15 edges in the BP ontology and from 2 to 12 edges in the CC and MF ontologies. For example, in the 2011-03 release of GO, the leaf term “nuclear outer membrane organization” (GO:0071764) has a maximum depth of 5, while the leaf term “nuclear inner mem- brane organization” (GO:0071765) has a maximum depth of 9.

The first edge-based approach used in the context of the GO was developed by Cheng et al. [2004]. They used the shared path, counting the edges, from the LCA of two terms to the root of the ontology. They addressed the issue of increasing specificity for deeper terms by weighting each edge with a weighting factor based on the edge’s depth, as well as addressing the varying levels of depth of different parts of the ontology by defining a normalising factor based on the local depth of the ontology. The similarity between two terms was then defined as the sum of the weighted edges of the longest path between their common ancestor and the root, multiplied by the normalisation factor.

Yu et al. [2005] used two edge-based measures, referred to as “taxonomy similar- ity”, in their work on gene function prediction, namely PK-TS proposed by Pekar and Staab [2002] and SB-TS proposed by the authors themselves and inspired by PK-TS. The former of these two measures was originally developed in a linguistics context and calculates the similarity between two terms c1 and c2 by dividing the

2.2 Similarity between GO terms

tances between c1 and c, c2 and c, and c and the root, again using the shortest path

in each case. In their interpretation of this approach, Yu et al. [2005] changed the distances used to the longest path. They also proposed their own approach, which does not take into account a common ancestor, but divides the distance of c1 to the

root by the distance of c2 to the root, if c1 is above c2 in the hierarchy, or vice versa

if c2 is above c1. If c1 and c2 are not part of the same branch of the ontology, their

similarity is set to 0.

“Relative Specificity Similarity” (RSS) is a multi-component semantic similarity approach by Wu et al. [2006], considering the distance between the common ancestor of two terms and the root, the distances between the terms and their leaf node de- scendants and the distance between the terms and their common ancestor. The RSS approach could ostensibly be classed as hybrid, rather than an edge-based approach, as it claims to incorporate the node-based approach by Wu et al. [2005], mentioned in Section 2.2.1. However, where the original node-based approach considers the maximum number of terms between the LCA of two terms and the root, the LCA to root distance component of RSS, called α, subtracts 1 from the number of terms, which equates to counting the maximum number of edges between the LCA and the root. RSS has two further components, β and γ. β represents the largest shortest path (counting edges) between term c1 and all its descendant leaf nodes and term c2

and all its descendant leaf nodes. γ is the sum of the distances of each query term to the LCA, which is effectively the shortest distance between the two terms c1 and c2.

The three components are then combined into the RSS formula, which also includes the maximum distance from the ontology root to the deepest leaf node.

Jakonien˙e et al. [2006] proposed a measure based the number of edges between two terms. They defined three types of paths: u, the number of “IS A” edges needed to go up in the hierarchy, d, the number of “IS A” edges needed to go down, and o, the number of edges of other types. The three paths are weighted by division with their respective weighting factor ppathtype. The three values are combined in an

exponential function.

For their CDGMiner tool, Yuan and Zhou [2008] defined a semantic similarity measure called go2go, which defines the semantic similarity between two GO terms as the multiplicative inverse of 1 plus the shortest path between the two terms. In their first paper, the authors do not specify whether the distance between two terms is obtained by counting nodes or edges. Yuan et al. [2010] then add that the distance between two directly connected terms is 1, suggesting that edges rather than nodes are counted. This assumption is also supported by the fact that 1 is added to the shortest path, as a path of n edges connects n + 1 nodes. While the purpose of both

2.2 Similarity between GO terms

papers is the identification of disease genes from functional information and the calculation of semantic similarity between GO terms is the same on both occasions, the remainder of the overall approaches detailed in the two papers present a number differences and result in the implementation of two distinct tools, rather than the second paper presenting an optimisation of the first approach.