V. Notas sobre la trascripción de nombres y términos árabes
1. ANTECEDENTES POLÍTICOS DEL EGIPTO CONTEMPORÁNEO: DE MU AMMAD AL A LA GUERRA ÁRABE-ISRAELÍ DE 1948 MU AMMAD AL A LA GUERRA ÁRABE-ISRAELÍ DE
1.3. Del Egipto liberal al proyecto unitario
1.3.3. Los Hermanos Musulmanes y el desarrollo de otras formaciones extraparlamentarias
Initial studies of relatedness focused on measuring similarity between words. This led to the usage of taxonomies and conceptual networks, where the path between two terms was employed as a starting model of similarity [Rada et al., 1989]. Two terms are considered semantically similar if a short path (i.e. the number of intermediate concepts from one to another) between them exists. As described earlier, this provides the opposite value to semantic distance, which obeys the principle of path exploration. Therefore, a pair of words semantically similar (i.e. with a high score) is separated by a short path. This path is formed by subsumption relations, such as hyponymy (e.g. big cat-lion) and hypernymy (e.g. lion- animal). In the case of two terms referring to the same concepts (i.e. synonymy), the path is 0. However, the measure of Rada et al. [1989] used a domain-specific taxonomy, as there was no general taxonomy that this technique could be applied over. The construction of WordNet [Fellbaum, 1998] helped to overcome this, by availing a general taxonomy of concepts. Due to the characteristics of WordNet, it became the standard general taxonomy and a benchmark for measuring semantic similarity and, in some cases, extend this to semantic relatedness.
WordNet-based measures can be subdivided according to the features employed for mea- suring semantic similarity. The first subset comprises measures that use hierarchical paths provided by WordNet, for instance hypernyms and hyponyms. Also in this subset, the mea- sure proposed by Hirst and St-Onge [1998] takes into consideration not only WordNet’s
taxonomy, but also lexical relations available in WordNet. The second subset contains mea- sures that, in addition to paths, are modelled after a concept retrieved from information theory, called information content. This refers to the probability of finding a certain term within a set of documents or corpus. We describe these in some detail below.
WordNet taxonomy-based measures. This group contains the measures of Wu and
Palmer [1994] (wup), Leacock et al. [1998] (lch), Lesk [1986] and its adaptation to WordNet [Banerjee and Pedersen, 2002] (lesk ) and Hirst and St-Onge [1998] (hso). The first three measures account for semantic similarity, while the last one is used to calculate semantic relatedness between words.
The first two measures employ a concept known as the least common subsumer (lcs), which is the first common concept that the two concepts of interest have in common. For instance, concepts lion and tiger (as animals) are both direct children of big cat, which is thus their least common subsumer concept.
Wu and Palmer (wup). Wu and Palmer [1994] introduced a measure for conceptual simi- larity, which takes into consideration the depth of the involved hierarchy, and scale this to the elements involved, including the lcs. The formula employed is shown in Equation 2.1, where path(x, y) represents the shortest hierarchical path between concepts x and y and depth(lcs(x, y)) represents the depth of the lcs of the pair x, y in the taxonomy.
simwup(c1, c2) =
2 × depth(lcs(c1, c2))
path(c1, lcs(c1, c2)) + path(c2, lcs(c1, c2)) + 2 × depth(lcs(c1, c2))
(2.1) Leacock and Chodorow (lch). This measure [Leacock et al., 1998] presents a normalised calcu- lation of the path length against the maximum depth of the taxonomy, in this case WordNet. As in wup, this measure detects concepts similarity, which is determined by Equation 2.2, where c represents the maximum depth of the taxonomy. In WordNet, c ≈ 18 for nouns.
simlch(c1, c2) = − log
path(c1, c2)
2 × max
c∈W ordN etdepth(c)
(2.2)
Adapted Lesk algorithm (lesk). One of the early approaches to word sense disambiguation was proposed by Lesk [1986]. Rather than focusing on taxonomies, Lesk’s measure employs the definitions accompanying a concept (i.e. glosses in WordNet). For this measure, the best sense to disambiguate a word is by detecting one sharing common words with a concept definition [Lesk, 1986]. While the original algorithm employed physical dictionaries to extract definitions, e.g. the Oxford Advanced Learner, Banerjee and Pedersen [2002] adapted the algorithm to use WordNet glosses. This adapted algorithm is shown in Equation 2.3, where
Word form Gloss
lion large gregarious predatory feline of Africa and
India having a tawny coat with a shaggy mane in the male
tiger large feline of forests in most of Asia having a
tawny coat with black stripes; endangered Overlaps (lemmatised) 4 (large, feline of, have a tawny coat with, in)
simlesk(lion, tiger) 1 + 4 + 25 + 1 = 31
Table 2.1: An example of the calculations of gloss overlap for the lesk measure, using the glosses for lion and tiger (as animals).
G is defined as the set of overlapping words between glosses and og(c1, c2) represents a word
overlap between the glosses of c1 and c2. See Table 2.1 for an example using the glosses of
lion and tiger (both as animals). In the adaptation of Banerjee and Pedersen [2002], not only did they consider synset gloss-gloss comparisons, but also other similar glosses related via taxonomy or other lexical relations, such as holonymy and meronymy.
simlesk(c1, c2) =
X
g∈G
length(og(c1, c2))2 (2.3)
Hirst and St-Onge (hso). The measure proposed by Hirst and St-Onge [1998] is the only WordNet-based measure to account for relatedness instead of just similarity, as it takes into consideration three levels of strength between two words and more than just hierarchical relations between synsets. Two concepts are said to have an extra-strong relation if they are synonyms; that is, they can be found within the same synset. Extra-strong terms are scored with a large value, which generally doubles the value for strong relations. Otherwise, they hold a strong relation if they share a horizontal path (i.e. any lexical relationship such as holonym, meronym or antonym) between them. Finally, two concepts share a regular relation if there is an existing allowable path between them, using the formula described in Equation 2.4, where C and k are two constants defined to limit the length of the longest path between concepts. These values alleviate computability of paths, as greater values of C imply the calculation of more possible paths between concepts. Also in the equation, turns(c1, c2) is a
boolean value that shows whether a turn (a change from traversing hypernyms to hyponyms or contrariwise) occurs in the available path.
Paths between concepts are considered available only if they have at most one turn and one horizontal link between them [Hirst and St-Onge, 1998]. Because this measure considers more than just hierarchical paths between concepts, it is considered to measure semantic relatedness [Budanitsky and Hirst, 2006].
One limitation of taxonomy-based measures is their heavy dependence on the com- pleteness and correctness of the taxonomy employed to measure similarity [S´anchez et al., 2009]. Although WordNet features other relationships between synsets such as holonymy and meronymy, these are either not considered by the measures described above or are insuffi- cient, as for the measure hso. For instance, the experimental setting of Budanitsky and Hirst [2006] demonstrated that these measures show good correlation with humans using the sim- ilarity dataset proposed by Rubenstein and Goodenough [1965] (which is described later in this section). However, another dataset (e.g. WordNet-353 or the datasets constructed ahead in this chapter) demonstrates that these measures are insufficient when tested on relatedness.
WordNet and Information Content-based measures. It has been suggested that the
number of relationships available between two concepts cannot be measured, and the type of relationships cannot be defined by a single, general frame such as WordNet [Budanitsky and Hirst, 2006; Grieser, 2011]. However, hidden associations represent around 60% of the relationships that humans consider when determining semantic relatedness [Morris and Hirst, 2006]. This makes the explicit inclusion of other semantic relationships between concepts a very hard problem for ontological representation. However, some of these relationships can be found in plain texts. This is connected to the syntagmatic perspective discussed above, meaning that the co-occurrence of words in several documents can be used to deem relatedness between two involved terms.
Word co-occurrence has been used for measures for detecting semantic similarity. Such measures combine the ontological structure of WordNet with the information content of two concepts. These measures take into consideration the probability of a term occurring in a controlled corpus, such as the Brown National Corpus. Additionally, these measures can be considered hybrids, due to combining corpora information along with taxonomical features extracted from WordNet. These measures, along with those using Wikipedia and the Web (described ahead) are commonly referred to as measures of distributional similarity, as they assign values using word distribution statistics [Agirre et al., 2009]. Examples of these measuress include: Resnik [1995], Jiang and Conrath [1997], and Lin [1998].
Resnik (res). Resnik [1995] proposed a measure based on the probability of finding the desired concepts in the same document, and extended this to a corpus. However, this measure takes into consideration the occurrence of the least common subsumer between two words. This
was motivated by the fact that two words are similar to “the extent to which they share information in common” [Resnik, 1995], and this commonality can be found via the least common subsumer between them. He proposed Equation 2.5, where p(lcs(c1, c2)) represents
the probability of finding the least common subsumer in a document.
simres(c1, c2) = − log p(lcs(c1, c2) (2.5)
According to this, the occurrence of Big cat considers the occurrence of its children, e.g. Lion, thus p(Big cat) ≥ p(Lion). This probability is therefore calculated as,
p(c) = X
w∈W (c)
count(w)
N (2.6)
where W (c) is the set of terms (nouns) that subsume c and N is the total number of terms (in particular, nouns) in the corpus.
Jiang and Conrath(jcn). A noticeable disadvantage of the measure proposed by Resnik is the fact that different terms with the same least common subsumer obtain the same score. For this reason, Jiang and Conrath [1997] reanalysed the calculation of probabilities. They proposed Equation 2.7 to determine similarity between words.
simjcn(c1, c2) =
1
2 log p(lcs(c1, c2)) − (log p(c1) + log p(c2))
(2.7) Lin (lin). Lin [1998] defined a measure that could be applied regardless of the knowledge representation employed. He stated that the similarity between two terms is measured by contrasting the amount of information required to state their commonality and the informa- tion required to describe them entirely [Lin, 1998]. He adapted this to WordNet’s taxonomy, as follows:
simlin(c1, c2) =
2 × log p(lcs(c1, c2))
log p(c1) + log p(c2)
(2.8) In a comparison made by Budanitsky and Hirst [2001; 2006], measures that combined hierarchical and corpus-based properties as those described above outperformed measures that only considered WordNet with respect to the datasets of Rubenstein and Goodenough [1965] and Miller and Charles [1991]. More recently, Cramer et al. [2012] made a comparison of semantic relatedness measures for the task of lexical chaining, including WordNet-based measures. The authors reported that measures limited by both coverage and taxonomy do not correlate as high as distributional-based measures, such as the ones described in the following sections.