La consolidación residencial en Quito
6.1. Patrones de ocupación del territorio
6.1.3. Los noventas
In its simplest form, a compatibility function comp(·, ·, ·) of an antecedent candidate noun ai and the verb vj governing a pronoun in the grammatical slot gfk is their first-
order co-occurrence count:
comp(vj, ai, gfk) = |vj, ai, gfk| (6.2)
Selecting the most suited antecedent from a set of candidates simply involves determining the antecedent candidate ai with the highest first-order co-occurrence count with the
verb governing the pronoun among all candidates a:
ante = arg max
Chapter 6. Semantics for pronoun resolution 139
However, the count does not measure associativity of the verb and its argument w.r.t the overall occurrence distribution of each. That is, we might see a verb-argument tuple with a high count and conclude that the two are strongly associated. However, the high count can stem from the overall high occurrence of both argument and verb. Thus, the raw count of a verb-argument tuple needs to be normalized in relation to the individual, overall occurrence counts of verb and argument.
There are association measures that perform such (point-wise) normalizations, e.g. DICE coefficient, TF IDF, Jaccard coefficient, and Pointwise Mutual Information (PMI), among others.5 In the realm of the vector space models, the Positive Pointwise Mu-
tual Information measure (PPMI) is a popular choice (Turney et al., 2010). PMI of two words x and y is defined as the logarithm of the joint probability divided by the product of the marginal probabilities. It thus takes into account the co-occurrences of two words and the individual occurrences of the words.
P M I(x; y) = log p(x, y)
p(x)p(y) (6.4)
PPMI replaces all negative values in a word co-occurrence matrix with zeros and has been shown to outperform PMI and other association measures on several tasks (Bullinaria and Levy, 2007).
The main issue for our purposes is that PMI can have negative values for seen, but loosely associated word pairs. One option would be to use PPMI and replace all negative values with zero. However, we want to reward seen verb-argument pairs that have a negative PMI value over non-seen pairs. Non-seen pairs are not represented by PMI, and we would have to assign a dummy value for these verb-argument combinations, i.e. zero. However, zeroing out negative values and assigning zero to unseen pairs would obscure whether a zero-valued pair had a negative PMI or was not seen at all. Thus, we would not be able to favor an antecedent candidate with a negative PMI over a candidate that has never been seen as an argument of the verb governing the pronoun. Therefore, we aim for a compatibility measure with a range of 0 < comp(·, ·, ·) ≤ 1 for all seen pairs and zero for unseen pairs.
All the association measures mentioned above share the characteristic that they nor- malize the first-order co-occurrence count of the word pair by the individual occurrence counts of the words. We aim for a simple association measure and a simple scaling procedure along this line. To do so, we model selectional preferences in a bidirectional manner. We model how strong the verb’s selection of the argument is and how strong
Chapter 6. Semantics for pronoun resolution 140
the argument’s selection of the verb is given a specific grammatical relation. Note that these two associations are asymmetrical, since a verb and an argument typically have a different overall count of first-order co-occurrences.
We formulate this bidirectional association in relation to conditional probabilities, i.e. the likelihood of seeing the argument ai given the verb vj and the grammatical slot gfk,
and the likelihood of seeing the verb given the argument and the grammatical slot, but do not actually model a probability distribution. To derive a compatibility score, we simply take the arithmetic mean of the bidirectional scores:
comp(ai, vj, gfk) =
1
2 ∗ score(ai|vj, gfk) + score(vj|ai, gfk)
(6.5)
where we calculate the scores by:
score(ai|vj, gfk) = |(ai, vj, gfk)| | max(a, vj, gfk)| score(vj|ai, gfk) = |(ai, vj, gfk)| | max(ai, v, gfk)| (6.6)
That is, we normalize the first-order co-occurrence counts, i.e. |(ai, vj, gfk)|, by division
by the highest count, i.e. | max(a, vj, gfk)|, instead of taking the sum in the denominator
(which would yield a probability distribution). This is a common replacement in e.g. TF IDF calculation, where in-document term frequency of each word is divided by the count of the most frequent word instead of the overall word count. Doing so decreases the denominator and thus yields higher scores overall.
Consider the following example. We want to score the triple (Hund, bellen, subj), i.e. the compatibility between Hund (dog) and bellen (bark) given the grammatical relation subject. Figure 6.1 shows an excerpt of the co-occurrence graph that depicts first-order co-occurrence for Hund and bellen given the grammatical relation subject. The nodes in the graph denote words, i.e. nouns and verbs, and edges signify grammatical relations of seen co-occurrences and their counts.
The first-order co-occurrence count is |(Hund, bellen, subj)| = 261. The maximal count of Hund as a subject is | max(Hund, v, subj)| = 440 (for v = kommen (to come))
Chapter 6. Semantics for pronoun resolution 141 stehen gehen sterben fressen liegen reagieren bringen gehören tragen brauchen sehen beißen laufen Fuchs Stimme finden Tier tun lassen leben Hund Seele zeigen Etage gelten bellen kommen machen lernen bleiben Befehl bekommen 167 173 112 154 198 131 111 187 146 251 122 259 271 5 7 150 4 128 141 166 204 129 261 440 300 269 214 253 5 15 3
Figure 6.1: Excerpt of the co-occurrence graph, showing first-order co-occurrence subject relations for the noun Hund and the verb bellen. Numbers on edges denote
absolute counts.
and the count of the most frequent subject of bellen is | max(a, bellen, subj) = 261| (i.e. a = Hund). We then get a compibility score of 12 ∗ (261261 +261440) = 0.8.6
This compatibility score has the advantage over (P)PMI that it ranges from 0 to 1, i.e. low counts for seen combinations are still above 0 and the maximum score is 1 (for a hypothetical combination of a noun and a verb that exclusively occur together), and we can assign zero to unseen verb-argument combinations.