Movilidad y transporte en Quito
4.1. El transporte público
German pronouns can be used to refer to inanimate entities, as opposed to e.g. English, where pronouns with gender refer to animate entities (he, she, his, her ). This introduces an additional layer of ambiguity, and the set of candidates that need be considered for German pronouns becomes larger.
In Tuggener and Klenner (2014), we introduced two features to account for this ambi- guity. We added a feature conjunction of animacy and gender, and the feature named
Chapter 5. Empirical validation of our entity-mention model 101
entity class to the set. Here, we slightly modify the animacy feature by adding num- ber to the conjunction. As the conjunctions yields sparse instantiations, we show the weights for those instantiations for personal pronouns which occur at least 50 times in table 5.11.
Weight Animacy Gender Number 1.95 ANIM FEM SG 1.50 ANIM MASC SG 1.19 ANIM * PL 0.97 INANIM * * 0.87 ANIM * SG 0.80 INANIM * PL 0.43 INANIM FEM SG 0.28 INANIM MASC SG
Table 5.11: Weights for the most frequent instantiations of the feature conjunction on animacy, gender, and number for antecedents of personal pronouns. ’*’ indicates
underspecified values.
The weight distribution shows that candidates with singular number and specific gen- der which are animate ([ANIM, FEM, SG]: 1.95, [ANIM, MASC, SG]: 1.50) are more likely to be pronominalized than their inanimate counterparts ([INANIM, FEM, SG]: 0.43, [INANIM, MASC, SG]: 0.28). This weight distribution can be seen as a reflec- tion of the topics of the underlying discourse in the training data. Newspaper articles often report on person entities, such as political figures, which therefore are likely to be pronominalized. We exploit this bias by featurizing it and thereby addressing the animacy ambiguity of German pronouns.
The second feature introduced in Tuggener and Klenner (2014) is the named entity class of the antecedent candidate. We add this feature without any modification, but only count and apply it when the candidate is actually a named entity. Table 5.12 shows named entity class weights for all pronouns.
PPER PPOSAT PRELS PDS PRELAT
PERSON 1.84 2.93 0.90 1.24 1.09
ORGANIZATION 0.80 1.09 0.83 1.00 1.14 GEO-POL. ENTITY 0.45 0.43 0.58 0.30 0.69 LOCATION 0.15 0.17 0.56 0.65 0.07
OTHER 0.48 0.52 0.71 0.12 0.07
Table 5.12: Weights for named entity classes (rows) of antecedent candidates per pronoun type (columns).
The table indicates that personal and possessive pronouns tend to bind to person entities (PER), which corresponds with the weights of the animacy features. Except for organi- zations (ORG), the other named entity types (GPE=geopolitical entity, LOC=location,
Chapter 5. Empirical validation of our entity-mention model 102
OTH=other) are weighted low, meaning their overall weight is decayed by the named entity class feature.
While working on the development set, we found that calculating prior weights for number and gender features also helps performance. Therefore, we add number and gender of the antecedent candidates as single features to the set.
Finally, we add the preposition of antecedent candidates in prepositional phrases as a feature. Candidates in PPs generally receive a low weight. However, we observed that different prepositions tend to affect salience to different amounts. Table 5.13 shows the five top and lowest weighted prepositions for personal and possessive pronouns.
Weight Preposition 0.69 f¨ur 0.58 neben 0.46 gegen 0.42 von 0.33 bei 0.08 aus 0.04 nach 0.04 in 0.01 w¨ahrend 0.00 seit Weight Preposition 0.60 gegen¨uber 0.52 zwischen 0.47 neben 0.44 f¨ur 0.38 gegen 0.01 trotz 0.01 w¨ahrend 0.01 ohne 0.00 seit 0.00 vor
Table 5.13: Top and lowest five weights for prepositions of antecedent candidates for personal pronouns (left) and possessive pronouns (right).
The tables show that all preposition weights are below 1, which diminishes the candi- dates’ overall weight products . However, there is a large difference between the top and lower weights. Amongst the lower weights, we see that “w¨ahrend” (while) and “seit” (since), two prepositions denoting PPs related to periods of time, are shared by the personal and possessive pronouns.
5.3.3.5 Features made available by the entity-mention model
The last two additions in the extended feature set are enabled by the incremental archi- tecture of the entity-mention model. The Discourse status feature indicates whether a candidate is already in a coreference chain (discourse-old) or stems from the buffer list of non-anaphoric markables (discourse-new). The Entity age feature measures how “old” the discourse-old entities are that appear as antecedent candidates of pronouns. That is, the feature only triggers when the antecedent candidate is part of a coreference chain. The value for the Entity age feature is calculated by subtracting the sentence number of the first mention of the candidate’s coreference chain from the sentence number of
Chapter 5. Empirical validation of our entity-mention model 103
the first markable in the document. The value of the Discourse status feature is a binary one (discourse-old, discourse-new). The intuition behind these features is that en- tities introduced early in the discourse (e.g. in headlines) are likely to appear frequently throughout the discourse and are, therefore, likely to be pronominalized (Mitkov, 1998, Uryupina, 2007, inter alia). Furthermore, Strube and Hahn (1999) showed that in the Centering framework, determining salience of antecedent candidate entities based on in- formation status (hearer new vs. hearer old) instead of grammatical functions improved pronoun resolution in their evaluation.
In Klenner and Tuggener (2010), we reported that other features derived from the entity- mention model, such as the length of the chain that a discourse-old candidate belongs to, had ambivalent impact on performance. During our experiments on the development set, we found that only the two features reported above increased performance overall.