• No se han encontrado resultados

Gestión del espacio público

Parques y medio ambiente urbano

3.6. Gestión del espacio público

The next family of features is derived from the entity grid representation of discourse (Barzilay and Lapata, 2008) which models local coherence in texts. The entity grid representation itself is based on Centering theory (Grosz et al., 1995) which models local coherence by tracking sequences of grammatical functions that entities subsequently occur with in coherent discourse. Table 5.8 depicts the example grid and text given by Barzilay and Lapata (2008), p. 6. Note that the inventory of grammatical functions

Chapter 5. Empirical validation of our entity-mention model 97

is reduced in the entity grid representation (S=subject, O=direct object, X=all other, -=entity not present).

1 [The Justice Department]Sis conducting an [anti-

trust trial]O against [Microsoft Corp.]X with

[evidence]X that [the company]S is increasingly

attempting to crush [competitors]O.

2 [Microsoft]Ois accused of trying to forcefully buy

into [markets]X where [its own products]S are

not competitive enough to unseat [established brands]O.

3 [The case]S revolves around [evidence]O of

[Microsoft]S aggressively pressuring [Netscape]O

into merging [browser software]O.

D ep ar tm en t T ri al M ic ros of t E v id en ce C om p et it or s M ar ke ts P ro d u ct s B ran d s C as e Ne ts cap e S of tw ar e 1 S O S X O - - - - 2 - - O - - X S O - - - 3 - - S O - - - - S O O

Table 5.8: Entity grid representation of entity occurrences in discourse.

The basic assumption of the entity grid model is that there exist certain regularities in these sequences (i.e. the columns in the right table in table 5.8) which interplay with coherence. Regularities in the sequences can be learned from coherent texts and then be used to score coherence in other texts, such as summaries.

The entity grid can also be interpreted as a representation of the coreference partition of a document, since it depicts all occurrences of the entities in the document (singletons and coreferent ones). Based on this view, we create feature conjunctions containing the grammatical roles of the antecedent candidates and the pronoun. That is, we model how likely are the different transitions of the grammatical functions from the antecedent candidates to the pronouns. Since we resolve pronouns in their local context and do not track all individual mentions of coreferent entities, we create bigram sequences, i.e. we do not query grammatical functions of antecedent entities that have occurred multiple times before the pronoun. However, we do not simplify the grammatical roles as Barzilay and Lapata (2008), but use the roles as encountered in the data.

We make two additions to the feature conjunction (Sequence gram. funct.). First, we add the sentence distance between antecedent candidate and pronoun. The example grid in table 5.8 shows that certain transitions only occur in relation to a certain sen- tence distance, e.g. we only see the transition [X,O] for the “Evidence” entity with an intervening sentence, i.e. the original transition is [X,-,O]. Since the coreference partition does not track non-occurrence of entities, as the entity grid does, this information is lost. In other words, if we learned transitions from the coreference partitions in the training data without relating them to sentence distance, we would extract the pattern [X,O] for the “Evidence” entity. However, the pattern occurs here with an intervening sentence. Therefore, we add sentence distance to the conjunction of the grammatical roles. In the case of the “Evidence” entity, the feature conjunction is then [2,X,O].

Chapter 5. Empirical validation of our entity-mention model 98

Second, we add the PoS tag of the antecedent candidate to the feature conjunction. The PoS tag of an entity mention reflects the givenness of an entity. The theory on antecedent accessibility by Ariel (1988) reflects this notion by relating the surface form of an entity mention to its degree of being familiar at a given point in discourse. This yields a hierarchy of surface manifestation forms, which can be coarsely adapted to PoS tags (Martschat and Strube, 2014, e.g.): named entity, common noun, and pronouns.15 The PoS tag can be thought of as a relativization for the other features in the conjunction. For example, a transition of grammatical roles given a high sentence distance might be more likely to indicate coreference if the antecedent candidate is a named entity, rather than a pronoun.

We experimented with different representations of the aspect of familiarity (e.g. definite- ness and coreferential status of the antecedent) and distance (sentences, markables) and found that the conjunction of sentence distance, PoS tag, and the transition of grammat- ical roles performs best. The final form of the feature conjunction for the “Evidence” entity from the above example would thus be [2,X,O,NN] when viewed from sentence three: It is two sentences away, it occurred with the grammatical role PN (X), now it is mentioned as an object (O), and its last occurrence was a nominal mention (NN). Table 5.9 lists the 5 highest weighted features derived from the conjunctions for personal pronouns. As the feature conjunction yields many sparse features, we only list those that are seen at least 50 times during training. We see that the feature also captures parallelism of grammatical roles, a feature in the standard set, but calculates weights for parallelism of specific roles and sentence distances.

Weight SD GF A GF P PoS A 3.92 0 SUBJ OBJD PPER 3.91 0 SUBJ SUBJ PPER 3.87 0 OBJA OBJA PPER 3.87 0 PN SUBJ PPER 3.86 0 SUBJ SUBJ PPOSAT

Table 5.9: Top weighted instantiations of the feature conjunction of sentence distance (SD), grammatical role transition from antecedent (GF A) to pronoun (GF P), and

PoS tag of the antecedent (PoS A) for personal pronouns.

For possessive pronouns, we model two specific configurations. Since the grammatical role of possessive pronouns is DET (determiner), we are not able to further diversify weight calculation on the syntactic context of possessive pronouns. Therefore, we addi- tionally calculate a weight for the grammatical transition of the antecedent candidate to the syntactic head of the possessive pronoun (Sequence gram. funct. PPOSAT

15

For example, if a discourse refers to Barack Obama, it uses a named entity mention if Obama has not been recently mentioned. If Obama has been mentioned very recently, a pronoun can be used for reference. It could use “the president” if the entity is not completely out of focus etc.

Chapter 5. Empirical validation of our entity-mention model 99

head1). Furthermore, we calculate weights for the specific cases where the head of the possessive pronoun is governed by the same verb as the antecedent candidate (Sequence gram. funct. PPOSAT head2). For example, consider the following segment from the test set:

(5) Landesvorsitzende Wedemeier: Ein Buchungsfehler. Im Januar hat die Arbeiter- wohlfahrt1 Bremen ihren1 langj¨ahrigen Gesch¨aftsf¨uhrer Hans Taake fristlos ent-

lassen [...]

Regional chairwoman Wedemeier: An accounting error. In January, the Worker Welfare Association1Bremen has laid off its1long-term CEO Hans Taake without

notice [...]

For the possessive pronoun ihren, we create the feature for the two compatible antecedent candidates Landesvorsitzende Wedemeier and Arbeiterwohlfahrt Bremen. The feature for the former is [1,ROOT,OBJA,NE], since the candidate is in the previous sentence, its grammatical role ROOT, the possessive pronoun’s head role is direct object (OBJA) and the candidate is a named entity. For the second candidate, Arbeiterwohlfahrt Bre- men we also instantiate the feature, i.e. [0,SUBJ,OBJA,NN]. Additionally, we create this feature without the sentence distance, i.e. [SUBJ,OBJA,NN], since the Arbeiter- wohlfahrt Bremen candidate is governed by the same verb as the possessive pronoun’s head langj¨ahrigen Gesch¨aftsf¨uhrer Hans Taake, i.e. entlassen. Doing so, we intend to capture specific transitions from an antecedent to the possessive pronoun’s head, given that both are governed by the same head. Table 5.10 lists weights for the most frequent instantiations of the feature.

Weight GF A GF P PoS A 7.97 SUBJ OBJA PPER 7.86 SUBJ PN PPER 7.69 SUBJ OBJA NN 6.07 SUBJ PN NN 0.54 PN PN NN

Table 5.10: Weights for the most frequent instantiations of the feature conjunction of grammatical function transition from antecedent (GF A) to pronoun (GF P) and PoS tag of antecedent (PoS A) for possessive pronouns, specifically for cases where the possessive pronoun’s head is governed by the same verb as the antecedent candidate.

The table shows that the transition evoked by the second candidate in our example (Arbeiterwohlfahrt Bremen) is very frequent and receives a high weight. The weight suggests that when possessive pronouns are determiners of direct objects, the likelihood of these pronouns referring to the subject of the verb governing the direct objects is high. By contrast, the likelihood of possessive pronouns whose heads are in prepositional noun

Chapter 5. Empirical validation of our entity-mention model 100

phrases to refer to antecedents in prepositional noun phrases governed by the same verb as the pronouns’ head is rather low (last row).