Infraestructura y servicios básicos
2.1. Servicios básicos de calidad en el territorio
There are three other German corpora we are aware of which feature coreference anno- tation. The Potsdam Commentary Corpus (Stede, 2004) provides an annotated data set comprised of 176 articles from the newspaper ‘M¨arkische Allgemeine Zeitung’. Besides coreference, the corpus features annotation of discourse structure, as well as connectives
Chapter 5. Empirical validation of our entity-mention model 76
and their arguments. The coreference annotation is available in a CoNLL-style format. We evaluate our models on this corpus in section 5.4.4.
The DIRNDL corpus (Bj¨orkelund et al., 2014) contains 618 documents gathered from German radio broadcasts and is also coreferentially annotated and available in a CoNLL format. However, possessive and relative pronouns are not annotated with coreference, and we only counted 218 instances of annotated personal pronouns (excluding es), which is a low count compared to the T¨uBa-D/Z corpus.
Finally, the Heidelberg Text Corpus (HTC) explored in Strube et al. (2002) and Kouchnir (2004) consists of 242 short texts about sights, historic events, and persons in Heidelberg. Unfortunately, the corpus is only available in the MMAX format (M¨uller and Strube, 2001).5 Without the CoNLL format, the official coreference scorer and our ARCS scorer
are not applicable and thus we are not able to evaluate our system on this corpus.
5.1.2 Markable extraction
In this section, we describe how we extract the markables, i.e. the noun phrases and pronouns that we consider for coreference resolution. Recall that we view coreference resolution through the scope of downstream applications in Computational Linguistics and NLP. In this perspective, we identify those types of markables whose resolution we deem useful for such subsequent applications.
5.1.2.1 Part-of-speech-based identification of markables
We define the type of markables we aim to extract by their part of speech (PoS) tags. The T¨uBa-D/Z employs the Stuttgart/T¨ubingen Tag Set (Schiller et al., 1999, STTS) for the annotation of part of speech of tokens. Among the STTS, we select the following subset shown in table 5.1 which features coreference annotation in the corpora investigated. The T¨uBa-D/Z contains coreference annotation for additional pronouns, but these fea- ture few instances in the data compared to the ones listed above. Additionally, related work on German pronoun resolution has mainly focused on personal and possessive pro- nouns (Strube et al., 2002, Schiehlen, 2004, Hinrichs et al., 2005, Wunsch, 2010, inter alia).
5Initial efforts to convert the MMAX format to CoNLL proved to be cumbersome and were not fruitful due to idiosyncrasies in the MMAX format. Due to time constraints, we were not able to pursue the effort further, unfortunately.
Chapter 5. Empirical validation of our entity-mention model 77
PoS Description Example DE EN Nouns
NN Common noun Anwalt (attorney) NE Named entity Danzig (Danzig) Pronouns
PPER Personal pronoun sie (she / they) PPOSAT Possessive pronoun ihr (her / their) PRELS Relative pronoun Leute, die (people who) PRELAT Attributing rel. pronoun Menschen, deren (people whose) PDS Demonstrative pronoun dies (this)
Table 5.1: PoS tags of head tokens of NPs and pronouns considered for coreference resolution in our system.
Regarding reflexive pronouns (PRF: sich, mich, dich, euch), the T¨uBa-D/Z coreference annotation guideline (Naumann, 2007) employs two categories. The first category con- sists of uses of the reflexive pronoun sich as part of an inherently reflexive verb, e.g. sich ereignen (to happen itself*). The reflexive in these cases is bound to the verb, and the verb cannot be used correctly without it. Here, the reflexive pronoun arguably does not refer to an antecedent and is therefore not annotated as anaphoric. The second category covers the use of reflexives in combination with verbs which do not depend on them, e.g. sich waschen (to wash oneself ) vs. etwas waschen (to wash something). These cases of sich are annotated as anaphoric.
The T¨uBa-D/Z 9.1 contains 10689 instances of the reflexive sich, of which 9284 are an- notated as being non-anaphoric, i.e. as part of an inherently reflexive verb. On the one hand, this yields a baseline of 86.86% accuracy of deciding not to resolve sich. On the other hand, when the reflexive is annotated as anaphoric, it refers to the grammatical subject of the governing verb in virtually all cases. That is, there is little or no ambigu- ity involved and, if desired, sich can safely be resolved to the aforementioned subject. Here, again, we take the view of higher-level NLP applications and argue that the anno- tation guideline is too fine-grained. Therefore, we exclude reflexives from our approach. However, it is noteworthy that when we use the common coreference evaluation metrics, which are ignorant of mention PoS types, we lower the upper bound for Recall of our experiments.
Apart from excluding reflexives and certain rare pronouns, we make an additional and arguably big cut: the exclusion of the notorious (because potentially pleonastic) 3rd person pronoun es (it). Related work on English and German pronouns has introduced several approaches to determine the (non-)anaphoricity of it and es, ranging from simple filters (lists of verbs which subcategorize a pleonastic it as the grammatical subject, e.g. it rains (Lappin and Leass, 1994)) to elaborate machine learning approaches (Bergsma et al., 2008b).
Chapter 5. Empirical validation of our entity-mention model 78
The T¨uBa-D/Z 9.1 contains 8891 instances of es of which only 706 (7.94%) are anaphoric. This yields a baseline accuracy of 92.06% for not resolving es.6 Furthermore, our initial
approaches to address the problem were disappointing. We have found that identifying pleonastic uses of es based on heuristics achieves relatively solid Precision accompanied by poor Recall for the anaphoric uses of es. That is, identifying anaphoric use of es has proven to be more difficult, mainly because of the large imbalance in the training data. Thus, finding indicators for the anaphoric use of es has been proven to be futile in our initial approaches.
Given the reasoning above, we decided not to resolve es in our experiments, like Klenner and Ailloud (2008). For comparison, (Wunsch, 2010, p. 151) stated that he relied on the gold annotation to decide whether to resolve es, i.e. he excluded the anaphoricity detection problem. Schiehlen (2004) also relied on the gold standard for anaphoricity detection of third person pronouns. That is, he did not restrict the anaphoricity check to es, but all third person pronouns.
Beside the restrictions listed above, we process all NPs and pronouns of the PoS sub- set listed in table 5.1. More specifically, we consider all encountered pronouns to be anaphoric, and if we find compatible antecedent candidates (which meet several filter criteria), we resolve them.
5.1.2.2 Identification of markable boundaries
As outlined in section 1.5.3, correctly identifying markable boundaries (i.e. the span of tokens that denote an NP) is of crucial importance, since evaluation only deems system mentions to be resolved correctly if their boundaries perfectly align with those of the gold mentions. An easy way of ensuring correct mention boundaries would be to only extract gold mentions from the data. However, we aim to develop an end-to-end coreference resolution approach which can be applied on raw text where gold mention boundaries are not known, which is a more realistic setting.
We apply the following heuristics to identify markable boundaries. Traversing the CoNLL-style format of the T¨uBa-D/Z (e.g. table 1.1), we look at the PoS tag of each token. If we encounter a pronoun PoS tag from the set listed in table 5.1, we assume that the pronoun features a single-word markable extension. That is, we extract the pronoun as a markable denoted by a singe token.
For nouns and named entity tokens, we find the maximal NP projection by gathering all tokens that (recursively) link to the noun or name token, as indicated by the dependency
6
By comparison, in the state-of-the-art corpus for coreference resolution for English, OntoNotes (Pradhan et al., 2013), 771 out of 1318 (58.5%) of the occurrences of the pronoun it are anaphoric.
Chapter 5. Empirical validation of our entity-mention model 79
parse. We then cut relative clauses and punctuation etc. at the end of the projection with heuristics developed over the development set.
To cope with mention boundary problems, we check whether all gold mentions are represented as markables once the end of a document is reached. If a gold mention lacks a corresponding markable, we traverse the markables and heuristically determine7 the closest matching one and adjust its boundaries to those of the gold mention.
By comparison, Hinrichs et al. (2005) allowed system mentions to contain the gold men- tions and vice versa in order to be counted as correct in their evaluation. Wunsch (2010), p. 123, counted system antecedents as correct if they featured the same head token as the corresponding gold antecedent. Versley (2010), p. 213, collected all projections of an NP’s head and counted system antecedents as correct if any of the corresponding projections denoted the gold antecedent.
Once markable spans are identified, we create feature vectors that describe the mark- ables. Table 5.2 shows the initial vector representation we create for the possessive pro- noun and its antecedent in our running example, i.e. [ihren] and [Arbeiterwohlf ahrt] (abbreviated as AWO). This representation, with several modifications and additions, constitutes the input for our coreference models.
m ar k ab le ID se n t. n u m b e r st ar t tok e n e n d tok e n P oS p e rs on ge n d e r n u m b e r gr am . rol e an im ac y go v . tok e n ID go v . v e rb NE ty p e c on n e c tor le m m a
11 4 4 6 NN 3 FEM SG SUBJ - 13 entlassen ORG - AWO
12 4 7 7 PPOSAT 3 * * DET * 13 entlassen - - ihr
Table 5.2: Example of markable instances represented as feature vectors. ’*’ denotes underspecified feature values.
All features shown in table 5.2 are directly extracted from the T¨uBa-D/Z data, except the animacy feature and the gender of person entities. To determine the gender of person entities, e.g. “Bill Clinton”, we use a list of first names divided into male and female names which was gathered from the Internet. We determine animacy of common noun markables by a look up in GermaNet (Hamp and Feldweg, 1997), a German word net. We deem a markable animate if its head noun is a hyponym of the synset Mensch (human).
Having outlined the data and our markable extraction procedure, we next discuss the implementation of the coreference models.
7
Chapter 5. Empirical validation of our entity-mention model 80