5 Implementación de las Principales Tareas de Diseño
5.2 Tomas de Decisiones Dimensionales mediante Reglas de Producción
Named entity recognition provides information that is particularly relevant for NP pars-ing, as the entities suggest a likely syntactic structure. For example, knowing that Air Force is an entity tells us that Air Force contract is a left-branchingNP. The evaluation of the annotation tool’s suggestion feature in Section 3.5 also demonstrated the usefulness ofNER-based features. The BBN SIFT parser (Miller et al., 1998) is an existing system that makes use of NERinformation. There is also more recent work combiningNERand parsing in the biomedical field. Lewin (2007) exper-iments with detecting base-NPs usingNERinformation, while Buyko, Tomanek, and Hahn (2007) use a Conditional Random Field (CRF) to identify coordinate structure in biological named entities.
We drawNEtags from the BBN Pronoun Coreference and Entity Type Corpus (Weischedel and Brunstein, 2005), which annotates 28 different entity types. These include the standard person, location and organisation classes, as well as person descriptions (generally occupations), NORP (National, Other, Religious or Political groups), and works of art. Some classes also have finer-grained subtypes, although we use only the coarse tags in our experiments.
We have implemented a number of novelNERfeatures in theC&Cparser, most of which are generalisations of existing features that use head words and/or theirPOStags. The original model used 489,196 features and this figure rises to 540,898 when ourNERfeatures are included. This is an increase of 51,702 or 11%.
Lexical
Our first addition is a lexical feature describing theNEtag of each token in the sentence together with its lexical category. The same feature already exists in the model for the word and its
POStag. This new feature, and all others that we describe here, are not active when theNEtag(s) are O, as there is noNERinformation from tokens that are not entities.
Local Tree
The next group of features is based on the local tree (a parent and two child nodes) formed by every grammar rule application. We add a feature where the rule being applied is combined with
114 Chapter 6: Parsing with CCG
the head of the parent node’sNEtag. For example, when joining two constituents1: hfive,CD,CARD, N/Ni and hbuilders,NNS,PER DESC, Ni, the feature would be:
N → N/N N +PER DESC
as the head of the constituent is builders. This feature is based on the following pre-existing features that use the head word and itsPOStag instead of theNEtag:
N → N/N N + builders N → N/N N +NNS
The local tree feature type also combines the grammar rule with the child nodes. There are already features in the model describing each combination of the children’s head words andPOS
tags, which we extend to include combinations with theNEtags. Using the same example as above, the pre-existing features would be:
N → N/N N + five + builders N → N/N N + five +NNS N → N/N N +CD+ builders
N → N/N N +CD+NNS to which we add five new features:
N → N/N N + five +PER DESC N → N/N N +CARD+ builders N → N/N N +CD+PER DESC
N → N/N N +CARD+NNS N → N/N N +CARD+PER DESC
Entity Spanning
Another feature group is based on the NEspanning categories. We identify constituents that dominate tokens that all have the sameNEtag, as these nodes will not cause a “crossing bracket”
with the named entity. For example, the constituent Force contract, in theNPAir Force contract, spans two differentNE tags, and should be penalised by the model. Air Force, on the other hand, only spansORGtags, and should be preferred accordingly. We also take into account whether the
1These 4-tuples are the node’s head and the head’sPOStag,NEtag and lexical category.
Chapter 6: Parsing with CCG 115
constituent spans the entire named entity. Combining these nodes with others of differentNE tags should not be penalised by the model, as theNEmust combine with the rest of the sentence at some point. This feature group has no equivalent in the pre-existingC&Cmodel.
There are twoNEspanning features for each grammar rule instantiation: one for the parent node and a second for the child nodes. These features also describe whether or not the nodes span entire entities. There are two possibilities for the parent node, it does or it doesn’t, and four possibilities for the child nodes, depending on whether neither, the left, the right or both nodes span the entireNE.
Consider the example NP: Royal Air Force contract. When Air and Force combine, the parent feature would be:
N → N/N N + PARTIAL +ORG
and when Royal and Air Force subsequently combine, the parent feature would be:
N → N/N N + ENTIRE +ORG
We expect that both of these features would be assigned positive weights by the model, with the latter given the greater magnitude of the two.
On the other hand, if Air Force and contract were incorrectly joined, then the parent feature would be:
N → N/N N + PARTIAL +X
whereXindicates that disparateNEtags have been combined. Thus, noXconstituents can span an entire entity. The presence ofXmeans that this latter feature should have a negative weight, resulting in an accordingly lower probability for the constituent. Note that anXwould also be in the parent feature when Royal Air Force and contract are combined, even though this is the correct structure.
This will be moderated by the child features as described below.
For the child features, joining Royal and Air Force would result in:
N → N/N N + NEITHER +ORG+ORG
as neither child spans the entire entity. This feature should still receive a positive weight. When Royal Air Forceand contract form the entireNPnode, then (assuming that there are moreOtags to the right) the child feature will be:
N → N/N N + LEFT +ORG+O
116 Chapter 6: Parsing with CCG
The left child does span the entire entity in this case, and so this feature should be given a sufficiently high weight to overcome the negative weight that may be assigned by the parent feature with itsX. Incorrect constituents will still have low probability, such as Air combined with Force contract which produces the following feature:
N → N/N N + NEITHER +ORG+X because neither node spans an entire entity andXis present.
Entity Spanning without Rules
It may be argued that the spanning feature group does not need to describe the grammar rule being applied, as only theNEtags are relevant. We experimented with this by adding back-off features, namely the spanning categories without rules. These features work in the same way as the previous feature group, only without conditioning on the grammar rule. For example, the parent feature for Air Force would be:
ENTIRE+ORG
and the child feature combining Air Force and contract would be:
LEFT+ORG+O
Distance
The final feature group is based on the distance features that the parser currently uses.
They give a measure of the distance between the heads of the constituents being combined. Individ-ual features exist that count the number of words, verbs and punctuation marks in the intervening tokens. These features already include the head word of the combined constituent and generalise to itsPOStag. We add another generalisation to theNERtag.
For example, if Air were combining with Force contract, then there would be one word between the heads (Force), and the following word distance features would be created:
N → N/N N + contract + 1 N → N/N N +NN+ 1 Our additionalNER-based feature would be:
N → N/N N +O+ 1
The same is done analogously for the verb and punctuation distance features.
Chapter 6: Parsing with CCG 117