Anexo 6. Estrategia de imputación de casos perdidos
2.2.2. Modelos para el estudio cuantitativo de la relación entre liderazgo
For the current task, we organize nouns into two semantic classes with a high and low tendency to encode causation. In particular, we identify two classes of noun phrases named as Cnp and ¬Cnp where the class
Cnp (¬Cnp) contains the noun phrases with a high (low) tendency to encode causation, respectively. For
example, the class Cnp consists of the noun phrases which normally represent events, conditions, states,
phenomena, processes and thus have a high tendency to encode causation. Similarly, the class ¬Cnpconsists
of the noun phrases that normally do not encode causation unless a metonymy is associated with them e.g., any noun phrase representing a location (see examples (6), (7) and (8)).
We leverage the annotations of FrameNet corpus to identify the above mentioned two classes of noun phrases. For this purpose, we consider only those annotations in which the labeled elements do not contain any verb and must contain at least one noun. These annotations roughly represent instances of noun phrases. We manually examined the inventory of frame elements acquired from these annotations and assign these frame elements to the classes Cnpand ¬Cnp. For this purpose, we follow the definitions of these two classes stated
above in this section. For example, the frame element “Place” is assigned to the class ¬Cnpbecause it has the
least tendency to encode causation unless a metonymy is associated with it. Table 5.3 shows some examples of the assignments of frame elements to the classes Cnp and ¬Cnp. Using FrameNet’s annotations, we have
acquired 936 distinct frame elements which we assigned to the classes Cnp and ¬Cnp. We have assigned
355 (524) frame elements to the class Cnp (¬Cnp), respectively. For the rest of the 57 frame elements, we
were not certain about the assignments of these frame elements to two classes of noun phrases. We refer the reader to Appendix C for the full list of frame elements with assignments to the classes Cnpand ¬Cnpand 57
frame elements with no assignments. We use these assignments of frame elements to apply the labels of Cnp
and ¬Cnp to the annotations of FrameNet corpus. Using the above method, we have acquired 52,706 Cnp
and 94,841 ¬Cnp instances. We use the term FNETnp to refer to the corpus of these instances. We employ
the FNETnp corpus to build a supervised classifier to predict the labels Cnp and ¬Cnp. On getting input
This information is then provided to our model to make better predictions for the current task of identifying causality. Following are two examples of classes Cnp and ¬Cnp from the FNETnp corpus:
9. Each year, we help thousands of people who face tremendous obstacles (Cnp).
10. And there are a lot of people who face these challenges every day of their lives (¬Cnp).
In FrameNet, for example (9) the frame element “Issue” is assigned to the expression “tremendous obstacles” and for example (10) the frame element “Frequency” is assigned to the expression “every day of their lives”. Using Table 5.3 we assign the class Cnp and ¬Cnp to examples (9) and (10).
Using the instances of FNETnp corpus, we build a supervised classifier for the classes Cnp and ¬Cnp. In
addition to above mentioned corpus obtained from FrameNet, we also employed WordNet to extract more training instances of the classes Cnp and ¬Cnp. For this purpose, we followed the approach similar to Girju
and Moldovan (2002) and adopted some senses of WordNet shown in Table 5.4. For example, considering Table 5.4, we assigned the label ¬Cnp to any noun concept whose all senses in WordNet lie in the semantic
hierarchy originated by the sense {time period, period of time, period}. Note that with this approach we consider a relatively unambiguous noun concept with all its senses lying in the hierarchy {time period, period of time, period}. Following this scheme, we extracted the instances of noun concepts ∈ WordNet from the English Gigaword corpus and assigned the labels Cnp and ¬Cnp to these instances by using the assignments
of senses from Table 5.4. Girju and Moldovan (2002) have used similar scheme to rank the noun phrases according to their tendencies to encode causation. In comparison to them, we use the WordNet’s senses to increase the size of the training corpus FNETnp. In addition to this, we build an automatic classifier using
this corpus to predict the semantic classes of noun phrases. We employ the term FNET-WNETnpto refer to
the training corpus with instances of noun phrases acquired using FrameNet and WordNet. In the training corpus FNET-WNETnp, there are 280, 212 instances of the noun phrases (50% belonging to each of Cnp
and ¬Cnp classes). After removing the instances of FNETnp’s corpus from the FNET-WNETnp’s corpus,
we are left with 87,400 Cnpand 45,265 ¬Cnp instances of noun phrases. We use the term WNETnp to refer
to the training corpus with these instances. We evaluate our model by providing the information of the semantic classes of noun phrases acquired using both training corpora of noun phrases i.e., WNETnp and
FNET-WNETnp. Note that WNETnp’s training corpus contains instances of the relatively unambiguous
noun phrases. On the other hand, FNET-WNETnp training corpus contains instances of both ambiguous
and unambiguous noun phrases.
In order to build the supervised classifier, we employ the following list of features:
Label WordNet Senses of Nouns
Cnp {act, deed, human action, human activity}, {phenomenon}, {state},
{psychological feature}, {event}, {causal agent, cause, causal agency}
¬Cnp {time period, period of time, period}, {measure, quantity, amount}, {group,
grouping}, {organization, organisation}, {time unit, unit of time}, {clock time, time}
Table 5.4: The assignments of WordNet’s senses of nouns to the classes Cnp and ¬Cnp.
first two (three) (four) letters of the head noun of the noun phrase, last two, (three) (four) letters of the head noun of the noun phrase.
• Syntatic Features: part-of-speech tags of all words of the noun phrase and the head noun of the noun phrase.
• Semantic Features: Frequent sense of the head noun of the noun phrase.
We employ both NB and MaxEnt supervised classification algorithms to predict the semantic classes of noun phrases (i.e., the classes Cnp and ¬Cnp).
In addition to using the supervised classifier for the classes Cnp and ¬Cnp, we also apply a named entity
recognizer [Finkel et al. 2005] to identify the seven types of named entities (i.e., LOCATION, PERSON, ORGANIZATION, DATE, TIME, MONEY, PERCENT). On getting an instance of verb-noun phrase, if the noun phrase is identified as a named entity then we assume that it belongs to the class ¬Cnp unless a
metonymy is associated with it. In the next section, we introduce our method to determine the association of metonymies with noun phrases.