Quito Ecuador
ÍNDICE DE CONTENIDOS TEMA Pág.
I use five types of features in the classification experiments in Section 8.6: bag-of-word fea- tures, clueset features, clue synset features, DSSE word features, and DSSE wordnet features. The bag-of-word features are straightforward. For a given attribution level, bag-of-words is just the words in the text for that attribution level. I describe each of the remaining types of features below.
8.5.1 Clueset Features
There are four clueset features defined for each attitude type that is classified. The value of a clueset feature is the number of instances of clues from the clueset that appear within the text of the attribution level. The cluesets are defined based on reliability class, which comes from the subjectivity lexicon, and attitude class, which comes either from one of the expression-level classifiers (Section 8.4) or from the lexicon.
For sentiment recognition, the clueset features are the following: strongsubj:sentiment-yes
strongsubj:sentiment-no weaksubj:sentiment-yes weaksubj:sentiment-no
The reliability class (strongsubj or weaksubj) for a clue instance comes from the clue’s entry in the lexicon. Whether a clue instance is sentiment-yes or sentiment-no comes either
from the lexicon or from from the neutral-polar expression classifier. If information from the lexicon is used, clues with a prior polarity of neutral are sentiment-no; all other are sentiment-yes. If the neutral-polar expression classifier is used as the source of the sentiment information, then only those clue instances identified as polar by the expression-level classifier are categorized as sentiment-yes.
For positive-sentiment classification, the clueset features are: strongsubj:pos-sentiment-yes
strongsubj:pos-sentiment-no weaksubj:pos-sentiment-yes weaksubj:pos-sentiment-no
The reliability class for a clue instance as always comes from the clue’s entry in the lexicon. Whether a clue instance is pos-sentiment-yes or pos-sentiment-no comes either from the lexicon or from the expression-level polarity classifier. If information from the lexicon is used, then a clue instance is pos-sentiment-yes if it has a positive or both prior polarity; otherwise, a clue instance is pos-sentiment-no. If the expression-level polarity classifier is used as the source of polarity information, then a clue instance is pos-sentiment-yes only if classified as positive or both by the expression-level classifier. The clueset features for negative-sentiment classification are defined in a similar way.
For arguing recognition, the clueset features are: strongsubj:arguing-yes
strongsubj:arguing-no weaksubj:arguing-yes weaksubj:arguing-no
Both the reliability class for a clue instance and whether a clue instance is arguing-yes or arguing-no is taken from the lexicon. Clues listed in the lexicon with a positive or negative prior arguing polarity are arguing-yes, and all others are arguing-no.
Although there is no expression-level arguing classifier for disambiguating the contextual arguing polarity of each clue instance, a possible way to improve the quality of the clueset features for arguing is to constrain the set of clue instances used to only those identified as subjective by the subjective-expression classifier. This is the approach used to disambiguate the clue instances for the various arguing classification experiments.
For positive-arguing classification, the clueset features are: strongsubj:pos-arguing-yes
strongsubj:pos-arguing-no weaksubj:pos-arguing-yes weaksubj:pos-arguing-no
The clueset features are similar for negative arguing. As with the more general arguing clas- sification, reliability class information is obtained from the lexicon, although the set of clue instances considered is constrained by the subjective-expression classifier as described in the paragraph above. Information about a clue instance’s arguing polarity is also obtained from the lexicon, but it is combined with information about negation terms in the surrounding context.
Negation for arguing is not always the same as negation for sentiment. The expression- level polarity classifier looks in the preceding four words for a negation term that is not part of an intensifying phrase. Examples of phrases that intensify rather than negate are not only and nothing if not. For some types of arguing clues being negated, the negation term will come before, for example, not true and I don’t believe. However, to negate modals such as should and must, which are often good arguing clues, the negation term follows the clue: should not or must not.
To incorporate negation when determining the arguing polarity for a clue instance, I do the following. If there is a negation term in the four words preceding the clue instance or in the two words following the instance, and if the negation term is not part of an intensifying phrase, then I assume the instance is being negated. If the instance is being negated and it has a positive arguing polarity in the lexicon, I count the instance as a negative-arguing clue. Similarly, if the instance is being negated and it has a negative arguing polarity in the lexicon, I count the instance as a positive-arguing clue.
8.5.2 Clue Synset Features
A synset is a set of synonymous words and phrases. It is also the basic unit of organization in the WordNet (Fellbaum, 1998) lexical database. Example 8.2 below is a synset from WordNet with its gloss.
(8.2) good, right, ripe – (most suitable or right for a particular purpose; “a good time to plant tomatoes”; “the right time to act”; “the time is ripe for great sociological changes”)
The motivation for the clue synset features is that there may be useful groupings of clues, beyond those defined in the subjectivity lexicon, that a learning algorithm could exploit for attitude classification. WordNet synsets provide one way of grouping clues.
To define the clue synset features, the first step was to extract the synsets for every clue from WordNet 2.0 and add this information to the lexicon for each clue. Every synset in WordNet has a unique identifier; these identifiers are what was added to the lexicon. Then, the clue synset feature for a given attribution level is the union of the synsets of all the subjective clue instances that are found in the attribution level. The subjective clue instances are determined based on the output of the subjective-expression classifier.
8.5.3 DSSE Features
The motivation for the DSSE features is that DSSEs, being at the root of the attribution level, might be particularly important when it comes to recognizing attitude type. There are two types of features based on DSSEs: the DSSE word features and the DSSE wordnet features.
The DSSE word feature for an attribution level is just the set of words in the DSSE phrase. If there is no DSSE phrase because the DSSE is implicit, then the value for this features is a special implicit token.
There are two DSSE wordnet features: DSSE synsets and DSSE hypernyms. The DSSE synsets feature is the union of the WordNet synsets for all the words in the DSSE phrase, with the exception of the words in the following stoplist:
is, am, are, be, been, will, had, has, have, having, do, does
Hypernymy is one type of semantic relation defined between synsets in WordNet. The hypernyms for a word are thus the synsets that are parents of the synsets to which the word belongs. A hypernym may be the direct parent synset, or a hypernym parent synset further up the tree. Example 8.3 below gives all the hypernyms for the noun synset: good, goodness (moral excellence or admirableness).
(8.3)
• morality (concern with the distinction between good and evil or right and wrong; right or good conduct)
• quality (an essential and distinguishing attribute of something or someone) • attribute (an abstraction belonging to or characteristic of an entity)
• abstraction, abstract entity (a general concept formed by extracting common features from specific examples)
• entity (that which is perceived or known or inferred to have its own distinct existence (living or nonliving))
The DSSE hypernyms feature is the union of all the WordNet hypernyms for all the words in the DSSE phrase, with the exception of the words in the above stoplist. If there is no DSSE phrase because the DSSE is implicit, then the value for both the DSSE synsets and DSSE hypernyms features is a special implicit token.