CAPITULO V: Conclusiones y Recomendaciones
5.2. Recomendaciones
2.3. Structurally informed sentiment analysis
This section presents some related work in the area of sentiment analysis that deals with the integration of structural information into the processing. Automatic sentiment analysis covers a broad range of tasks and approaches, including sentiment polarity classification, target identification, opinion holder identification, sentiment summariza- tion, and emotion classification. Many different methods have been applied and, the integration of context information has been approached in many different ways.
From the various possible types of context (e.g., sentence structure, discourse infor- mation, domain knowledge), we put our focus in this section on structural linguistic context in the form of parse tree information, as this is the type of structural context we address in this thesis. This section is not an exhaustive survey of works that use parse tree information. We concentrate mainly on polarity classification and mention some of the more prominent approaches where sentence structure plays an important role. For more information, please refer to the survey by Liu (2015).
2.3.1. Structure in rule-based approaches
Rule-based approaches to sentiment analysis are usually based on polarity clues from a sentiment dictionary (see Section 2.2.3). The sentiment of a text snippet is calculated by counting the polarity clues with their polarity, if there are more positive words, the over- all sentiment is positive, if there are more negative words, the sentiment is negative. This term counting or lexicon-based approach was first introduced by Pang et al. (2002) and has been widely used as a baseline in sentiment analysis since. Approaches that in- clude structural information either use this information to calculate contextual polarity for every found sentiment expression or to calculate sentence polarity by propagating the polarity from the leaves to the root through a parse tree.
Moilanen and Pulman (2007) present a propagation approach to calculate sentence po- larity with the help of manually defined compositional patterns on top of a constituency parse. They use a sentiment dictionary where each entry has one of the tags: non- polar/neutral, negative, positive and polarity reversers. Patterns are used to propagate sentiment through the parse tree, so that each phrase is assigned a sentiment polarity. The patters are dependent on POS tags and include rules for polarity reversers.
Shaikh et al. (2007) present detailed rules to determine sentiment on sentence level that work on top of a what they call “semantic parse”, i.e., the output of a dependency parser that yields triplets of subject-verb-object according to each semantic verb frame of
20 2. Sentiment Analysis
the input sentence. Calculation rules take into account POS tags, polarity information, negation and intensifiers/diminishers.
One approach to calculate contextual polarity for individual sentiment expressions is presented by Choi and Cardie (2008). They use hand-written rules motivated by compositional semantics that work on syntactic patterns on top of a constituency parse, e.g., the pattern “VP NP” for “destroyed the terrorism”. A rule specifies how the parts of the pattern are to be combined in terms of their polarities, e.g., for the above example as “destroyed” is a negator, it would flip the negative polarity of “terrorism” to result in positive sentiment. The proposed rules deal with negation and conflicting polarities.
Similarly, Liu and Seneff (2009) present an approach that uses what they call a “parse- and-paraphrase” paradigm. All sentences are parsed by a lexicalized context free gram- mar which results in what they call “frames that encode semantic dependencies”. These parses are used to extract adjective-noun pairs which are then assigned a polarity and used for the final task, sentiment polarity classification of phrases.
An approach that is not purely rule-based, but makes use of manually defined patterns is proposed by Kanayama et al. (2004). They view sentiment analysis as translation from text to “sentiment units” consisting of polarity and target. They use an existing machine translation system that works on complete dependency parses and performs top-down pattern matches on the tree structures. They replace the translation patterns with sentiment patterns and the bilingual lexicon with a sentiment polarity lexicon.
2.3.2. Structural features for machine learning
Machine learning for polarity classification can be approached the same way as other text classification tasks. Commonly used features include unigrams, bigrams, as well as polarity clues. In this subsection we look at how parse tree information has been included into feature sets. Approaches that are directly based on sentence structure are discussed in the following subsection.
Parse-tree information can be broken down into the individual relation between two words in a sentence. As one of the first to address the task of polarity classification with machine learning, Dave et al. (2003) attempt to determine the polarity of product reviews. They experiment with a wide range of features: WordNet, negation, POS tags, and parse features based on a dependency parser. Their parse features are triples of a word, the dependency relation and the head word, e.g., “nice(A):subj:camera(N)”. They report that including these features hurts their baseline of using unigrams only.
2.3. Structurally informed sentiment analysis 21
feedback items. His features include POS trigrams, constituency parse tree information in the form of patterns, grammatical roles, and “logical form” features such as transitivity of a predicate or tense information. In contrast to Dave et al. (2003), his results indicate that linguistic features are beneficial for classification.
Ng et al. (2006) incorporate dependency information into a polarity classifier for movie reviews. In contrast to Dave et al. (2003), they include only specific relations, namely all subject-verb, verb-object and adjective-noun tuples found in the parse tree, e.g., “(like, movie)”. Using these dependency-based features improves over unigrams, but not over higher-level n-grams (bigrams and trigrams). In their investigation they hypothesize that their stemming of the words may be problematic, as it does not allow to distinguish between “he likes the movie” and “I like the movie”.
Based on the mixed results from previous work, Joshi and Penstein-Rosé (2009) in- vestigate the role of dependency relations as features in more detail, although for the task of identification of opinionated sentences. They convert dependency relations into triples of relation-head-dependent and experiment with replacing the lemma of either the head, the dependent or both by the POS tag, e.g., “amod-NN-great”. They report the best results by replacing the head word, which they argue yields more generalizable patterns as most of the time the sentiment words are modifiers and targets are heads.
As an alternative to general parse tree relations, features tailored to the specific task can be used. One prominent example is presented by Wilson et al. (2005) who do sub- jectivity and polarity classification of individual sentiment expressions from news data. Wilson et al. (2009) investigate the effect of the different feature groups in more detail. They find negation to be the most important feature for polarity classification, but this feature is based on window context in their implementation. Next in importance are dependency-based features that model whether a subjectivity clue modifies the current word or the current word is modifying a subjectivity clue. Less important are their other dependency features which indicate whether a subject, copula or passive relation is found on the path from the current word to the root.
Besides individual relations, complete paths through the tree can be used as features which is especially important if polarity classification is addressed in combination with the identification of target or opinion holder. J. Kessler and Nicolov (2009) work on the task of linking already identified sentiment expressions and their targets. They present an approach with features based on syntactic paths through the dependency tree which outperforms manually defined patterns.
More recently, Sayeed et al. (2012) extract triples of sentiment expression, target, and holder in a probabilistic framework. Their features are based on dependency parses and
22 2. Sentiment Analysis
include POS tags, dependency relations and features of parent and child nodes. They conclude that using more linguistic features increases the stability of the results.
An extensive analysis of different features used for the identification of polarity, holder and target is presented by Johansson and Moschitti (2013). Besides the path between the expressions in the dependency tree, they use the output of semantic role labeling and add features for predicate and argument labels. Their detailed analysis shows that features derived from grammatical and semantic role structure can be used to improve all three sentiment tasks that they are working on.
2.3.3. Machine learning guided by syntax trees
Feature extraction on syntactic structures is difficult and not all information can be modeled this way. So instead of adding features based on parse-tree information, syntax information can be used to guide a machine learning approach, similar to the way rule- based systems propagate sentiment information through the parse tree.
One way of including complex syntax information is the use of tree kernels which efficiently compare tree structures without explicitly extracting features. Tu et al. (2012) work on the task of document-level polarity classification of movie reviews with tree kernels. They focus on selecting the most relevant substructures to keep the feature space small and propose to include only structures around polarity clues with their direct head and the dependents. They investigate different types of kernels and report the best results with dependency trees, lexical information and their filter.
An example for a more direct modeling of syntactic structure is Nakagawa et al. (2010) who use conditional random fields and dependency trees for polarity classification on sentence-level. They introduce hidden random variables that represent the polarity for every node in the dependency tree. The random variables are connected if the corresponding nodes in the parse tree have a dependency relation, so the polarity is propagated through the tree. Their features include the polarities of the subtrees, prior polarities from a sentiment dictionary, polarity reversal and POS tags. They report significant improvements over a term counting baseline.
An approach that has received much attention recently is the work of Socher et al. (2013) who present recursive neural tensor networks for sentence polarity classification. In this deep-learning framework, words are represented as real-valued vectors of a fixed dimensionality. The polarity of phrases is calculated based on the structure given by a constituency tree. Their work also presents the Stanford Sentiment Treebank, a parsed version of the movie review data set (Pang and Lee, 2005) where every phrase of the con-