3 CAPÍTULO III: METODOLOGÍA PARA EL MANEJO INTERNO DE MATERIALES Y
3.4 FASE IV
From an NLP perspective, even though solving the sentiment analysis problem involves solving many problems of natural language like co-reference resolution and negation, we often do not need to thoroughly understand the context to de- termine the sentiment [58]. Approaches that address the general task of senti- ment analysis can be divided into two general categories: unsupervised and su- pervised. These approaches are explained further below.
Unsupervised Methods for Sentiment Analysis
Unsupervised methods for sentiment analysis are mainly lexicon-based. Lexicon-based methods [34, 59, 60, 4] rely on sentiment related words that can be obtained using different approaches. Sentiment lexicons are words that are indicative of sentiment, either positive or negative. Although sentiment words and phrases are important for sentiment analysis, relying only on them is far from sufficient. A positive or negative sentiment word can have opposite ori-
entations in different domains of analysis. Additionally, a sentence containing sentiment words may not express any sentiment and many sentences without sentiment words can bear sentiment.
Supervised Methods for Sentiment Analysis
Feature-based supervised methods such as maximum entropy classification and support vector machines have been used for the classification of the senti- ment [32]. The performance of these models depend on the features that are used to represent text. Even though term frequency features such as tf-idf have tradi- tionally been important in many NLP tasks, in sentiment analysis task a better performance can be achieved using presence rather than frequency [32]. Higher order n-grams are shown not to be more effective than uni-grams in sentiment detection tasks [32]. However, in some domains, product-review sentiment clas- sification can benefit from bi-grams and tri-grams [36]. Part-of-speech (POS) information is commonly used in sentiment analysis and opinion mining. The reason for this is that POS tagging can be considered to be a simple form of word sense disambiguation [61]. Incorporating syntactic relations has also been inves- tigated for sentiment classification. Such linguistic features seems particularly relevant with short pieces of text [62]. Parsing the text can also help in modelling negation, intensifiers, and diminishers [63].
Recently, deep learning models have been applied for identification of the sentiment. Such models do not depend on engineering domain or task-specific features. For instance, recursive neural networks have been used to hierarchi- cally compose word embeddings based on syntactic parse trees. These vectors are then used to identify the sentiments of the phrases and sentences [64]. Bi- directional LSTMs have also been used for sentiment classification [65], outper- forming recursive neural networks that are based on syntactic parse trees.
In the following sections, we look at approaches that are used for the tasks of target-dependent sentiment analysis and aspect-based sentiment analysis.
2.3.4.1 Target-Dependent Sentiment Analysis
Several approaches have been used to address the task of target-dependent sen- timent analysis. Rule-based target-dependent features together with traditional target-independent features for sentiment analysis are used in [1]. Some ap- proaches utilise the syntactic tree of a sentence. For instance, a recursive neural network is used in [41] which passes sentiment signals from sentiment related words to specific targets on a dependency tree. However, data generated on so- cial media blogs and Twitter are not necessarily grammatically correct and can be challenging to parse [66, 67]. More recently, syntax independent features and models are used for solving this task. For instance, word embeddings are used to generate features using the left and the right-hand context of each entity [2]. Also, different neural network architectures such as Convolutional Neural Net- works (CNN) and variations of Recurrent Neural Networks (RNN) [42] have been applied to this task.
2.3.4.2 Aspect-Based Sentiment Analysis
Aspect-based sentiment analysis is usually divided into two sub tasks [54]: aspect detection and sentiment identification. In the aspect detection task, the goal is to identify the presence of an aspect in a sentence. Aspects are concept names from a given domain ontology and do not necessarily occur as terms in a sen- tence. This task sometimes involves extraction of the aspect target expression. An aspect target expression is a span of text naming a particular aspect of the target entity. Sentiment polarity identification assigns a sentiment polarity label (e.g. positive, negative, neutral) to a given aspect of the entity. For example, in the sentence “The pizza is the best if you like thin-crusted pizza”, extracted in- formation should be: food (aspect), pizza (aspect term expression) and positive (sentiment).
Separate Tasks
To solve the task of aspect-based sentiment analysis, many of the existing work treat two tasks of aspect and sentiment detection as two separate tasks. Aspect category detection is often formulated as a classification task in a supervised set-
ting. In this framework, text features are defined over the unit of text and are fed into a classifier such as logistic regression or SVM [52, 68]. Convolutional neural networks have also been used in aspect category classification [69], achieving a great performance on SemEval shared task.
Sentiment polarity is then identified for each detected aspect category. As- pect sentiment identification has been addressed both with and without super- vision. In a supervised setting, a classifier is often trained using a defined set of carefully designed features [68, 70]. Neural networks, especially variants of RNNs and LSTMs have also been used for the sentiment detection task [71, 72, 73]. These models achieve comparable results to feature-based models where a lot of effort is required for defining the features.
Conditional random fields (CRFs) [74] have been very successful for extract- ing aspect target expression [75].5However, the success of CRFs depends heavily on the use of an appropriate feature set, which often requires a lot of engineer- ing effort for each task at hand. Unsupervised methods are also popular for this sub-task [76, 58, 77, 76].
Joint Approaches
Joint models have been proposed for detecting aspects and their polarities [78, 79, 3, 80, 81]. In [78] a hierarchical sequential learning is applied using CRFs to jointly extract aspect terms boundaries, opinion polarity, and intensity. Multi- grained LDA has been used [82] to identify topics, sentiment and the evidence that support aspect ratings jointly. Hierarchical deep learning models have also been used by leveraging parse tree of a sentence [3] to extract aspects and their sentiment.
2.3.4.3 Our Work
The task of targeted aspect-based sentiment analysis that is proposed in this the- sis, is very similar to the task of aspect-based sentiment analysis. However, in addition to identifying the relevant sentiment for each aspect, we also need to 5Aspect target expression is an intermediary task to help in identifying the accurate sentiment of an aspect
identify the target entity that the aspect and sentiment are expressed for. There- fore, the existing methods for the task of aspect-based sentiment analysis are not sufficient for this task. To solve this task, we propose a joint approach in which the target location, aspects, and the polarity towards each aspect are identified in a single step.
Traditionally, a classifier was trained using representations of sentences based on extensive feature engineering which resulted in great performances for different sub-tasks of aspect-based sentiment analysis [83, 84]. Recurrent neu- ral networks (RNN) and specifically Long Short Memory Networks (LSTM) [85] have become increasingly popular, resulting in the state of the art performances in many NLP tasks [86, 87, 88, 71]. Variations of LSTMs and RNNs have also been used for sentiment classification in aspect-based sentiment analysis task [73, 72] which have resulted in comparable performances to the traditional bag of n- grams representations without the need for extensive feature engineering efforts.
Motivated by these successes, we propose discriminative models, based on representations that are obtained using sequential models such as LSTMs. We compare the results with the results obtained using discriminative models that are based on the traditional bag of n-grams representations. These representa- tions can either be sparse and based on generic pre-defined syntactic and se- mantic features (e.g. uni-grams, bi-grams, POS) or dense and based on linear compositions of the embeddings of the words in the unit of text.
Neural models such as LSTMs often need a large number of training exam- ples to learn good representations. Instead of relying on adding data through expensive human annotation, we investigate data augmentation. Using data augmentation, we can generate training samples with more lexical and syntac- tic variations compared to the samples in the training set. This can lead to mod- els that can generalise better on unseen data. Data generation and augmentation have been used in the past in machine learning [89] to inject prior knowledge and to improve the performance of the prediction models. In NLP, data augmentation has been used in the past to generate positive [90] andnegative examples [91].
Adding more sophisticated features to the traditional bag of n-grams repre- sentations or designing more sophisticated sequential neural networks can also be considered for improving the results. However, adding more features or em- ploying a more sophisticated architecture are both orthogonal to data augmen- tation and can be incorporated further. Here, we look at data augmentation as we find it an intuitive way of incorporating domain knowledge into the represen- tations and the models.
Part I
Opinion Aggregation For
Neighbourhoods
Predicting Population Demographics
In this chapter, we investigate whether the discussions on QA platforms about neighbourhoods reflect the demographic attributes of their population. Exam- ples of demographic attributes are deprivation levels, percent population of Mus- lim, and percent population of White ethnicity. The values of these attributes are reflected in census data statistics. The focus of this chapter is investigating the following hypothesis, specifically using the discussions from Yahoo! Answers QA platform:Hypothesis 1 The language used in QA discussions about neighbourhoods re- flects the demographic attributes of their population taken from census records.
To investigate the above hypothesis, in the next section, we raise appropriate research questions.
3.1 Research Questions
In this chapter, we investigate whether there are correlations between the lan- guage used in discussions on QA platform of Yahoo! Answers and the demo- graphic attributes of neighbourhoods. We also investigate the extent in which Yahoo! Answers discussions can be used to predict such attributes. To provide baselines, we also apply our methods to the data from Twitter.
Q1: Are there strong and significant correlations between the language used in Yahoo! Answers discussions and the demographic attributes of neighbour- hoods?
Q2: How well can features based on text from Yahoo! Answers discussions pre- dict demographic attributes of neighbourhoods?
Q3: What are the limitations of using Yahoo! Answers data in predicting demo- graphic attributes of neighbourhoods?
In the following, we describe the technical background for the methods used in this chapter. The reader can skip Section 3.2 if already familiar with linear re- gression, non-linear regression using basis functions and Gaussian process re- gression. We define our approach in Section 3.3. This includes the scope of the problem, the entities of our models and the methods we use for correlation and prediction. This is followed by a description of our dataset, experimental setup and the results. At the end, we discuss our findings and answer the above ques- tions.