4. RESULTADOS
4.5 Examen comparativo entre los distintos análisis
Word Sense Disambiguation is hardly an aspect of sentiment classification that can be cov- ered in a section of this piece of work. It is an intense area on its own, and would require its own full project. This thesis does not attempt to cover Word Sense Disambiguation, or as it is usually referred to, WSD.
What this section focuses on is part-of-speech tagging, a basic form of WSD. In the next experiments, part-of-speech information was incorporated. Tagging a piece of text according to its part-of-speech has been widely used in the area of sentiment classification, especially in distinguishing different textual classes, like adverbs, adjectives,and verbs, which are considered to be emotion carriers in text. Adjectives are especially considered as this.
There have been some instances in the field where the POS information was added to the features fed to the classifier in different forms. Part-of-speech information has been appended to text in order to generate language patterns, using association rule mining [54].
Implementation The documents in the dataset were tagged using Oliver Mason’s QTag, as in the lexical approach. The part-of-speech of each word was appended to the word,
and these words with appended POS were added to the feature set as additional features. For example, given a word like ’House’, which could be a noun or verb, the noun form of the word would be tagged like so; ’House NN’, where ’NN’ is the part-of-speech tagged by QTag. ’House NN’ was considered as a new feature, and was added to the feature set, as an additional feature to ’House’, but not as a replacement.
Two experiments were run on this augmented feature set. In the first experiment, the feature set consisting of the single features, together with the features appended with their Part-Of-Speech was fed to the classifier. In the second experiment, the set of new features, that is, the words with POS appended were used as the only features.
In the former case, a precision of 83%, a recall of 86% and an accuracy of 84% were obtained. In the latter case, a precision of 83%, a recall of 85% and an accuracy of 84% were obtained. From the obtained results, no change in accuracy between the two different variations was observed. We hypothesize that this may be due to a couple of factors, like the fact that we did not perform any feature selection, hence there may have been too much noise in the classification process.
5.8
Validation
The next series of experiments were concerned with applying the classification approach to other domains. One of the key challenges of Sentiment Analysis as has been previously established, is that of cross-domain classification. It is a well known fact in the field that classifiers trained on one domain, or those that work well on one domain tend not to do so well in other domains.
The cross domain validation we carried out at this stage was to achieve two main aims, the first one being to test the hypothesis that a classifier that works well on one domain would not perform equally well in the other, due to the different lexical characteristics of the words used in different domains. The second was to assess how the approach would fare with unbalanced datasets, being that the Cornell reviews movie dataset was a balanced set
of 1000 positive and 1000 negative documents.
Implementation We used the Multi-Domain Sentiment Dataset (MDS) used in [12]. In this dataset which is made up of product reviews from Amazon, we ran experiments on the Beauty, Computer and Video games, and the Video datasets. The video dataset is made up of movie reviews, and would serve the purpose of providing us with the means to compare how well adaptable the machine learning classifier was on classifying movie reviews from another source, and of another structure different from that of the Cornell reviews.
We preprocessed the data by extracting the review text and then tokenizing the reviews and removing stop words. The normalized frequency of the words was computed and this was fed to the classifier as features.
The results obtained are shown in Table 5.6.
Table 5.6: Other Datasets Test
Product Precision Recall Accuracy
Beauty 82% 93% 82%
Computer and Video Games 85% 98% 87%
Video 76% 90% 81%
From the classification accuracy obtained, we observed that the classifier fared well on the other reviews, attaining an accuracy score close to that of the movie reviews. It can be argued that this is due to the similarity of the language used, especially in the Computer and Video games domain and the video domain.
We must clarify that we did not train the classifier on one domain, then attempt to use it to classify documents from another domain, neither did we create a pool of features extracted from all domains, as is often the case in cross domain sentiment classification.