SECRETARIA DE SALUD
BENEFICIADA PERSONAL JORNADA DE LA UNIDAD MOVIL
Considering the work involved in this thesis, the focus of this subsection will be an introduction of a traditional model for opinion mining and the story that researchers use this model to mine opinions, targets (topics) and bridging them together.
Although researchers chose different aspects and approaches to present their understandings in their surveys and findings, a typical popular model is presented in Munezero et al.’s paper [24]. In this model, researchers think a sentiment is composed of a sentiment holder, an emotional disposition, and an object.
It is not clear who is the original author of this model because it could be traced back to 2004 from the paper by Kim and Hovy [13], who tried to identify those elements and combine them together, and actually described an opinion as a quadruple [Topic, Holder, Claim, Sentiment]. It could be arguable whether this model should be credited to Kim and Hovy although they did not name it. Alternatively, researchers might prefer to think that this is a common English structure.
17
An example of an opinion that conforms to this model is in Figure 3.
Figure 3 Example of an opinion with an opinion holder
However, it frequently occurs that there is no opinion holder in a review sentence. This is a very common scenario in mobile app user reviews, e.g. the example depicted in Figure 4.
Figure 4 Example of an opinion without an opinion holder
Figure 4 precisely describes the task that the Topic-Opinion Extractor component of this prototype currently can do: extracting pairs of topic words and opinion words.
Many researchers had a similar train of thoughts to Munezero et al. [24]. Researchers did a lot of research on finding the correct emotional dispositions or refinement of the opinion vocabularies [25]–[32].
Table 2 Research of mining or refining opinions
Year Paper Methods of mining opinions
1997 Predicting the semantic orientation of adjectives [25]
Corpora based, classification of adjectives’ orientation
18
2000 Learning Subjective Adjectives from Corpora [26]
Corpora based, clustering adjectives words according to distributional similarity
2003 Measuring Praise and Criticism: Inference of Semantic Orientation from Association [27]
Corpora based, inferring the semantic orientation of a word from its statistical association with a set of positive and negative paradigm words
2003 Learning extraction patterns for subjective expressions [28]
Corpora based, bootstrapping process fed with syntactic patterns 2005 Automatic Detection of Opinion Bearing
Words and Sentences [29]
Lexicon based, from opinion words mining to sentences subjectivity classification
2006 Mining WordNet for fuzzy sentiment:
Sentiment tag extraction from WordNet glosses [30]
Lexicon based, analyse the WordNet through a three-step (pass) method against 1904 adjective seeds, to form a sentiment strength distribution of WordNet terms
2009 Subjectivity recognition on word senses via semi-supervised mincuts [31]
Lexicon based, supplement WordNet entries with information on the subjectivity of its word senses through a semi-supervised approach 2012 Automatic detection of political opinions
in tweets [32]
NLP techniques applied in their GATE tool
While some researchers contributed more when looking for the objects [33]–[38].
19
Table 3 Research contributed to mining opinion targets (topics)
Year Paper Methods of identifying topics
2008 Modeling online reviews with multi-grain topic models [33]
based on extensions to standard topic modelling methods such as LDA and PLSA to induce multi-grain topics; not only extract aspects, but also cluster them into coherent topics
2008 Topic identification for fine-grained opinion analysis [34]
Proposed an algorithm for opinion topic identification, but this paper only focuses on the first half: clustering opinions that share same topics, which was achieved by adapting a standard machine learning-based approach to noun phrase co-reference resolution
(the next step will be labelling the clusters with the name of the topic)
2011 Opinion word expansion and target extraction through double propagation [35]
Double propagation; Rule-based; Minipar was used in this paper.
(At the time of writing this thesis, the Minipar homepage shows 404 Not Found messages.) 2012 System and method for
automatically summarizing fine-grained opinions in digital text [36]
This patent continues the research in their 2008 paper above [34]. For the topic identification, they use predefined topic lists to train topic models classifiers with supervisory labels. If a sentence contains an opinion expression, the classifiers determine which of the topic categories are represented in the sentence.
20
2017 RubE : Rule-based methods for extracting product features from online consumer reviews [37]
Rule-based unsupervised methods that extract both subjective and objective features;
Subjective features by extending double propagation with indirect dependency and comparative construction
Objective features: by incorporating part–whole relation and review-specific patterns
2017 A two-fold rule-based model for aspect extraction [38]
1) Using sequential patterns-based rules to extract explicit aspects that are associated with regular opinions;
2) Improving aspect extraction accuracy with a frequency-based approach along with normalized Google distance;
3) Extracting aspects that are associated with domain dependent opinions.
Moreover, some researchers made contributions on identification of the sentiment holders when dealing with sentiment analysis [13], [39]–[44].
Table 4 Research contributed to identifying opinion holders
Year Paper Methods of identifying opinion holders
2004 Automatic extraction of opinion propositions and their holders [39]
Bethard et al. labelled holders when they labelled propositional opinions. They labelled all agents of opinion-propositions as holders, but counted implicit holders that were “speakers” or un-lexicalized as no holders.
21
2004 Determining the sentiment of opinions [13]
Kim and Hovy used a named entity tagger to identify potential opinion holders. Only
“Person” and “Organization” that are closest to the topic phrase are considered.
2005 Identifying sources of opinions with conditional random fields and extraction patterns [40]
An automatic supervised learner is created to extract opinion sources. The opinion sources that Choi et al. defined are much broader than the opinion holder concept above. Choi et al. consider “authority”,
“location”, and “proper name” as opinion sources. In their work, one opinion source is identified in one sentence.
2006 Identifying and analysing judgment opinions [41]
Kim and Hovy define a holder as one of the elements of a judgement opinion. They use Charniak parser to interpret sentence tree structures. Based on the paths on the trees, the type of the holder candidates, and the distance between the holder and the opinion expression, their learner is able to identify multiple holders for multiple opinions in one sentence.
2006 Extracting opinions, opinion holders, and topics expressed in online news media text [42]
(1) Identify opinions
(2) Label semantic roles related to the opinions
(3) Find holders and topics of opinions among the identified semantic roles (manually built a mapping table to map elements to holder or topic)
22
2010 Convolution kernels for opinion holder extraction [43]
Different convolution kernels are investigated for opinion holder extraction.
The performance is the best if all kernels are combined. Stanford Parser is used.
2014 Frame-based detection of opinion holders and topics: a model and a tool [44]
A semantic representation opinion model of sentences is proposed in this paper. Opinion holders are annotated to the “agent” that is associated with four types of “opinion trigger verbs”.
Lastly, researchers bridged them together at some points [45], [46].
Table 5 Research of bridging topics and opinions together
Year Paper Methods of association
2003 Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques [45]
1) a topic specific feature term extraction, 2) sentiment extraction, and 3) (subject, sentiment) association by relationship analysis
2007 Extracting appraisal expressions [46] An attitude type taxonomy and two domain-dependent target type taxonomies are predefined.
Stanford Parser is used for dependency representation. A ranked list of linkage specifications of the paths in the dependency trees is hand-constructed.
When linking each attitude to a target, the link between them is compared with the ranked list and prioritized.
23
Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques [45]:
Yi et al. extract candidate feature terms based on a set of Part-Of-Speech (POS) patterns first, then select them through two algorithms: a mixture language model and likelihood ratio [45].
For sentiment phrases, they use a syntactic parser to identify subject, object, adjective, and prepositional phrases of a sentence. From those elements, they identify all sentiment adjectives as sentiments. If the feature terms identified before have sentiment words, they become sentiments.
For the feature / target and sentiment association, they use a sentiment pattern database to determine a sentiment phrase’s target and polarity, if the target exists in the sentence. For sentences that do not match the sentiment pattern database, they assign the association based on a group of B-expressions (for incomplete fragments containing sentiment) if possible.
This is a typical early work that researchers identify sentiments and targets/topics separately, and then bridge them together.
Because Yi et al.’s datasets are selected with a subject term in each sentence, and they are datasets from 15 years ago, their reported performance can not act as baselines to compare with the evaluation datasets used for the prototype in this thesis. In the evaluation of this thesis, the reviews are randomly sampled without any restriction or selection process.
Extracting appraisal expressions [46]:
Bloom, Garg and Argamon constructed an attitude type taxonomy and two domain-dependent target type taxonomies. Both are in tree structures and have vocabularies as leaves under each category.
They then use Stanford Dependency Parser for dependency representation. They hand-constructed a ranked list of linkage specifications of the paths in the dependency trees that connect attitudes and targets. Priorities are manually assigned for the paths so only the highest priority specification can be used when more than one paths are found. After attitudes and target candidates are found, in order to link each attitude to a target, the links between them are compared with the ranked list and prioritized.
24
Worth noting, if no linkage is found at all, they assign the default category of targets to the attitudes. This situation frequently happens in user reviews data used in this thesis. This thesis adopts a similar method to their approach: assigning the opinions to the current mobile app.
Bloom, Garg and Argamon used Stanford Dependency Parser in their work. Although they adopted the similar approach as other researchers at that time, and used Stanford dependency trees as mainly the linking methods to bridge the previously resulting attitudes and targets, it was a significant sign that Stanford dependency parser began to contribute to research considerably.
It could be arguable whether these jobs could be easier if with more help of the Stanford Dependency Parser [47]. However, the Stanford Parser might not have reached sound robustness and gained enough popularity at the time of this paper was written, despite that its first release was in 2002 [48].