SEGUNDA SECCION PODER EJECUTIVO
A.1 Límites Máximos para Aditivos Alimentarios
In this implementation, similar to the Request Elicitor, this Topic-Opinion Extractor component also mainly works on explicit opinions. This is to say that implicit opinions are generally not extracted, such as there are no explicit opinion words in the sentences. When people tell stories in a narration way and do not mention any explicit opinion words, it is difficult for a computer program to mine the opinion behind. However, such sentences always exist in user reviews. This is reflected in the Results and evaluation chapter. When the reasons for the false positive and false negative results were analysed in section 4.4, “no obvious opinion term” constitutes one of the reasons across all samples from the three datasets.
However, one exception has been made towards the concept of mobile apps themselves. Two forms of implications towards to the current mobile apps are handled in this component:
(1) Orphan opinions that have no subjects associated
(2) Opinions that associate to subject terms implied to the current mobile apps
Orphan opinions are handled in version two of this component. The way of how they are handled is introduced in the previous subsection 3.5.5 when introducing the algorithm of version two.
The topics of the current mobile apps are frequently mentioned in user reviews and relate to opinions. When users express opinions to the ‘app’, mostly it means the current mobile apps.
Users also express opinions to the current mobile apps in an implication way: mentioning topics that can be implied to the current mobile apps. This can be seen in the forms of 'content', 'thought', 'detailed', 'work', 'tool', 'designed', 'motivation', 'product', 'program' and some other words. Even the pronoun “it” means the same concept most of the time.
Therefore, both versions of this component deal with those “app” related terms implications.
This is achieved through the keywords list. When the implied subject terms are mentioned, they are classified to the “App” category. The definition of the “App” category in the keywords list is below:
['App', 'app', 'application', 'content', 'database', 'design', 'designed', 'detailed', 'game', 'handles', 'help', 'helped', 'improvements', 'informative', 'it', 'job', 'motivation', 'motivator',
130
'product', 'program', 'rating', 'see', 'seeing', 'stuff', 'thing', 'thought', 'tool', 'use', 'way', 'work', 'works']
This decision is supported by the high frequencies of such terms in the user reviews. The high frequency trend remains the same in the results of all three datasets. The table below compares the frequencies of the term “app”, “app” related terms, and the rest of other terms among the three datasets in the “pairsTable”.
Table 24 Frequencies of "app" and related topics across three datasets
Dataset 1 Dataset 2 Dataset 3
(1) Frequency of term “app” 8262 31452 7186
(2) Average frequency of “app” and its related
terms 776.2333 2774.4483 1542.7857
(3) Average frequency of other terms 13.4702 20.6133 12.8182 Table 24 uses the results of version one code C.9 in this component to get above data. This is because version one code writes to the “pairsTable” without the test of popular opinions therefore is suitable for the initial subjects and opinions data gathering and analysis that help to produce better performance in version two.
The SQL statements that produce the data in Table 24 are listed below:
(1) Frequency of term “app”:
Select subject, count(*) from pairsTable where subject = “app” group by subject order by 2 desc;
(2) Average frequency of “app” and its related terms:
SELECT AVG(a.pcount) FROM (Select subject, count(*) as pcount from pairsTable p where subject in ('app', 'application', 'content', 'database', 'design', 'designed', 'detailed', 'game', 'handles', 'help', 'helped', 'improvements', 'informative', 'it', 'job', 'motivation', 'motivator', 'product', 'program', 'rating', 'see', 'seeing', 'stuff', 'thing', 'thought', 'tool', 'use', 'way', 'work', 'works') group by p.subject order by 2 desc) a;
131
(3) Average frequency of other terms:
SELECT AVG(a.pcount) FROM (Select subject, count(*) as pcount from pairsTable p where subject not in ('app', 'application', 'content', 'database', 'design', 'designed', 'detailed', 'game', 'handles', 'help', 'helped', 'improvements', 'informative', 'it', 'job', 'motivation', 'motivator', 'product', 'program', 'rating', 'see', 'seeing', 'stuff', 'thing', 'thought', 'tool', 'use', 'way', 'work', 'works') group by p.subject order by 2 desc) a;
It can be seen from the comparisons in Table 24 that “app” and its related terms are mentioned much more frequently by users than other terms in the reviews. It may be also worth noting that the frequencies of the term “game” are 12, 328, and 14909 in the three datasets respectively.
Implicit opinions are harder to mine than explicit opinions in general. This is not only because that implicit opinions have more complicated rules or no rules, but also because that negation words are harder to handle. Some attempts have been made in the experimentation stage of this prototype for certain implicit opinion mining, but failed due to awkward handling of negations. Negation words are much easier to handle with the help of the Stanford dependency parser. That has been reported in the previous subsection 3.5.3 and the evaluation is reported in a later section 4.5 in the next chapter.
For future researchers who aim to deal with implicit opinions in user reviews, combinations with dependency trees would be a recommendation in this direction since Stanford dependency parser has relatively mature handling for negation words.
132