Capítulo 4 – Discussão
4.1.2 Percentagem de Células Marcadas vs Intensidade Média de
SenticNet, freely available at http://cs.stir.ac.uk/~eca/sentics, currently contains more than 5,700 polarity concepts (nearly 40% of Open Mind corpus). It is very easy to interface SenticNet with any kind of opinion mining application and, especially if used within Open Mind software (for a full correspondence of concepts), it is a precise polarity detection tool. In particular, after deconstructing text into concepts (more details in chapter 5), SenticNet can be used to associate polarity values to these and, hence, infer the overall polarity of a clause, sentence, paragraph or document by av- eraging such values. SenticNet’s capacity of detecting opinion polarity was compared with SentiWordNet’s over a collection of 2,000 patient opinions, of which 57% are labelled as negative, 32% as positive and the rest as neutral. After extracting con- cepts from each opinion, relative polarity values were searched in SentiWordNet and SenticNet and compared with the dataset labels, in order to compute recall and preci- sion rates as evaluation metrics. Results showed SenticNet to be much more accurate than SentiWordNet. The former, in particular, can identify positive opinions with
against 46.5%), for a total F-measure value of 67.1% versus 49.8%. In SenticNet 2.0, currently under-development, the whole Open Mind corpus is being labelled with po- larity values and a list of mood and sentic values is being associated with each common sense concept, in order to provide the public with a comprehensive semantic resource for easily extracting affective information from natural language text.
4.5
Conclusions
This chapter has shown how the application of dimensionality reduction techniques on the matrix representation of AffectNet yields a vector space of affective common sense knowledge, which can be accordingly configured, depending on the desired trade-off between precision and efficiency, and on the problem being tackled (section 4.1). So far, TSVD appears to be a good method for generalising the information contained in AffectNet but it is very expensive in both computing time and storage, as it requires costly arithmetic operations such as division and square root in the computation of rotation parameters. This is a big issue because AffectNet is keeping on growing, in parallel with the continuously extended versions of ConceptNet. To this end, alternative multi-dimensionality reduction techniques, e.g., independent component analysis (ICA) and random projections, are currently being explored.
The rest of the chapter illustrated a new PAM-based clustering method for organis- ing and categorising such vector space (section 4.2) and how the ensemble application of dimensionality reduction and graph mining techniques can be exploited to emulate conscious and unconscious reasoning processes (section 4.3). In addition, this chap- ter showed how the developed methods are employed to design a publicly available semantic resource for opinion mining (section 4.4). In order to assess how the devel- oped reasoning techniques can be effectively exploited for tackling real-world problems, next chapter will explore multiple ways to combine such techniques for the design of intelligent applications in fields such as Social Web, HCI, and e-health.
Chapter 5
Sentic Knowledge Base
Exploitation
Knowing is not enough; we must apply. Willing is not enough; we must do.
Johann Wolfgang von Goethe
The amount of data available on the Web is growing exponentially. These data, how- ever, are mainly in an unstructured format and, hence, not machine-processable and machine-interpretable. What is called collective intelligence today is actually just col- lected intelligence as the value of user contributions is simply in their being collected together and aggregated into community or domain specific sites. True collective in- telligence can emerge if the data collected from all those people is aggregated and recombined to create new knowledge and new ways of learning that individual humans cannot do by themselves [199]. So far, online information retrieval has mainly relied on keyword-based algorithms, which have proved to have important limitations, e.g., the inability to recognise topical authority that humans recognise effortlessly without
the explicit words being in the content. In order to let machines better understand natural language and, hence, conveniently analyse and aggregate opinions and senti- ments over the Web, we need to provide them with both adequately broad common sense knowledge bases and reasoning methods to efficiently handle these. This chapter describes how the knowledge bases, and the reasoning tools built on the top of them, are exploited for the design of an intelligent opinion mining engine (section 5.1) and, hence, for the development of applications in fields such as Social Web (section 5.2), HCI (section 5.3), and e-health (section 5.4). The chapter, eventually, ends with some concluding remarks (section 5.5).
5.1
Opinion Mining Engine: A Semantics and Sentics Ex-
traction Tool
In order to effectively mine and analyse opinions and sentiments, it is necessary to bridge the gap between unstructured natural language data and structured machine- processable data. To this end, an intelligent software engine has been proposed by Cambria et al. [145] that aims to extract the semantics and sentics, that is the cognitive and affective information, associated with natural language text, in a way that the opinions and sentiments in it contained can be more easily aggregated and interpreted. The engine exploits graph mining and multi-dimensionality reduction techniques on Isanette and AffectNet respectively, and it is based on the Hourglass model (Fig. 5.1). Several other affect recognition and sentiment analysis systems [200, 201, 202, 203, 204, 205, 206] are based on different emotion categorisation models, which generally comprise a relatively small set of categories (Table 5.1). The Hourglass of Emotions, in turn, allows the opinion mining engine to classify affective information both in a cate- gorical way (according to a wider number of emotion categories) and in a dimensional format (which facilitates comparison and aggregation). Such engine, in particular, con- sists of four main components: a pre-processing module, which performs a first skim of the opinion (subsection 5.1.1), a semantic parser, whose aim is to extract concepts
from the opinionated text (subsection 5.1.2), the Isanette module, for inferring the se- mantics associated with the given concepts (subsection 5.1.3), and the AffectiveSpace module, for the extraction of sentics (subsection 5.1.4). Eventually, this section illus- trates an output example of the engine, given a short natural language sentence as input (subsection 5.1.5), and provides a thorough evaluation of the system (subsection 5.1.6).