GATE GOURMET DEL ECUADOR CIA LTDA.
2.2.2.1. Factores Políticos.
Sentiment lexicons are typically generated independent of any target application. Thus, they usually reflect general knowledge making them useful in diverse applications (i.e.
general purpose). However, the lexicons utility is reduced when a target application domain or genre deviates from the general sentiment knowledge.
The concept: Domain. Unfortunately, the concept ‘domain’ does not seem to have unambiguous definitions from the linguistics and sociolinguistics points of view. From a purely linguistic perspective, a domain has been defined as a genre attribute that describes the broad subject field that an instantiation of a certain genre deals with (Lee,
2001). A genre is defined as a category assigned to a text based on external, non- linguistic criteria such as intended audiences, purposes and activity type (Lee,2001) as well as textual structure, form of argumentation and level of formality (Crystal,2011). Based on this definition, for example, a text from the genre NEWSPAPER ARTICLE may belong to the domain of SCIENCE. Other domains may include ART, FINANCE, RELIGION, POLITICS, SPORTS and TECHNOLOGY. However, in sociolinguistics, a domain is viewed as a social setting that is likely to influence the use of language such as
FAMILY, FRIENDSHIP, RELIGION, EDUCATION and EMPLOYMENT (Fishman,
1972). It can be observed that some categories in this latter definition may correspond to what can be called genre in the former definition (e.g. FRIENDSHIP and EMPLOY- MENT). In fact, for socio-psychological analysis, social contexts such as INTIMATE, INFORMAL, FORMAL, and INTERGROUP are identified as domains (Fishman,1972). Both notions of a domain have been used in sentiment analysis. For instance, it is com- mon to refer to collections of documents grouped per subjects of discussion as domains (e.g. HOTELS, SPORTS, BOOKS and ELECTRONICS) (Du et al.,2010,Yoshida et al.,
2011). It is also common to refer to the social setting in which documents are generated as domains (e.g TWITTER as a domain) (Kaur and Kumar, 2015, Kiritchenko and Mohammad,2016,Reitan et al.,2015). In this thesis, we use the concept of a domain in a broader sense that encompasses both definitions. Specifically, we use the concepts to refer to any collection of documents that share certain characteristics that may influence the expression of sentiment. For instance, TWITTER with its informal setting, brief nature of communication and the general public as target audience forms a domain; so also MYSPACE with its severe informal communication between friends.
Differing Vocabulary and Polarities. The deviation between a lexicon and a target genre can be in terms of vocabulary coverage whereby the lexicon supplies insufficient sentiment-bearing terms for a target genre. This is particularly the case with social media genres where non-standard vocabulary is widely used to express sentiment. A potential remedy to the coverage problem is to generate a domain-specific lexicon. How- ever, existing lexicon generation methods tend to result in lexicons with poor coverage for social media. For instance, the method in Hatzivassiloglou and McKeown (1997) has produced a lexicon based on the proximity of terms with adjectives and constrained by the occurrence of certain conjunctions. This is too restrictive for the informal social media content. A subsequent work has improved term coverage by relaxing the conjunc- tion constraints and the use of a relatively larger corpus (the web) to measure terms co-occurrence (within a text window) with known seed terms (Turney,2002). Neverthe- less, coverage is still affected by the fact that the co-occurrence has to be with infrequent seed terms. Yet, to improve coverage, the concept of double propagation was introduced (Guang. et al.,2009). Here, co-occurrence with a product/service aspects was used to identify sentiment-bearing terms and vice-versa. This runs iteratively until no further sentiment-bearing term or aspect can be found. This method was meant for the do- mains of products/services reviews where aspects mentions in sentiment expression is common. Other methods employed supervised strategies whereby a lexicon is generated from sentiment-labelled data (Mohammad et al.,2013,Pang et al.,2002). The need for labelled data limits the utility of the supervised strategies. The use of a domain-specific lexicon alone for sentiment analysis is also problematic because although test instances are expected to be of similar composition to that of domain text, it is possible for a test instance to contain terms that never appear in the domain but which may be available from a general-purpose lexicon.
The deviation of a target domain from a general-purpose lexicon can also be in terms of sentiment polarities of terms. Sentiment-bearing property of terms is known to be domain-dependent such that the same term can have different sentiment semantics in different domains. For example, the adjective ‘unpredictable’ may indicate negative sentiment in a car review, as in “unpredictable steering” but a positive sentiment in a movie review, as in “unpredictable plot” (Liu,2012). Indeed, a comparison of sentiment
analysis systems across different domains reveals that factors such as datasets size and domain/genre can significantly affect performance (Andreevskaia and Bergler,2008). The difference in polarities between a sentiment lexicon and a target domain has been ad- dressed with techniques that produce domain-adapted lexicons. Choi and Cardie(2009) investigated the adaptation of a general-purpose lexicon to a domain specific one. Their approach adapts term polarities of a general-purpose lexicon by utilizing expression- level polarities from the domain. The polarity relationship between the terms and the expressions were modelled as a set of constraints that are solved using integer linear programming. This work relied on sentiment-labelled data to obtain the expression- level polarities. It was also limited to term polarity reversal (from one sentiment class to another) but unable to adjust polarity intensity within the same class. In a similar work, a domain-specific lexicon was adapted to another domain using the information bottleneck framework (Du et al., 2010). Here, the algorithm also assumes as input a set of in-domain sentiment-labelled documents. In another work, an approach was pro- posed to identify the most effective lexicon, from among several lexicons, for sentiment analysis in a target domain (Ohana et al.,2012). This approach employs the case-based reasoning methodology and extracts documents statistics and writing styles as features on which to represent the documents (cases). The solutions to a case are the lexicons that provide correct classification of the case document as checked against human judg- ment. Thus, given a domain containing new cases (documents), sentiment classification is performed by reusing lexicons from the most similar documents to those in the given domain. It can be noted that this approach does not attempt to adapt a lexicon to a target domain.
With social media domains, the idea of distant supervision can be leveraged to gener- ate domain-specific lexicons that can capture evolving vocabulary. For instance, two Twitter-specific sentiment lexicons have been generated from tweets that are labelled based on the occurrence of certain emoticons and hashtags respectively (Kiritchenko et al., 2014, Mohammad et al., 2013), using the point-wise mutual information (PMI) approach (Turney, 2002). These lexicons are highly domain-specific and could miss general sentiment-bearing terms that may not be available in the tweets’ vocabulary, a
limitation which can be addressed by a lexicon expansion strategy.
A lexicon expansion strategy begins with a standard lexicon whose polarities are prop- agated to domain-specific terms. This is similar to lexicon generation strategies except that a lexicon generation strategy begins with a very small set of seed terms known to have a high and stable sentiment connotation across domains. In Zhou et al. (2014), a standard lexicon has been expanded with terms from an emoticon-labelled Twitter dataset. Here, similar to Mohammad et al. (2013) and Kiritchenko et al. (2014), a Twitter-specific lexicon was generated using the PMI approach (Turney,2002), however, unlike inMohammad et al.(2013) andKiritchenko et al.(2014), a negated co-occurrence of a term with a sentiment class was counted as co-occurrence of the term with the oppo- site sentiment class. For example,“I don’t like their online service :(” would be counted as a co-occurrence of ‘like’ and ‘:)’. In another lexicon expansion strategy, emoticon- labelled datasets were used to identify a suitable feature set on which to represent a set of seed terms, formed from a union of several general-purpose lexicons, for a su- pervised sentiment classification of unknown terms (Bravo-Marquez et al., 2015). The datasets were time-sorted and time-series were created for each term from the datasets’ vocabulary. Then, the feature set was extracted from the location-based and dispersion properties of the time-series. A classifier learned from the representation was then used to classify every unknown term from the vocabulary as positive, negative or neutral. Although a lexicon expansion strategy such as inZhou et al.(2014) andBravo-Marquez et al. (2015) is able to capture domain-specific terms, it is unable to adapt polarities of existing terms from the initial lexicon to domain-specific semantics. With distant supervision, a domain-specific lexicon can be generated for social media domains, and combining such a lexicon with a general-purpose lexicon will ensure domain adaptation as well as the acquisition of additional vocabulary available from the general-purpose lexicon (Muhammad et al.,2014,2013b).