Subproceso de Evaluación - Dirección de Planificación

Many papers define Opinion Classification as classification of opinion text into one of the two classes, namely, positive and negative [24, 20, 32, 19, 31]. This is true in many cases, and the most fundamental thinking of anyone reviewing a product, whether the product is a camera, a movie, a food item or even an election candidate, is that either you like that product or you don’t. However this is a very generalised point of view; this is what we say when we talk about a product with friends or when we are in real a hurry. Written reviews are more elaborate. This is driven by the need for information. You would not want to go out and buy a camera without knowing the pros and cons of its every aspect (lens, viewfinder, body etc.), and you would still go to a restaurant which serves bad desserts but has amazing fish. Consumers’ taste are different; some would not mind a mind-boggling action flick with mediocre acting and some would prefer a mediocre plot with awesome acting. So there is certainly a need for elaborate information, but why we have all this information? The answer is “features“.

Every product has features. A camera has lens, viewfinder, body etc; a movie

has actors, acting, plot, direction; an election candidate has his or her own set of policies. In a best case scenario or in worst case scenario all the features will get a positive review or negative review respectively. But there are also cases when some features get positive reviews while others get negative reviews. In a review data set taken from Amazon, where a star rating of 1 is termed negative and 5 is termed positive, of 3791 reviews 1180 reviews were rated with stars 2 and 4. This statistic states that approximately 31% of the reviews were neither positive nor negative. Thus a document opinion classification which separates a positive review from negative ones starts off with a 31% error rate without even seeing any training data. This is a major problem, a problem which arises from sentences like, The camera is fine but the bland brown colour is not eye catching. This sentence is taken from a review about a camera from Amazon which has a star rating of 4.

There is a need to divert from binary classification of opinion to more elaborate graded classification. There are possibly two directions that can be taken. One is to model opinion analysis as multi-class classification where each class resembles the magnitude of the sentiment in the document. For example, there could be 5 classes, each class representing a star rating in reviews, where 1 is the most negative and 5 is the most positive. The other direction is to model opinion analysis as opinion on features instead of opinion on the whole product. The system would output a result after reading a review of a camera under three categories {Feature : Opinion : Sentence} where “Feature“ is the specific feature of the product, “Opinion“ is the opinion expressed in the product either positive or negative and “Sentence“ is the sentence in the review showing the opinion. An example output for a camera review could be:

Feature Opinion Sentence

lens Positive the lens in the camera is spectacular

viewfinder Negative you can barely look through the viewfinder

body Positive the body is light and sturdy

In this report we show the implementation on both the directions. The major contributions of this study are:

tion as a two-class model to a multi-class model. We show a method of classifying opinionated text into multiple classes, namely 4 classes termed as 1 star, 2 star, 4 star and 5 star. We left out star value 3 because these reviews are more neutral in nature and lack distinc- tion from 2 star and 4 star. We propose a method based on mixed linear programming where classification is done based on the weights attained by each feature sentence. This is a more fine-grained method of classifying opinionated text than blindly counting the number of positive sentences and negative sentences. For example, in the review text “The lens is very good. The viewfinder is bad.“, there is 1 positive sentence and 1 negative sentence, thus a count of positive or negative sentence would lead us nowhere. But we propose a system which weighs the opinion and classifies the text accordingly. In the above sentence, since the positive review is more stressed (i.e. the sentiment intensifier very is used) than the negative, the product has a higher chance of getting 4 stars, which is plausible.

• We also propose a fine-grained opinion analysis model where an opinion text or review is split into sentences reviewing different features and an opinion (positive or negative) is assigned to such sentences. The concept of fine grain analysis of opinion text is well established in the opinion analysis domain. We propose a method of identifying such sentences from the review by using targets (opinion feature), opinionated words and grammatical constructs. The target is extracted from an annotated review. The opinionated words are extracted by using the lexicon extraction method described in Chapter 2. The method successfully extracts opinionated sentences on a particular feature from the review text. The opinion assignment on the extracted sentence is done according to the weighted lexicon and compositionality shown by the opinionated text.

3.2 Motivation for using Mixed Linear Program-

In document Dirección de Planificación (página 25-42)