PDF Opinion Mining and Sentiment Analysis - inaoep.mx

(1)

Vol. 2, Nos. 1–2 (2008) 1–135 c 2008 B. Pang and L. Lee DOI: 10.1561/1500000001

Opinion Mining and Sentiment Analysis

Bo Pang

¹

and Lillian Lee

²

1 Yahoo! Research, 701 First Avenue, Sunnyvale, CA 94089, USA, [email protected]

2 Computer Science Department, Cornell University, Ithaca, NY 14853, USA, [email protected]

Abstract

An important part of our information-gathering behavior has always been to ﬁnd out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a ﬁrst-class object.

This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material

(2)

privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

(3)

1

Introduction

Romance should never begin with sentiment. It should begin with science and end with a settlement.

— Oscar Wilde, An Ideal Husband

1.1 The Demand for Information on Opinions and Sentiment

“What other people think” has always been an important piece of information for most of us during the decision-making process. Long before awareness of the World Wide Web became widespread, many of us asked our friends to recommend an auto mechanic or to explain who they were planning to vote for in local elections, requested reference letters regarding job applicants from colleagues, or consultedConsumer Reportsto decide what dishwasher to buy. But the Internet and the Web have now (among other things) made it possible to ﬁnd out about the opinions and experiences of those in the vast pool of people that are nei- ther our personal acquaintances nor well-known professional critics — that is, people we have never heard of. And conversely, more and more people are making their opinions available to strangers via the Internet.

1

(4)

Indeed, according to two surveys of more than 2000 American adults each [63, 127],

• 81% of Internet users (or 60% of Americans) have done online research on a product at least once;

• 20% (15% of all Americans) do so on a typical day;

• among readers of online reviews of restaurants, hotels, and various services (e.g., travel agencies or doctors), between 73% and 87% report that reviews had a signiﬁcant inﬂuence on their purchase;¹

• consumers report being willing to pay from 20% to 99% more for a 5-star-rated item than a 4-star-rated item (the variance stems from what type of item or service is considered);

• 32% have provided a rating on a product, service, or per- son via an online ratings system, and 30% (including 18%

of online senior citizens) have posted an online comment or review regarding a product or service.²

We hasten to point out that consumption of goods and services is not the only motivation behind people’s seeking out or expressing opinions online. A need for political information is another important factor. For example, in a survey of over 2500 American adults, Rainie and Horrigan [248] studied the 31% of Americans — over 60 million people — that were 2006campaign internet users, deﬁned as those who gathered information about the 2006 elections online and exchanged views via email. Of these,

• 28% said that a major reason for these online activities was to get perspectives from within their community, and 34%

said that a major reason was to get perspectives from outside their community;

• 27% had looked online for the endorsements or ratings of external organizations;

1Section 6.1 discusses quantitative analyses of actual economic impact, as opposed to consumer perception.

2Interestingly, Hitlin and Rainie [123] report that “Individuals who have rated something online are also more skeptical of the information that is available on the Web.”

(5)

• 28% said that most of the sites they use share their point of view, but 29% said that most of the sites they use challenge their point of view, indicating that many people are not simply looking for validations of their pre-existing opinions;

and

• 8% posted their own political commentary online.

The user hunger for and reliance upon online advice and recom- mendations that the data above reveals is merely one reason behind the surge of interest in new systems that deal directly with opinions as a ﬁrst-class object. But, Horrigan [127] reports that while a majority of American internet users report positive experiences during online product research, at the same time, 58% also report that online information was missing, impossible to ﬁnd, confusing, and/or overwhelming. Thus, there is a clear need to aid consumers of products and of information by building better information-access systems than are currently in existence.

The interest that individual users show in online opinions about products and services, and the potential inﬂuence such opinions wield, is something that vendors of these items are paying more and more attention to [124]. The following excerpt from a whitepaper is illustra- tive of the envisioned possibilities, or at the least the rhetoric surround- ing the possibilities:

With the explosion of Web 2.0 platforms such as blogs, discussion forums, peer-to-peer networks, and various other types of social media . . . consumers have at their disposal a soapbox of unprecedented reach and power by which to share their brand experiences and opinions, positive or negative, regarding any product or service.

As major companies are increasingly coming to realize, these consumer voices can wield enormous inﬂuence in shaping the opinions of other consumers — and, ulti- mately, their brand loyalties, their purchase decisions, and their own brand advocacy. . . . Companies can respond to the consumer insights they generate through social media monitoring and analysis by modifying their

(6)

marketing messages, brand positioning, product development, and other activities accordingly.

— Zabin and Jeﬀeries [327]

But industry analysts note that the leveraging of new media for the purpose of tracking product image requires new technologies; here is a representative snippet describing their concerns:

Marketers have always needed to monitor media for information related to their brands — whether it’s for public relations activities, fraud violations,³ or competitive intelligence. But fragmenting media and changing consumer behavior have crippled traditional monitoring methods. Technorati estimates that 75,000 new blogs are created daily, along with 1.2 million new posts each day, many discussing consumer opinions on products and services. Tactics [of the traditional sort] such as clipping services, ﬁeld agents, and ad hoc research simply can’t keep pace.

— Kim [154]

Thus, aside from individuals, an additional audience for systems capa- ble of automatically analyzing consumer sentiment, as expressed in no small part in online venues, are companies anxious to understand how their products and services are perceived.

1.2 What Might be Involved? An Example Examination of the Construction of an Opinion/Review Search Engine

Creating systems that can process subjective information eﬀectively requires overcoming a number of novel challenges. To illustrate some of these challenges, let us consider the concrete example of what building anopinion- or review-search application could involve. As we have discussed, such an application would ﬁll an important and prevalent

3Presumably, the author means “the detection or prevention of fraud violations,” as opposed to thecommissionthereof.

(7)

information need, whether one restricts attention to blog search [213]

or considers the more general types of search that have been described above.

The development of a complete review- or opinion-search application might involve attacking each of the following problems.

(1) If the application is integrated into a general-purpose search engine, then one would need to determine whether the user is in fact looking for subjective material. This may or may not be a diﬃcult problem in and of itself: perhaps queries of this type will tend to contain indicator terms like “review,”

“reviews,” or “opinions,” or perhaps the application would provide a “checkbox” to the user so that he or she could indicate directly that reviews are what is desired; but in general, query classiﬁcation is a diﬃcult problem — indeed, it was the subject of the 2005 KDD Cup challenge [185].

(2) Besides the still-open problem of determining which documents are topically relevant to an opinion-oriented query, an additional challenge we face in our new setting is simultaneously or subsequently determining which documents or portions of documents contain review-like or opinionated material. Sometimes this is relatively easy, as in texts fetched from review-aggregation sites in which review- oriented information is presented in relatively stereotyped format: examples include Epinions.com and Amazon.com.

However, blogs also notoriously contain quite a bit of subjective content and thus are another obvious place to look (and are more relevant than shopping sites for queries that con- cern politics, people, or other non-products), but the desired material within blogs can vary quite widely in content, style, presentation, and even level of grammaticality.

(3) Once one has target documents in hand, one is still faced with the problem of identifying the overall sentiment expressed by these documents and/or the speciﬁc opinions regarding particular features or aspects of the items or topics in question, as necessary. Again, while some sites make this

(8)

kind of extraction easier — for instance, user reviews posted to Yahoo! Movies must specify grades for pre-deﬁned sets of characteristics of ﬁlms — more free-form text can be much harder for computers to analyze, and indeed can pose additional challenges; for example, if quotations are included in a newspaper article, care must be taken to attribute the views expressed in each quotation to the correct entity.

(4) Finally, the system needs to present the sentiment information it has garnered in some reasonable summary fashion.

This can involve some or all of the following actions:

(a) Aggregation of “votes” that may be registered on diﬀerent scales (e.g., one reviewer uses a star system, but another uses letter grades).

(b) Selective highlighting of some opinions.

(c) Representation of points of disagreement and points of consensus.

(d) Identiﬁcation of communities of opinion holders.

(e) Accounting for diﬀerent levels of authority among opinion holders.

Note that it might be more appropriate to produce a visualization of sentiment data rather than a textual summary of it, whereas textual summaries are what is usually created in standard topic-based multi-document summarization.

1.3 Our Charge and Approach

Challenges (2), (3), and (4) in the above list are very active areas of research, and the bulk of this survey is devoted to reviewing work in these three sub-ﬁelds. However, due to space limitations and the focus of the journal series in which this survey appears, we do not and cannot aim to be completely comprehensive.

In particular, when we began to write this survey, we were directly charged to focus on information-access applications, as opposed to work of more purely linguistic interest. We stress that the importance of work in the latter vein is absolutely not in question.

(9)

Given our mandate, the reader will not be surprised that we describe the applications that sentiment-analysis systems can facilitate and review many kinds of approaches to a variety of opinion-oriented clas- siﬁcation problems. We have also chosen to attempt to draw attention to single- and multi-document summarization of evaluative text, espe- cially since interesting considerations regarding graphical visualization arise. Finally, we move beyond just the technical issues, devoting sig- niﬁcant attention to the broader implications that the development of opinion-oriented information-access services have: we look at questions of privacy, manipulation, and whether or not reviews can have measur- able economic impact.

1.4 Early History

Although the area of sentiment analysis and opinion mining has recently enjoyed a huge burst of research activity, there has been a steady undercurrent of interest for quite a while. One could count early projects on beliefs as forerunners of the area [48, 317]. Later work focused mostly on interpretation of metaphor, narrative, point of view, aﬀect, evidentiality in text, and related areas [121, 133, 149, 262, 306, 310, 311, 312, 313].

The year 2001 or so seems to mark the beginning of widespread awareness of the research problems and opportunities that sentiment analysis and opinion mining raise [51, 66, 69, 79, 192, 215, 221, 235, 291, 296, 298, 305, 326], and subsequently there have been literally hundreds of papers published on the subject.

Factors behind this “land rush” include:

• the rise of machine learning methods in natural language processing and information retrieval;

• the availability of datasets for machine learning algorithms to be trained on, due to the blossoming of the World Wide Web and, speciﬁcally, the development of review-aggregation web-sites; and, of course

• realization of the fascinating intellectual challenges and com- mercial and intelligence applications that the area oﬀers.

(10)

1.5 A Note on Terminology: Opinion Mining, Sentiment Analysis, Subjectivity, and All that

‘The beginning of wisdom is the deﬁnition of terms,’

wrote Socrates. The aphorism is highly applicable when it comes to the world of social media monitoring and analysis, where any semblance of universal agreement on terminology is altogether lacking.

Today, vendors, practitioners, and the media alike call this still-nascent arena everything from ‘brand monitoring,’ ‘buzz monitoring’ and ‘online anthropology,’ to

‘market inﬂuence analytics,’ ‘conversation mining’ and

‘online consumer intelligence’. . . . In the end, the term

‘social media monitoring and analysis’ is itself a verbal crutch. It is placeholder [sic], to be used until something better (and shorter) takes hold in the English language to describe the topic of this report.

— Zabin and Jeﬀeries [327]

The above quotation highlights the problems that have arisen in trying to name a new area. The quotation is particularly apt in the context of this survey because the ﬁeld of “social media monitoring and analysis” (or however one chooses to refer to it) is precisely one that the body of work we review is very relevant to. And indeed, there has been to date no uniform terminology established for the relatively young ﬁeld we discuss in this survey. In this section, we simply mention some of the terms that are currently in vogue, and attempt to indicate what these terms tend to mean in research papers that the interested reader may encounter.

The body of work we review is that which deals with the computational treatment of (in alphabetical order)opinion,sentiment, andsub- jectivityin text. Such work has come to be known as opinion mining, sentiment analysis, and/or subjectivity analysis. The phrases review miningandappraisal extractionhave been used, too, and there are some connections to affective computing, where the goals include enabling computers to recognize and express emotions [239]. This proliferation of terms reflects differences in the connotations that these terms carry,

(11)

both in their original general-discourse usages⁴ and in the usages that have evolved in the technical literature of several communities.

In 1994, Wiebe [311], influenced by the writings of the literary theorist Banfield [26], centered the idea of subjectivity around that of private states, defined by Quirk et al. [245] as states that are not open to objective observation or verification. Opinions, evaluations, emotions, and speculations all fall into this category; but a canonical example of research typically described as a type of subjectivity analysis is the recognition of opinion-oriented language in order to distinguish it from objective language. While there has been some research self-identified as subjectivity analysis on the particular application area of determining the value judgments (e.g., “four stars” or “C+”) expressed in the evaluative opinions that are found, this application has not tended to be a major focus of such work.

The term opinion mining appears in a paper by Dave et al. [69]

that was published in the proceedings of the 2003 WWW conference;

the publication venue may explain the popularity of the term within communities strongly associated with Web search or information retrieval. According to Dave et al. [69], the ideal opinion-mining tool would “process a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions

4To see that the distinctions in common usage can be subtle, consider how interrelated the following set of deﬁnitions given inMerriam-Webster’s Online Dictionaryare:

Synonyms: opinion, view, belief, conviction, persuasion, sentiment mean a judgment one holds as true.

• Opinion implies a conclusion thought out yet open to dispute each expert seemed to have a diﬀerent opinion.

• View suggests a subjective opinionvery assertive in stating his views.

• Belief implies often deliberate acceptance and intellectual assenta ﬁrm belief in her party’s platform.

• Conviction applies to a ﬁrmly and seriously held belief the conviction that animal life is as sacred as human.

• Persuasion suggests a belief grounded on assurance (as by evidence) of its truthwas of the persuasion that everything changes.

• Sentiment suggests a settled opinion reﬂective of one’s feelings her feminist sentiments are well-known.

(12)

about each of them (poor, mixed, good).” Much of the subsequent research self-identified as opinion mining fits this description in its emphasis on extracting and analyzing judgments on various aspects of given items. However, the term has recently also been interpreted more broadly to include many different types of analysis of evaluative text [190].

The history of the phrasesentiment analysisparallels that of “opinion mining” in certain respects. The term “sentiment” used in reference to the automatic analysis of evaluative text and tracking of the predictive judgments therein appears in 2001 papers by Das and Chen [66]

and Tong [296], due to these authors’ interest in analyzing market sentiment. It subsequently occurred within 2002 papers by Turney [298] and Pang et al. [235], which were published in the proceedings of the annual meeting of the Association for Computational Linguistics (ACL) and the annual conference on Empirical Methods in Natural Language Pro- cessing (EMNLP). Moreover, Nasukawa and Yi [221] entitled their 2003 paper, “Sentiment analysis: Capturing favorability using natural language processing”, and a paper in the same year by Yi et al. [323] was named “Sentiment Analyzer: Extracting sentiments about a given topic using natural language processing techniques.” These events together may explain the popularity of “sentiment analysis” among communities self-identified as focused on NLP. A sizeable number of papers mentioning “sentiment analysis” focus on the specific application of classifying reviews as to their polarity (either positive or negative), a fact that appears to have caused some authors to suggest that the phrase refers specifically to this narrowly defined task. However, nowa- days many construe the term more broadly to mean the computational treatment of opinion, sentiment, and subjectivity in text.

Thus, when broad interpretations are applied, “sentiment analysis”

and “opinion mining” denote the same field of study (which itself can be considered a sub-area of subjectivity analysis). We have attempted to use these terms more or less interchangeably in this survey. This is in no small part because we view the field as representing a unified body of work, and would thus like to encourage researchers in the area to share terminology regardless of the publication venues at which their papers might appear.

(13)

2

Applications

Sentiment without action is the ruin of the soul.

— Edward Abbey

We used one application of opinion mining and sentiment analysis as a motivating example in the Introduction, namely, web search targeted toward reviews. But other applications abound. In this section, we seek to enumerate some of the possibilities.

It is important to mention that because of all the possible applications, there are a good number of companies, large and small, that have opinion mining and sentiment analysis as part of their mission. How- ever, we have elected not to mention these companies individually due to the fact that the industrial landscape tends to change quite rapidly, so that lists of companies risk falling out of date rather quickly.

2.1 Applications to Review-Related Websites

Clearly, the same capabilities that a review-oriented search engine would have could also serve very well as the basis for the creation and automated upkeep of review- and opinion-aggregation websites. That is, as an alternative to sites like Epinions that solicit feedback and reviews,

11

(14)

one could imagine sites that proactively gather such information. Topics need not be restricted to product reviews, but could include opinions about candidates running for oﬃce, political issues, and so forth.

There are also applications of the technologies we discuss to more traditional review-solicitation sites, as well. Summarizing user reviews is an important problem. One could also imagine that errors in user ratings could be ﬁxed: there are cases where users have clearly acci- dentally selected a low rating when their review indicates a positive evaluation [47]. Moreover, as discussed later in this survey (see Sec- tion 5.2.4, for example), there is some evidence that user ratings can be biased or otherwise in need of correction, and automated classiﬁers could provide such updates.

2.2 Applications as a Sub-Component Technology

Sentiment-analysis and opinion-mining systems also have an important potential role as enabling technologies for other systems.

One possibility is as an augmentation to recommendation systems [292, 293], since it might behoove such a system not to recommend items that receive a lot of negative feedback.

Detection of “ﬂames” (overly heated or antagonistic language) in email or other types of communication [276] is another possible use of subjectivity detection and classiﬁcation.

In online systems that display ads as sidebars, it is helpful to detect webpages that contain sensitive content inappropriate for ads place- ment [137]; for more sophisticated systems, it could be useful to bring up product ads when relevant positive sentiments are detected, and perhaps more importantly, nix the ads when relevant negative statements are discovered.

It has also been argued that information extraction can be improved by discarding information found in subjective sentences [256].

Question answering is another area where sentiment analysis can prove useful [274, 284, 189]. For example, opinion-oriented questions may require diﬀerent treatment. Alternatively, Lita et al. [189] suggest that for deﬁnitional questions, providing an answer that includes more information about how an entity is viewed may better inform the user.

(15)

Summarization may also beneﬁt from accounting for multiple viewpoints [265].

Additionally, there are potentially relations to citation analysis, where, for example, one might wish to determine whether an author is citing a piece of work as supporting evidence or as research that he or she dismisses [238]. Similarly, one eﬀort seeks to use semantic orientation to track literary reputation [287].

In general, the computational treatment of aﬀect has been motivated in part by the desire to improve human–computer interaction [188, 192, 295].

2.3 Applications in Business and Government Intelligence The ﬁeld of opinion mining and sentiment analysis is well-suited to various types of intelligence applications. Indeed, business intelligence seems to be one of the main factors behind corporate interest in the ﬁeld.

Consider, for instance, the following scenario (the text of which also appears in Lee [181]). A major computer manufacturer, disappointed with unexpectedly low sales, ﬁnds itself confronted with the question:

“Why aren’t consumers buying our laptop?” While concrete data such as the laptop’s weight or the price of a competitor’s model are obviously relevant, answering this question requires focusing more on people’s personal views of such objective characteristics. Moreover, subjective judgments regarding intangible qualities — e.g., “the design is tacky”

or “customer service was condescending” — or even misperceptions — e.g., “updated device drivers are not available” when such device drivers do in fact exist — must be taken into account as well.

Sentiment-analysis technologies for extracting opinions from unstructured human-authored documents would be excellent tools for handling many business-intelligence tasks related to the one just described. Continuing with our example scenario: it would be diﬃcult to try to directly survey laptop purchasers who have not bought the company’s product. Rather, we could employ a system that (a) ﬁnds reviews or other expressions of opinion on the Web — newsgroups, individual blogs, and aggregation sites such as Epinions are likely to

(16)

be productive sources — and then (b) creates condensed versions of individual reviews or a digest of overall consensus points. This would save an analyst from having to read potentially dozens or even hundreds of versions of the same complaints. Note that Internet sources can vary wildly in form, tenor, and even grammaticality; this fact under- scores the need for robust techniques even when only one language (e.g., English) is considered.

Besides reputation management and public relations, one might perhaps hope that by tracking public viewpoints, one could perform trend prediction in sales or other relevant data [214]. (See our discussion of Broader Implications(Section 6) for more discussion of potential economic impact.)

Government intelligence is another application that has been considered. For example, it has been suggested that one could monitor sources for increases in hostile or negative communications [1].

2.4 Applications Across Diﬀerent Domains

One exciting turn of events has been the conﬂuence of interest in opinions and sentiment within computer science with interest in opinions and sentiment in other ﬁelds.

As is well known, opinions matter a great deal in politics. Some work has focused on understanding what voters are thinking [83, 110, 126, 178, 219], whereas other projects have as a long term goal the clar- iﬁcation of politicians’ positions, such as what public ﬁgures support or oppose, to enhance the quality of information that voters have access to [27, 111, 294].

Sentiment analysis has speciﬁcally been proposed as a key enabling technology in eRulemaking, allowing the automatic analysis of the opinions that people submit about pending policy or government-regulation proposals [50, 175, 271].

On a related note, there has been investigation into opinion mining in weblogs devoted to legal matters, sometimes known as “blawgs” [64].

Interactions with sociology promise to be extremely fruitful. For instance, the issue of how ideas and innovations diﬀuse [258] involves the question of who is positively or negatively disposed toward whom,

(17)

and hence who would be more or less receptive to new information transmission from a given source. To take just one other example:

structural balance theory is centrally concerned with the polarity of “ties” between people [54] and how this relates to group cohe- sion. These ideas have begun to be applied to online media analysis [58, 144].

(18)

3

General Challenges

3.1 Contrasts with Standard Fact-Based Textual Analysis The increasing interest in opinion mining and sentiment analysis is partly due to its potential applications, which we have just discussed.

Equally important are the new intellectual challenges that the ﬁeld presents to the research community. So what makes the treatment of evaluative text diﬀerent from “classic” text mining and fact-based analysis?

Take text categorization, for example. Traditionally, text categorization seeks to classify documents by topic. There can be many possible categories, the definitions of which might be user- and application- dependent; and for a given task, we might be dealing with as few as two classes (binary classification) or as many as thousands of classes (e.g., classifying documents with respect to a complex taxonomy). In contrast, with sentiment classification (see Section 4.1 for more details on precise definitions), we often have relatively few classes (e.g., “positive” or “3 stars”) that generalize across many domains and users.

In addition, while the diﬀerent classes in topic-based categorization can be completely unrelated, the sentiment labels that are widely

16

(19)

considered in previous work typically represent opposing (if the task is binary classiﬁcation) or ordinal/numerical categories (if classiﬁcation is according to a multi-point scale). In fact, the regression-like nature of strength of feeling, degree of positivity, and so on seems rather unique to sentiment categorization (although one could argue that the same phenomenon exists with respect to topic-based relevance).

There are also many characteristics of answers to opinion-oriented questions that differ from those for fact-based questions [284]. As a result, opinion-oriented information extraction, as a way to approach opinion-oriented question answering, naturally differs from traditional information extraction (IE) [49]. Interestingly, in a manner that is similar to the situation for the classes in sentiment-based classification, the templates for opinion-oriented IE also often generalize well across different domains, since we are interested in roughly the same set of fields for each opinion expression (e.g., holder, type, strength) regardless of the topic. In contrast, traditional IE templates can differ greatly from one domain to another — the typical template for recording information relevant to a natural disaster is very different from a typical template for storing bibliographic information.

These distinctions might make our problems appear deceptively simpler than their counterparts in fact-based analysis, but this is far from the truth. In the next section, we sample a few examples to show what makes these problems diﬃcult compared to traditional fact-based text analysis.

3.2 Factors that Make Opinion Mining Diﬃcult

Let us begin with asentiment polaritytext-classiﬁcation example. Sup- pose we wish to classify an opinionated text as either positive or negative, according to the overall sentiment expressed by the author within it. Is this a diﬃcult task?

To answer this question, ﬁrst consider the following example, consisting of only one sentence (by Mark Twain): “Jane Austen’s books madden me so that I can’t conceal my frenzy from the reader.” Just as the topic of this text segment can be identiﬁed by the phrase “Jane Austen,” the presence of words like “madden” and “frenzy” suggests

(20)

negative sentiment. So one might think this is an easy task, and hypothesize that the polarity of opinions can generally be identiﬁed by a set of keywords.

But, the results of an early study by Pang et al. [235] on movie reviews suggest that coming up with the right set of keywords might be less trivial than one might initially think. The purpose of Pang et al.’s pilot study was to better understand the difficulty of the document- level sentiment-polarity classification problem. Two human subjects were asked to pick keywords that they would consider to be good indicators of positive and negative sentiment. As shown in Figure 3.1, the use of the subjects’ lists of keywords achieves about 60% accuracy when employed within a straightforward classification policy. In contrast, word lists of the same size but chosen based on examination of the corpus’ statistics achieves almost 70% accuracy — even though some of the terms, such as “still,” might not look that intuitive at first.

However, the fact that it may be non-trivial for humans to come up with the best set of keywords does not in itself imply that the problem is harder than topic-based categorization. While the feature

“still” might not be likely for any human to propose from introspection, given training data, its correlation with the positive class can be discovered via a data-driven approach, and its utility (at least in

Proposed word lists Accuracy Ties

(%) (%)

Human 1 positive:dazzling, brilliant, phenomenal, excellent, fantastic

58 75

negative:suck, terrible, awful, unwatchable, hideous

Human 2 positive:gripping, mesmerizing, riveting, spectacular, cool, awesome, thrilling, badass, excellent, moving, exciting

64 39

negative:bad, cliched, sucks, boring, stupid, slow Statistics-based positive:love, wonderful, best, great, superb, still,

beautiful

69 16

negative:bad, worst, stupid, waste, boring, ?, !

Fig. 3.1 Sentiment classiﬁcation using keyword lists created by human subjects (“Human 1” and “Human 2”), with corresponding results using keywords selected via examination of simple statistics of the test data (“Statistics-based”). Adapted from Figures 1 and 2 in Pang et al. [235].

(21)

the movie review domain) does make sense in retrospect. Indeed, applying machine learning techniques based on unigram models can achieve over 80% in accuracy [235], which is much better than the performance based on hand-picked keywords reported above. However, this level of accuracy is not quite on par with the performance one would expect in typical topic-based binary classiﬁcation.

Why does this problem appear harder than the traditional task when the two classes we are considering here are so different from each other? Our discussion of algorithms for classification and extraction (Section 4) will provide a more in-depth answer to this question, but the following are a few examples (from among the many we know) showing that the upper bound on problem difficulty, from the viewpoint of machines, is very high. Note that not all of the issues these examples raise have been fully addressed in the existing body of work in this area.

Compared to topic, sentiment can often be expressed in a more subtle manner, making it diﬃcult to be identiﬁed by any of a sentence or document’s terms when considered in isolation. Consider the following examples:

• “If you are reading this because it is your darling fragrance, please wear it at home exclusively, and tape the windows shut.” (review by Luca Turin and Tania Sanchez of the Givenchy perfume Amarige, inPerfumes: The Guide, Viking 2008.) No ostensibly negative words occur.

• “She runs the gamut of emotions from A to B.” (Dorothy Parker, speaking about Katharine Hepburn.) No ostensibly negative words occur.

In fact, the example that opens this section, which was taken from the following quote from Mark Twain, is also followed by a sentence with no ostensibly negative words:

Jane Austen’s books madden me so that I can’t conceal my frenzy from the reader. Everytime I read ‘Pride and Prejudice’ I want to dig her up and beat her over the skull with her own shin-bone.

(22)

A related observation is that although the second sentence indicates an extremely strong opinion, it is diﬃcult to associate the presence of this strong opinion with speciﬁc keywords or phrases in this sentence.

Indeed, subjectivity detection can be a diﬃcult task in itself. Consider the following quote from Charlotte Bront¨e, in a letter to George Lewes:

You say I must familiarise my mind with the fact that

“Miss Austen is not a poetess, has no ‘sentiment’ ” (you scornfully enclose the word in inverted commas),

“has no eloquence, none of the ravishing enthusiasm of poetry”; and then you add, I must “learn to acknowl- edge her as one of the greatest artists, of the greatest painters of human character, and one of the writers with the nicest sense of means to an end that ever lived.”

Note the ﬁne line between facts and opinions: while “Miss Austen is not a poetess” can be considered to be a fact, “none of the ravishing enthusiasm of poetry” should probably be considered as an opinion, even though the two phrases s (arguably) convey similar information.¹ Thus, not only can we not easily identify simple keywords for subjectivity, but we also ﬁnd that like “the fact that” do not necessarily guarantee the objective truth of what follows them — and bigrams like

“no sentiment” apparently do not guarantee the absence of opinions, either. We can also get a glimpse of how opinion-oriented information

1One can challenge our analysis of the “poetess” clause, as an anonymous reviewer indeed did — which disagreement perhaps supports our greater point about the diﬃculties that can sometimes present themselves.

Different researchers express different opinions about whether distinguishing between subjective and objective language is difficult for humans in the general case. For example, Kim and Hovy [159] note that in a pilot study sponsored by NIST, “human annotators often disagreed on whether a belief statement was or was not an opinion.” However, other researchers have found inter-annotator agreement rates in various types of subjectivity- classification tasks to be satisfactory [45, 273, 274, 309]; a summary provided by one of the anonymous referees is that “[although] there is variation from study to study, on average, about 85% of annotations are not marked as uncertain by either annotator, and for these cases, inter-coder agreement is very high (kappa values over 80).” As in other settings, more careful definitions of the distinctions to be made tend to lead to better agreement rates.

In any event, the points we are exploring in the Bront¨e quote may be made more clear by replacing “Jane Austen is not a poetess” with something like “Jane Austen does not write poetry for a living, but is also no poet in the broader sense.”

(23)

extraction can be diﬃcult. For instance, it is non-trivial to recognize opinion holders. In the example quoted above, the opinion is not that of the author, but the opinion of “You,” which refers to George Lewes in this particular letter. Also, observe that given the context (“you scornfully enclose the word in inverted commas,” together with the reported endorsement of Austen as a great artist), it is clear that “has no sentiment” is not meant to be a show-stopping criticism of Austen from Lewes, and Bront¨e’s disagreement with him on this subject is also subtly revealed.

In general, sentiment and subjectivity are quite context-sensitive, and, at a coarser granularity, quite domain dependent (in spite of the fact that the general notion of positive and negative opinions is fairly consistent across different domains). Note that although domain depen- dency is in part a consequence of changes in vocabulary, even the exact same expression can indicate different sentiment in different domains.

For example, “go read the book” most likely indicates positive sentiment for book reviews, but negative sentiment for movie reviews.

(This example was furnished to us by Bob Bland.) We will discuss topic-sentiment interaction in more detail in Section 4.4.

It does not take a seasoned writer or a professional journalist to produce texts that are diﬃcult for machines to analyze. The writings of Web users can be just as challenging, if not as subtle, in their own way — see Figure 3.2 for an example. In the case of Figure 3.2, it should be pointed out that might be more useful to learn to recognize the quality of a review (see Section 5.2 for more detailed discussions on that subject). Still, it is interesting to observe the importance of modeling discourse structure. While the overall topic of a document

Fig. 3.2 Example of movie reviews produced by web users: a (slightly reformatted) screen- shot of user reviews forThe Nightmare Before Christmas.

(24)

should be what the majority of the content is focusing on regardless of the order in which potentially diﬀerent subjects are presented, for opinions, the order in which diﬀerent opinions are presented can result in a completely opposite overall sentiment polarity.

In fact, somewhat in contrast with topic-based text categorization, order eﬀects can completely overwhelm frequency eﬀects. Consider the following excerpt, again from a movie review:

This ﬁlm should bebrilliant. It sounds like agreatplot, the actors are ﬁrst grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can’t hold up.

As indicated by the (inserted) emphasis, words that are positive in orientation dominate this excerpt,² and yet the overall sentiment is negative because of the crucial last sentence; whereas in traditional text classiﬁcation, if a document mentions “cars” relatively frequently, then the document is most likely at least somewhat related to cars.

Order dependence also manifests itself at more ﬁne-grained levels of analysis: “A is better than B” conveys the exact opposite opinion from

“B is better than A.”³ In general, modeling sequential information and discourse structure seems more crucial in sentiment analysis (further discussion appears in Section 4.7).

As noted earlier, not all of the issues we have just discussed have been fully addressed in the literature. This is perhaps part of the charm of this emerging area. In the following sections, we aim to give an overview of a selection of past heroic eﬀorts to address some of these issues, and march through the positives and the negatives, charged with unbiased feeling, armed with hard facts.

Fasten your seat belts. It’s going to be a bumpy night!

— Bette Davis,All About Eve, screenplay by Joseph Mankiewicz

2One could argue about whether in the context of movie reviews the word “Stallone” has a semantic orientation.

3Note that this is not unique to opinion expressions; “A killed B” and “B killed A” also convey diﬀerent factual information.

(25)

4

Classiﬁcation and Extraction

“The Bucket List,” which was written by Justin Zack- ham and directed by Rob Reiner, seems to have been created by applying algorithms to sentiment.

— David Denby movie review, The New Yorker, January 7, 2007

A fundamental technology in many current opinion-mining and sentiment-analysis applications isclassification— note that in this survey, we generally construe the term “classification” broadly, so that it encompasses regression and ranking. The reason that classification is so important is that many problems of interest can be formulated as applying classification/regression/ranking to given textual units; examples include making a decision for a particular phrase or document (“how positive is it?”), ordering a set of texts (“rank these reviews by how positive they are”), giving a single label to an entire document collection (“where on the scale between liberal and conservative do the writings of this author lie?”), and categorizing the relationship between two entities based on textual evidence (“does A approve of B’s actions?”). This section is centered on approaches to these kinds of problems.

23

(26)

Part One (p. 24ff.) covers fundamental background. Specifically, Section 4.1 provides a discussion of key concepts involved in common formulations of classification problems in sentiment analysis and opinion mining. Features that have been explored for sentiment analysis tasks are discussed in Section 4.2.

Part Two (p. 37ff.) is devoted to an in-depth discussion of different types of approaches to classification, regression, and ranking problems.

The beginning of Part Two should be consulted for a detailed outline, but it is appropriate here to indicate how we coverextraction, since it plays a key role in many sentiment-oriented applications and so some readers may be particularly interested in it.

First, extraction problems (e.g., retrieving opinions on various features of a laptop) are often solved by casting many sub-problems as classification problems (e.g., given a text span, determine whether it expresses any opinion at all). Therefore, rather than have a separate section devoted completely to the entirety of the extraction task, we have integrated discussion of extraction-oriented classification sub- problems into the appropriate places in our discussion of different types of approaches to classification in general (Sections 4.3–4.8). Section 4.9 covers those remaining aspects of extraction that can be thought of as distinct from classification.

Second, extraction is often a means to the further goal of providing eﬀective summaries of the extracted information to users. Details on how to combine information mined from multiple subjective text segments into a suitable summary can be found in Section 5.

Part One: Fundamentals 4.1 Problem Formulations and Key Concepts

Motivated by diﬀerent real-world applications, researchers have considered a wide range of problems over a variety of diﬀerent types of corpora. We now examine the key concepts involved in these problems.

This discussion also serves as a loose grouping of the major problems, where each group consists of problems that are suitable for similar treatment as learning tasks.

(27)

4.1.1 Sentiment Polarity and Degrees of Positivity

One set of problems share the following general character: given an opinionated piece of text, wherein it is assumed that the overall opinion in it is about one single issue or item, classify the opinion as falling under one of two opposing sentiment polarities, or locate its position on the continuum between these two polarities. A large portion of work in sentiment-related classiﬁcation/regression/ranking falls within this category. Eguchi and Lavrenko [84] point out that the polarity or positivity labels so assigned may be used simply for summarizing the content of opinionated text units on a topic, whether they be positive or negative, or for only retrieving items of a given sentiment orientation (say, positive).

The binary classification task of labeling an opinionated document as expressing either an overall positive or an overall negative opinion is called sentiment polarity classification or polarity classification.

Although this binary decision task has also been termedsentiment clas- siﬁcation in the literature, as mentioned above, in this survey we will use “sentiment classiﬁcation” to refer broadly to binary categorization, multi-class categorization, regression, and/or ranking.

Much work on sentiment polarity classiﬁcation has been conducted in the context of reviews (e.g., “thumbs up” or “thumbs down” for movie reviews). While in this context “positive” and “negative” opinions are often evaluative (e.g., “like” vs. “dislike”), there are other problems where the interpretation of “positive” and “negative” is subtly diﬀerent. One example is determining whether a political speech is in support of or opposition to the issue under debate [27, 294]; a related task is classifying predictive opinions in election forums into “likely to win” and “unlikely to win” [160]. Since these problems are all concerned with two opposing subjective classes, as machine learning tasks they are often amenable to similar techniques. Note that a number of other aspects of politically oriented text, such as whether liberal or conservative views are expressed, have been explored; since the labels used in those problems can usually be considered properties of a set of documents representing authors’ attitudes over multiple issues rather than positive or negative sentiment with respect to a single issue, we

(28)

discuss them under a diﬀerent heading further below (“viewpoints and perspectives,” Section 4.1.4).

The input to a sentiment classifier is not necessarily always strictly opinionated. Classifying a news article into good or bad news has been considered a sentiment classification task in the literature [168]. But a piece of news can be good or bad news without being subjective (i.e., without being expressive of the private states of the author): for instance, “the stock price rose” is objective information that is generally considered to be good news in appropriate contexts. It is not our main intent to provide a clean-cut definition for what should be considered

“sentiment polarity classiﬁcation” problems,¹ but it is perhaps useful to point out that (a) in determining the sentiment polarity of opinionated texts where the authors do explicitly express their sentiment through statements like “this laptop is great,” (arguably) objective information such as “long battery life”² is often used to help determine the overall sentiment; (b) the task of determining whether a piece of objective information is good or bad is still not quite the same as classifying it into one of several topic-based classes, and hence inherits the challenges involved in sentiment analysis; and (c) as we will discuss in more detail later, the distinction between subjective and objective information can be subtle. Is “long battery life” objective? Also consider the diﬀerence between “the battery lasts 2 hours” vs. “the battery only lasts 2 hours.”

Related categories. An alternative way of summarizing reviews is to extract information on why the reviewers liked or disliked the product.

Kim and Hovy [158] note that such “pro and con” expressions can diﬀer from positive and negative opinion expressions, although the two concepts — opinion (“I think this laptop is terriﬁc”) and reason for opinion (“This laptop only costs $399”) — are for the purposes of analyzing evaluative text strongly related. In addition to potentially forming the basis for the production of more informative sentiment-oriented summaries, identifying pro and con reasons can potentially be used to

1While it is of utter importance that the problem itself should be well-deﬁned, it is of less, if any, importance to decide which tasks should be labeled as “polarity classiﬁcation”

problems.

2Whether this should be considered as an objective statement may be up for debate: one can imagine another reviewer retorting, “you call thatlongbattery life?”

(29)

help decide the helpfulness of individual reviews: evaluative judgments that are supported by reasons are likely to be more trustworthy.

Another type of categorization related to degrees of positivity is considered by Niu et al. [226], who seek to determine the polarity of outcomes (improvement vs. death, say) described in medical texts.

Additional problems related to the determination of degree of positivity surround the analysis of comparative sentences [139]. The main idea is that sentences such as “The new model is more expensive than the old one” or “I prefer the new model to the old model” are important sources of information regarding the author’s evaluations.

Rating inference (ordinal regression). The more general problem of rating inference, where one must determine the author’s evaluation with respect to a multi-point scale (e.g., one to ﬁve “stars” for a review) can be viewed simply as a multi-class text categorization problem. Predict- ing degree of positivity provides more ﬁne-grained rating information;

at the same time, it is an interesting learning problem in itself.

But in contrast to many topic-based multi-class classiﬁcation problems, sentiment-related multi-class classiﬁcation can also be naturally formulated as a regression problem because ratings are ordinal.

It can be argued to constitute a special type of (ordinal) regression problem because the semantics of each class may not simply directly correspond to a point on a scale. More speciﬁcally, each class may have its own distinct vocabulary. For instance, if we are classifying an author’s evaluation into one of the positive, neutral, and negative classes, an overall neutral opinion could be a mixture of positive and negative language, or it could be identiﬁed with signature words such as

“mediocre.” This presents us with interesting opportunities to explore the relationships between classes.

Note the diﬀerence between rating inference and predicting strength of opinion (discussed in Section 4.1.2); for instance, it is possible to feel quite strongly (high on the “strength” scale) that something is mediocre (middling on the “evaluation” scale).

Also, note that the label “neutral” is sometimes used as a label for the objective class (“lack of opinion”) in the literature. In this survey, we use neutral only in the aforementioned sense of a sentiment that lies between positive and negative.

(30)

Interestingly, Cabral and Horta¸csu [47] observe that neutral comments in feedback systems are not necessarily perceived by users as lying at the exact mid-point between positive and negative comments;

rather, “the information contained in a neutral rating is perceived by users to be much closer to negative feedback than positive.” On the other hand, they also note that in their data, “sellers were less likely to retaliate against neutral comments, as opposed to negatives: . . . a buyer leaving a negative comment has a 40% chance of being hit back, while a buyer leaving a neutral comment only has a 10% chance of being retaliated upon by the seller.”

Agreement. The opposing nature of polarity classes also gives rise to exploration ofagreement detection, e.g., given a pair of texts, deciding whether they should receive the same or differing sentiment-related labels based on the relationship between the elements of the pair. This is often not defined as a standalone problem but considered as a sub- task whose result is used to improve the labeling of the opinions held by the entities involved [272, 294]. A different type of agreement task has also been considered in the context of perspectives, where, for example, a label of “conservative” tends to indicate agreement with particular positions on a wide variety of issues.

4.1.2 Subjectivity Detection and Opinion Identiﬁcation Work in polarity classiﬁcation often assumes the incoming documents to be opinionated. For many applications, though, we may need to decide whether a given document contains subjective information or not, or identify which portions of the document are subjective. Indeed, this problem was the focus of the 2006 Blog track at TREC [227].

At least one opinion-tracking system rates subjectivity and sentiment separately [108]. Mihalcea et al. [209] summarize the evidence of several projects on subsentential analysis [12, 90, 289, 319] as follows:

“the problem of distinguishing subjective versus objective instances has often proved to be more difficult than subsequent polarity classification, so improvements in subjectivity classification promise to positively impact sentiment classification.”

(31)

Early work by Hatzivassiloglou and Wiebe [120] examined the effects of adjective orientation and gradability on sentence subjectivity. The goal was to tell whether a given sentence is subjective or not judging from the adjectives appearing in that sentence. A number of projects address sentence-level or sub-sentence-level subjectivity detection in different domains [33, 156, 232, 255, 308, 315, 319, 326].Wiebe et al. [316] present a comprehensive survey of subjectivity recognition using different clues and features.

Wilson et al. [320] address the problem of determining clause-level opinion strength (e.g., “how mad are you?”). Note that the problem of determining opinion strength is diﬀerent from rating inference. Classi- fying a piece of text as expressing a neutral opinion (giving it a mid- point score) for rating inference does not equal classifying that piece of text as objective (lack of opinion): one can have a strong opinion that something is “mediocre” or “so-so.”

Recent work also considers relations between word sense disam- biguation and subjectivity [307].

Subjectivity detection or ranking at the document level can be thought of as having its roots in studies in genre classiﬁcation (see Section 4.1.5 for more detail). For instance, Yu and Hatzivassiloglou [326] achieve high accuracy (97%) with a Naive Bayes classiﬁer on a particular corpus consisting of Wall Street Journal articles, where the task is to distinguish articles under News and Business (facts) from articles underEditorial and Letter to the Editor(opinions). (This task was suggested earlier by Wiebe et al. [315], and a similar corpus was explored in previous work [308, 316].) Work in this direction is not lim- ited to the binary distinction between subjective and objective labels.

Recent work includes the research by participants in the 2006 TREC Blog track [227] and others [69, 97, 222, 223, 234, 279, 316, 326].

4.1.3 Joint Topic–Sentiment Analysis

One simplifying assumption sometimes made by work on document- level sentiment classiﬁcation is that each document under consideration is focused on the subject matter we are interested in. This is in part because one can often assume that the document set was created

(32)

by first collecting only on-topic documents (e.g., by first running a topic-based query through a standard search engine). However, it is possible that there are interactions between topic and opinion that make it desirable to consider the two simultaneously; for example, Rilof et al. [256] find that “topic-based text filtering and subjectivity filtering are complementary” in the context of experiments in information extraction.

Also, even a relevant opinion-bearing document may contain oﬀ- topic passages that the user may not be interested in, and so one may wish to discard such passages.

Another interesting case is when a document contains material on multiple subjects that may be of interest to the user. In such a setting, it is useful to identify the topics and separate the opinions associated with each of them. Two examples of the types of documents for which this kind of analysis is appropriate are (1) comparative studies of related products, and (2) texts that discuss various features, aspects, or attributes.³

4.1.4 Viewpoints and Perspectives

Much work on analyzing sentiment and opinions in politically oriented text focuses on general attitudes expressed through texts that are not necessarily targeted at a particular issue or narrow subject. For instance, Grefenstette et al. [112] experimented with determining the political orientation of websites essentially by classifying the concate- nation of all the documents found on that site. We group this type of work under the heading of “viewpoints and perspectives,” and include under this rubric work on classifying texts as liberal, conservative, lib- ertarian, etc. [219], placing texts along an ideological scale [178, 202], or representing Israeli versus Palestinian viewpoints [186, 187].

Although binary or n-ary classiﬁcation may be used, here, the classes typically correspond not to opinions on a single, narrowly deﬁned topic, but to a collection of bundled attitudes and beliefs.

This could potentially enable diﬀerent approaches from polarity

3When the context is clear, we often use the term “feature” to refer to “feature, aspect, or attribute” in this survey.

(33)

classification. On the other hand, if we treat the set of documents as a meta-document, and the different issues being discussed as meta- features, then this problem still shares some common ground with polarity classification or its multi-class, regression, and ranking vari- ants. Indeed, some of the approaches explored in the literature for these two problems individually could very well be adapted to work for either one of them.

The other point of departure from the polarity classification problem is that the labels being considered are more about attitudes that do not naturally correspond with degree of positivity. While assigning- simple labels remains a classification problem, if we move farther away and aim at serving more expressive and open-ended opinions to the user, we need to solve extraction problems. For instance, one may be interested in obtaining descriptions of opinions of a greater complexity than simple labels drawn from a very small set, i.e., one might be seeking something more like “achieving world peace is difficult” than like “mildly positive.” In fact, much of the prior work on perspectives and viewpoints seeks to extract more perspective-related information (e.g., opinion holders). The motivation was to enable multi-perspective question answering, where the user could ask questions such as “what is Miss America’s perspective on world peace?” rather than a fact-based question (e.g., “who is the new Miss America?”). Naturally, such work is often framed in the context of extraction problems, the particular characteristics of which are covered in Section 4.9.

4.1.5 Other Non-Factual Information in Text

Researchers have considered various aﬀect types, such as the six

“universal” emotions [86]: anger, disgust, fear, happiness, sadness, and surprise [192, 9, 285]. An interesting application is in human–computer interaction: if a system determines that a user is upset or annoyed, for instance, it could switch to a diﬀerent mode of interaction [188].

Other related areas of research include computational approaches for humor recognition and generation [210]. Many interesting aﬀectual aspects of text like “happiness” or “mood” are also being explored in the context of informal text resources such as weblogs [224]. Potential

(34)

applications include monitoring levels of hateful or violent rhetoric, perhaps in multilingual settings [1].

In addition to classiﬁcation based on aﬀect and emotion, another related area of research that addresses non-topic-based categorization is that of determining the genre of texts [97, 98, 150, 153, 182, 277].

Since subjective genres, such as “editorial,” are often one of the possible categories, such work can be viewed as closely related to subjectivity detection. Indeed, this relation has been observed in work focused on learning subjective language [316].

There has also been research that concentrates on classifying documents according to their source or source style, with statistically detected stylistic variation [38] serving as an important cue. Author- ship identiﬁcation is perhaps the most salient example — Mosteller and Wallace’s [216] classic Bayesian study of the authorship of the Feder- alist Papers is one well-known instance. Argamon-Engelson et al. [18]

consider the related problem of identifying not the particular author of a text, but its publisher (e.g., the New York Times vs. The Daily News); the work of Kessler et al. [153] on determining a document’s

“brow” (e.g., high-brow vs. “popular,” or low-brow) has similar goals.

Several recent workshops have been dedicated to style analysis in text [15, 16, 17]. Determining stylistic characteristics can be useful inmulti- facetedsearch [10].

Another problem that has been considered in intelligence and secu- rity settings is the detection of deceptive language [46, 117, 329].

4.2 Features

Converting a piece of text into a feature vector or other representation that makes its most salient and important features available is an important part of data-driven approaches to text processing. There is an extensive body of work that addresses feature selection for machine learning approaches in general, as well as for learning approaches tai- lored to the specific problems of classic text categorization and information extraction [101, 263]. A comprehensive discussion of such work is beyond the scope of this survey. In this section, we focus on findings in feature engineering that are specific to sentiment analysis.

(35)

4.2.1 Term Presence vs. Frequency

It is traditional in information retrieval to represent a piece of text as a feature vector wherein the entries correspond to individual terms.

One inﬂuential ﬁnding in the sentiment-analysis area is as follows.

Term frequencies have traditionally been important in standard IR, as the popularity of tf-idf weighting shows; but in contrast, Pang et al.

[235] obtained better performance usingpresencerather than frequency.

That is, binary-valued feature vectors in which the entries merely indicate whether a term occurs (value 1) or not (value 0) formed a more effective basis for review polarity classification than did real-valued feature vectors in which entry values increase with the occurrence frequency of the corresponding term. This finding may be indicative of an interesting difference between typical topic-based text categorization and polarity classification: While a topic is more likely to be empha- sized by frequent occurrences of certain keywords, overall sentiment may not usually be highlighted through repeated use of the same terms.

(We discussed this point previously in Section 3.2 on factors that make opinion mining diﬃcult.)

On a related note, hapax legomena, or words that appear a single time in a given corpus, have been found to be high-precision indicators of subjectivity [316]. Yang et al. [322] look at rare terms that are not listed in a pre-existing dictionary, on the premise that novel versions of words, such as “bugfested,” might correlate with emphasis and hence subjectivity in blogs.

4.2.2 Term-based Features Beyond Term Unigrams

Position information finds its way into features from time to time. The position of a token within a textual unit (e.g., in the middle vs. near the end of a document) can potentially have important effects on how much that token affects the overall sentiment or subjectivity status of the enclosing textual unit. Thus, position information is sometimes encoded into the feature vectors that are employed [158, 235].

Whether higher-order n-grams are useful features appears to be a matter of some debate. For example, Pang et al. [235] report that unigrams outperform bigrams when classifying movie reviews by sentiment