• No se han encontrado resultados

Summarization

5.2 Multi-Document Opinion-Oriented Summarization Language is itself the collective art of expression,

5.2.2 Textual Summaries

opinions — are often conflicting, which is a bit different from the usual situation in topic-based summarization, where typically one does not assume that there are conflicting sets of facts in the document set (although there are exceptions [301, 302]).

Summary created via a “true natural-language-generation” approach:

Almost all users loved the Canon G3 possibly because some users thought the physical appearance was very good. Furthermore, sev- eral users found the manual features and the special features to be very good. Also, some users liked the convenience because some users thought the battery was excellent. Finally, some users found the editing/viewing interface to be good despite the fact that sev- eral customers really disliked the viewfinder. However, there were some negative evaluations. Some customers thought the lens was poor even though some customers found the optical zoom capability to be excellent. Most customers thought the quality of the images was very good.

Summary created by a modified sentence-extraction system:

Bottom line, well made camera, easy to use, very flexible and power- ful features to include the ability to use external flash and lense/filters choices. It has a beautiful design, lots of features, very easy to use, very configurable and customizable, and the battery duration is amazing! Great colors, pictures, and white balance. The camera is a dream to operate in automode, but also gives tremendous flexibility in aperture priority, shutter priority, and manual modes. I’d highly recommend this camera for anyone who is looking for excellent qual- ity pictures and a combination of ease of use and the flexibility to get advanced with many options to adjust if you like.

Fig. 5.3 Sample automatically generated summaries. Adapted from Figure 2 of Carenini et al. [52].

latter approach, sentence extraction methods have also been used to create summaries for opinion-oriented queries or topics [265, 266].

While we are not aware of the following technique being used in standard topic-based summarization, we see no reason why it is not applicable to that setting, at least in principle. Ku et al. [170] (short version available as Ku et al. [169]) propose the following simple scheme to create a textual summary of a set of documents known in advance to be on the same topic. Sentences considered to be representative of the topic are collected, and the polarity of each such sentence is computed based on what sentiment-bearing words it contains, with negation taken into account. Then, to create a summary of the positive documents, the system simply returns the headline of the document with the most positive on-topic sentences, and similarly for the negative

documents. The authors show the following examples for the positive and the negative summary, respectively:

Positive: “Chinese Scientists Suggest Proper Legislation for Clone Technology.”

Negative: “UK Government Stops Funding for Sheep Cloning Team.”

The cleverness of this method is that headlines are, by construction, good summaries (at least of the article they are drawn from), so that fluency and informativeness, although perhaps not appropriateness, are guaranteed.

Another perhaps unconventional type of multi-document “sum- mary” is the selection of a few documents of interest from the corpus for presentation to the user. In this vein, Kawai et al. [151] have devel- oped a news portal site called “Fair News Reader” that attempts to determine the affect characteristics of articles the user has been read- ing so far (e.g., “happiness” or “fear”) and then recommends articles that are on the same topic but have opposite affect characteristics. One could imagine extending this concept to a news portal that presented to the user opinions opposing his or her pre-conceived ones (Phoebe Sen- gers, personal communication). On a related note, Liu [190] mentions that one might desire a summarization system to present a “represen- tative sample” of opinions, so that both positive and negative points of view are covered, rather than just the dominant sentiment. As of the time of this writing, Amazon presents the most helpful favorable review side-by-side with the most helpful critical review if one clicks on the

“[x] customer reviews” link next to the stars indicator. Additionally, one could interpret the opinion-leader identification work of Song et al.

[275] as suggesting that blog posts written by opinion leaders could serve as an alternative type of representative sample.

Summarizing online discussions and blogs is an area of related work [131, 300, 330]. The focus of such work is not on summarizing the opinions per se, although Zhou and Hovy [330] note that one may want to vary the emphasis on the opinions expressed versus the facts expressed.

5.2.2.2 Textual Summarization Without Topic-based Summarization Techniques

Other work in the area of textual multi-document sentiment summa- rization departs from topic-based work. The main reason seems to be that redundancy elimination is much less of a concern: users may wish to look at many individual opinions regardless of whether these individ- ual opinions express the same overall sentiment, and these users may not particularly care whether the textual overview they peruse is coher- ent. Thus, in several cases, textual “summaries” are generated simply by listing some or all opinionated sentences. These are often grouped by feature (sub-topic) and/or polarity, perhaps with some ranking heuris- tic such as feature importance applied [129, 170, 324, 332].