3.1 ESTUDIO DE MERCADO
3.1.8 ESTRATEGIAS DE MERCADO
[Ideas and Development trait (Section2.1)]
The entire piece has a strong sense of balance. Key ideas stand out.
[Organization trait] All the sentences of an article do not convey information in the same manner. Sen- tences in the opening paragraphs are general giving an overview of the topic. The details on the topic come later on. Finally the end of the article provides some abstraction and here the content is often general. The points above taken from the definition of theIdeas and Development and Organizationtraits of the Six Traits model are based on this switch between overview and detailed information in the text.
Consider the sentences in Table 5.1taken from a news article.
Sentence (a) describes the unpopular features of the books chosen for the Booker
a) The novel, a story of Scottish low-life narrated largely in Glaswegian dialect, is unlikely to prove a popular choice with booksellers who have damned all six books shortlisted for the prize as boring, elitist and—worst of all—unsaleable.
...
b) The Booker prize has, in its26-year history, always provoked controversy.
prize and also talks more specifically about one of the selected books. Sentence (b) is the last sentence of this article and summarizes the negative sentiment by mentioning that controversy surrounding the prize is also longstanding and happens almost every year. The level of detail is markedly different in the two sentences. Sentence (b) only gives the topic. If this sentence is presented by itself, it will make a reader wonder why such a statement is made by the author. In other words, sentence (b) needs some substantiation from other parts of the text. On the other hand, sentence (a) does not create such expectations. It has details and specific information on the topic. In this work, we call sentences like (a) above as specific, while sentences of second type are called as general.
It is intuitive and noticeable that texts have a mix of such general and specific infor- mation.
Studies on academic writing [155] have identified that a hourglass-like structure is present in academic articles where the introduction and conclusion present general con- tent and the experimental sections in between contain a lot of details. Large scale an- notations carried out for discourse relations also indicate that sentences have different specificity levels. For example, in the Penn Discourse Treebank (PDTB) corpus [128], the
Instantiation and Restatement relations appear to be relevant to this phenomenon. The definition of these relations from the PDTB manual is given below. Arg1 andArg2 refer to the two text spans that are connected by the relation.
• Instantiation: Arg1 evokes a set and Arg2 describes it in further detail. It may be a set of events, reasons or a generic set of events, behaviors and attitudes. The relation involves a function which extracts the set of events from the semantics of Arg1and Arg2describes one element in the extracted set.
• Restatement: The semantics of Arg2restates that of Arg1. The subtypes “specifica- tion”, “generalization”, and “equivalence” further specify the ways in which Arg2
restates Arg1. In the case of specification Arg2 describes the situation in Arg1 in more detail.
These definitions indicate that one sentence may be written to be more general than another sentence. The general sentence creates the need for more specific details which
is fulfilled by the subsequent (in the case of Instantiation and Restatement-Specification relations) sentence. Some example Instantiation and Specification relations between adja- cent sentences are shown in Table5.2.
Apart from the PDTB, other discourse frameworks such as Rhetorical Structure Theory (RST) [97] and Segmented Discourse Representation Theory (SDRT) [3] also note that sentences involved in certain discourse relations have varying degrees of specificity. We discuss the specificity differences reported in the RST and SDRT theories in further detail in the related work section (Section5.8).
Given these observed regularities in the occurrence of general and specific informa- tion in texts, we hypothesize that specificity patterns will be useful for predicting the quality of an article. The content quality rubrics in Section2.1point out that the presen- tation of details has significant influence on quality. Too much general or specific content could make an article difficult to read. Similarly, the placement of general and specific content influences the organization quality of an article. When general content is pre- sented without particular details, the article could appear ambiguous and on the other hand, specific information without appropriate topic statements and summaries would leave the reader without a high level understanding of the article. This chapter presents a metric for content quality and organization quality based on the idea of specificity.
To this end, we develop a supervised classifier to identify general versus specific sen- tences and use the predictions for analysis of text quality. Our classifier is trained on sentences from news articles. Based on the specificity differences noted in the PDTB In- stantiation and Specification discourse relations, we create proxy examples of general and specific sentences from these relations. We use this data as a training corpus. We also ob- tain manual annotations from people for the general-specific distinction and test how the classifier trained on proxy examples performs on the direct annotations for specificity. Sections 5.2 to 5.4 provide details about the corpora and classification approach. This classifier has a high accuracy of 75% for identifying general and specific sentences. We calculate a measure for specificity of a text based on the classifier’s predictions (described in Section5.5).
Instantiations
1.The40-year-old Mr. Murakami is a publishing sensation in Japan. A more recent novel, “Norwegian Wood” (every Japanese under40seems to be fluent in Beatles lyrics), has sold more than four million copies since Kodansha published it in1987.
2.Sales figures of the test-prep materials aren’t known, but their reach into schools is significant.In Arizona, California, Florida, Louisiana, Maryland, New Jersey, South Carolina and Texas, educators say they are common classroom tools.
3.Despite recent declines in yields, investors continue to pour cash into money funds.Assets of the400taxable funds grew by $1.5billion during the last week, to $352.7billion.
Specifications
4.By most measures, the nation’s industrial sector is now growing very slowly—if at all. Factory payrolls fell in September.
5.Mrs. Hills said that the U.S. is still concerned about ‘disturbing developments in Turkey and continuing slow progress in Malaysia.’ She didn’t elaborate, although earlier U.S. trade reports have complained of videocassette piracy in Malaysia and disregard for U.S. pharmaceutical patents in Turkey.
6.Alan Spoon, recently named Newsweek president said Newsweek’s ad rates would increase5% in January.A full, four-color page in Newsweek will cost $100,980.
Table 5.2: Example Instantiation and Specification relations from the PDTB. The Arg1 of each relation is shown in italics.
assessment in two genres—summarization and science journalism.
Since summaries are a condensed version of the source articles, they cannot contain all the details from the source. Some content should be made more general than how it appears in the source. Therefore text specificity could have direct relevance for the task of summarization and we expected that the degree and placement of general and specific information could have a noticeable impact on text quality in this genre.
In fact, several studies in the summarization field have noted specificity differences in summaries. Jing and McKeown (2000) [69] manually analyzed human-written summaries in combination with their source documents. They pointed out that people in fact convert some source sentences into more general content for the summaries. But the opposite transformation is also done, some sentences become more specific than the source. But it is not known how often these transformations occur and if they impact the quality of summaries. Summarization evaluation has traditionally concerned itself with assessing content quality solely on the basis of how much important information is provided by the summary. Aspects such as how the information is conveyed has received little if any focus.
But recently, Haghighi and Vanderwende [61] built a topic model based summariza- tion system that could select content based on both a general content distribution and on distributions of content for specific subtopics. They report that using the general distribu- tion yielded summaries with better content than using the specific topics. The approach was later improved by Mason and Charniak [101] who modified the model’s objective function to directly implement the idea that general content should be preferred. Given an input set which contains multiple documents, their objective function favors content that appears across multiple input documents and penalizes content that is specific to individual documents in the input. But the relationship between content specificity and quality of the summaries has not been studied so far in a direct manner across several systems and from the point of view of how people summarize articles.
Similarly, we expect the general-specific nature of content to be relevant for research writing. As we pointed out earlier, conference articles have been observed to be structured like a hour-glass with regard to general-specific nature. While research papers are written
for an expert audience and have such a structure, we believe that text specificity could be even more relevant for analyzing science journalism articles. The audience for science news are non-experts and proper substantiation and topic statements are necessary to guide a reader through difficult concepts. Therefore we also perform evaluations of text quality for the science journalism articles using the specificity metric. We have not used specificity features for the academic writing genre since our training data has been chosen exclusively from news articles.