CAPÍTULO II 2 MARCO TEÓRICO
AÑO OFERTA kilogramos
4.2. ANÁLISIS EMPRESARIAL
The existing work on descriptive sentence extraction is fairly limited, so we give a brief overview of related areas and the works that are conceptually the most similar. Notably, the task is also related to knowledge extraction for question answering, as discussed in
4 Applications of Implicit Networks
Chapter 2.3. We begin by discussing the basics of summarization, as well as question answering for geographic locations, which relates to our evaluation.
Document summarization
The summarization of entire documents (or sets of documents) has been well researched in the past, and several recent surveys exist on this matter[121,142]. Graph-based ap- proaches to coherence and composition are particularly successful, which were pioneered by contributions such as LexRank[52]and TextRank[130]that use graph centrality to iden- tify relevant components of a summary. These methods have been continuously improved, and more recent additions include novel techniques such as topic signatures[5],vector em- beddings[129],or word associations relative to a background corpus[80].Overall, graph- based document summarization tends to focus on sentences and representations of topics as nodes. In contrast, by using implicit networks, we put the focus on entities instead of topics. As a result, we can retrieve and explore entity information from the network rep- resentation of large document collections on a scale for which traditional multi-document summarization approaches are ill-suited due to their runtime complexities.
Summarization and qestion answering for geographic IR
Since we focus on the relations between locations for part of our evaluation, some sum- marization and question answering (QA) approaches with a focus on geographic infor- mation can also be considered related. Text summarization and question answering for geographic concepts have first been extensively addressed in GeoCLEF[74]as part of the Cross Language Evaluation Forum (CLEF). Among the main tasks, the focus has been put on NLP-based geographic search. Thus, it established a hybrid approach between text sum- marization and QA, although neither the extraction of relationships between locations (or geographic entities in general) nor location summarization are a particular focus. Simi- larly, a recent work by Chen et al.[42]also focuses on geographic QA but does not address location summarization or relations between locations. In addition to geographic QA, the more recent NTCIR GeoTime track[72,73]included a temporal dimension for determin- ing answers to NLP-based geographic search queries. Different aspects of summaries for geographic IR have been analyzed and studied by Perea-Ortega et al.[147],who focus on sentences in a document containing geographic entities, but do not specifically study lo- cation summaries or extracted location relations. Thus, while there has been substantial research into geographic question answering, there is no previous work on the extraction of descriptive sentences for complex non-hierarchical relations between locations.
4.3 Entity-centric Summarization
Descriptive sentence extraction
Finally, there are a number of more general approaches that focus on extracting sentences or tags as descriptions. Probably the most popular works on enriching location informa- tion are based on a method proposed by Rattenbury and Naaman[152],in which image tags in Flickr are used to derive semantically rich geospatial image and location descrip- tions. Similarly, Tardi et al.[198] outline an approach for identifying the characteristics of locations from tags that are associated with photographs. However, these frameworks neither utilize large text collections to further improve location descriptions, nor do they investigate descriptive relations between entities.
In contrast, some works in other domains focus on similar, entity-centric tasks and sen- tence extraction. Kim et al. consider the extraction and ranking of sentences based on their usefulness for understanding the reasons behind the sentiments that are expressed in a document for opinion summarization[108].To this end, they rank sentences in opin- ionated texts based on their usefulness for the reader to better understand the reasons behind the sentiments. Biadsy et al. extract summaries targeted at the creation of person biographies by focusing on sentences that are biographical in nature[25].They distinguish between sentences that are biographical in style and those that are not. Since we intend to describe entities in general, not just persons, such a focus would be too narrow in scope. Amitay and Paris summarize websites from external descriptions by looking at sentences that contain hyperlinks to these websites[11],which is conceptually similar to our descrip- tion of entities, but they utilize patterns that are specific to hyperlink anchors that differ from entity mentions. The above approaches are focussed on a specific type of entity or concept (sentiments, persons, or links). In contrast, we consider a broader approach that is useful for entity-centric explanation in general.
An approach for the extraction of support sentences that is compatible with general en- tities is given by Blanco and Zaragoza for the extraction of support sentences. To this end, they identify descriptive sentences from a document collection that describe the entities contained in a textual query[27].A downside of their solutions is the reliance on textual queries as input. Furthermore, they exclude sentences from the output that do not contain any of the input entities. However, since this is likely to occur whenever multiple entities are used as input, this renders their algorithms incompatible with sets of query entities.
To follow a more general approach, we use the entity-centric representation of implicit networks that enables us to rank sentences for multiple input entities or terms, and to extract descriptive sentences for the relations between these entities.
4 Applications of Implicit Networks