• No se han encontrado resultados

PERSONAL DE LA GERENCIA DE TRANSPORTE Y ALMACENAMIENTO NORTE

PROCEDIMIENTO RECLUTAMIENTO Y PRESELECCIÓN EXTERNO

4.2.1.9. Políticas del Procedimiento

In the previous sections we first enumerated common tasks involved in natural language processing applications and then we discussed computational narrative, its applications and the different models used. In this section we delve in the intersection of these efforts and present the state-of-the-art in information extraction from narrative and discuss the most common architecture used in related implementations.

2.6.1

Automatically Extracting Narrative Information

Automatically extracting narrative information can be seen as a specialized case of information extraction. Except for a few recent exceptions, automatically extracting narrative information from unannotated text has not received a lot of attention. Character identification, related to named entity recognition is a crucial step in extracting and identifying narrative information. Goyal et al.’s AESOP system16 explored how to extract characters and their affect states from textual narrative in order to produce plot units145 for a subset of Aesop fables. The system uses both domain- specific assumptions (e.g., only two characters per fable) and external knowledge (word lists and hypernym relations in WordNet) in its character identification stage. Chambers and Jurafsky15 proposed using unsupervised induction to learn what they called “narrative event chains” from raw newswire text. In order to learn Schankian script-like information about the narrative world, they use unsupervised learning to detect the event structures as well as the roles of their participants without pre-defined frames, roles, or tagged corpora. Regneri et al.146worked on the specific task of identifying matching participants in given scripts in natural language text using semantic (WordNet) and structural similarities. Schank and Abelson147use scripts as a formalism drawing from cognitive science as an attempt to natural language understanding. Calix et al.13 proposed an approach for detecting characters in spoken stories. Based on features in the transcribed textual content using common sense information and speech patterns (e.g., pitch), their system detects characters through supervised learning techniques. Bamman et al.56extract characters, their attributes and the actions in which they participate and propose an unsupervised approach to identify latent character archetypes or personas by clustering their stereotypical actions and attributes. More recently, Li et al.10;148 proposed a combination of NLP techniques and crowdsourcing to acquire narrative and procedural information about a specific situation.

Some of the aforementioned work focuses on capturing and identifying interactions and character relationships in the narratives using formalisms such as verb frames and logic clauses based on verbs tuples8;15;17–19. A specific case of narrative information extraction is the information about interactions encoded or implicit in dialogue. In this context, Elson and McKeown worked on the

Sentence Segmentation

Text POS Tagging Parsing

Word Sense Disambig. Coref. Resolution NER/IE Tokenization Verb arg. ext. / SRL Knowledge Augmentation Computational Model

Figure 2.13: Typical NLP pipeline, specially prevalent in applications related to information extraction and text understanding.

problem of quoted speech attribution149from text8. In their work, they definedsyntactic categories for quoted speech instances and then derivedrulesto assign the speaker for each of them. O’Keefe et al. 150treated the problem of quoted speech attribution as a sequence labeling task. They were able to remove the use of gold-standard information and achieve similar performance on newswire text. More recently, Muzny et al.151 also tackled the problem and proposed supervised method using a sieve approach. Their method greatly improved the performance of previous state-of-the-art. In our work, we extend their approach by automatically learning thesyntactic categories and therules to identify the speaker and intended listener for each single instance of quoted speech. Other work on extracting information from text include the use of regular expressions and specialized structures such as Tregex44 which use a set of hand-authored extraction patterns. Riloff and Philipp45 developed an IE system that was able to learn extraction patterns from examples.

NLP pipelines are typically embedded in larger systems used for information extraction and text understanding applications. In such applications, usually the linguistic tasks are complemented with augmentation steps that combine and augment the extracted information with common sense or domain specific knowledge in order to enable further processing and inference. For example, the

WordsEyesystem11by Coyne et al.58;130is a text-to-scene system that creates 3D scenes from natural language descriptions. The system implements an NLP pipeline similar to the one presented in Figure

2.13which is combined with a lexical database of common sense knowledge (WordNet, FrameNet and a custom scenario-based lexical knowledge resource) to extract a symbolic representation of the scene. The system converts dependency structures into semantic nodes and roles representing spatial relationships and visual attributes. The system relies on a large database of 3D models and poses

Figure 2.14: Frames from an animated sequence generated from a text report in natural language by CarSim9.

for entities and actions. The extracted structure is mapped to the database and finally rendered into an image. Similarly,CarSimby Johansson et al.9is a system that automatically converts narratives in the traffic domain into animated 3D scenes. It is intended to be a tool for visualizing traffic situations from text reports in natural language. It also implements an NLP pipeline similar to the one presented in Figure2.13and extracts entities, events, relations and environment attributes separately. Then infers implicit information using a spatio-temporal planning and inference module that produces a full geometric description of the extracted symbolic representation from the text. Time descriptions and the output of the planning module are used to compute trajectories and generate an animation. Figure2.14shows four frames from an animated sequence generated from a text report in natural language by CarSim.

2.6.2

Evaluation of NLP Pipelines

The pipelined architectures described so far have been used extensively in NLP and information extraction applications. These systems usually integrate several natural language preprocessing modules (e.g., the Stanford CoreNLP35). Basic modules used in NLP pipelines have been stud- ied extensively, often within the context of shared task competitions (e.g., Stanford’s coreference resolution system at the CoNLL coreference resolution shared task29). These studies are typically conducted on a single module isolated from a pipeline since no standard methodology exists to evaluate how error propagates in pipelined architectures. Margaretha & DeVault152 tackle the is- sue of automated evaluation of pipeline architectures in natural language dialogue systems using a Wizard-of-Oz approach and simulations of the pipeline process. In related work, Punyakanok el al.50 combine different systems as modules of a single pipeline and study the quality of the information Chapter 2: Background and Related Work 2.6 Information Extraction

contributed by two different NLP systems for the task of semantic role labeling. When evaluating the performance of specific tasks or modules, most approaches agree on using a ground truth dataset and common counting metrics such as precision and recall153. In some tasks, such as coreference resolution, evaluating the accuracy is still contentious. Various alternative metrics have been pro- posed which weight different features of the coreference problem48. Punyakanok el al.50 also discuss using alternative task-specific evaluation instruments on top of counting metrics.