APLICACIÓN DEL MARCO TEÓRICO DE LA INVESTIGACIÓN En el siguiente capítulo se presenta los aspectos teóricos relacionados con la
PRESENTACIÓN DE LOS RESULTADOS DE LA INVESTIGACIÓN
3.3. CONCLUSIONES Y RECOMENDACIONES DE LA INVESTIGACIÓN
Chapter5reviewed the activities and developments of research work in QA. For a given question, a participating system is required to provide one answer for the factoid question or a list of distinct answers for the list question. An eligible response for submission includes an answer string and the identification of a answer-bearing document. When all submissions are complete, TREC recruited human assessors judge the correctness of the answer to the question based on the content from the document.
Each year after the completion of system evaluation, TREC would release a collection of gold standard answers for the question set, including answer patterns, which are regular expressions for matching answering strings derived from correct answers in all responses from participating systems. Answer patterns are used to match answer-bearing passages from relevant documents. Although this method matches passages which only contain the answer but no enough contextual evidence to answer the question, the aggregation of crowdsourcing judgement is used to easily distinguish those false positive passages.
Each answer pattern consists of question id, answer regexps and document ids, the format is listed in Table6.1. The question 3.3: In what country was the Hale Bopp comet visible on its last return? have correct answers including Australia, China, Panama and United States. Each of them maps to an answer pattern in Table6.1. The answer United States is correctly found in several documents with identifiers of XIE19960217.0069, XIE19960105.0039, etc.. Table6.2shows examples of passages from relevant documents matched by answer patterns. The first passage supports its answer “China” to question 3.3, while the second does not.
Ques. id Regexp Document ID List
3.3 Australia XIE19960321.0254
3.3 (Chinese|China) XIE19960405.0124 XIE19970319.0243 . . .
3.3 Panama XIE19970318.0242
3.3 (United States| XIE19960311.0115 XIE19960409.0120 America|US) XIE19960217.0069 XIE19960105.0039 . . .
Table 6.1.. Example of Answer Patterns.
Judgement Passage for annotation
Supportive Hale–Bopp, a newly-discovered extraordinarily large comet in the solar system, has been recently observed for the first time in China.
Unsupportive At 11.39 p.m. local time, “only a very thin rim will be seen over the orange face” of the moon, a color that will become brighter in the shadow of the earth, the Panama Canal Commission said.
Table 6.2.. Support judgement of passages matched with answer patterns.
Tellex et al. [93] quantitatively evaluated the effectiveness of several passage retrieval algorithms for QA by including the answer patterns in strict and lenient judgement of the passage relevance. The strict scoring determines whether a passage matches the answer pattern and appears in a supporting document, while lenient score requires only pattern matching. These scoring generalized the evaluation metrics for document retrieval to passage retrieval, however more fine-grained supporting passages are needed for further analysis and evaluation. To achieve this goal, we focused on creating supporting passage corpus for list QA, which is much harder to solve than factoid QA and has not been thoroughly studied.
6.3. Experiment Design
Currently there is no such dataset of question-supporting texts for TREC list question task. The purpose of our work is to contribute to the development of QA systems by providing a new corpus, which include pairs of a question and a passage which supports its containing answer to the question. The applications of IR, IE and NLP techniques in QA will benefit from the fine-grained annotated dataset.
For the factoid question answering, Kaisser et al. [47] constructed the corpus of support- ing sentences for factoid questions by running annotation tasks via AMT and postpro- cessed MTurkers’ results with the validation of specialists. In contrast to expensive and time-consuming relevance judgement by few assessors [100], AMT offers a web-based solution to quickly and cheaply annotate supporting compact excerpt in the relevant docu- ments. Our work is not only to construct the corpus of supporting passages for list question task, but also to investigate and compare various automatic learning methods to select true annotations. Those learning methods can largely improve the quality of annotation from AMT and save the cost and time of post-processing as in Kaisser et al [47], which makes it possible to implement crowdsourced annotation for large-scale and successive linguistic data annotation.
Following the workflow introduced in Section6.2, we conduct the data collection in following steps:
1. Data Generation. We first use passage bound tags to break each document into several passages. We select passages matched by answer patterns to be candidates, which will be presented together with the question to MTurkers.
2. HIT design. The principle of HIT designing is to appropriately organize and present the annotation data so that annotators can easily understand the requirement and execute the annotation. We therefore keep the task description succinct and set up a straightforward user interface. We adjust the qualification and number of MTukers to reach the best preliminary results through several dry runs.
prove over the majority voting of multiple annotations, various methods are explored to enhance the quality of annotations.
Details will be introduced in following sections and chapter.