• No se han encontrado resultados

SISTEMA NACIONAL DE ALERTA Fuente: ONEMI

CAPÍTULO IV: DE LOS PROCEDIMIENTOS

D. SISTEMA NACIONAL DE ALERTA Fuente: ONEMI

We cast the two extractions as a multi-granularity categorization task of two levels, one at sentence-level and the other at word-level:

Key Sentence Classification: We use a five-class scheme as listed in Ta-

ble3.4. The first three classes map to PICO elements: patient → P, intervention → I/C, and result → O. In addition, we also have a fourth class, study design, which indicates the strength of evidence of a study for users, and a fifth class, research goal, which helps them determine whether a study is likely to provide useful information to the clinical questions they have in mind.

Keyword Classification: We use six classes for words as listed in Table3.5. The first four cover the SCORAP of patient demographics (as described in Ta- ble 3.1): sex → S, condition → CO/P, race → R and age → A. The last two are introduced to extract the names of intervention and study design.

Table 3.5: Classes for words.

Name Definition Example

Sex The sex of the patients. male, female

Age The age (group) of the patients. 54-year-old, children

Race The race of the patients. Chinese, Indian, Cau-

casian Condition The condition of the patients, usu-

ally a disease name.

COPD, asthma Intervention The name of the procedure applied

to the patients.

intramuscular inacti- vated vaccine

Study Design The name of the design of the study. cohort study, RCT

These two classifications can be described by the nodes and edges in the seg- ment layer of our correlation graph by instantiating the segment to be sentences and the sub-segment to be words in the sentences as shown in Figure3.3. If we only consider the correlations at their respective levels, the types of sentences and words can be determined by their features (i.e., observable characteristics) and the types of their neighbours.

This translates into our baseline model, the independent model, as shown at the top left corner of Figure 3.4. In this model, the two classifications are performed independently of each other. The words in the same sentence are categorized together and the type of a word is determined based on its features and the types of the other words in the same sentence. In contrast, the sentences are categorized individually based on their own features. We have decided not to consider the types of neighbouring sentences because of two reasons. First, taking the types of neighbouring sentences into consideration would require much more sentences to be annotated and used for training. This would significantly increase the time and effort needed. Second, it would also greatly increase the complexity of one of the models we are going to introduce later. Therefore, the correlation between the type of a sentence and the ones of its neighbours is omitted from all our models.

Nevertheless, as discussed earlier, a suitable technique should address both levels of classifications since they are equally important for the extraction of key information. For this purpose, the correlation between these two classifications, as represented by the edge between sentence type and word type in Figure3.3, needs to be exploited. These correlations can be observed through a closer

Figure 3.3: Correlations exploited for Resource Categorization on nominal facets.

inspection of our classes:

Take the patient class from key sentence classification and the sex, age, race and condition classes from keyword classification as an example. These two sets of classification tasks are correlated: If a sentence is classified as a patient sen- tence, its words are more likely to represent the patients’ sex, age, race and con- dition. Likewise, if the words in a sentence have been categorized into one of the sex, age, race, condition classes, this sentence is likely to be a patient sentence. Similar correlations can be identified between the study design/intervention sen- tence class and the corresponding keyword class.

A straightforward approach for exploiting this correlation is to perform the classifications in sequence so that the results from the earlier classification can be incorporated into the later one. This gives rise to the two pipelined models we propose, as shown at the top right and bottom left corners of Figure 3.4. In the sentence-first model, key sentence classification is performed first and the resulting sentence class labels are added as evidence for keyword classification (added to the feature vectors of keyword classification as additional features). In the word-first model, this process is done in the opposite direction; i.e., keyword classification is performed first and the resulting word labels are added to the feature vectors of key sentence classification.

While these two models are able to incorporate information from the earlier classification into the later one, there is no way for the earlier classification to benefit from the later. Consequently, classification performance can improve on one level but not both. To overcome this problem, we investigate a fourth, joint

Figure 3.4: Four models for multi-granularity Resource Categorization of two levels.

model as shown at the bottom right corner of Figure 3.4. It is basically the unrolled version of Figure 3.3 without the looping edge at the sentence type node. In this model, the two levels of classifications are mutually informed of each others’ results via joint inference. Therefore, the sentence labels now may influence the prediction of the word labels and vice versa. While often advantageous to performance, the joint model significantly increases the model complexity of the classifier and the training time. Note that if the looping edge were to be included in this model, the resulting model would have been prohibitively expensive to train since it would have contained all the sentences and words from the same article.

In addition, we observe from our inspection that the sentence classes are not mutually exclusive. As shown in Figure 3.2 and the intervention sentence example in Table3.4, a sentence may contain more than one type of information. We compare two common approaches to achieve this soft classification. The first is to use train a multi-class classifier on super classes which are the supersets of the existing classes, and classify the sentences into one of these super classes.

Table 3.6: Features for key sentence classification.

Group Examples

Token n-grams (sequences of n words, where 1≤ n ≤ 3) of the sentence. Sentence Length of the sentence and its position in the paragraph and in

the article. Named

Entity

Whether the sentence contains person name, location name and organization name.

MeSH Whether the sentence contains MeSH terms and their categories among the 16 top categories of the MeSH tree.

Lexica Whether the sentence contain a word which appears in the age/sex/race wordlist. All these wordlists contain common words found in the corpus which indicate age, sex and race, respectively.

The classes a sentence belongs to are then all the classes that make up this par- ticular super class. For example, if we only consider the patient and intervention classes, a multi-class classifier can be trained on three super classes: patient, intervention, and patient & intervention. The sentences that are classified into these super classes will be considered to belong to the patient class, the inter- vention class and both the patient and intervention classes, respectively. The second method is to train one one-against-all classifier for each class: A sentence belongs to a class as long as the corresponding classifier reports positive.

Factoring these two possible approaches into our models, we have eight can- didate models in total.

We implement these models using Conditional Random Fields (CRFs), not only because it is the state-of-the-art model for information extraction, but also because its structure can be arbitrarily defined such that different instances from different classification problems can be learned in the same model. This feature allows us to build the necessary joint model. For accuracy concerns, we use an exact inference algorithm: the junction tree algorithm, from the GRMM package2 for the joint model. We use the MALLET package3 for the others.

The feature sets for key sentence classification and keyword classification can be found in Table 3.6 and 3.7respectively. Both feature sets consist of generic text classification features, such as n-grams and named entity information, as well as domain-specific features, such as MeSH terms and class-specific lexica.

As shown in our previous work [Zhao et al., 2010], all the listed features

2

http://mallet.cs.umass.edu/grmm/

Table 3.7: Features for keyword classification.

Group Examples

Token The word itself, its stem and its part-of-speech tag.

Phrase Position of the word in the phrase and the head noun of the phrase if it is a noun phrase.

Named Entity

Whether the word is part of a person name, location name or organization name in the sentence.

MeSH Whether the word is part of a MeSH term and the categories of that term among the 16 top categories of the MeSH tree.

Lexica Whether the word appears in the age/race/sex wordlist. The wordlists are the ones used in key sentence classification.

contribute positively to the two classifications. For example, token features are crucial to key sentence classification as removing them can lead to significant drop in performance, while MeSH and lexica features play important roles in keyword classification by covering the vocabulary for the classes.

Documento similar