Escenarios actuales - El Colegio de la Frontera Sur

[Hamp and Feldweg, 1997] are presenting GermaNet which is a lexical-semantic net for the German language. GermaNet is comparable to WordNet [Miller et al., 1990] which is a lexical-semantic net for English. These lexical-semantic nets are comparable to ontologies by the presentation of particular relations between the words of the certain language. A relation between words in GermaNet is defining semantical information concerning the words. Exemplary relations are synonymy, antonym, metonymy, hyponymy, and so on. These lexicons can directly be helpful for the analysis of tokens in case of Named Entity Recognition, for instance. If a token which has to be classified is part of a hyponomy-relation with a country, the token will probably be a city or a territory. This makes WordNets a good resource for helpful features in Information Extraction.

DWDS also internally uses a lexical net called LexikoNet [Geyken and Schrader, 2006]. LexikoNet only contains nouns. 75.000 nouns are put into 1.200 concepts containing, for instance, instruments, sports teams, human groups, and so on.

[Henrich and Hinrichs, 2010] developed an editing framework for GermaNet called GernEdiT which provides possibilities for the easy and intuitive manipulation of Ger- maNet data. The framework works on a relational dataset containing the GermaNet data. Unfortunately, it cannot be directly used for other lexical nets because the data of those nets has to be transferred to a database first and afterwards the framework would have to be adjusted to handle the language specific structures of that particular lexcial net.

4.3 Summary

We presented the usage of linguistic resources for Information Extraction. In contrast to orthographic or semantic features the use of morphological features require exhaus-

5_{http://www.stratosphere.eu/} 6_{http://www.ukp.tu-darmstadt.de/}

4.3. SUMMARY

tive preparatory work in order to generate the lexicons or to construct the analyzing approach. If that preparatory work finally is done, the morphological features will de- liver useful and rich semantic knowledge.

We presented annotated corpora and lexicons based on linguistic knowledge in particular. Annotated corpora can be used as a gold-standard for supervised learning methods for the creation of parsers, PoS-taggers or NER-systems. These corpora are called tree- banks, and one well-known treebank is the Penn Treebank we presented in detail in this chapter. Linguistic lexicons contain linguistic knowledge for a huge amount of words which are either accessible via an index or based on ad-hoc morphological analysis. We present the German online lexicon DWDS as an example. The specialty of this lexicon is that due to the grammar rules of German it is more complex than a lexicon for English. For some analysis it is not useful to store the information in the lexicon but to do the analysis ad-hoc during runtime.

We have shown in addition that certain languages like German require particular analyzing systems which sometimes are not available for those languages.

Chapter 5

Relation Extraction

In this chapter we will define the task of Relation Extraction as it is used in this work. After a formal and textual definition of Relation Extraction in Section 5.1 we will show the typical feature space which is used for Relation Extraction in Section 5.2. Two ap- plication domains in which Relation Extraction is applied are shown in Section 5.3. We present the broad area of kernels which are used for Relation Extraction in Section 5.4. The specialized tree kernels which are used to access tree-structured attribute values are presented in Section 5.5. In contrast to using different tree kernels it is possible to convert the tree-structures used by them. Methods based on such manipulation of trees are presented in Section 5.6. The datasets we use for Relation Extraction are presented in Section 8.2.1. An intuitive approach for the visualization of (extracted) relations is presented in Section 8.2.3. Section 8.2.2 contains the descriptions and the results of the experiments we made for Relation Extraction. Finally, we subsume this chapter in Section 5.7.

5.1 Definition

The task of Relation Extraction is a subsequent step of named entity recognition. If the task of finding the entities is sufficiently done, one can look for relations between these entities. The scientific field of finding relations between entities has become popular since the ACE-tasks. Especially the relation detection and classification (RDC) task in 2004 [Doddington et al., 2004] has gained much attraction in the scientific community. Following the definition presented in the ACE RDC task [LDC, 2004a] we specify a relation as a valid combination of two entities being mentioned in the same sentence and being related to each other. Relations can be symmetric or asymmetric. In [Had et al., 2009] we defined the schema of i relations in a sentence s as follows:

Definition 19. Relation candidates Riin a sentences: Ri(Sentence s) =< T ypem∈ relationtypes,

where entitiessis the set of entities contained in the current sentence, and relationtypes is the set of possible relations in the corpus.

To build pairs of entities the entities in a sentence or document first of all have to be detected. Making errors in this first step will cause errors in the following steps, too. During the creation of entity pairs one has to take care of the possible approaches to create these pairs. A trivial approach would be to create all pairs which will result in a sparse amount of correct relations in comparison to the incorrect (negative) ones. Therefore, one should create only those pairs which are defined. A relation-type that is only defined to contain arguments of type Person, for instance, should not be used in a creation process of a relation candidate containing other entities than Person. This proceeding would create a smaller number of negative relation candidates. In addition, although some combinations are defined by the relation-types, they never appear in the training set. It is possible to create only those relation-types which are present in the training set to still lower the number of negative relation-candidates.

After creating all possible pairs of entities for Relation Extraction, machine learning techniques can be used to create models for distinguishing between correctly and in- correctly related entities. Multiple machine learning techniques could be used for this task, but the techniques used in most publications are SVMs.

5.2 Feature space

The selection of the feature space for Relation Extraction is crucial because the usage of a great amount of features is possible. In Section 5.1 it became clear that first-of-all the original text – containing the marked entity mentions – has to be converted into examples. These examples have to be enriched by several features which are slightly different from the features we presented for NER in Section 3.2. The examples are extracted from a sentence containing several marked entity mentions. Each example contains information about the two entities it was created for. The other entity mentions contained in the sentence can be used as features, too. In addition to the information offered by the entity mentions, the sentence itself offers many pieces of information which can and should be used as features for future processing. The relative location of the used informational units – words or entities – is very useful. An entity mention which is located between the two entities building the relation, for instance, might offer different informational contents than an entity mention located outside of the part of the sentence bounded by the two relation-entities. Most publications concerning Relation Extractionare using similar feature sets. Nevertheless, some features are only used in particular publications. We will present a list of features which are used in different publications.

[Zhao and Grishman, 2005], for instance, used the features presented in Table 5.1. A token in this context consists of three features concerning a particular word: word, pos and base. word is the particular word itself. pos is the part of speech tag of the information available via word, and base is the base form of word. Additionally,

5.2. FEATURE SPACE

Name Description

arg1 information about the first entity arg2 information about the second entity

seq token vector of the sequence of words between first and second entity link token vector extracted from seq by utilizing the parse tree of the sen-

tence

path the dependency path connecting first and second entity of the dependency parse tree

Table 5.1: Features used for Relation Extraction by [Zhao and Grishman, 2005]

dependencies, which are extracted of the dependency tree (see Section 2.1.4), can be represented in token features by storing the vector of all dependency arcs (arc) in the token for a particular word. An arc contains the word w, the word dw which is con- nected by a dependency arc of a dependency parse tree, the role label label and the direction e of the dependency arc. The pieces of information about the entities contained in arg1and arg2are the (dependency-)token, type, subtype and mtype of the entity. Although [Zhao and Grishman, 2005] use the dependency tree structure of the corresponding sentence for the construction of the feature set, all the features still are flatones.

[Bunescu and Mooney, 2005] also used the dependency of the particular sentences to extract flat features. They took the shortest dependency path between the two particular entities of the relation candidate and built a feature-vector out of it. The nodes of the dependency path which are particular words of the sentence are represented as a small feature vector containing the word itself and additional available information concerning the word like the part-of-speech tag and the named entity class. The dependencies (edges of the dependency tree) are represented as a binary feature indicating if a particular word xi is depending from word xi+1 or if word xi+1 is depending from word xi.

[Bunescu and Mooney, 2006] presented an approach completely waiving the structural information from trees. Using a generalized version of [Lodhi et al., 2002] they used the sequences of words before, between and after the relevant entities of the relation candidate for constructing the feature vector.

[Zhou et al., 2005] published a great set of flat features. They extracted several features concerning the words of the entity mentions building the relation candidate. As an example they extracted the first word before the first entity mention, they extracted the first word after the second entity mention, and so on. In sum they extracted 14 features concerning that word level. Additionally, they extracted the information already encoded in the entity mentions (called entity features). These are the entity types, the combination of the entity types and the mention levels and the combination of the mention levels. Some overlap features are extracted which contain the number of words and

entity mentions between the two entity mentions building the relation candidate. The overlap features also contain features which indicate if the first entity mention contains the second entity mention or vice-versa. In addition, combinations of these overlap features and the entity features are built to generate more effective features. Some features were generated out of base phrase chunking. The dependency and the constituent parse trees were used to generate path features for the relation candidate. Finally, semantic resources like WordNet (see Section 4.2.6) were used to generate features indicating if the entities contain known locations or if the entities indicate a social relationship. [Zhang et al., 2006] were one of the first who used techniques directly processing features containing tree structures. In contrast to the formerly presented approaches which only converted the structured features into flat features for future processing, the approach of [Zhang et al., 2006] directly handles the structured features. Addition- ally, they show how to combine flat features with structured features in order to profit from both types of features.

5.3 Applications

In this section we will show two exemplary applications which benefit from Relation Extraction. The first example is text understanding which aims at the automatic extraction of relevant facts from documents in order to semantically understand the document without having to completely read it. The second example is biological knowledge acquisition. Biological knowledge acquisition aims at extracting new pieces of information out of biological documents. These pieces of information especially are interrelations – between genes, for instance – which are available in text documents but not yet published in databases for much easier access. The Learning Language in Logicchallenge 2005 (LLL05) [N´edellec, 2005] covered exactly that topic.

5.3.1 Opinion Mining

Opinion mining is a field of research which gained much interest over the last years. The task is defined as the automatic extraction of opinions for particular entities. The task is very interesting for many organizations because it can be used to extract opinions or sentiments for certain products sold or produced by companies out of the WWW. The WWW and especially social networks or blogging platforms represent an actual view on the mood and the opinions of the community. The usage of opinion mining techniques therefore can be helpful to gain a current feedback concerning particular products. This feedback can be helpful on the one hand to gain insights of the needs of customers and on the other hand to get information about the drawbacks of a new product which has been just rolled out.

[Jiang et al., 2010] used Relation Extraction as a technique for opinion mining. First of all, they extracted two types of entities: target features and sentiment words. The target featuresare entities for which the opinions or sentiments should be extracted. Target featurescould be products, for example. The sentiment words are words or tokens

5.4. KERNEL USAGE

which are expressing some kind of sentiment. After the extraction of the particular entities the authors are constructing pairs of entities which only contain one target feature and one sentiment word. Finally, machine learning approaches can be used to classify the pairs in order to get to know if a relation is present between the particular entities or if it is not. In addition, if there is a relation it will be classified. Figure 5.1 shows an exemplary sentence containing two target features and two sentiment words. Each target feature is used in a relation candidate with each sentiment word. It becomes obvious that some of the pairs are not related. Especially the tree-structured features which are used by the methods presented in this chapter are helpful in order to extract the correct relations.

Thecomputerisgreatbut thehard driveisdisgusting.

Figure 5.1: A sentence containing two target features (blue) and two sentiment words (red).

In document El Colegio de la Frontera Sur (página 65-0)