• No se han encontrado resultados

29portan una disminución en el ángulo α de 23

LUXACIÓN QUIRÚRGICA DE LA CADERA

29portan una disminución en el ángulo α de 23

In this section we describe the datasets used in our experiments. First we describe the dataset used in the intrinsic evaluation of domain models in Section 3.6.2.1 and then we discuss the datasets used for evaluating expertise topic extraction in Section 3.6.2.2. 3.6.2.1 Dataset for domain modelling

The proposed method for building a domain model is analysed through an intrinsic evaluation in the Computer Science domain. A dataset was gathered for evaluating

3. DOMAIN ADAPTIVE EXPERTISE TOPIC EXTRACTION THROUGH DOMAIN MODELLING

the extraction of general terms from domain-specific corpora. Our choice of domains was conditioned by the availability of domain experts. The annotation of words for a domain model was done in two steps to reduce the complexity of the task. In the first step one expert analysed an extended list of terms with the purpose of selecting words for a domain model. In the second step several experts analysed a subset of these terms to measure agreement.

In the first step the expert is asked to analyse the nouns used in a taxonomy of Computer Science subjects, the ACM Computing Classification System1. The expert is provided with the list of nouns and their frequency in the taxonomy and is required to identify nouns that refer to generic concepts, that are not specific to a particular subfield of the domain. A set of 80 nouns are selected in this manner including system, information, and software. The complete list of words from this manually constructed domain model of Computer Science is presented in Table 3.2.

algorithm distributed interaction optimisation software

analysis engineering interface pattern solution

approach environment interpretation prediction specification

approximation estimation language probability standard

architecture evaluation machine problem standardisation

automation execution management procedure statistic

classification feature measurement process strategy

computation framework mechanism processing structure

computer generation method processor study

control graphic methodology program synthesis

data hardware metric programming system

database hierarchy modelling protocol technique

definition implementation model reasoning technology

design information module representation theory

development integrated multiparadigm science tool

device integration network service workbench

Table 3.2: A manually constructed domain model for Computer Science

This gold standard is built by one annotator because the task requires analysing and filtering several hundred words. Inter-annotator agreement is computed by analysing a subset of the selected words through a survey that involved 27 participants. More de- tails about the instructions provided to the participants and their anonymised answers

3.6 Experimental setup

can be found in Appendix 1. A quarter of the words identified by the domain expert are randomly combined with the same number of randomly selected words from the rejected list, which are used as negative examples. The participants are given an alpha- betically sorted list of words that contain the positive and negative examples presented in Table 3.3. We used the Fleiss kappa statistic to calculate the interrater agreement. Kappa is 0.34, lying in the fair agreement range.

A qualitative analysis of the answers shows that 80% of the words from our gold standard domain model are selected by at least half of the participants. The positive examples that received less than half of the votes are approximation, execution, gen- eration, and probability. Even so, these words received a minimum of 9 votes each. A possible reason why these words are rejected by the majority of the participants could be that they are arguably too general for the given domain. Another conclusion is that although the gold standard list is overall considered to be correct by a large number of participants, the list is not exhaustive as two words that were initially rejected by the first annotator (i.e., concept and user ) are considered to be correct by a majority of experts.

Positives Negatives

algorithm machine agent key

analysis optimisation circuit-switching mathematics approximation probability concept moment

data processing connection multimedia

definition protocol debuggers sorting

execution service depth speech

feature solution directory supplier

generation standard framebuffer text

integration technology frames user

language theory grayscale word

Table 3.3: Positive and negative examples used in the domain modelling survey

3.6.2.2 Datasets for expertise topic extraction

In this section we describe the datasets used to evaluate our approach for expertise topic extraction. We make use of a dataset provided by the organisers of the Semantic Evaluation (SemEval) workshop for the task of Automatic Keyphrase Extraction from Scientific Articles. We also rely on three domain-specific datasets to evaluate how well our approach performs across domains.

3. DOMAIN ADAPTIVE EXPERTISE TOPIC EXTRACTION THROUGH DOMAIN MODELLING

SemEval 2010 collection

The SemEval 2010 competition included a task targeting the Automatic Keyphrase Extraction from Scientific Articles [KMKB10]. Given a set of scientific articles partic- ipants are required to assign keyphrases extracted from text to each document. We participated in this task with an unsupervised approach for keyphrase extraction that does not only consider a general description of a term to select candidate keyphrases but also takes into consideration context information [BB10a].

The SemEval task organizers provided two sets of scientific articles, a set of 144 documents for training and a set of 100 documents for testing. The collection con- sists of ACM publications from four subdomains (i.e., C.2.4 Distributed Systems, H.3.3 Information Search and Retrieval, I.2.6 Learning and J.4 Social and Behavioral Sci- ences). The average length of the articles is between 6 and 8 pages including tables and pictures. Three sets of answers are provided: author-assigned keyphrases, reader- assigned keyphrases and combined keyphrases (combination of the first two sets). The participants are asked to assign a number of exactly 15 keyphrases per document. All reader-assigned keyphrases are extracted from the papers, whereas some of the author- assigned keyphrases do not occur explicitly in the text. Several alternations of genitive keyphrases are accepted, for example policy of school, school policy, and school’s pol- icy are all considered to be correct. In case that the semantics changes due to the alternation, the alternation is not included in the answer set.

The SemEval dataset has several drawbacks that make it of limited use for our evaluation of domain modelling. The dataset contains a relatively small number of documents from a small number of domains. While it allows us to evaluate the per- formance of our system for keyphrase extraction, it does not allow us to analyse if we are able to model multiple domains. For this reason we also consider several other domain-specific datasets described in the next section.

Domain-specific collections

Domain independence is an important requirement for expertise topic extraction, as no assumption can be made about the application area of the expertise mining system. The system should achieve acceptable results on academic content from various scientific areas as well as enterprise documents. For this purpose we consider three corpora