LUXACIÓN QUIRÚRGICA DE LA CADERA
29portan una disminución en el ángulo α de 23
In this section we describe the datasets used in our experiments. First we describe the dataset used in the intrinsic evaluation of domain models in Section 3.6.2.1 and then we discuss the datasets used for evaluating expertise topic extraction in Section 3.6.2.2. 3.6.2.1 Dataset for domain modelling
The proposed method for building a domain model is analysed through an intrinsic evaluation in the Computer Science domain. A dataset was gathered for evaluating
3. DOMAIN ADAPTIVE EXPERTISE TOPIC EXTRACTION THROUGH DOMAIN MODELLING
the extraction of general terms from domain-specific corpora. Our choice of domains was conditioned by the availability of domain experts. The annotation of words for a domain model was done in two steps to reduce the complexity of the task. In the first step one expert analysed an extended list of terms with the purpose of selecting words for a domain model. In the second step several experts analysed a subset of these terms to measure agreement.
In the first step the expert is asked to analyse the nouns used in a taxonomy of Computer Science subjects, the ACM Computing Classification System1. The expert is provided with the list of nouns and their frequency in the taxonomy and is required to identify nouns that refer to generic concepts, that are not specific to a particular subfield of the domain. A set of 80 nouns are selected in this manner including system, information, and software. The complete list of words from this manually constructed domain model of Computer Science is presented in Table 3.2.
algorithm distributed interaction optimisation software
analysis engineering interface pattern solution
approach environment interpretation prediction specification
approximation estimation language probability standard
architecture evaluation machine problem standardisation
automation execution management procedure statistic
classification feature measurement process strategy
computation framework mechanism processing structure
computer generation method processor study
control graphic methodology program synthesis
data hardware metric programming system
database hierarchy modelling protocol technique
definition implementation model reasoning technology
design information module representation theory
development integrated multiparadigm science tool
device integration network service workbench
Table 3.2: A manually constructed domain model for Computer Science
This gold standard is built by one annotator because the task requires analysing and filtering several hundred words. Inter-annotator agreement is computed by analysing a subset of the selected words through a survey that involved 27 participants. More de- tails about the instructions provided to the participants and their anonymised answers
3.6 Experimental setup
can be found in Appendix 1. A quarter of the words identified by the domain expert are randomly combined with the same number of randomly selected words from the rejected list, which are used as negative examples. The participants are given an alpha- betically sorted list of words that contain the positive and negative examples presented in Table 3.3. We used the Fleiss kappa statistic to calculate the interrater agreement. Kappa is 0.34, lying in the fair agreement range.
A qualitative analysis of the answers shows that 80% of the words from our gold standard domain model are selected by at least half of the participants. The positive examples that received less than half of the votes are approximation, execution, gen- eration, and probability. Even so, these words received a minimum of 9 votes each. A possible reason why these words are rejected by the majority of the participants could be that they are arguably too general for the given domain. Another conclusion is that although the gold standard list is overall considered to be correct by a large number of participants, the list is not exhaustive as two words that were initially rejected by the first annotator (i.e., concept and user ) are considered to be correct by a majority of experts.
Positives Negatives
algorithm machine agent key
analysis optimisation circuit-switching mathematics approximation probability concept moment
data processing connection multimedia
definition protocol debuggers sorting
execution service depth speech
feature solution directory supplier
generation standard framebuffer text
integration technology frames user
language theory grayscale word
Table 3.3: Positive and negative examples used in the domain modelling survey
3.6.2.2 Datasets for expertise topic extraction
In this section we describe the datasets used to evaluate our approach for expertise topic extraction. We make use of a dataset provided by the organisers of the Semantic Evaluation (SemEval) workshop for the task of Automatic Keyphrase Extraction from Scientific Articles. We also rely on three domain-specific datasets to evaluate how well our approach performs across domains.
3. DOMAIN ADAPTIVE EXPERTISE TOPIC EXTRACTION THROUGH DOMAIN MODELLING
SemEval 2010 collection
The SemEval 2010 competition included a task targeting the Automatic Keyphrase Extraction from Scientific Articles [KMKB10]. Given a set of scientific articles partic- ipants are required to assign keyphrases extracted from text to each document. We participated in this task with an unsupervised approach for keyphrase extraction that does not only consider a general description of a term to select candidate keyphrases but also takes into consideration context information [BB10a].
The SemEval task organizers provided two sets of scientific articles, a set of 144 documents for training and a set of 100 documents for testing. The collection con- sists of ACM publications from four subdomains (i.e., C.2.4 Distributed Systems, H.3.3 Information Search and Retrieval, I.2.6 Learning and J.4 Social and Behavioral Sci- ences). The average length of the articles is between 6 and 8 pages including tables and pictures. Three sets of answers are provided: author-assigned keyphrases, reader- assigned keyphrases and combined keyphrases (combination of the first two sets). The participants are asked to assign a number of exactly 15 keyphrases per document. All reader-assigned keyphrases are extracted from the papers, whereas some of the author- assigned keyphrases do not occur explicitly in the text. Several alternations of genitive keyphrases are accepted, for example policy of school, school policy, and school’s pol- icy are all considered to be correct. In case that the semantics changes due to the alternation, the alternation is not included in the answer set.
The SemEval dataset has several drawbacks that make it of limited use for our evaluation of domain modelling. The dataset contains a relatively small number of documents from a small number of domains. While it allows us to evaluate the per- formance of our system for keyphrase extraction, it does not allow us to analyse if we are able to model multiple domains. For this reason we also consider several other domain-specific datasets described in the next section.
Domain-specific collections
Domain independence is an important requirement for expertise topic extraction, as no assumption can be made about the application area of the expertise mining system. The system should achieve acceptable results on academic content from various scientific areas as well as enterprise documents. For this purpose we consider three corpora