3. LA NUEVA MOVILIDAD DE LA CIUDAD NEOLIBERAL
4.1. La ciudad dual vista desde los mapas mentales
4.1.4. Erick
In this chapter, we approached RQ1 and RQ2 through an empirical analysis of the evolution of biomedical annotations and its relation to the KOS changes. For this, we used a set of documents annotated with GATE and NCBO Annotator using 13 different versions of two well-known biomedical KOS (ICD-9-CM and MeSH). We observed that there was a correlation between KOS and annotation changes. We then regrouped the annotation changes according to the type of information modified and the way it was modified. We obtained five different cases of changes (see Section 2.3) and verified how the annotations evolved during the KOS evolution. In a second step, we analysed different annotation models in order to verify whether they could represent (or whether we could infer from their elements) all the criteria required to classify the annotation
changes. As a result of this step, we proposed an extended annotation model designed to support the evaluations and maintenance of annotations utilized in our maintenance method described in the next chapter.
Chapter 3
Direct maintenance of semantic
annotations
Contents
3.1 Existing approaches for maintaining semantic annotations valid over time . . . 21 3.2 Proposed approach for adapting semantic annotations . . . 23 3.3 Experimental assessment . . . 29 3.4 Results . . . 31 3.5 Discussion . . . 33 3.6 Conclusion . . . 34 In order to answer RQ3 (How can we automatically maintain the validity of semantic annotations without re-annotating all document content when a KOS evolves? ) discussed in chapter 1, we designed an automatic approach for maintaining semantic annotations valid over time when the underlying KOS evolves. To do this, we designed a rule-based approach that considered the findings from chapter 2, which highlighted different aspects to take into account for the maintenance of semantic annotations.
Besides the set of rules, we used two other methods to improve the quality of our maintenance method, detailed in section 3.2. The first relies on the use of background knowledge (BK) [Pruski et al., 2016], while the second one exploits semantic change patterns (SCP) [Dos Reis et al., 2015a].
Thus, we divided this chapter up as follows. In section 3.1, we discuss the related work on semantic annotation maintenance processes and highlight possible improvements. Section 3.2 presents our method to overcome the limitations of existing approaches. In section 3.3 and 3.4, we describe the experiments and results, respectively. Finally, we discuss the results and highlight our contribution in section 3.5.
3.1
Existing approaches for maintaining semantic annotations
valid over time
One possible solution to cope with the evolution of annotations is the re-annotation of documents [Tissaoui et al., 2011]. However, Funk et al. [Funk et al., 2014] point out that concept recognition systems vary from ontology to ontology and do not perform equally on natural language texts. Furthermore, the necessity of validating automatic generated annotations is a laborious and time-consuming task for domain experts [Do˘gan et al., 2014]. Therefore, it is vital to propose
advanced methods and tools able to automatically maintain semantic annotations impacted by KOS evolution and/or changes in the annotated data or documents.
In the literature, three families of approaches dealing with annotation maintenance can be found. The first addresses the problem of automatic detection of inconsistent annotations [Eilbeck et al., 2009, Qin and Atluri, 2009, K¨opke and Eder, 2011, Zavalina et al., 2015]. This is mainly done by the combined identification of concepts that have changed from one KOS version to the next and the set of annotations associated with them. However, mechanisms to support the correction of impacted annotations are not proposed.
The second family of approaches focuses on the automatic detection and manual correction of invalid annotations by using standalone applications [Maynard et al., 2007, Auer and Herre, 2007, Burger et al., 2010, Park et al., 2011]. These approaches only consider basic ontology changes, e.g., the deletion and addition of concepts in KOS. Nevertheless, more complex changes are also important and also need to be considered. Moreover, the requirement of human intervention to perform the maintenance is hardly applicable in the medical domain by virtue of the huge amount of annotations to be adapted.
Lastly, most advanced works implement an automatic detection and correction of the annota- tions [Luong and Dieng-Kuntz, 2006, Abgaz, 2013, Frost and Moore, 2014]. These approaches are based on different techniques, each of which is briefly described in this chapter.
[Luong and Dieng-Kuntz, 2006] developed the CoSWEM framework to investigate the evolution of annotations and to maintain them using a rule-based approach to detect and correct basic inconsistencies, such as deletion. This approach converts ontologies to RDF(S) files and detects annotations affected by the evolution of the ontologies, as well as potentially inconsistent annotations using CORESE. This work focuses on expressive and small-sized ontologies and can hardly be applied to large biomedical ones, because the implemented reasoning techniques require the power of description logics (not always used in biomedical controlled terminologies) to decide on the validity of the annotations.
[Abgaz, 2013] developed methods to facilitate the evolution of ontology-driven content management systems (OCMS). The evolution is done by analysing the impacts of change operations and selecting an optimal evolution strategy before the changes are permanently implemented. The proposed strategies, i) no-action, ii) cascade, iii) attach-to-parent, and iv) N-level cascading, were based on reasoning techniques and mostly deal with removal of concepts/meta-data as described below:
1. No-action strategy: This states that a given change operation, e.g. adding a concept, is implemented without adding consequential or corrective changes. For instance, even after the addition of a new class in the ontology, e.g. avian influenza, the documents referring to it and annotated with the class influenza will not be adapted.
2. Cascade strategy: This is the opposite of the no-action strategy. In this case, the changes will be propagated throughout the class and annotations. However, they only deal with cases of removal by propagating the deletion to all dependent entities.
3. Attach-to-parent strategy: This states that when a certain entity is deleted, its depen- dent entities are linked to the parent whenever it appears.
4. N-level cascading: This is a specific type of the cascade strategy. This strategy is applied to ontology classes that are found N distances from the target class. For example, if N is set to 2, the N-level cascading will apply the changes to up to two hierarchies.
[Frost and Moore, 2014] propose a novel algorithm for optimizing gene set annotations to best match the structure of specific empirical data sources. The proposed method uses entropy minimization over variable clusters (EMVC). It filters the annotations for each gene set to remove
inconsistent annotations. The results show that EMVC can filter between 92% and 67% of the inconsistent annotations from MSigDB C4 v4.0 cancer modules using leukemia data and MSigDB C2 v1.0 using p53 data, respectively. This method is able to improve the annotations but does not produce good results for improving incomplete gene sets or identifying new gene sets. It is very sensitive to several algorithm parameters, specifically, the cluster method and it can be computationally expensive. Furthermore, the authors point out that EMVC only works in the gene set domain, meaning other domains cannot take advantage of this approach.
The literature review highlights that there is no annotation maintenance/adaptation frame- work able to cope with the specificity of the medical domain e.g., size of the KOS, number of annotations. Therefore, in this chapter we present a method to automatically maintain semantic annotations when the used KOS evolves. The method discussed in the upcoming sections is related to the Direct maintenance of semantic annotations, i.e., the first use case discussed in chapter 1. For this purpose, we proposed a set of rules based on the rigorous analysis of the evolution and adaptation of a set of annotations over a ten-year period of time, described in Chapter 2 [Cardoso et al., 2016].