though each SWB had a distinct implementation and used different aspects of the SW technology. A new evaluation framework for SWBs was designed and tested on 3 intervention Semantic Web Browsers, with participants recruited from the intervention systems’ real-world target audiences. The control plat- forms were live, real-world systems with substantial numbers of existing users. Using this evaluation framework, all of the initial hypotheses were successfully confirmed or contradicted (Table5.6).
Overall, the framework successfully elicited a range of feedback on 3 distinct Semantic Web technolo- gies. It was found that, although potentially easier to elicit feedback via online questionnaires, observing respondents in a workshop setting provides an excellent opportunity to gather both quantitative and qualitative data from larger numbers of users.
The evaluation showed that users tended to prefer the system (GoPubMed) that had the most mature interface, but were able to use the semantic features of all systems regardless of the interface or types of semantic links presented. The evaluation feedback will contribute directly to future versions of each Semantic Web Browser and there will be further analysis of the weblogs to determine the specific types of semantic links that were or were not used.
Chapter 6
Summary and Future Work
6.1
Open problem 1 revisited
Open problem 1: Word sense disambiguation (WSD) is required for the accurate analysis of text in many applications. Since 2004, the most active domain-specific application area for WSD seems to be bioinformatics (Liu et al., 2004; Schuemie et al., 2005; Edmonds and Agirre, 2006). Classical approaches to WSD use co-occurring words or terms. However, most treat on- tologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms.
We have addressed this problem in Chapter 3, where we used co-occurrences (Term Cooc, Sections 3.2, 3.3.1), document clustering (see Section3.2), the ontology structure (Inferred Cooc, Section3.3.1) and semantic similarity between terms (Closest Sense, Section3.3.1), as well as metadata like the year of publication, journal and abstract title (MetaData, Section3.3.1) in order to perform disambiguation of terms in abstracts of biomedical publications. We furthermore made available a corpus of 2600 documents divided into three datasets of varying quality and quantity that can be used as benchmarks for disambiguation.
The comparison of the methods shows that metadata and training data of high quality are key points for increasing the performance of disambiguation, with up to 96% accuracy (MetaData method, trained on high quality/low quantity dataset). However, the production of high quality training data is a tedious and time-consuming process. When such training data are not available, the co–occurrence of ontology/taxonomy terms can be used for disambiguation with high accuracy. The hierarchical structure of the ontology can also improve the accuracy, especially when the ontology is consistently modelled. In Section 3.3 we have showed that a ‘is a’ hierachy like the Gene Ontology gives higher disambiguation accuracy compared to a ‘narrower than’ hierarchy such as the Medical Subject Headings.
For disambiguation one has to balance between achieving high accuracy and producing training data of sufficient quality and quantity. The MetaData method gave the best results but it required high quality training data, which were hard to produce. The Term Cooc and Closest Sense methods gave lower accuracy than the MetaData. However, they are semi-automated, requiring no manual intervention for training.
Future work on disambiguation
Future work can include several aspects ranging from the use of negative co–occurrences, disambiguation in full-text articles, to a combination of the three methods (Term Cooc, MetaData, Closest Sense) and a decision based on a confidence score for each of the approaches.
While performing disambiguation with the Term Cooc and Closest Sense methods, all terms found in the abstract apart from the term in question were considered as true with respect to the ontology. However, they could as well be ambiguous terms and therefore insert an error into the disambiguation process. In the future, we want to take such ambiguous terms into account as well.
A possible extension could be to correctly identify if a sense occurs that is not included in the ontology and possibly add it. This can potentially be done by setting a threshold. In the Closest Sense approach, from all distances below that threshold, one would be clearly shortest. If not, then this would be the new sense. For the Term Cooc and MetaData methods this could be done by training each method on each sense and if the sense found would be below the threshold, this would indicate a new one.
Another interesting aspect can be the automatic identification of an ambiguous term. So far, the ambiguous terms tested were empirically identified. A more thorough and automated identification pipeline employing WordNet, noun phrase statistics and expert input could be set up.
It would also be interesting to see how the accuracy would change once the disambiguation would be performed in the full text of articles. Co–occurrences could also be computed based on the full-text instead of the abstracts of articles. The number of terms occurring in a document could also be considered in the disambiguation pipeline. We have noticed that in most of the cases where the ambiguous term had one of the false senses, it usually co–occurred with only a few other terms (or in a lot of cases it was the only term in the document).
WSD use cases
In Chapter 5 we demonstrated use cases of word sense disambiguation in ontology-based text-mining and described a user-centred evaluation framework developed to evaluate Semantic Web Browsers. As presented in Section 5.1, the GoPubMed infrastructure can be used with any ontology to search for specific scientific literature. An example of such a search was the mouse-anatomy-specific document retrieval presented in Section5.2, where genes, tissues, and developmental stages of the mouse embryo contained many ambiguities. We additionally described a user-centred evaluation framework developed to evaluate Semantic Web Browsers in Section 5.3, where we mainly focused on the user satisfaction about GoPubMed.
6.2
Open problem 2 revisited
Open problem 2: Which are the common obstacles during the design of an ontology to be used for text mining? Can automatic term recognition (ATR) methods assist the ontology generation process?
In Chapter 4 we presented the experience acquired during the manual development of a lipoprotein metabolism ontology (LMO) that was afterwards used for text-mining. We manually created an ontology for lipoprotein metabolism with 846 terms in total, we derived design principles and systematically evaluated four methods for Automated Term Recognition (ATR).
We have shown that automated predictions of up to 1000 terms generate in the order of 40-50% useful terms. Considering only the top 50 terms generated, the results improve up to 89% average precision for terms that make sense to be included into the terminology (LMO + domain expert1).
Based on the results of the comparison between the manually built terminology and the terminologies extracted from the automatic term recognition methods, we have shown that ATR methods can provide lists of useful domain-specific terms. In this way, ATR methods can aid and speed up the ontology
1Some terms were not included in the manually created terminology because the domain expert missed them. However,
these made sense to be included into the LMO, therefore the LMO + domain expert set contains manual terminology together with automated terminology that was domain related
design process, but the terminologies produced cannot yet replace the manually created terminologies, nor construct ontologies without any expert intervention.
Composite terms - such as the Gene Ontology term ‘hydrolase activity, acting on ester bonds’ or the LMO term ‘receptor-mediated extra-hepatic cellular uptake’ - which do not appear literally in text seem to be a key point for the further improvement of the results.
The terms that were absent from the automatically generated terminologies were grouped into five cat- egories of ranging difficulty: rarely occurring terms (‘test person’, ‘experimentee’), rarely occurring vari- ants of terms (‘insuline resistant’, ‘slo syndrome’), very long terms (‘receptor-mediated extra-hepatic cel- lular uptake’, ‘predominance of large low-density lipoprotein particles’), combinations of terms/variants (‘elevated plasma-tg level’) and, finally, terms that should normally be easily found (‘type-II diabetic’, ‘diabetes type I’, ‘apolipoprotein-c’).
Terms from the latter category were terms that appear often in PubMed and should normally be identified, but were probably absent from the document set used to automatically generate the terminol- ogy. The document selection seems to be another key point for producing terminologies that can cover a whole domain. Much attention needs to be put at the first step in order to collect documents that are specific enough and include detailed terminology, but also general enough in order to include basic terms of interest.
Remaining open problems contain the selection of suitable corpora for term recognition as well as generation of composite terms (such as GO term ‘hydrolase activity, acting on ester bonds’) from basic ones.
Appendix A
Word Sense Disambiguation
collected corpora
The WSD collected corpora used in the experiments can be found under:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2663782/bin/1471-2105-10-28-S1.txt
The three corpora (High quality/Low quantity corpus; Medium quality/Medium quantity corpus; Low quality/High quantity corpus) are given in the form of PubMed identifiers (PMID) for True/False cases for the 7 ambiguous terms examined (GO/MeSH/UMLS identifiers are also given).
Appendix B
User-centered Evaluation of
Semantic Browsers
B.1
Methods
To prove or disprove the hypotheses H1-6, the following questions were considered:
Mobility and travel within the system
• O1: time taken for users to find information or perform tasks
• O2: pathway taken to find information or perform tasks
• O3: use of semantic links compared with non-semantic links
a) Do users use semantic links?
b) Which semantic links are they using - tree, semantic relationships, etc.? c) What percentages of links are non-semantic and semantic?
User attitude and satisfaction
• O4: user satisfaction with the ease of use of the system
• O5: user attitudes to the availability of semantic links and ranking
• O6: user understanding of the SWB:
a) Does the user think it helps him/her find information or complete tasks?
b) Does the user understand how to use the SWB to find such information or complete such tasks?
Sample populations
TableB.1shows the target population and the control platforms and intervention SWBs each population used. The initial aim was to recruit 10 users per intervention SWB. Group A1 included mainly infectious disease clinicians and group A2 molecular biologists.
Population A1 – infectious disease practitioners A2 – molecular biologists
Control NeLI PubMed
Intervention COHSE-NeLI GoPubMed/GoGene
COHSE-NeLI GoPubMed/GoGene