INNOVACIÓN Y NUEVAS TECNOLOGÍAS - Seducir matematicamente?

The evaluation of GoPubMed took place in the form of a workshop with 20 students from the Masters Program in Molecular Bioengineering at the BIOTEC, TU Dresden. The students were first given a short introduction to GoPubMed/GoGene functionality. They later had to fill in a pre-questionnaire concerning their scientific background and their way of searching for scientific literature. They were then given a series of tasks to perform with the use of PubMed, GoPubMed and GoGene (see following subsection tasks – main test). The workshop finished with the students answering a post-questionnaire comparing the unmodified system (PubMed) to the modified one (GoPubMed).

Pre-questionnaire

Based on the filled-in pre-questionnaires, 45% of the participants had a main University degree in Biology, 25% in Engineering and 30% in other fields. Their experience in this field was for the 65% under 5 years and for the 35% 5 years or more. Only 6.25% of the participants use PubMed daily, 31.25% more than once a week but not as often as daily, 12.5% once a week and the 50% of the participants use PubMed for more than once a month but not as often as once a week. On average, they rate the usefulness of PubMed to 68.75% Concerning what they search for in PubMed, the participants have chosen between the following answers (being able to choose more than one):

75% I look for specific articles

75% I search for reviews for an overview 19% I look for papers in a specific journal 50% I look for papers of specific authors

62.5% I look for papers for specific diseases, genes, etc. 31% I look for the most recent papers.

Concerning the use of other search engines for their research, the participants have chosen between the following answers (being able to choose more than one):

94% PubMed21

31% Google Scholar22

100% Google or other web search 6% Scopus23

19% Other specialist

21_{See http://www.ncbi.nlm.nih.gov/pubmed/}

22_{See http://scholar.google.com/}

Tasks – Main Test

The participants were given a set of questions they should answer with the use of PubMed and a set of similar questions to be answered with the use of GoPubMed. These were especially conceived so as not to favour one of the two systems (PubMed vs. GoPubMed) and to avoid as much as possible the bias for a positive opinion towards GoPubMed. The participants were told not to spend more than 8 minutes to answer each question. The participants were divided into two groups, answering half of the questions with PubMed and the rest with GoPubMed and vice-versa. The two groups of questions were the following:

Group A

1. Which particular diseases are associated most often with HIV?

2. What kinds of diseases are also related to HIV?

3. Which techniques of treatment are used to help HIV patients?

4. Who are the top authors for Antiretroviral Therapy?

5. Where was this research done by those authors?

6. Which are leading centres for liver transplantation?

7. Which are leading scientists for liver transplantation?

8. Is the research on leukaemia decreasing?

9. Which proteins are related to Alzheimer’s disease?

10. What is the role of MMS2 in cancer?

Group B

11. How does BARD1 regulate BRCA1 activity?

12. Which are the different types of Paget’s disease (where does it locate)?

13. How can Paget’s disease be diagnosed?

14. Is there a treatment/therapy for Paget’s disease?

15. How rare/prevalent is this disease?

16. Which sex and age groups are the most affected?

17. Which are the leading 3 countries doing research on Paget’s disease?

18. Is there anybody in Brazil doing research on Paget’s disease?

PubMed GoPubMed

1 I would like to use this system frequently 5.1 8

2 The system was easy to use 4.9 7.8

3 The system was too complex 5.3 4.3

4 The user interface was easy to understand 6.8 7.7

5 The system responded fast 7.3 6.3

6 The system provided enough help information and examples 3.5 6.8

7 Finding the answers to the tasks with the system was easy 3.8 7.6

8 Finding the answers to the tasks with the system was fast 3.8 7

9 A lot of information the system found was irrelevant to the tasks 7.5 4 10 Most of the information returned by the system was relevant 3.7 7 11 The amount of relevant information I found was less than expected 4.9 3.4 12 The amount of relevant information I found was same as expected 4.3 5 13 The amount of relevant information I found was more than expected 5 5.6

14 The modifications were mostly relevant to me – 6.8

Tab. 5.5: Post-questionnaire on GoPubMed vs. PubMed. The numbers for PubMed and GoPubMed are averages for agreement on a 1 to 10 scale (1 strongly disagree, 10 strongly agree).

Post-questionnaire

The participants were given a post-questionnaire after completing the tasks and were asked to fill it on a 1 to 10 scale (1 strongly disagree, 10 strongly agree). The results are shown in Table5.5.

The post-questionnaire included also the following questions, giving the participants the freedom to add more comments and suggestions:

15. Did you find the highlighting of ontology terms helpful?

16. Did you get an overview over your search results from the tree on the left? 17. Did you manage to navigate efficiently through the tree?

18. Did you find any papers you would probably have missed with PubMed? 19. What do you like/dislike about using the tree to explore your search results?

Most of the comments from the participants concerned the appearance of the browser, e.g. some could not find an obvious link to the original paper site (could not easily locate the PMID link in light grey). Others were asking for functionality such as information on how often the article has been cited or read. All of the participants were very positive towards GoPubMed and GoGene, but still had concerns, since 94% of them have been using PubMed and were used to its simple interface.

5.3.3 Conclusion

For GoPubMed a number of the hypotheses formulated at the start of the evaluation were confirmed, especially regarding ease of use. For the other two Semantic Web Browsers, COHSE-NeLI and CORESE- NeLI, most of the hypotheses were contradicted. Table5.6 shows how user feedback from each system agreed or disagreed with the hypotheses.

The evaluation study demonstrated that the evaluation framework is suitable for eliciting user per- ceptions of SWBs. The results have allowed us to answer our initial hypotheses fully for each SWB even

Hypothesis COHSE CORESE GoPubMed H1 The SWB reduces the time taken for

users to find information or perform tasks.

No Yes No

H2 The SWB shortens the pathway taken to find information or perform tasks.

No (targets not found)

No (targets found by few users)

PubMed data not available for comparison

H3 Where semantic links are available, users will always follow them instead of nonsemantic links.

No Yes No

H4 Users find the SWB easier to use than the control platform.

Yes and No No Yes

H5 Where semantic links and ranking are available, users prefer them to nonsemantic links and ranking.

Yes Yes Yes

H6 Use of the SWB is intuitive: a) Users think the SWB helps them to find information or complete tasks.

No No Yes

b) Users intuitively understand how to use the SWB to find such information or complete tasks.

No No Yes

Tab. 5.6: Confirmation or contradiction of original hypotheses.

though each SWB had a distinct implementation and used different aspects of the SW technology. A new evaluation framework for SWBs was designed and tested on 3 intervention Semantic Web Browsers, with participants recruited from the intervention systems’ real-world target audiences. The control plat- forms were live, real-world systems with substantial numbers of existing users. Using this evaluation framework, all of the initial hypotheses were successfully confirmed or contradicted (Table5.6).

Overall, the framework successfully elicited a range of feedback on 3 distinct Semantic Web technolo- gies. It was found that, although potentially easier to elicit feedback via online questionnaires, observing respondents in a workshop setting provides an excellent opportunity to gather both quantitative and qualitative data from larger numbers of users.

The evaluation showed that users tended to prefer the system (GoPubMed) that had the most mature interface, but were able to use the semantic features of all systems regardless of the interface or types of semantic links presented. The evaluation feedback will contribute directly to future versions of each Semantic Web Browser and there will be further analysis of the weblogs to determine the specific types of semantic links that were or were not used.

Chapter 6

Summary and Future Work

6.1 Open problem 1 revisited

Open problem 1: Word sense disambiguation (WSD) is required for the accurate analysis of text in many applications. Since 2004, the most active domain-specific application area for WSD seems to be bioinformatics (Liu et al., 2004; Schuemie et al., 2005; Edmonds and Agirre, 2006). Classical approaches to WSD use co-occurring words or terms. However, most treat on- tologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms.

We have addressed this problem in Chapter 3, where we used co-occurrences (Term Cooc, Sections 3.2, 3.3.1), document clustering (see Section3.2), the ontology structure (Inferred Cooc, Section3.3.1) and semantic similarity between terms (Closest Sense, Section3.3.1), as well as metadata like the year of publication, journal and abstract title (MetaData, Section3.3.1) in order to perform disambiguation of terms in abstracts of biomedical publications. We furthermore made available a corpus of 2600 documents divided into three datasets of varying quality and quantity that can be used as benchmarks for disambiguation.

The comparison of the methods shows that metadata and training data of high quality are key points for increasing the performance of disambiguation, with up to 96% accuracy (MetaData method, trained on high quality/low quantity dataset). However, the production of high quality training data is a tedious and time-consuming process. When such training data are not available, the co–occurrence of ontology/taxonomy terms can be used for disambiguation with high accuracy. The hierarchical structure of the ontology can also improve the accuracy, especially when the ontology is consistently modelled. In Section 3.3 we have showed that a ‘is a’ hierachy like the Gene Ontology gives higher disambiguation accuracy compared to a ‘narrower than’ hierarchy such as the Medical Subject Headings.

For disambiguation one has to balance between achieving high accuracy and producing training data of sufficient quality and quantity. The MetaData method gave the best results but it required high quality training data, which were hard to produce. The Term Cooc and Closest Sense methods gave lower accuracy than the MetaData. However, they are semi-automated, requiring no manual intervention for training.

Future work on disambiguation

Future work can include several aspects ranging from the use of negative co–occurrences, disambiguation in full-text articles, to a combination of the three methods (Term Cooc, MetaData, Closest Sense) and a decision based on a confidence score for each of the approaches.

While performing disambiguation with the Term Cooc and Closest Sense methods, all terms found in the abstract apart from the term in question were considered as true with respect to the ontology. However, they could as well be ambiguous terms and therefore insert an error into the disambiguation process. In the future, we want to take such ambiguous terms into account as well.

A possible extension could be to correctly identify if a sense occurs that is not included in the ontology and possibly add it. This can potentially be done by setting a threshold. In the Closest Sense approach, from all distances below that threshold, one would be clearly shortest. If not, then this would be the new sense. For the Term Cooc and MetaData methods this could be done by training each method on each sense and if the sense found would be below the threshold, this would indicate a new one.

Another interesting aspect can be the automatic identification of an ambiguous term. So far, the ambiguous terms tested were empirically identified. A more thorough and automated identification pipeline employing WordNet, noun phrase statistics and expert input could be set up.

It would also be interesting to see how the accuracy would change once the disambiguation would be performed in the full text of articles. Co–occurrences could also be computed based on the full-text instead of the abstracts of articles. The number of terms occurring in a document could also be considered in the disambiguation pipeline. We have noticed that in most of the cases where the ambiguous term had one of the false senses, it usually co–occurred with only a few other terms (or in a lot of cases it was the only term in the document).

WSD use cases

In Chapter 5 we demonstrated use cases of word sense disambiguation in ontology-based text-mining and described a user-centred evaluation framework developed to evaluate Semantic Web Browsers. As presented in Section 5.1, the GoPubMed infrastructure can be used with any ontology to search for specific scientific literature. An example of such a search was the mouse-anatomy-specific document retrieval presented in Section5.2, where genes, tissues, and developmental stages of the mouse embryo contained many ambiguities. We additionally described a user-centred evaluation framework developed to evaluate Semantic Web Browsers in Section 5.3, where we mainly focused on the user satisfaction about GoPubMed.

In document Seducir matematicamente? (página 111-113)