3.2 Sprint Backlog
3.2.1 Primer Sprint
Ontology is a formal, explicit specification of a shared conceptualization. In simple words, it is a model of organized knowledge in a given domain (e.g. fisheries). Ontologies consist of components called “concepts, attributes, relations and instances”. Ontology is considerably different from taxonomy and thesaurus. Taxonomy is a hierarchical tree structure which models a domain from abstract to specific. On the other hand a thesaurus is a structured vocabulary that defines each term by three major types of relationships – hierarchical (as in a taxonomy), associative and equivalent. But ontology is the most formal model as it defines the meaning of concepts by modeling
constraints that restrict the number of possible interpretation. Therefore, these three schemes differ mainly in their degree of precision. However, a
comparison is given here in Table 18 to help you in understanding the features.
Retrieval of Information for OA Resources
Interoperability and
Retrieval Table 18: Comparison of Taxonomy-Thesaurus-Ontology
Features Taxonomy Thesaurus Ontology
Background Natural Sciences and Universe of Subjects Library and Information Science Metaphysics, AI and NLP, Knowledge modeling Modeling
standard None ISO-2788 (equivalent standards are BS 5723, BS 6723, Z 39.19) No official standard yet Notational
standard Graphical tree and Mixed-base notation
BT, NT, RT, UF,
USE etc. RDF schema and OWL Relationships Basically
hierarchical but all types of relationships are modelled Untyped hierarchical, associative and equivalence Typed hierarchical and associative
Properties None Scope Notes (SN )
device Domain and Range (in RDF schema) Application Classification,
Navigation, Search Classification, Navigation, Search Classification, Navigation, Search,
Visualization and Automated reasoning Popular tools
for creation Mind manager MultiTES Protégé
Thesauri are structured according to an international standard (ISO-2788), and, therefore, these schemes can be transferred to ontologies through the
application of ontology representation language (such as RDF schema). In Semantic Web environment, we need an element which can unequivocally describe the meaning of a concept or word for the software agent. This role is performed by ontologies. In practice, desired words/concepts/terms are marked by a tag that refers to the ontology. A software agent who comes across the tag can consult the ontology for meaning of the term. The Semantic Webextends the present form of Web by giving meaning and context to information bearing
135
Retrieval of Information for OA Resources
objects, allowing people and software agents to share and process data more competently. Ontology helps to boost the effectiveness and uniformity of describing resources i.e. they allow more sophisticated functionalities in IRR. The use of standards, such as the Resource Description Framework (RDF) and Web Ontology Language (OWL), provide structures and methods for
descriptions, definitions and relations within a given domain. In OA domain, some of the content retrieval systems support ontology-driven retrieval of knowledge objects. For example, sciencewise.info an experimental OA retrieval system (presently covers Physics, Life Sciences, Humanities and Information Technologies disciplines) provides ontology-driven search interface. A search query is automatically linked with available domain ontology and user allows navigating from one Node to another. It also gives users links to open contents (preprints/post prints).
Figure 39: Ontology-driven Retrieval in sciencewise.info
For example, a search on LHCb in sciencewise.info shows position of the query term in domain ontology (including it’s relationships with other
concepts) and provides link to available open access journal papers related to LHCb (Figure 39 & 40). This is also a participative retrieval architecture which allows user scope to define a new concept or to edit an existing concept in domain ontology.
Interoperability and
Retrieval
Figure 40: Linking of Query Term in Domain Ontology in sciencewise.info 3.5.4 Statistical and Other Tools
You already know in Unit 2 of this Module that usage data and statistics is considered as a value-added feature for any OA retrieval system. Many repository software are attempting to implement the statistics add-on by using usage data stored in retrieval engine. For example, the statistics add-on in the DSpace platform allows gathering, processing and presenting usage data, contents related data and administrative statistics by utilizing Apache Solr (text retrieval engine in use in DSpace version 4.0) underlying application layer for harvesting vast array of usage data. Some of the statistical datasets displayed by DSpace are – top ten countries and cities from where visits originate, total number of visits for community, collection and items, search history, work- flow related statistics, item download statistics etc (Figure 41).
137
Retrieval of Information for OA Resources
The other associated services that support OA retrieval system are Web 2.0 tools for achieving interactive, collaborative and participative architecture in content retrieval. These are use of RSS feeds, content rating, folksonomy, review submission, social networking tools etc. Shafi, Gul and Shah (2012) conducted a study in 2012 to measure the use of Web 2.0 tools and services in OA repositories listed in openDOAR (1977 to be exact). The finding of this research provides shows that the use of RSS is the most popular Web 2.0 application in OA retrieval (possibly the use of RSS as automatic alerting service for updated contents makes it very useful support tool in OA retrieval) and social bookmarking occupies the next position (again because of scholarly reasons). The other useful Web 2.0 tools are social networking tools (Twitter, Face book, and YouTube) and collaborative tools (like Blog, Flickr, and Podcasting). In a total of 1,412 accessible repositories (in 1977 total listed repositories), 57 percent (804 number of repositories) applied Web 2.0 tools and the remaining 43 percent (608 number of repositories) have not yet applied Web 2.0 tools. Again a country-wise distribution of the use of Web 2.0 tools in OA repositories shows that US based OA retrieval systems ranked first and UK and Germany occupied the next positions respectively. One interesting fact is that use of web 2.0 tools in Asian OA retrieval systems are increasing
(Taiwan – 83.33%, India – 60% and Japan – 41.56%) in comparison with European and American OA retrieval systems.
CHECK YOUR PROGRESS
Notes: a) Write your answers in the space given below.
b) Compare your answers with those given at the end of this unit.
7) Discuss the use of Controlled Vocabulary in OA retrieval systems.
……….……….………... ……….……….………... ……….……….………... ……….……….………... ……….……….………... 8) What is Ontology? Discuss how is it helping to improve retrieving OA
resources. ……….……….………... ……….……….………... ……….……….………... ……….……….………... ……….……….………...
Interoperability and
Retrieval
3.6 RETRIEVAL OF SPECIALIZED OPEN
CONTENTS
Michel Lesk (1995) in his seminal paper reported a comparison between development in the domain of Information Retrieval and seven ages of man as described by Shakespeare in As You Like It (Act2, Scene 7, lines 143-166). Lesk predicted many possible achievements of IR in the first decade of 21st
century. These are – i) Resource Description Framework (RDF) and XML supported Web-enabled IR; ii) Centralized/Federated search services through harvesting; iii) Influence of Semantic Web and Web 2.0 in Information Representation and Retrieval (IRR); iv) Matured multimedia IR systems with information mashup support; v) Integration of digital libraries with online learning environments; vi) Sophisticated multilingual IR with Unicode support; vii) Interactive and collaborative IRR; and viii) Application of Ontology in IRR. Many of these predictions are still in research bed but multimedia IR and multilingual IR are quite matured now. This section covers major aspects of these two retrieval systems.