• No se han encontrado resultados

UNIVERSITY OF IBADAN LIBRARY

83

UNIVERSITY OF IBADAN LIBRARY

84 The invention of word processing system and time-sharing systems in the 1970s led to a lot of text in machine-readable form that started full text retrieval systems. As research progresses, there arose the probabilistic information retrieval model which involve measuring the frequency of words in relevant and irrelevant documents using term frequency measures to adjust the weight of words. The term weighting techniques improved the performance of IR systems over the simple word matching that was prevalent.

In the 1980s the use of online IR expanded with availability of full text instead of just abstracts and indexing and spread of outline retrieval into use by non-specialists. There was increasing interest in new retrieval methods such as sense disambiguation using machine-readable dictionaries and computational linguistics, and the statistical kind of retrieval.

In 1990s, things seemed to progress well, more text were available online with full text search algorithms for retrieval. The Internet put IR to test; everyone could access the Internet and provide information freely as well as classify their information.

Information retrieval received a boom especially in the USA where it was suggested that computer network could bring information close students, and digital libraries were developed, this is the origin of distributed information retrieval.

The fulfilment of Bush‟s prediction came in 2000s, a lot of books are available online, and some ordinary questions can be answered by referencing online materials instead of printed materials. Research focused on multimedia retrieval i.e. retrieval of images, sound and video which led to more serious image recognition and sound recognition research that have been more promising than computational linguistics. New and improved retrieval systems were developed to multimedia information retrieval.

In the year 2010s, the fulfilment of Bush‟s prediction is being exploited, a lot of conversion to machine-readable forms have been done but not complete. Multimedia information is available and easy to deal with. Research now focuses on improving the IR systems and learning new ways to use the new IR systems.

2.9.2 Overview of Information Retrieval (IR)

Information retrieval deals with the recovery of documents from a document collection, for a given user information need expressed in form of a query. Information retrieval starts when a user issues a query i.e. a formal statement of his information need; the IR

UNIVERSITY OF IBADAN LIBRARY

85 system evaluates the query with reference to information collection and provides the user with a set of data that answers the query (van Rijsbergen, 1997). According to Hiemstra (2000), information retrieval system is a software program that stores and manages information on documents; the system assists users in finding the needed information and does not explicitly return information or answer questions. However, in IR, query may match several objects with different degrees of relevance, the set of result obtained are therefore, ranked according to the degree of relevance to the query.

There is no perfect retrieval system that would retrieve only the relevant documents and no irrelevant documents; therefore measuring the degree of relevance of documents forms a vital part of IR. It is useless to have so much information that is not relevant; it is also not desirable to have unretrieved relevant information. The whole process of information retrieval is summarised by William (2007) as

”A full-text search engine takes a user’s query q, consisting of discrete terms {t1,...t|q|}; evaluates it against a document collection D, consisting of documents {d1, . . . . .,dN}; and answers it with a ranked list of documents {a1, . . . . ar}, ai , ordered in decreasing estimated relevance to the query q.”

 Query is the user‟s request to the computer in an attempt to communicate the information need, and it is made up of distinct terms.

 Document is an item of whatever units we have decided to build a retrieval system over. Documents are data objects, usually textual, though may contain other types of data such as images, sound, video or mixed-media records.

 Document Collection is a group of documents over which retrieval is performed.

 Information need is the topic about which the user desires to know more.

 Ranked list of documents are members of the document collection that are found to be relevant to the user‟s query when a specific ranking algorithm has been applied.

A document is relevant if it is one that the user perceives as containing information of value with respect to their personal information need in other words, satisfying his information need. Information retrieval systems are evaluated based on two criteria, effectiveness and efficiency. An effective system returns results containing more relevant documents and an efficient system incurs reduced costs while finding result documents. The cost of a search according to Craswell (2000) includes several factors

UNIVERSITY OF IBADAN LIBRARY

86 such as computation or storage resources expended at client or server, network resources such as bandwidth expended in their communication and monetary network usage or per-search charges.

Most computer based retrieval systems store only a representation of the document (or query), this representaion is neccessary because it speeds up the retrieval process.

Information retrieval systems must support three basic processes that are of major concern: (i) how to represent each document (indexing), (ii) how to represent the user‟s information needs in a form suitable for a computer to use (query formulation) and (iii) the matching or comparison of the two representations. The matching process usually results in a ranked list of documents with the most relevant documents towards the top of the list.