• No se han encontrado resultados

This section presents the motivation for having non-learning (statistical) IR approaches instead of Evolutionary and Machine Learning (EML) techniques for evolving TWS at the early stage of the establishing the IR system. Evolutionary computation approaches have been applied for evolving term weights or evolving a TWS such as in section 4.1. The relevance judgement is the set of queries for the test collection and their corresponding relevant documents from the collection. The objective functions of the learning IR approaches use relevance judgements to check the quality of the evolved solutions. However, as mentioned earlier, real and test IR test collections are partially judged as it is not feasible to have fully judged test collections at the beginning of any IR system (Qin et al., 2010). Consequently, EML techniques for evolving TWS are limited because the trained queries and their corresponding relevant documents do not cover the whole term space of the collection.

In the evolving TWS and term weights research, the EML techniques should be trained using queries and the corresponding relevant documents containing the whole term space (index terms) that exists in the collection (see section 4.1.1). Then, the IR system should be tested with a test dataset different to those used in the learning process (training dataset). To the best of our knowledge, it appears that works applying evolution- ary computation to IR systems use the same training set from the learning stage to then test the candidate solution that represents the documents. The index terms that do not ex- ist in relevant documents are also given random weights. Hence, these index terms cannot be judged by the fitness function because they do not exist in relevant documents nor the query set. Thus, the number of random weights created in the evolutionary learning pro- cess are not really applicable to measure the relevancy for any query. Moreover, evolving term-weighting function using EML is based on the collection or relevance judgement values that were used in the evolving process. Thus, the evolved term-weighting function can not be generalised as the optimal solution for all textual collection. The ideal solu- tion for this issue is the EML technique should be applied for every textual collection. However, there is a need for using a term-weighting function (TWS) before evolving it to gather the user relevance judgement values by establishing IR system. Thus, optimising

document representation regarding term-weights rather evolving TWS itself after estab- lishing IR system and gathering the relevance judgement values is the ideal solution for this issue. Although, most of the documents in the test collection have no relevance val- ues to optimise them using EML techniques at the beginning of IR system. This issue is not only for evolving TWS and term-weights but also for using probabilistic and lan- guage models to establish IR systems (see Section2.2.4). This is because probabilistic and language models are probabilistic models that rely on the relevance judgement dis- tribution between the relevant documents and their corresponding queries to propose its TWS. The problem with evolving TWS and term weights is likely to arise in any test collection created at the beginning of establishing IR system.

Table 3.3: Test Collections General Basic Characteristics.

ID Description No. of Docs. No. of Queries Cranfield Aeronautical engineering abstracts 1,400 225

Ohsumed Clinically-Oriented MEDLINE subset 348,566 105 NPL Electrical Engineering abstracts 11,429 93 CACM Computer Science ACM abstracts 3,200 52 CISI Information Science abstracts 1,460 76 Medline Biomedicine abstracts 1,033 30 FBIS Foreign Broadcast Information Service 130,471 172 LATIMES Los Angeles Times 131,896 230 FT Financial Times Limited 210,158 230

On the other hand, Table 3.3 lists the eight test collections (see Section 3.1) used in our analysis and that have also been used in most literature for evolving TWS and term weights (Cordon et al.,2003;Cummins,2008). Each test collection has three main components: a set of documents, a set of queries and the relevance judgement file. The creation of these collections and their relevance judgements has been done using different approaches including sampling (Cranfield paradigm), extracting from real IR system and pooling paradigm (Soboroff, 2007; Hersh et al., 1994). FBIS, FT and LATIMES were the most recent test collections used in evolving local and global term weights in IR (Cummins,2008). These test collections are the test collections existing in TREC Disk 4 and 5 with Robust relevance judgement values as discussed in Section3.1.2. A number of additional characteristics about the test collection was not taken into consideration for evolving TWS or evolving term weights. Table3.4 gives the values for such additional characteristics which are defined as follows.

collection.

NoDRis the number of duplicates occurrences of relevant documents between queries in the query set.

NoInDCis the total number of index terms that exist in the whole test collection.

NoInDris the number of unique index terms that exist in the relevant documents set.

NoInNR is the number of index terms that were not covered by relevance judgement and is given by the difference NoInD−NoInDr. This is the number of index terms that get a random weights in documents representations without testing them with the objective function.

Table 3.4: Characteristics of Test Collections to Consider When Evolving TWS. ID NoUR NoDR NoInDC NoInDr NoInNR

Cranfield 924 914 5,222 4,236 986 Ohsumed 4,660 177 227,616 22,760 204,856 NPL 1,735 348 7,697 3,536 4161 CACM 555 241 7,154 3,189 3,965 CISI 1,162 1,952 6,643 5,709 934 Medline 696 0 8,702 6,907 1,795 FBIS 4,506 42,873 177,065 41,272 135,793 LATIMES 4,683 497 211,909 56,255 155,654 FT 5,658 55,819 287,876 45,564 242,312

Table 3.4 shows characteristics for the collections that were created with a pooling technique, such as FBIS, FT and LATIMES collections and with Cranfield (Sampling) technique such as Cranfield and Ohsumed. From this table, the majority of index terms that exist in the test collections were not covered by the relevance judgements. Thus, the majority of index terms of the test collections did not exist in the queries nor their corresponding relevant/irrelevant documents. As discussed above, this is an issue for evolved TWS because the trained queries and their corresponding relevant documents do not cover the whole term space of the collection. Hence, it is an argument that having non-learning IR approaches instead of learning ones at the start of building IR system is vital. Then, the relevance feedback can be gathered and used for improving the system by partially learning model as described in Chapter6.

Documento similar