• No se han encontrado resultados

Prueba de funcionamiento con el gestor del sistema y prosumidores en equipos diferentes

A content-based method suggests recommendations based on the contents of a user’s previously top rated documents. Therefore, before it can start recommending, this method needs to learn something about documents that the user thinks are valuable. In our case this is the purpose of the system collecting at least three positively rated predetermined urls in the user-profiling stage (see Figure 6.1). To this end, the top rated document of the three is chosen as the initial basis for this method since this is the one the user currently likes best. Thus this method recommends web documents based on their similarity to the top rated predeterminedurl. In the recommending and browsing stage (see Figure 6.1), if subsequent Web pages with even higher ratings are

Chapter 6 User Evaluations of the Recommender System 108

uncovered, then these become the basis for recommending based on content. If there is more than one page with the same highest rating value, all of them are used7.

To be more precise, let P1, P2, · · ·, PNc be the Nc (Nc6 3) previously rated different Web pages with the same top rating valueRv. Then letPP be a potential Web document

to be recommended. Now, the internal quality of this potential Web page is computed by the similarity between theNc top rated pages and itself:

IN Qcon(PP) = 1 Nc X i Similarity(PP, Pi)∗Rv (6.2)

In Equation6.2, the subscript of the function is used to differentiate it from the other two methods introduced in sections6.3.3and6.3.4. The similarity measure will be formally discussed in the end of this subsection. Thus, the content-based method compares the source recommendation web pages to previously top rated pages and recommends those with high similarity values. To compute the similarity value, we extract fifteen keywords with the highest term frequency (TF) from each document8. Actually, the

more keywords extracted the more accurate they are able to stand for a document. However, extracting a large number of keywords induces much computation and affects the efficiency of recommending. In practice, therefore, we find that fifteen frequently occurring keywords are able to cover the meaning that a document delivers in most cases. Thus, a source web page is represented as a fifteen-dimensional term vector (for reasons of computational simplicity we do not use more keywords). The similarity measure of two web pages is then computed using the standard technique of considering the cosine between two vectors with a result value between 0 and 1.0, where 0 indicates not strongly related and 1.0 indicates very strongly related [Salton,1989]. These fifteen most frequently occurring words are then stored in a Content-Based Recommendation Table (see Table 6.1 for part of the actual table we use). Likewise, the predetermined

urls are prepared with the fifteen most frequently occurring terms and their TFs in a

7We seek to use a common one for each of the three typical kinds of recommendation method (i.e.

content-based, collaborative and demographic) described in Chapter2. However, we do not aim to refine each method to a perfect one because we are not aiming to build perfect information filtering methods and this is not our main concern in this work.

8To extract the most frequently occurring keywords from a web document, a lookup table is used

to filter out unimportant words that do not make sense in our context and need to be ignored (such as “a”, “the”, “in”, “that” and “and”). This look up table is constructed according to Middleton’s work [Middleton,2003]. Meanwhile, a stop-list technique also taken from Middleton’s work is used to match different words with the same meaning. For example, “negotiation”, “negotiations”, “negotiating” and “negotiated” are tokenized into “negotiat” and are all deemed the same word.

Chapter 6 User Evaluations of the Recommender System 109

Table 6.1: The Content-Based Recommendation Table

Table 6.2: The Predetermined URL Table

Predetermined URL Table in the similar style (see Table6.2for part of the actual table we use). As stated in section6.2, the contents of the Predetermined URL Table do not overlap those in the Content-base Recommendation Table, nor do they with the source recommendation tables presented in the subsequent two recommendation methods (see sections 6.3.3and6.3.4).

From the Content-Based Recommendation Table, we can see that each potential Web document is represented by a record of that table. Each record contains a vector of fifteen dimensions decided by the fifteen keywords and each dimension has a value of the keyword’s TF. With respect to a specific record, ki represents the ith most frequently occuring keyword in a document and wi represents the times it occurs (i.e. the TF). With this representation, we are going to formally discuss the similarity measure of two Web documents. Assuming Px and Py are two different documents and they each have fifteen keywords that are the same (k1, k2,· · · , k15). Thus, the two documents can be represented by two vectors in the same fifteen-dimensional Euclidean space: x = (w1, w2,· · · , w15) andy= (w10, w20,· · · , w150 ) wherewi andw0i are their keywordki’s TFs respectively. The inq of Px is defined as the cosine of the two term vectors of Px and

Chapter 6 User Evaluations of the Recommender System 110 Py [Salton,1989]: Similarity(Px, Py) = cos(x,y) = x×y |x| · |y| = P15 i=1(wi·w0i) qP15 i=1(wi)2· qP15 i=1(w0i)2 ,