• No se han encontrado resultados

Content-based filtering, also called information filtering, is based on analysis and comparison of the content of items. Commonly, a user profile is constructed using the content of the items the user has rated and their respective ratings. The profile is then used to predict ratings of other items. Items that match well with their profiles are recommended to the users. In the sense that this approach mainly concerns the analysis and matching of content representations, it is similar to information retrieval. Belkin and Croft compared and contrasted information retrieval and information filtering (Belkin and Croft, 1992). According to them, while information retrieval is to meet short-tem information-seeking goals by retrieving items that are relevant to queries, information filtering focuses on removing items that are irrelevant to long-term user interests represented in their profiles.

Content-based filtering has its weaknesses. Since it requires items to be parsed, it only works well with text-based items or items with textual metadata assigned. Content- based techniques provide recommendations based on the degree of matching, which does not have much to do with qualitative factors. Therefore, as the number of items in a given topic or category grows, the effectiveness of the filtering could be diminished. In addition, there is little room for serendipitous finding of relevant items, because the system recommends only items that are similar to those already rated by the user (Shardanand & Maes, 1995; Balabanovic & Shoham, 1997; Claypool et al., 1999).

2.2.3.1.2 Collaborative filtering

The term collaborative filtering was coined by Goldberg et al. (1992), emphasizing the social aspects – sharing collective group knowledge - of this approach. The basic idea is that the system can leverage other people’s opinions to provide recommendations to users who have similar preferences. Typically, according to an often cited definition, “people provide recommendations as inputs, which the system then aggregates and directs to appropriate recipients” (Resnick & Varian, 1997, p.56). By relying on people’s judgments, either in the form of explicit ratings or implicitly drawn from user behaviors, this approach tries to overcome some of the above mentioned limitations of content-based filtering methods. From the algorithmic point of view, the emphasis shifts from computing item similarities to matching users with similar preferences. User preferences are usually expressed as item ratings (or equivalent measures) and users who have common items with similar ratings form ‘neighbors.’ Various matching algorithms have been proposed to identify a set of similar users based on correlation coefficients or other similarity measures (Terveen & Hill, 2001). A prediction of what items a user might like or dislike is made based on the ratings or the behaviors of their neighbors. The fundamental assumption is that people’s preferences on items are not random and there are persistent patterns in their choices. In other words, people like items similar to those they liked before, and thus people who made similar choices in the past would probably agree on new items (Shardanand & Maes, 1995). In fact, an empirical study using the GroupLens system, which first introduced the concept of ‘k-nearest’ neighbor group, supports the assumption (Konstan et al., 1997). The result showed that “correlation between ratings and predictions is dramatically higher for personalized predictions [based on nearest neighbors] than for all-user average ratings” (p.81), and that there were systematic

differences in user preferences even within a certain newsgroup where people with relatively close interests gathered (Konstan et al., 1997).

Since collaborative filtering is based on similarity of users, user modeling is the main issue. There are basically two ways to model users – explicitly soliciting ratings/opinions from users or implicitly deriving user preferences from behavioral/activity data. One of the earliest implementations of collaborative filtering, a system named Tapestry, is based on explicit opinions of people as they filter email messages (Goldberg et al, 1992). The GroupLens system, developed for filtering Usenet news articles, asks users to rate articles and uses their ratings to form ‘k-nearest’ neighbor groups based on Pearson’s r correlation coefficients. A predicted value for a new article for a user is then calculated with the weighted average of all the ratings of their k-nearest neighbors (Konstan et al., 1997). Explicit modeling methods, however, do not scale well because users are generally reluctant to provide ratings and it is hard to get a sufficient number of ratings for building accurate user profiles, especially in large and heterogeneous systems (Aggarwal et al., 1999). In response to the fact that users are not willing to put time and effort into rating items, researchers turned to the possibility of using other data sources as surrogates for ratings. For example, the time a user spent on reading a document could be an implicit indicator of their interest in that document. Past purchase history is a broadly used rating-surrogate in e- commerce implementations. For discovering patterns or trends from large data sources, various data mining and machine learning techniques have been introduced (Perugini et al., 2004). PHOAKS (People Helping One Another Know Stuff) is a well-known example of using data mining methods for implicit user modeling. PHOAKS examines Usenet postings to find uniform resource locators (URLs) within the messages. The inclusion of URLs is

interpreted as implicit affirmation of interest in or ‘endorsement’ of the Web sites (Terveen et al., 1997). Siteseer uses personal bookmark folders to model users. A user’s bookmark folders are compared with folders belonging to other users to compute the overlaps (set intersection) among the bookmarks and to find qualified recommenders in the context of each folder (Rucker and Polanco, 1997).

Even though a collaborative filtering approach achieves a certain success and constitutes the core technology for many recommender systems, there are well-recognized limitations (Sarwar et al.2000). The sparsity of ratings (or rating surrogates) always poses a challenge in making an accurate recommendation. In a typical ecommerce system, for example, both the number of items and the number of users are very large and the number of transactions is relatively small. It makes the user-item matrix sparse and, as a result, in a great number of cases the similarity/correlation between two users is zero or too small to be reliable. Many attempts have been made to alleviate the sparsity problem in research prototype systems, but developing a scalable technique to be able to deal with the inherently sparse data is continuing to be an issue (Huang et al., 2004). The so called ‘cold start’ problem is another common problem. It refers to the situation where the system has no data to make recommendations. When a new user first enters the system, it is not possible to make any reliable recommendation since there is no data on their preferences. Similarly when a new item is added, the system would not be able to recommend this item until a sufficient number of users rate it (Adomavicius et al., 2005; Schein et al., 2002).

2.2.3.1.3 Hybrid approach

features/attributes of items and transactions/interactions between items and users. Typically a content-based filtering method relies solely on feature/attribute data, while a pure collaborative filtering method makes recommendations based on transaction data without considering item features (Huang et al., 2002). Hybrid approaches combine content-based filtering and collaborative filtering in an effort to utilize both types of information and thus bring the advantages of the two approaches together. The Fab system (Balabanovic & Shoham, 1997) is a representative example. In this system, each user profile is built based on the content of the items they have rated. User similarities are then calculated based on the affinity of their profiles, which in effect are the similarities of the ‘content’ of the items associated with each user profile. An item is recommended to a user either when the content of the item is similar enough to the user’s profile or when the item is highly rated by similar users. By using content information, the system can produce better results especially in those situations where a collaborative filtering method is known to be ineffective, while being able to take advantage of collective group knowledge whenever possible. Specifically, when an item is not rated by many users or when a user does not have enough items in common with other users the system can still make recommendations for the item or the user in question based on content analysis. Many other attempts have been made to combine the two different filtering methods at different stages of the recommendation process with varying degrees of computational sophistication (Basu et al., 1998; Sarwar et al., 1998; Claypool et al. 1999;

Condliff et al., 1999; Popescul et al., 2001).

Documento similar