Improving Collaborative Filtering Based Recommender Systems Using Pareto Dominance

Texto completo

(1)Ph.D. Thesis. Improving Collaborative Filtering Based Recommender Systems Using Pareto Dominance. Author: José Luis Sánchez Sánchez Director: Jesús Bobadilla Sancho.

(2) Index Abstract …………………………………………………………………………………. 1. Hypothesis and motivation ……………………………………………………… 1.1. Introduction ………………………………………………………………… 1.2 Motivation …………………………………………………………………... 2 3 3 6. 2. State of the art ……………..……………………………………………………. 9 2.1. Introduction ….……………………………………………………………. 9 2.1.1. Fundamentals ………………………………………………………. 11 2.1.2. Filtering approaches ………………………………………………… 14 2.1.3. Traditional issues with recommender systems ……………………… 19 2.1.4. Social recommender systems ……………………………………….. 21 2.1.5. Future trends ……………………………………………………….. 24 2.2. The k nearest neighbors recommendation algorithm ……………………… 28 2.3. Evaluation of recommender systems and frameworks .…………………… 30 2.3.1. Quality of predictions: MAE-accuracy and coverage ……………… 32 2.3.2. Quality of the set of recommendations: precision, recall and F1 …... 33 2.3.3. Quality of the list of recommendations: rank measures ……………. 34 2.3.4. Novelty and diversity ………………………………………………. 34 2.4. Similarity measures ………………………………………………………… 36 2.5. Cold-start …………………………………………………………………… 40 2.6. Location-aware recommender systems …………………………………….. 42 2.7. Recommending to groups of users …………………………………………. 46 2.8. Explaining collaborative filtering recommendations ………………………. 48 2.9. Social information ………………………………………………………….. 51 2.10. Content-based filtering ……………………………………………………. 56 2.11. Bio-inspired approaches …………………………………………………... 61 2.12. Summary …………………………………………………………………... 63 3. Formalization …………………………………………………………………… 65 3.1. The concept of dominance ………..………………………………………… 65 3.2. Introduction ………………………………………………………………… 67 3.3. Selecting the neighbor candidates (non-dominated users) …………………. 68 3.4. Finding the k-neighbors ……………………………………………………. 69 3.5. Items Recommendation …………………………………………………….. 71 3.6. Running example …………………………………………………………… 73 4. Experiments ……………………………………………………………………… 76 4.1. Quality measures …………………………………………………………… 76 4.2. Experiments proposed ……………………………………………………… 78 5. Results ………………………………………………………………………….. 79 6. Conclusions and future works …………………………………………………… 83 References ………………………………………………………………………….. 84 1.

(3) Improving Collaborative Filtering Based Recommender Systems Using Pareto Dominance Abstract Recommender systems are a type of solution to the information overload problem suffered by users of websites on which they can rate certain items. The Collaborative Filtering Recommender System is considered to be the most successful approach as it make its recommendations based on votes of users similar to an active user. Nevertheless, the traditional collaborative filtering method selects insufficiently representative users as neighbors of each active user. This means that the recommendations made a posteriori are not precise enough. The method proposed in this thesis performs a pre-filtering process, by using Pareto dominance, which eliminates the less representative users from the k-neighbor selection process and keeps the most promising ones. The results from the experiments performed on Movielens and Netflix show a significant improvement in all the quality measures studied on applying the proposed method.. 2.

(4) 1. Hypothesis and motivation The hypothesis of the thesis claims that, in the recommender systems context, it is possible to improve the k nearest neighbors collaborative filtering algorithm results. This thesis suggest performing a pre-filtering process, by using Pareto dominance, eliminating the less representative users from the k-neighbor selection process and keeping the most promising ones.. 1.1. Introduction At present, Recommender Systems (RS), are broadly used to implement Web 2.0 services based on Collaborative Filtering (CF). CF RS make predictions about the preferences of each user based on the preferences of a set of “similar” users. This way, a trip to Canary Islands could be recommended to an individual who has rated different destinations in the Caribbean very highly, based on the positive ratings about the holiday destination of “Canary Islands” of an important number of individuals who also rated destinations in the Caribbean very highly. There are a large number of applications based on RS (Jinghua 2007, Baraglia 2004, and Fesenmaier 2002), some of which are centered on the movie recommendation area (Konstan 2004, Antonopoulus 2006, Li 2005). The quality of the results offered by a RS greatly depends on the quality of the results provided by its CF phase; i.e. it is essential to be capable of adequately selecting the group of users most similar to a given individual. The following types of filtering are usually used by RS:. 3.

(5) •. Content-based filtering: The recommendations are based on the users’ past choices (i.e. recommendation of a new programming book to a user who purchased various books on this subject the previous year).. •. Demographic filtering: The recommendations are based on the information provided by users considered similar according to demographic parameters, such as age, gender, nationality, etc.. •. Collaborative filtering: The recommendations for each user (active user) are obtained in line with the preference of other users who have rated the products (items) in a similar way to the active user.. •. Hybrid filtering: The recommendations are made by combining the previous filtering; in particular, Content-based filtering / CF and Demographic filtering / CF are used.. Among the three types of basic filtering (Content-based filtering, Demographic filtering, CF), CF is the one that usually provides the best results. CF calculates the recommendations based on the information of the votes that all users have cast as regards their preferences on the items (i.e. in a film RS, the total preferences made by each user on each of the films they have voted for). When the CF is solely based on the information stored in the array of votes it is called memory-based CF. A variety of CF exists which obtains information from additional sources to the array of votes, such as the social relations between users or the contents of posts in blogs; in these cases (memory-based+additional information) the additional information is used to improve the quality of the recommendations, but its use is only applicable to the subset of RS where that type of additional information exists.. 4.

(6) All scientific progress in the area of memory-based CF has the virtue of being applicable in all types of CF-based RS (pure CF, Hybrid filtering, memorybased+additional information. The main objective of CF is to provide a user, who we will call the active user, with a series of suggestions about items which may be of interest to him. Its operation is based on a very simple idea [Adomavicius and Tuzhilin 2005] if two users have similar preferences it is highly probable that one of them will like items they are not aware of to the same extent as the other user. Therefore, the purpose of CF is to search for those users who are most similar to the active user and then analyze their votes to see which items may interest the active user. This process can be summarized in the following three steps [Al-Shamri and Bharadwaj 2008; Antonopoulus and Salter 2006; BarragansMartinez et al. 2010]: 1. To find the k users most similar to the active user (k-neighbors of the active user). This phase is the one which greatly determines the quality of the recommendations made. The method proposed in the thesis provides a novel approach to obtain a suitable set of similar neighbors to each active user. 2. To predict the vote that the active user would cast on items they have not yet rated, by observing the votes of their k-neighbors. When trying to predict an item’s value you must be aware that there will normally be a significant number of neighbors who have not voted for the item, and therefore, mechanisms must be defined that enable the k-neighbors’ votes to be combined satisfactorily. 3. To find the most suitable N items to be recommended (due to their high rating, novelty, etc.).. 5.

(7) 1.2. Motivation Traditional CF research approaches are based on directly improving CF metrics and similarity measures. Currently there is a lack of research trying to create a preprocessing method that helps metrics to work better using preprocessed results. Preprocessing frameworks present the advantage of being potentially useful when applied to any existing similarity measure. This thesis proposes a Pareto Dominance based preprocessing method suitable to be used before any CF designed similarity measure. One of the main problems faced by CF is the high degree of sparsity [Papgelis et al. 2005] in the vote databases due to the fact that users generally only vote for a small percentage of the items available in the system. This means that when we want to calculate the similarity between each pair of users we must do so by only considering the items that both users have rated in common. Traditional similarity metrics [Adomavicius and Tuzhilin 2005; Campos et al. 2010] which include Pearson Correlation, Cosine and Mean Squared Difference (MSD), have been taken from statistics and are not suitable in the field of RS, where there is great sparsity in the data and a very small set of permitted values in the votes. Traditional metrics display a marked tendency to show high similarity between users based on the similarity of their votes on a very small set of items; by way of example, these metrics could provide maximum similarity to two users who have voted for hundreds of items, but have only voted for three of them in common. Using the k nearest neighbors algorithm, it is usual to find active users with a significative number of inadequate neighbors (neighbors who have little information in common with their active users). Our hypothesis is that it is possible to improve the quality of the recommendations of a CF RS if we use the Pareto dominance concept, 6.

(8) which eliminates the less representative users from the k-neighbors selection process and keeps the most promising ones. Figure 1 shows the characteristics of the k-neighbors (k = 150) in the Movielens database and the positive impact of discarding neighbors with a small number of items in common. In this experiment, the following similarity metrics have been used: Pearson correlation (COR), Cosine (COS) and Mean Squared Difference (MSD).. Figure 1. Characteristics of the 150-neighbors in Movielens for different similarity metrics: Pearson correlation (COR), Cosine (COS) and Mean Squared Difference (MSD). A) Percentage of the items rated by the active user that were used in the similarity metric calculation using traditional CF. B) Tendency of the MAE when the percentage of common items between each active user and their neighbors is increased. C) Tendency of the coverage when the percentage of common items between each active user and their neighbors is increased.. Graph 1A shows the percentage of items rated by the active user which have intervened in the calculation of the similarity metric (with each of the neighbors) using the traditional CF. As we can see, the traditional CF generally calculates the active user neighbors using a very low percentage of the active user ratings. Graph 1B displays the 7.

(9) decreasing tendency of the MAE (improvement) when the neighbors are calculated with a higher percentage of items in common with each active user. Graph 12C displays the increasing tendency of the coverage when the neighbors are calculated with a higher percentage of items in common with each active user (the low values of the coverage is because is the coverage of a single neighbor). These experiments were repeated with the Netflix database and very similar results were obtained. The proposed method treat to solve this problem using a novel approach based on Pareto dominance to exclude unpromising users of the k-neighbors selection phase.. 8.

(10) 2. State of the art 2.1. Introduction An initial study was performed to determine the most representative topics and terms in the recommender systems (RS) and collaborative filtering (CF) fields. First, 300 RS papers were selected from journals, with a higher priority for current and for often-cited articles. The most significant terms were then extracted from the 300 papers. We gave the most emphasis to keywords, less emphasis to titles and, finally, the least emphasis to abstracts. Common words, such as articles, prepositions and general-use words were eliminated, and from the remaining pool, we selected 300 terms represented in the RS field. From a matrix of articles x words, wherein we stored the importance of each word from each article, we generated a tree of relationships between the words. Figure 1 shows the most significant section of the graph (due to space constraints, the entire tree is not shown, but it is provided as additional material in Fig1AdditionalData.png). The short distances between words indicate the highest similarities; warm colors indicate a greater reliability for the relationships. The size of the nodes indicates the importance of the words as a function of the parameters Nk, Nt, Na (number of significant words in the keywords, title and abstract) and Nwk, Nwt, Nwa (number of times that the word w appears in the keywords, title and abstract). The equation used to determine the importance of each word w is as follows:. 9.

(11) Fig. 2. Words represented in the recommender systems research field. Short distances indicate higher similarities, and a warm color indicates greater reliability. The size of the nodes is proportional to the importance of the words.. We used the information in Figure 2 to identify the most relevant aspects of RS, which are represented by the most significant words in the graph and the related terms. The articles referenced herein were chosen based on the following criteria: a) the transcendence of the subject according to the importance of the words in Figure 2; b) its historical contribution (a significant fraction of the classic reference articles are included); c) the number of times the article is cited; d) articles published in journals with an impact factor were preferred over conferences and workshops; and e) recent articles were preferred over articles published many years ago. Figure 3 shows a temporal distribution for the referenced papers. 10.

(12) Fig. 3. Temporal distribution for the referenced papers.. 2.1.1. Fundamentals CF provides personalized recommendations to users based on a set of information collected in the RS. The underlying idea is that proper recommendations can be generated for each user according to the preferences of other users similar to him. To determine similarities between users, the following information can be used: the users’ explicit ratings; implicit data, such as the number of times a video is viewed; social information (e.g., friends, followers, followed and trusted); demographic information, such as age or nationality and geographic information. Recently, RS implementation in the Internet has increased, which has facilitated its use in diverse areas [Park et al. 2012]. The most common research papers are focused on movie recommendation studies [Carrer-Neto et al. 2012; Winoto and Tang 2010]; however, a great volume of literature for RS is centered on different topics, such as music [Lee et al. 2010; Nanopoulos et al. 2010; Tan et al. 2011], television [Yu et al. 2006; Barragans-Martínez et al. 2010], books [Núñez-Valdéz et al. 2012; GonzálezCrespo et al. 2011], documents [Serrano-Guerrero et al. 2011; Porcel et al. 2009; Porcel and Herrera-Viedma 2010; Porcel et al. 2012], e-learning [Zaiane 2002; Bobadilla et al. 11.

(13) 2009], e-commerce [Huang et al. 2007; Castro-Sanchez et al. 2011], applications in markets [Costa-Montenegro et al. 2012] and web search [McNally et al. 2011], among others. RS collect information on the preferences of its users for a set of items (e.g., movies, songs, books, jokes, gadgets, applications, websites, travel destinations and e-learning material). The information can be acquired explicitly (typically by collecting users’ ratings) or implicitly [Lee et al. 2010; Choi et al. 2012; Nuñez-Valdez et al. 2012] (typically by monitoring users’ behavior, such as songs heard, applications downloaded, web sites visited and books read). Research in RS requires using a representative set of public databases to facilitate investigations on the techniques, methods and algorithms developed by researchers in the field. Through these databases, the scientific community can replicate experiments to validate and improve their techniques. Table 1 lists the current public databases referenced most often in the literature. Last.Fm and Delicious incorporate implicit ratings and social information; their data were generated from the versions released in the HetRec 2011 data sets, hosted by the GroupLens research Group. Without social information. ratings users items range tags Tags assignment Friends relations items. MovieLens 1M 1 million. MovieLens 10M 10 million. 6040 3592 {1,…,5} N/A N/A N/A movies. Netflix. Jester. EachMovie. 4.1 million 73421 100 -10,,10 N/A N/A. 2.8 million. 71567 10681 {1,…,5} N/A N/A. 100 million 480189 17770 {1,…,5} N/A N/A. N/A. N/A. N/A. movies. movies. jokes. With social information (hosted by the GroupLens) ML Last.Fm Delicious 855598. 92834. 104833. 72916 1628 [0,1] N/A N/A. Bookcrossing 1.1 million 278858 271379 {1,…,10} N/A N/A. 2113 10153 {1,…,5} 13222 47957. 1892 17632 implicit 11946 186479. 1867 69226 implicit 53388 437593. N/A. N/A. N/A. 25434. 15328. movies. books. movies. music. URL’s. Table 1. Most often used memory-based recommender systems public databases. 12.

(14) The process for generating an RS recommendation is based on a combination of the following considerations: • The type of data available in its database (e.g., votes, user registration information, features and content for items that can be ranked, social relationships among users and location-aware information). • The filtering algorithm used (e.g., demographic, content-based, collaborative, socialbased, context-aware and hybrid). • The model chosen (e.g., based on direct use of data: “memory-based,” or a model generated using such data: “model-based”). • The employed techniques are also considered: probabilistic approaches, Bayesian networks, nearest neighbors algorithm; bio-inspired algorithms such as neural networks and genetic algorithms; fuzzy models, singular value decomposition techniques to reduce sparsity levels, etc. • Sparsity level of the database and the desired scalability. • Performance of the system (time and memory consuming). • The objective sought is considered (e.g., predictions and top N recommendations) as well as • The desired quality of the results (e.g., novelty, coverage and precision). The graph in Figure 4 shows the most significant traditional methods, techniques and algorithms for the recommendation process as well as their relationships and groupings. The remainder of this section briefly introduces the meaning and use of the elements in Figure 4 as well as other methods, concepts and types of information currently used in RS. Finally, an updated figure is provided with new trends and current information sources for modern RS. Different sections of this thesis provide more detail on the most important aspects involved in the recommendation process. 13.

(15) Fig. 4. Traditional models of recommendations and their relationships.. 2.1.2. Filtering approaches The internal functions for RS are characterized by the filtering algorithm. The most widely used classification divides the filtering algorithms into [Adomavicius and Tuzhilin 2005; Candillier et al. 2007; Schafer et al. 2007]: a) collaborative filtering, b) demographic filtering, c) content-based filtering and d) hybrid filtering. Content-based filtering [Lang 1995; Antonopoulus and Salter 2006; Meteren and Someren 2000] makes recommendations based on user choices made in the past (e.g. in a web-based e-commerce RS, if the user purchased some fiction films in the past, the 14.

(16) RS will probably recommend a recent fiction film that he has not yet purchased on this website). Content-based filtering also generates recommendations using the content from objects intended for recommendation; therefore, certain content can be analyzed, such as text, images and sound. From this analysis, a similarity can be established between objects as the basis for recommending items similar to items that a user has bought, visited, heard, viewed and ranked positively. Content-based filtering has recently become more important due to the surge in social networks. RS show a clear trend to allow users to introduce content [Arazy et al. 2009; Perugini et al. 2004], such as comments, critiques, ratings, opinions and labels as well as to establish social relationship links (e.g., followed, followers, like user and dislike user). This additional information increases the accuracy of predictions and recommendations, which has generated a variety of research articles: Kim et al. [2011], Zheng and Li [2011] and Carrer-Neto et al. [2012]. Two challenging problems for content-based filtering are limited content analysis and overspecialization [Adomavicius and Tuzhilin 2005]. The first problem arises from the difficulty in extracting reliable automated information from various content (e.g., images, video, audio and text), which can greatly reduce the quality of recommendations. The second problem (overspecialization) refers to the phenomenon in which users only receive recommendations for items that are very similar to items they liked or preferred; therefore, the users are not receiving recommendations for items that they might like but are unknown (e.g., when a user only receives recommendations about fiction films). Recommendations can be evaluated for novelty [Bobadilla et al. 2011a; Hurley and Zhang 2011].. 15.

(17) Demographic filtering [Pazzani 1999; Krulwich 1997; Porcel et al. 2012] is justified on the principle that individuals with certain common personal attributes (sex, age, country, etc.) will also have common preferences. Collaborative Filtering [Adomavicius and Tuzhilin 2005; Herlocker et al. 2004; Herlocker et al. 1999; Candillier et al. 2007; Su and Khoshgoftaar 2009] allows users to give ratings about a set of elements (e.g. videos, songs, films, etc. in a CF based website) in such a way that when enough information is stored on the system, we can make recommendations to each user based on information provided by those users we consider to have the most in common with them. CF is an interesting open research field [Xie et al. 2007; Bobadilla et al. 2012a; Bobadilla et al. 2011a]. As noted earlier, user ratings can also be implicitly acquired (e.g., number of times a song is heard, information consulted and access to a resource). The most widely used algorithm for collaborative filtering is the k Nearest Neighbors (kNN) [Adomavicius and Tuzhilin 2005; Schafer et al. 2007; Bobadilla et al. 2011a]. In the user to user version, kNN executes the following three tasks to generate recommendations for an active user: 1) determine k users neighbors (neighborhood) for the active user a; 2) implement an aggregation approach with the ratings for the neighborhood in items not rated by a; and 3) extract the predictions from in step 2 then select the top N recommendations. In the item to item version [Sarwar et al. 2001; Gao et al. 2011] of the kNN algorithm, the following three tasks are executed: 1) determine q items neighbors for each item in the database; 2) for each item i not ranked by the active user a, calculate its prediction pa,i based on the ratings of a from the q neighbors of i; and 3) select the top N recommendations for the active user (typically the N major predictions from a). Step 1). 16.

(18) can be executed periodically, which facilitates an accelerated recommendation with regard to the user to user version. This action involves that the recommendations calculations are made with outdated ratings. However, the ratings of the items do not change much in short periods of time, so this does not cause a significant impact on accuracy. The item to item and user to user versions of the kNN algorithm can be combined [Qin et al. 2011] to take advantage of the positive aspects from each approach. These approaches are typically fused by processing the similarity between objects. CF based on the kNN algorithm is conceptually simple, with a straightforward implementation;. it. also. generally. produces. good-quality. predictions. and. recommendations. However, due to the high level of sparsity [Luo et al. 2012; Bobadilla and Serradilla 2009] in RS databases, similarity measures often encounter processing problems (typically from insufficient mutual votes for a comparison of users and items) and cold start situations (users and items with low number of rankings) [Schein et al. 2002; Heung-Nam et al. 2011; Bobadilla et al. 2012c; Leung et al. 2008]. Another major problem for the kNN algorithm is its low scalability [Luo et al. 2012]. As the databases (such as Netflix) increase in size (hundreds of thousands of users, tens of thousands of items, and hundreds of millions of rankings), the process for generating a neighborhood for an active user becomes too slow; The similarity measuremust be processed as often as the users are registered in the database. The item to item version of the kNN algorithm significantly reduces the scalability problem [Sarwar et al. 2001]. To this end, neighbors are calculated for each item; their top n similarity values are stored, and for a period of time, predictions and recommendations are generated using the stored information. Although the stored information does not include the ratings. 17.

(19) from previous processing/storage, outdated information for items is less sensitive than for the users. A recurrent theme in CF research is generating metrics to calculate with accuracy and precision the existing similarity for the users (or items). Traditionally, a series of statistical metrics have been used [Adomavicius and Tuzhilin 2005; Candillieret al. 2007], such as the Pearson correlation, cosine, constraint Pearson correlation and mean squared differences. Recently, metrics have been designed to fit the constraints and peculiarities of RS [Bobadilla et al. 2010; Bobadilla et al. 2012b]. The relevance (significance) concept was introduced to afford more importance to more relevant users and items [Bobadilla et al. 2012a; Wang et al. 2008]. Additionally, a group of metrics was specifically designed to adequately function in cold-start situations [Ahn 2008; Bobadilla et al. 2012c]. Hybrid filtering [Burke 2002; Porcel et al. 2012] commonly uses a combination of CF with demographic filtering [Vozalis and Margaritis 2007] or CF with content-based filtering [Barragans-Martinez et al. 2010; Choi et al. 2012] to exploit merits of each one of these techniques. Hybrid filtering is usually based on bioinspired or probabilistic methods such as genetic algorithms [Gao and Li 2008; Ho et al. 2007], fuzzy genetic [Al-Shamri and Bharadwaj 2008], neural Networks [Lee and Woo 2002; Christakou and Stafylopatis 2005; Ren et al. 2008], Bayesian Networks [Campos et al. 2010], clustering [Shinde and Kulkami 2012] and latent features [Saranya and Atsuhiro 2009], among others.. 18.

(20) 2.1.3. Traditional issues with recommender systems A widely accepted taxonomy divides recommendation methods into memory-based and model-based method categories. Memory-based methods can be defined as methods that a) act only on the matrix of user ratings for items, b) use any rating generated before the referral process (i.e., its results are always updated). Memory-based methods [Adomavicius and Tuzhilin 2005; Candillier et al. 2007; Kong et al. 2005; Symeonidis et al., 2009] act directly on the ratio matrix that contains the ratings of all users who have expressed their preferences on the collaborative service. Memory-based methods usually use similarity metrics to obtain the distance between two users, or two items, based on each of their ratios. To reduce the problems from high levels of sparsity in RS databases, certain studies have used dimensionality reduction techniques [Sarwar et al. 2000b]. The reduction methods are based on Matrix Factorization [Koren et al. 2009; Luo et al. 2012]. Matrix factorization is especially adequate for processing large RS databases and providing scalable approaches [Takács et al. 2009]. The model-based technique Latent Semantic Index (LSI) and the reduction method Singular Value Decomposition (SVD) are typically combined [Vozalis and Margaritis 2007; Zhang et al. 2005; Cacheda et al. 2011]. SVD methods provide good prediction results but are computationally very expensive; they can only be deployed in static off-line settings where the known preference information does not change with time. RS can use clustering techniques to improve the prediction quality and reduce the coldstart problem when applied to hybrid filtering. It is typical to form clusters of items in hybrid RS [Shinde and Kulkami 2012; Yao and Zhang 2009]. A different common 19.

(21) approach uses clustering both for items and users (bi-clustering) [Zhu and Gong 2009; George and Meregu 2005]. RS comprising social information have been clustered to improve the following areas: tagging [Shepitsen et al. 2008], explicit social links [Pham et al. 2011] and explicit trust information [Pitsilis et al. 2011; DuBois et al. 2009]. Model-based methods [Adomavicius and Tuzhilin 2005; Su and Khoshgoftaar 2009] use RS information to create a model that generates the recommendations. Herein, we consider a method model-based if new information from any user outdates the model. Among the most widely used models we have Bayesian classifiers [Cho et al. 2007], neural networks [Ingoo et al. 2003], fuzzy systems [Yager 2003], genetic algorithms [Gao and Li 2008; Ho et al. 2007], latent features [Zhong and Li 2010] and matrix factorization [Luo et al. 2012], among others. Research in the RS field requires quality measures (evaluation metrics) [Gunawardana and Shani 2009] to know the quality of the techniques, methods, and algorithms for prediction and recommendations. Evaluation metrics [Herlocker et al. 2004; Hernández and Gaudioso 2008] and evaluation frameworks [Herlocker et al. 1999; Bobadilla et al. 2011a] facilitate comparisons of several solutions for the same problem and selection from different promising lines of research that generate better results. Evaluation metrics [Antunes et al. 2012] can be classified as [Herlocker et al. 2004; Hernández and Gaudioso 2008] a) prediction metrics: such as the accuracy ones: Mean Absolute Error (MAE), Root of Mean Square Error (RMSE), Normalized Mean Average Error (NMAE); and the coverage b) set recommendation metrics: such as Precision, Recall and Receiver Operating Characteristic (ROC) [Schein et al. 2002] c) rank recommendation metrics: such as the half-life [Breese et al. 1998] and the discounted cumulative gain [Baltrunas et al. 2010] and d) diversity metrics: such as the diversity. 20.

(22) and the novelty of the recommended items [Hurley and Zhang 2011]. The validation process is performed by employing the most common cross validation techniques (random sub-sampling and k-fold cross validation) [Bengio and Grandvalet 2004]; for cold-start situations, due to the limited number of users (or items) involved, the usual method chosen to carry out the experiments is leave-one-out cross validation [Bobadilla et al. 2012c]. Commercial RS compete in the market by offering the best content and quality in recommendations as well as greatest variety of services. Recommendations to user groups [Jameson and Smyth 2007] facilitate joint recommendations to user groups (e.g., a group of four friends who wish to choose a movie). For CF, four design approaches offer an opportunity for action: 1) acting into the similarity measures stage, 2) acquiring neighbors [Bobadilla et al. 2012d], 3) acquiring predictions [Christensen and Schiaffino 2011], and 4) generating recommendations [Baltrunas et al. 2010]. Research results indicate that the quality of the recommendations does not vary greatly between the different approaches, but the execution time is dramatically reduced as we advance when it is used (when the design of a similarity measure for groups is the most efficient solution).. 2.1.4. Social recommender systems The drastic increase of websites 2.0 has initiated an increasing tendency to include social information in RS (e.g., friends, followed and followers). This additional information is used by researchers with three primary objectives: a) to improve the quality of predictions and recommendations [Carrer-Neto et al. 2012; Arazy et al. 2009], b) propose or generate new RS [Li et al. 2012; Siersdorfer and Sergei 2009], and. 21.

(23) c) elucidate the most significant relationships between social information and collaborative processes [Hossain and Fazio 2009; Perugini et al. 2004]. Trust and reputation is an important area of research in RS [O’Donovan and Smyth 2005]; this area is closely related to the social information currently included in RS [Jøsang et al. 2007]. The most common approachs to generating trust and reputation measurements are the following: a) user trust: to calculate the credibility of users through explicit information of the rest of users [Yuan et al. 2010; Li and Kao 2009] or to calculate the credibility of users through implicit information obtained in a social network [Cho et al. 2007; Massa and Avesani 2004], b) item trust: to calculate the reputation of items through a feedback of users [Jøsang et al. 2007] or to calculate the reputation of items studying how users work with these items [Cho et al. 2009; Kitisin and Neuman 2009]. In the social RS field, users can introduce labels associated with items. The set of triples <user, item, tag> form information spaces referred to as folksonomies. Fundamentally, folksonomies are used in the following two ways: 1) to create tag recommendation systems (RS based only on tags) [Marinho and Schmidt-Thieme 2008], and 2) to enrich the recommendation processes using tags [Gedikli and Jannach 2010]. For the RS-generated recommendations to be valuable for users, they must be explained well in a simple, compelling and accurate manner. The recommendation explanation field has been investigated with new developments in RS [Herlocker et al. 2000] until now [Papadimitriou et al. 2012]. Traditionally, the explanation type is divided into the following categories: a) human style (user to user approach), b) item style (item to item approach), c) feature style (items features), and d) hybrid. It also employs the use of. 22.

(24) conversational techniques [McSherry 2005] and incorporates geo-social information [Yang et al. 2008]. Context-aware recommender systems [Adomavicius and Tuzhilin 2011; Abbar et al. 2009], focus on additional contextual information, such as time, location, wireless sensor networks [Gavalas and Kenteris 2011], etc. The contextual information can be obtained explicitly, implicitly, using data mining or with a mixture of these methods (hybrid). Currently, mobile applications increasingly use geographic information; this information enables geographic RS that can be considered as location-aware RS. For geographic RS [Oku et al. 2010; Matyas and Schlieder 2009], recommendations are typically generated by considering the geographical position of the user that receives the recommendation. Privacy is an important issue for RS [Bilge and Polat 2012] because the systems contain information on large numbers of registered users. For privacy preservation in RS, a certain level of uncertainty must be introduced into the predictions [McSherry and Mironov 2009], primarily through tradeoffs between accuracy and privacy [Machanavajjhala et al. 2011]. Furthermore, privacy can be preserved when different RS companies share information (combining their data) [Kaleli and Polat 2012; Zhan 2010]. Privacy becomes more important as RS increasingly incorporate social information. Because RS are often used in electronic commerce, unscrupulous producers may find profitable to shill recommender systems by lying to the systems in order to have their products recommended more often than those of their competitors. RS can experience shilling attacks [Lam and Riedl 2004; Chirita et al. 2005], which generate many 23.

(25) positive ratings for a product, while products from competitors receive negative ratings. RS are still highly vulnerable to such attacks [Ray and Mahanti 2009].. 2.1.5. Future trends From the evolution of existing RS and research papers in the field, there is a clear tendency to collect and integrate more and different types of data. This trend is parallel to the evolution of the web, which we can define through the following three primary stages: 1) at the genesis of the web, RS used only the explicit ratings from users as well as their demographic information and content-based information included by the RS owners. 2) for the web 2.0, in addition to the above information, RS collect and use social information, such as friends, followers, followed, both trusted and untrusted. Simultaneously, users aid in the collaborative inclusion of such information: blogs, tags, comments, photos and videos. 3) for the web 3.0, context-aware information from a variety of devices and sensors will be incorporated with the above information. Currently, geographic information is included, and the expected trend is gradual incorporation of information, such as radio frequency identification (RFID) data, surveillance data, on-line health parameters and food and shopping habits, as well as teleoperation and telepresence. Additionally, there is a clear trend towards collection of implicit information instead of a traditional explicit evaluation of items by ratings. Last.Fm is a good example of this situation; the users ratings are inferred by the number of times they have heard each song. The same can be applied in a number of everyday situations, such as for access to web addresses, use of various public transport systems, food purchased, access to sports facilities and access to learning resources.. 24.

(26) Incorporation of implicit information on the daily habits of users allows RS to use a variety of data; these data will be used in future CF processes, which are increasingly useful and accurate. Privacy and security considerations will be increasingly important with the widespread trend in using, with consent, devices and sensors for the Internet of things. Gradual incorporation of different types of information (e.g., explicit ratings, social relations, user contents, locations, and use trends) has forced RS to use CF hybrid approaches. Once the memory-based, social and location-aware methods and algorithms are consolidated, the evolution of RS demonstrates a clear trend toward combining existing collaborative methods. The latest research in the CF field has generated only modest improvements for predictions and recommendations from a single type of information (e.g., when the only information used is user ratings, information from social relations, or item content). The results improve further when several CF algorithms are combined with their respective data types. A growing number of publications address hybrid CF approaches that use current databases to simultaneously incorporate memory-based, social and contentbased information. To unify the above concepts, Figure 5 provides an original taxonomy for RS. The taxonomy is classified depending on the nature of the data rather than according to the methods and algorithms used. The core of the taxonomy focuses on data classification by three factors: 1) the subject of the data: user or item; 2) mode of production: explicit (i.e., ratings from users for items) or implicit (e.g., number of times a user has heard a song); and 3) information level: memory, content or social context.. 25.

(27) Figure 5 shows the recommender methods and algorithms grouped by the level of information in the RS database. Depending on the information type in each RS database, it adopts a hybrid filtering approach. Each hybrid approach will use an appropriate subset of algorithms to consider processing of existing information in a coordinated manner. Future developments will include different recommendation frameworks that address the most common situations. These frameworks allow RS to incorporate the CF kernel with the most appropriate recommendations methods based on the available information in a simple and straightforward manner. At higher levels (prediction and recommendation), Figure 5 incorporates current evaluation quality measures, such as those for diversity and novelty. The importance of such measures, and measures developed in the future will grow as users demand novel and less predictable recommendations. The remainder of this article is structured as follows: We begin with ten sections considered relevant to the RS field. These sections address the following subjects: a) basic topics, b) areas where research is currently focused, and c) new issues that may have increasing incidence. The concluding section summarizes the RS history and focuses on the type of data used as well as the development of algorithms and evaluation measures. The conclusions section also indicates seven new areas that we consider likely to be the focus of RS research in the scientific community in the near future.. 26.

(28) Fig. 5. Recommender Systems taxonomy.. The ten sections with the selected topics are as follows: explanation and formalization of the k Nearest Neighbors algorithm (section 2); evaluation measures of the RS quality (section 3); current memory-based similarity measures (section 4); the new item, new user and new community cold-start issue (section 5); location aware RS (section 6); recommendations to group of users (section 7); explaining recommendations (section 8); incorporating social information to RS (section 9); content-based filtering techniques (section 10); bio-inspired model-based approaches (section 11).. 27.

(29) 2.2. The k nearest neighbors recommendation algorithm The k Nearest Neighbors (kNN) recommendation algorithm is the reference algorithm for the collaborative filtering recommendation process. Its primary virtues are simplicity and reasonably accurate results; its major pitfalls are low scalability and vulnerability to sparsity in the RS databases. This section provides a general explanation of this algorithm function as well as a brief formalization of its details. The kNN algorithm is based on similarity measures. Some of the traditional user to user similarity measures commonly-used in RS are: Pearson correlation, cosine, constrained Pearson’s correlation and Spearman rank correlation. Section 4 provides further details on the current RS similarity measures. The similarity approaches typically compute the similarity between two users x and y (user to user) sim(x, y) based on both users’ item ratings. The item to item kNN version computes the similarity between two items i and j. Figure 6 shows a case study using the user to user kNN algorithm mechanism.. Fig. 6. User to user kNN algorithm example. k=3. Similarity measure: 1-(Mean Squared Differences).. 28.

(30) Lets an RS with a database of L users and M items rated in the range {min,...,max}, where the absence of ratings will be represented by the symbol •. We define U (set of users) and I (set of items). Item recommendation algorithm phases for an active user u are as follows: a) obtaining the active user’s k-neighbors; b) prediction of the items values; and c) top-n recommendations. a) Using the selected similarity measurement, sim(x,y), we produce the set of k neighbors Ku for the user u. The k neighbors for u are the nearest k (similar) users to u according to the results generated by applying sim(x,y) as follows. (1) (2) b) Once the set of k users (neighbors) similar to active u has been calculated (Ku), in order to obtain the prediction pu,i of item i on user u, one of the following aggregation approaches is often used: the average (3), the weighted sum (4) and the adjusted weighted aggregation (deviation-from-mean) (5). We define Ku,i as the set of neighbors of u which have voted item i. (3). (4). (5). c) To obtain the top-n recommendations, we define Xu as the set of recommendations to user u, and Zu as the set of n recommendations to user u.The following must be true: 29.

(31) ,. (6) (7). If we want to impose a minimum recommendation value:. , we add. .. 2.3. Evaluation of recommender systems and frameworks Since RS research began, evaluation of predictions and recommendations has become important [Herlocker et al. 2004; Sarwar et al. 2000a]. Because of evaluation measures, RS recommendations have gradually been tested and improved [Cacheda et al. 2011]. A representative set of existing evaluation measures has standard formulations, and a group of open RS public databases has been generated. These two advances have facilitated quality comparisons for new proposed recommendation methods and previously published methods; thus, RS methods and algorithms research has progressed continuously. The most commonly used quality measures are the following [Gunawardana and Shani 2009; Hernández and Gaudioso 2008]: 1) prediction evaluations, 2) evaluations for recommendation as sets, and 3) evaluations for recommendations as ranked lists. Figure 6, (shown in the next section 'Similarity Measures'), shows results from applying several evaluation measures to a set of representative similarity measures. Hernández and Gaudioso [2008] propose an evaluation process based on the distinction between interactive and non-interactive subsystems. General publications and reviews also exist which include the most commonly accepted evaluation measures: mean absolute error, coverage, precision, recall and derivatives of these: mean squared error, normalized mean absolute error, ROC and fallout; Goldberg et al. [2001] focuses 30.

(32) on the aspects not related to the evaluation, Breese et al. [1998] compare the predictive accuracy of various methods in a set of representative problem domains. The majority of articles discuss attempted improvements to the accuracy of RS results (RMSE, MAE, etc.). It is also common to attempt an improvement in recommendations (precision, recall, ROC, etc.). However, additional objectives should be considered for generating greater user satisfaction [Ziegler et al. 2005], such as topic diversification, coverage serendipity, etc. Currently, the field has a growing interest in generating algorithms with diverse and innovative recommendations, even at the expense of accuracy and precision. To evaluate these aspects, various metrics have been proposed to measure recommendation novelty and diversity [Hurley and Zhang 2011; Vargas and Castells 2011]. The frameworks aid in defining and standardizing the methods and algorithms employed by RS as well as the mechanisms to evaluate the quality of the results. Among the most significant papers that propose CF frameworks are Herlocker et al. [1999] which evaluates the following: similarity weight, significance weighting, variance weighting, selecting neighborhood and rating normalization; Hernández and Gaudioso [2008] proposes a framework in which any RS is formed by two different subsystems, one of them to guide the user and the other to provide useful/interesting items. Koutrika et al. [2009] is a framework which introduces levels of abstraction in CF process, making the modifications in the RS more flexible. Antunes et al. [2012] presents an evaluation framework assuming that evaluation is an evolving process during the system lifecicle. The majority of RS evaluation frameworks proposed until now present two deficiencies: The first of these is the lack of formalization. Although the evaluation 31.

(33) metrics are well defined, there are a variety of details in the implementation of the methods which, in the event they are not specified, can lead to the generation of different results in similar experiments. The second deficiency is the absence of standardization of the evaluation measures in aspects such as novelty and trust of the recommendations. Bobadilla et al. [2011a] provides a complete series of mathematical formalizations based on sets theory. Authors provide a set of evaluation measures, which include the quality analysis of the following aspects: predictions, recommendations, novelty and trust. Presented next is a representative selection of the RS evaluation quality measures most often used in the bibliography. We use the mathematical notation described in the previous section. 2.3.1 Quality of the predictions: Mean Absolute Error-Accuracy and coverage In order to measure the accuracy of the results of an RS, it is usual to use the calculation of some of the most common prediction error metrics, amongst which the Mean Absolute Error (MAE) and its related metrics, mean squared error, root mean squared error, and normalized mean absolute error stand out. Let. , set of items rated by user u having prediction values.. We define the MAE and RMSE of the system as the average of the user’s MAE:. (8). (9). 32.

(34) The coverage could be defined as the capacity of predicting from a metric applied to a specific RS. In short, it calculates the percentage of situations in which at least one kneighbor of each active user can rate an item that has not been rated yet by that active user. We defined Ku,i as the set of neighbors of u which have voted item i. Let We define the coverage of the system as the average of the user’s coverage:. (10). 2.3.2 Quality of the set of recommendations: precision, recall and F1 The confidence of users for a certain RS does not depend directly on the accuracy for the set of possible predictions. User confidence is produced by providing a reduced set of recommendations to which each user agrees. In this section, we define the following three most widely used recommendation quality measures: 1) precision, which indicates the proportion of relevant recommended items from the total number of recommended items, 2) recall, which indicates the proportion of relevant recommended items from the number of relevant items, and 3) F1, which is a combination of precision and recall. First, we redefine equation (6):. (11). We will represent the evaluation precision, recall and F1 measures for recommendations obtained by making N test recommendations to the user u, taking a θ relevancy threshold. Assuming that all users accept N test recommendations: (12) 33.

(35) (13). (14). 2.3.3 Quality of the list of recommendations: rank measures When the number N of recommended items is not small, users give greater importance to the first items on the list of recommendations. The mistakes incurred in these items are more serious errors than those in the last items on the list. The ranking measures consider this situation. Among the ranking measures most often used are the following standard information retrieval measures: a) half-life (15) [Breese et al. 1998], which assumes an exponential decrease in the interest of users as they move away from the recommendations at the top, and b) discounted cumulative gain (16) [Baltrunas et al. 2010], wherein decay is logarithmic.. (15). (16). p1,…,pn represents the recommendation list, ru,pi represents the true rating of the user u for the item pi, k is the rank of the evaluated item, d is the default rating, α is the number of the item on the list such that there is a 50% chance the user will review that item 2.3.4 Novelty and diversity The novelty evaluation measure indicates the degree of difference between the items recommended to and known by the user. The diversity quality measure indicates the degree of differentiation among recommended items. 34.

(36) Currently, novelty and diversity measures do not have a standard; therefore, different authors propose different metrics [Nehring and Puppe 2002; Vargas and Castells 2011]. Certain authors have [Hurley and Zhang 2011] used the following: (17). (18). Here, sim(i,j) indicates item to item memory-based CF similarity measures. Figure 7 shows the general mechanism for cross validation used to generate results from the evaluation measures.. Fig. 7. Recommender Systems evaluation process.. 35.

(37) 2.4. Similarity measures The metrics or Similarity Measures (SM) determine the similarity between pairs of users (user to user CF) or the similarity between pairs of items (item to item CF). For this purpose, we compare the numerical values of the votes for all the items voted for by both users (user to user) or the numerical values of the votes of all those users who have voted for both items (item to item). The kNN algorithm has been based essentially on the use of traditional similarity metrics of statistical origin. These metrics require, as the only source of information, the set of votes made by the users on the items (memory-based CF). Among the most commonly used traditional metrics we have: Pearson correlation (CORR), cosine (COS), adjusted cosine (ACOS), constrained correlation (CCORR), mean squared differences (MSD) and Euclidean (EUC) [Candillier et al. 2007; Adomavicius and Tuzhilin 2005]. This section describes and compares a representative group of SM used in the kNN algorithm. The SM discussed include the following variations: a) cold-start and general cases, b) based or not based on models, and c) using trust information or votes only. Table 2 shows a classification of the memory-based CF SM which will be tested in this section. Not based on models. Traditional (only the votes of both users or both items). Extended to all the votes. Not tailored to cold-start users Tailored to cold-start users. Modelbased. No trust extraction JMSD, CORR, CCORR, COS, ACOS, MSD, EUC PIP. Trust extraction. SING. TRUST. GEN. UERROR. NCS. Table 2. Tested collaborative filtering similarity measures. 36.

(38) A new metric (JMSD) has recently been published, which besides using the numerical information from the votes (via Mean Squared Differences) also uses the non-numerical information provided by the arrangement of these (via Jaccard) [Bobadilla et al. 2010]. A specialization of the memory-based CF SM, which appeared recently [Bobadilla et al. 2012b], uses the information contained in the votes of all users, instead of restricting it to the votes of the two users compared (user to user) or the two items compared (item to item). We will call this SM SING (singularities). The possibility exists to create a model (model-based CF) from the full set of users’ votes in order to later determine the similarity between pairs of users or pairs of items based on the model created. The potential advantages of this focus are an increase in the accuracy obtained, in the performance (time consuming) achieved or in both. The drawback is that the model must be regularly updated in order to consider the most recently entered set of votes. Bobadilla et al., [2011b] provides a metric based on a model generated using genetic algorithms. We will call this SM GEN (genetic-based). As a result of the increase in web 2.0 websites on the Internet, a set of metrics has appeared which use the new social information available (friends, followings, followeds, etc.). Most of these SM are grouped in papers related to trust, reputation and credibility [Ekström et al. 2005; Yuan et al. 2010; Li and Kao 2009], although this situation is also produced in other fields [Bobadilla et al. 2009]. These metrics cannot be considered strictly memory-based CF, as they use additional information which not all RS have. In this sense, each SM proposed is tailored to a specific RS or at most to a very small set of RS which share the same structure in their social information. There are SM [Jeong et al. 2009; Kwon et al. 2009] which aim to extract information related to trust and reputation by only using the users’ set of votes (memory-based CF). 37.

(39) The advantage is that their use can be generalized to all CF RS; the drawback is that the social information extracted is really poor. We will call TRUST the SM proponed in Jeong et al. [2009]. Figure 8 shows the results from several evaluation measures generated by applying the SM discussed in this section. The results show that the RS-tailored SM are superior compared with the traditional SM from statistics. Processing for the memory-based information and results follow the framework schematic published previously [Bobadilla et al. 2011a].. Fig. 8. Evaluation measures results obtained from current similarities measures; MovieLens database. A) prediction results, B) recommendation results, C) novelty results, D) trust results.. There are so far research papers which deal with the cold-start problem through the users’ ratings information: Ahn [2008] presents a heuristic SM named PIP, that outperforms the traditional statistical SM (Pearson correlation, cosine, etc.); and HeungNam et al. [2011] proposes a method (UERROR) that first predicts actual ratings and subsequently identifies prediction errors for each user; taking into account this error 38.

(40) information, some specific “error-reflected” models, are designed. [Bobadilla et al. 2012c] presents a metric based on neural learning (model-based CF) and adapted for new user cold-start situations, called NCS. Figure 9 shows results from several evaluation measures generated by applying the cold-start SM presented in this section; in the following section (‘Cold Start’), a detailed study of research in this field is discussed. The results show that the RS-tailored SM are superior compared with the traditional SM from statistics. Since the database Movielens does not take into account cold-start users, we have removed votes of this database in order to achieve cold-start users. Indeed, we have removed randomly between 5 and 20 votes of those users who have rated between 20 and 30 items. In this way, those users who now result to rate between 2 and 20 items are regarded as cold-start users.. Fig. 9. Evaluation results obtained from current cold-start similarities measures. A) prediction results, B) recommendation results, C) novelty results, D) trust results.. 39.

(41) 2.5. Cold-start The cold-start problem [Schafer et al. 2007; Adomavicius and Tuzhilin 2005] occurs when it is not possible to make reliable recommendations due to an initial lack of ratings. We can distinguish three kinds of cold-start problems: new community, new item and new user. The last kind is the most important in RS that are already in operation. The new community problem [Schein et al. 2002; Lam et al. 2008] refers to the difficulty in obtaining, when starting up a RS, a sufficient amount of data (ratings) which enable reliable recommendations to be made. When there are not enough users in particular and votes in general, it is difficult to maintain new users, which come across a RS with contents but no precise recommendations. The most common ways of tackling the problem are to encourage votes to be made via other means or to not make CF-based recommendations until there are enough users and votes. The new item problem [Park and Tuzhilin 2008; Park and Chu 2009] arises due to the fact that the new items entered in RS do not usually have initial votes, and therefore, they are not likely to be recommended. In turn, an item that is not recommended goes unnoticed by a large part of the community of users, and as they are unaware of it they do not rate it; this way, we can enter a vicious circle in which a set of items of the RS are left out of the votes/recommendations process. The new item problem has less of an impact on RS in which the items can be discovered via other means (e.g. movies) than in RS where this is not the case (i.e. e-commerce, blogs, photos, videos, etc.). A common solution to this problem is to have a set of motivated users who are responsible for rating each new item in the system.. 40.

(42) The new user problem [Rashid et al. 2008; Ryan and Bridge 2006] represents one of the great difficulties faced by the RS in operation. When users register they have not cast any votes and, therefore, they cannot receive any personalized recommendations based on memory-based CF; when the users enter their firsts ratings they expect the RS to offer them personalized recommendations, but the number of votes entered is usually not yet sufficient to be able to make reliable CF-based recommendations, and, therefore, new users may feel that the RS does not offer the service they expected and they may stop using it. The common strategy to tackle the new user problem consists of turning to additional information to the set of votes in order to be able to make recommendations based on the data available for each user. The cold-start problem often faced using hybrid approaches (usually CF-content based RS, CF-demographic based RS, CF-social based RS) [Kim et al. 2010; Loh et al. 2009]. Leung et al. [2008] propose a novel contentbased hybrid approach that makes use of cross-level association rules to integrate content information about domains items. Kim et al. [2010] use collaborative tagging employed as an approach in order to grasp and filter users’ preferences for items and they explore the advantages of the collaborative tagging for data sparseness and a coldstart user (they collected the dataset by crawling the collaborative tagging delicious site). Weng et al. [2008] combine the implicit relations between users’ items preferences and the additional taxonomic preferences to make better quality recommendations as well as alleviate the cold-start problem. Loh et al. [2009] represent user’s profiles with information extracted from their scientific publications. Martinez et al. [2009a] present a hybrid RS which combines a CF algorithm with a knowledgebased one. Chen and He [2009] propose a number of common terms / term frequency (NCT/TF) CF algorithm based on demographic vector. Saranya and Atsuhiro [2009] 41.

(43) propose a hybrid RS that utilizes latent features extracted from items represented by a multi-attributed record using a probabilistic model. Park et al. [2006] propose a new approach: they use filterbots, and surrogate users that rate items based only on user or item attributes. These research papers base their strategies on the presence of additional data to the actual votes (user’s profiles, user’s tags, user’s publications, etc.). The main problem is that not all RS databases possess this information, or is not considered sufficiently reliable, complete or representative. There are so far two research papers which deal with the cold-start problem through the users’ ratings information: Ahn [2008] presents a heuristic similarity measure named PIP, that outperforms the traditional statistical similarity measures (Pearson correlation, cosine, etc.); and Heung-Nam et al. [2011] proposes a method that first predicts actual ratings and subsequently identifies prediction errors for each user; taking into account this error information, some specific “error-reflected” models, are designed.. 2.6. Location-aware recommender systems Due to the increasing use of mobile devices, location-aware systems are becoming more widespread. These systems show a tendency towards their consolidation as web 3.0 services and this naturally leads to location-aware CF and location-aware RS, which can be called geographic CF and geographic RS. We introduce a taxonomy for geographic CF RS and focus on the most relevant section of the classification obtained. Table 3 establishes the different possibilities of tackling a geographic RS according to the nature of the ratings made (“rating stage”) and the recommendation process followed (“recommendation stage”). “User” indicates that the. 42.

(44) rating and/or recommendation are made without having or using the user’s Geographic Information (GI). Similarly, “Item” indicates that the rating and/or recommendation are made without having or using the item’s GI. In the cases labeled as “Userg” and “Itemg” the GI is used. The cases identified are: •. RS: Traditional RS, where you vote and recommend without using geographical. information. •. RS+G: Traditional RS which also contributes the item’s geographical position.. These RS cannot be considered as geographic RS, as the GI does not play a part in the recommendation process. •. GRS: form the group of Geographic RS which are most likely to become. popular in the near future. In these, ratings are made in a traditional way, whilst recommendations are made by considering the geographical position of the user to whom the recommendation is to be made. A representative example is that of a RS for restaurants; the users rate a restaurant using very diverse concepts, which do not include the distance at the time of voting between the user and the restaurant. However, a user of a Geographic RS expects a restaurant to be recommended to them not only because of good ratings from similar users (k-neighbors), but also according to the distance between their current position and that of the restaurant. Other possible examples are RS for cinemas, pubs, supermarkets, cultural activities in a city, language learning centers, gyms and sports clubs, etc. •. GRS+: In this case, users establish ratings on items by weighting the distance. between them and the items rated. In this type of geographic RS two possibilities can be established:. 43.

(45) 1.. Hybrid CF/Demographic filtering: Each item accepts a maximum of one. vote per user, to which the geographical position from which it has been issued is associated. 2.. Geographic RS where each item accepts more than one vote for each. user, depending on the geographical position from which each vote is cast.. User Userg Item GI. Rating stage Item RS/GRS ---Not. Itemg ---GRS+ Yes. Recommendation stage Item Itemg RS RS+G ---GRS/GRS+ Not Yes. User GI Not Yes. Table 3. Geographic collaborative filtering recommender systems classification.. The hybrid RS in case 1 respond to regional or national geographical approaches, in which recommendations can be established according to weighting between the similarity of the votes (CF) and their origin. This type of GRS can be considered an extended case of hybrid CF/demographic filtering, in which the GI is given for each vote instead of for each user. From a theoretical point of view, Type 2 GRS+ are the most complete; however, from a practical point of view, they involve a semantic difficulty in the item rating process which makes their use very difficult. The casting of votes in a GRS+ requires each user to be capable of giving ratings about the items according to the relative distances between the user and the items. By way of example, a user can rate a restaurant from their home differently to how they would rate it from their workplace; and when the distances are very different the ratings are also likely to be so. The mental process would be something like this: I am 1 km from the restaurant and I rate very positively travelling 1 km to go to that restaurant which I think is good; but after some time, the same user, who is at work, 24 km away from the restaurant, could cast a vote indicating 44.

(46) they do not consider it to be positive to travel 24 km to go to the restaurant even if they think it is good. In summary, GRS+ have the advantage that they accept a wider variety of ratings and that these also contain the relative importance that each user gives to the items according to the distance required to access them. The disadvantage is that it is difficult to involve users in a particularly complex and demanding ratings process. This section focuses on the GRS-type geographic CF RS. At present, there are few publications regarding GI-based RS; This is due, to a great extent, to the lack of public databases that include ratings and geographic positions capable of being combined in an RS. Some of the publications that focus more closely on the field are as follows: Martinez et al. [2009b] and Biuk-Aghai et al. [2008] are examples of the RS+G group. In Schlieder [2007], they propose a novel approach for modeling the collaborative semantics of geographic folksonomies. This approach is based on multi-object tagging, that is, the analysis of tags that users assign to composite objects. This paper is based on the concept of groups of people who share a common geospatial feature data dictionary (including definitions of feature relationships) and a common metadata schema. Wan-Shiou et al. [2008] can be considered as an hybrid content based/geographic RS. The core of the system is an hybrid content based/geographic recommendation mechanism that analyzes a customer’s history and position so that vendor information can be ranked according to the match with the preferences of a customer. Matyas and Schlieder [2009] show a collaborative system that we could situate between a RS and a GRS. In this case, the users' ratings are taken based on the photos they have downloaded from a Web 2.0 and the photos they have uploaded to the same Web (the photos have a GPS address associated to them). After this, a search of k-neighborhoods 45.

(47) based on this data is carried out. The recommendation process does not take into account the user's position. It is possible to collect travel GPS traces from users and use the database to generate recommendations [Zheng and Xie 2011]. The travel GPS traces can be reinforced with social information based on friends [Zheng et al. 2011]. Both papers can be classified as GRS+.. 2.7. Recommending to groups of users RS that consider groups of users [Jameson and Smyth 2007] are starting to expand and to be used in different areas: tourism [Ardissono et al. 2003], music [Chao et al. 2005], TV [Yu et al. 2006], web [Pazzani and Billsus 2007]. Given the specific characteristics of the recommendation to groups, it is appropriate to establish a consensus for different group semantics that formalize the agreements and disagreements among users [Roy et al. 2010]. With the aim of presenting the work carried out to date in a structured way, we provide a classification of the recommendation to groups in CF RS. Figure 10 graphically illustrates the four basic levels on which we can act in order to unify the group’s users’ data with the objective of obtaining the data of the group of users: similarity metric, establishing the neighborhood, prediction phase, determination of recommended items. In Figure 10, the individual members of a group are represented on the left, in grey; each graticule represents the matrix of votes by the users (horizontal) on the items (vertical). The graph shows the four representative cases of tackling the solution to recommendation by groups (one case for each matrix on the left of the figure). The. 46.