• No se han encontrado resultados

5 REFERENTE TEÓRICO

5.1 POLÍTICAS PÚBLICAS

5.1.2 Implementación de Políticas Públicas

Results show that the genetic algorithm does converge to useful results, some of which agree with commonly-used parameter values: for example, the MovieLens parameter values. However, some unexpected parameter values were selected: for example, low significance threshold values (in the range [0 − 16]) are selected for the bookcrossing, last.fm and Epinions datasets in both experiments. This indicates that dampening the similarity measure between users with a small num- ber of co-rated items is not useful for the bookcrossing, last.fm and Epinions datasets, whereas doing so is beneficial in the MovieLens case. This makes in- tuitive sense, in particular for the bookcrossing dataset, where the dataset is extremely sparse and where any evidence, even between users with only a few co-rated items, is better than the common case of having no evidence available to find similar users.

Also seen in the results is the selection of many neighbours (the N or corrT parameter) to perform prediction. Choosing the top-N neighbours for prediction is only selected for the MovieLens dataset, but in both experiments a high value of N is chosen (N = 196 and N = 164). For the other three datasets, correlation thresholding is chosen, with very low corrT values in the range [0.000621 − 0.09]. However from an efficiency perspective, it may not always be possible to include so many neighbours when calculating predictions.

In Experiment 2, cosine similarity was chosen for three of the four datasets, with Pearson correlation similarity chosen for the MovieLens dataset. This suggests that the dominance of Pearson correlation as a similarity function is not always

justified.

It could be argued in some instances that the field of recommendation has moved past a complete reliance on the neighbourhood-based model outlined here and that the recent focus is on matrix factorization models and incorporating addi- tional information that is available — for example, content information [154] and social information gleaned from the online social interactions between users (e.g. trust [73]). Whilst this is undoubtedly an avenue of work which can potentially overcome many of the disadvantages associated with a pure collaborative filtering approach, there is still scope, as witnessed by recent literature [197, 70, 131] and the availability of newer datasets, to continue investigation into the assumptions and parameter values chosen for the basic collaborative filtering approach. It is from such a perspective that the work outlined here was undertaken.

4.6

Conclusions,

Contributions

and

Future

Work

This chapter outlined a genetic algorithm approach which was used to learn an optimal set of parameters for four datasets in a nearest-neighbour collaborative filtering approach. The sample space of parameters, and their possible values, and the potential combinations of different parameters, was considered too large and unwieldy to perform a brute force analysis of the problem. For this reason a genetic algorithm approach was adopted where each individual represented a set of values for a chosen set of parameters. The fitness of each individual was calculated by running a collaborative filtering approach on a test set using the parameter values specified in the individual and calculating the mean absolute error (MAE) of the results. Initially, a reduced set of parameters were learned; a second experiment allowed a greater range of options for the similarity functions and prediction functions. Although the approach is computationally expensive it only needs to be carried out once per dataset. The suitability of the problem to a genetic algorithm approach was also considered.

The contributions of this work are in the use of a genetic algorithm approach to learn the best set of parameter values across the four datasets. The same approach will be used in the next chapter but, in that chapter, views of each of the four datasets will be considered and parameters will be evolved and evaluated for each view.

In addition, the parameters chosen for each dataset will be used in Chapters 6 and 7 which focus on another machine learning approach — this time concentrating on learning features of the dataset and views which, for any user with these feature values, may be useful in predicting how well a collaborative filtering technique will perform for that user.

Future work could involve looking at additional parameters or adding constraints to the existing parameters: for example, reducing the higher range of values allowed for the number of neighbours, N, chosen when using a top-N approach; or reducing the lower range of values allowed for the threshold value when using a correlation thresholding approach. This would stop the convergence to values that are too large (e.g., N) or too low (e.g., corrT ).

Learning Neighbourhood-based

Collaborative Filtering

Parameters: Dataset Views

5.1

Introduction

This chapter will present further experiments performed on learning the best set of parameters for the Pearson Correlation Nearest-Neighbour Collaborative Filtering approach. The experiments in this chapter will focus on dataset views. Specifically, a genetic algorithm approach will be used to find the optimal set of parameters for all the views previously specified, if possible, for the four datasets. The outline of the chapter is as follows: the motivations, and an overview, of the work is presented in Section 5.2. An overview of the methodology is given in Section 5.3. Results are split according to the two different sets of views: Section 5.4 presents results, where possible, for the 12 user rating views; Section 5.5 presents results, where possible, for the 12 popular item views. Section 5.6 compares the results for each view across the four datasets. A discussion of the results (Section 5.7) and conclusions (Section 5.8) are finally presented.

Documento similar