LA ESCRITURA COMO “ACTIVIDAD LITERARIA” EN LOS LIBROS DE TEXTO.
2 CAPÍTULO II: LOS LIBROS DE TEXTO EN LENGUA Y LITERATURA : 41 DÉCADA DE
The parameters held constant for the first experiment are:
• Pearson correlation is used as a similarity measure (option 1 for sim). • The means of test users and neighbours are calculated over co-rated items
(option 0 or 1 for the predict option, P ).
• The rating threshold, that is, the number of ratings a user must have in order to be included as a test user, remains constant for each dataset (10 for the MovieLens, last.fm and Epinions datasets, and 1 for the bookcrossing dataset).
Therefore the parameters considered by the genetic algorithm are:
• sigT, the significance threshold value, for dampening similarity scores of users with a small number of co-rated items.
• P, the prediction option used, in this case, whether top-N is used (1) or not (0).
• N, the top-N value when top-N is selected (i.e. when top-N is 1).
• corrT, the correlation threshold value when correlation thresholding is used (i.e., when top-N is 0).
A population size of 20 is used and results are presented after 12 generations of the genetic algorithm.
Results for the four datasets are summarised in Table 4.2 which shows the sum- mary of the best set of individuals found across all generations. These are the
Table 4.2: Experiment 1: Learning 4 parameters.
Dataset sigT P N corrT GA MAE
M ovieLens 29 1 196 n/a 0.685 bookcrossing 16 0 n/a 0.09 6.79 last.f m 11 0 n/a 0.064 0.6735 Epinions 9 0 n/a 0.04 2.8
individuals with the lowest MAE. Where a parameter value is not applicable (e.g. a value for N when top-N has a value of 0, i.e. top-N is not chosen) ‘n/a’ is in- serted. The final columns holds the average fitness score (MAE) of the best set of individuals found by the genetic algorithm approach. A set is used rather than the single best individual, and averages found over the set where appropriate, as there are often small differences across the top set of individuals (e.g. in the threshold value).
As illustrated in Table 4.2, for all but the MovieLens dataset, P is 0, meaning that correlation thresholding, and not top-N, is used to select neighbours. The correlation threshold value is very low in all three cases, indicating that any user with a Pearson correlation value greater than this corrT value is chosen as a neighbour. For the MovieLens dataset the number of neighbours chosen is quite high (196).
For the bookcrossing dataset, the MAE score is very high in comparison to the other datasets. For many test users in the bookcrossing dataset the data is too sparse to find neighbours similar to the test users. The usual approach in this case is to return the test user’s average rating as the prediction score, as this is better than returning a zero or no rating. However from a learning perspective this would be misleading, as some good results could be due to factors outside those being tested by the genetic algorithm. As a result, when no neighbours can be found, a prediction of zero is returned. This inflates all MAE scores but is particularly noticeable in the bookcrossing dataset. The genetic algorithm will always try to find the lowest score and, as the dataset runs are independent of each other, it does not affect the results if the best MAE for one dataset is much higher than the best MAE for another dataset.
The MAE returned by the genetic algorithm gives an idea of how the parameters performed in a number of different runs. To test the parameters more fully, further collaborative filtering runs were performed and a number of evaluation metrics used. In particular, for ten runs for each dataset and for the best set of parameters, measures of MAE, coverage, F1 and precision-at-1 were calculated.
Table 4.3: Experiment 1: Further Evaluation of Best set of GA Parameters.
Dataset GA MAE Avg.
MAE Avg. Coverage F1 Precision- at-1 M ovieLens 0.685 0.721 99.37% 0.6367 1 bookcrossing 6.79 1.46 8.08% 0.0382 1 last.f m 0.6735 0.802 20.38% 0.89 1 Epinions 2.8 0.884 35.35% 0.166 1
The results are shown in Table 4.3 and indicate that, for the MovieLens and last.f m datasets, the average MAE over 10 runs is higher than the best found by the genetic algorithm, whereas the opposite is true for the bookcrossing and Epinions datasets, with the average MAE over 10 runs much lower than the best found by the genetic algorithm. The coverage results show that in general, for all but the MovieLens dataset, coverage is low, and particularly low for the bookcrossing dataset. The F1 results show a somewhat different trend to the MAE results. Both the last.fm and MovieLens datasets have good F1 scores but the bookcrossing and Epinions datasets do not. This is not surprising for the bookcrossing dataset; however, it is more surprising for the Epinions dataset. It can perhaps be explained by the GA MAE indicating that the solution found is not good for all test users. The precision-at-1 results show that it is easy, for all datasets, and for all test users, for the systems to be correct in the highest rated-item they recommend (i.e. the test users have rated this item positively).