Table 3.5: Average Mean and Average Variance per User
Mean Variance Item-based 3.5961 0.3317 User-based 200 3.6136 0.3774 User-based 500 3.5639 0.2673 SVD 3.5610 0.3354 Observed ratings 3.5827 1.0648
she will agree on a different one. In contrast, if a users rated two different movies similarly that is a more reliable source to conclude that those movies are similar, because similarity between items is more static than between users and probably the similarity between two items can be expressed by less common ratings than a similarity between two users. This suggests that items similarity is more expressive and reliable than user similarly. User similarity involves more dimensions in terms of taste and depends on the particular items, so a single number might not be able to express user similarity, especially if it is based on little information.
The other factor that can be further investigated is the effect of the variance on recommendation. Variance shows the extent users agree on an item, which is a good indication whether an item is easy or difficult to predict. Table 3.5 shows that the predicted mean and the variance of different models vary and generally less than the real variance for the same data. This is due to the fact that each model is confident predicting items that have low variance (Figure 3.6 (c)). So that item that have higher variance are predicted towards their mean, which is responsible for the higher error and the increased variance for the prediction. This clearly suggests that personalisation does not work for items that have high variance. This observation can also be the reason why recommender systems tend to promote popular items which generally have lower variance (discussed later in Chapter 4).
Therefore, the aim is to improve accuracy for the following three distinct features: items and users with a small number of ratings, which is referred to as the cold-start problem (Section 2.4.1), items and users that have lower mean (the issue was identified in Section 3.1) and items and users that have high variance as suggested in the previous paragraph, but it is also a part of the bigger picture discussed in Section 4.3.2.
3.6
Conclusion
We introduced the first example of goal driven design in this chapter. The approach presented here puts an emphasis on the risk of making an incorrect recommendation and identifying items accordingly. The algorithm aims to optimise its performance on those items. This approach can be fine-tuned further by considering how the items would be presented to the user. Depending on the goal certain specialised metrics can be favoured. For example if a user would like to have just one item recommended, the algorithm is best optimised by MRR, or if the user would like to have more items recommended it would be better to optimise it by NDCG. In addition, different strategies depending on user needs could
45 3.6. Conclusion
be identified and the risk preference can be switched accordingly altering the recommendations. The choice of parameters can be tailored to users need penalizing sectors that are more important to predict correctly to a specific user. For example, calculating the mean of all the ratings for a particular user would suggest where the taste boundaries lie, so it can be determined for each user. On a global level (across all users), strategies such as concentrating on popular items (defined by their average rating or their frequency of ratings) would help to increase the quality of recommendation for all the users, but concentrating on items that are hard to predict (e.g. items that have a high variance of ratings) would help to reduce the overall error of the system.
As it was discussed above all the measures only care about relevant items, but for our purposes it is also important to minimise error on disliked items (rated one or two). Thus, we would like to measure how the algorithm performs on both sides of the rating scale. In both cases, the middle range (items rated three) would be considered non-relevant. These two scores could be combined taking the high rated list more into account than the low rated one.
It is a widely discussed topic that accuracy alone is not a sufficient to measure whether a recom- mender system provides an effective and satisfying experience [HKTR04]. It is also important to note that a data is not homogeneous. In terms of prediction, we can differentiate between easy and difficult items as well as easy and difficult users.
Here, the direction of errors was considered only for items and applied uniformly for all users, in a similar fashion users can differ from each other in terms of risk preferences which could be defined for each user and the direction of errors per user could be modelled accordingly. In this way, we would introduce another layer to the model that considers risk preferences per items first and above this level per user, so that the particular penalty for an item would be defined by the risk preference of the user whom the item will be recommended. Risk preference could be mined for the user profile, for example depending on previous rating strategy or how diverse items the user rated previously.
Chapter 4
Optimising Multiple Objectives
In the previous chapter, we focused on some aspects of the main performance measure with respect to a certain user-centred goal, the main focus of this chapter is to investigate this further and understand how multiple goals and their specialised metrics can be framed. We illustrate a multiple goal optimization approach that not only considers the predicted preference scores (e.g. ratings) but also deals with addi- tional operational or resource related recommendation goals, based on the goal driven design outlined in Section 1.2. We start the chapter with an example of goal driven approach where the objectives are directly connected to the performance of the system. Here we study whether it is feasible to use rec- ommender systems to optimize digital content delivery, by predicting which items would be requested and pre-caching them near the target user. This problem is framed as an external system centric goal. In the second part of the chapter, this is extended to multiple goals where the objectives might not be directly linked to the general performance of the system. Using this framework we demonstrate through realistic examples how to expand existing rating prediction algorithms by biasing the recommendation depending on other goals such as the availability, profitability or usefulness of an item. In the last part of the chapter we set to improve diverse, novel and serendipitous recommendations at the same time, at a slight cost to accuracy, using an internal optimisation approach. To some extent, these goals might be complementing with each other so that the combination of the goals, measured by specialised metrics, would provide the best user experience.
4.1
Problem Statement
To build a practical recommender system, providing items that fit to the target user’s taste (recommen- dation accuracy) is not the only concern. Users’ satisfaction also relies on the utility of obtaining rec- ommendations to accomplish a certain information seeking task. Additionally, in a practical operational environment there might be other factors that can affect the effectiveness of the whole system. For exam- ple, many recommendation algorithms use, either explicitly or implicitly, the Root Mean Square Error measure as the objective function [Kor08, KBV09] - a typical case is the Netflix competition. To reduce the error, the algorithm has to focus on the popular items (in the training phase), because that strategy would minimise the overall error of the system. As a result the algorithm is more likely to recommend mainstream items which might be already known to the user. However, recommending these items is