Depending on the application scenario, features of both the rows and the columns may be available. In a Recommender System, demographic information is often known about the users and the items can be described by a set of features, too. In [ABEV06], the integration of features is proposed by defining a kernel between rows and columns that integrates features. Another way of introducing features is to use them as a prior for the factors, as studied in [AC09].
4.7 Conclusion Here, we present a integration of features to our matrix factorization framework by adding several linear supervised learning models: Following common supervised ma- chine learning notation, the rows of XR ∈ Rr×dRshall contain the d
Rfeature vectors for the rows of Y. The rows of XC ∈ Rc×dC contain those d
Cfeatures of the columns of Y. Using these two new matrices, the prediction function from Equation (4.1) can be extended to use column features:
Fi, j=Ri∗, Cj∗+
D
WiR∗, XCj∗E (4.46)
Here, WR ∈ Rr×dC is a matrix whose rows are the weight vectors for the column
features per row. The very same idea can be applied to row features as well: Fi, j= Ri∗, Cj∗+
D
WiR∗, XCj∗E+DWiC∗, XRj∗E (4.47) The matrix WM ∈ Rc×dUencodes the weight vector for each column.
As with the offsets, the features can be integrated into the algorithmic procedure described above without any changes. To do so, one would extend C with the features from XC, R with the features from XR. These new entries are masked from optimization and regularization such that their value stays fixed.
To learn the parameters, R is extended by WRand C by WC. The optimization over R and C includes these new entries. This yields the parameter vectors WRand WC. Please note that the Frobenius norm decomposes per entry in R and C. Thus, the newly in- troduced parameters are regularized as if there would be a L2norm imposed on them. This essentially recovers a regularized risk minimization problem for these weight vec- tors.
4.7 Conclusion
In this chapter, a generalized matrix factorization model and algorithm has been pre- sented. As shown at the beginning of this chapter (Section 4.1 on page 70), the model underlying this approach can be applied both to recommender systems and Machine Teaching applications. The presented approach is a generalization of the state-of-the-art in several respects.
Regularizer: While this thesis focuses on the Frobenius norm as the model regularizer, many more choices such as the L1-norm are possible without leaving the frame- work presented here. The adaptive regularization can be applied to a wide range of regularizers, too.
Loss Functions: The model and algorithm here can be used with a wide range of loss functions known in the supervised machine learning community.
Especially notable is the possibility to use non-smooth loss functions due to the use of the bundle method solver. Even more so, the applicability of matrix factor- ization to per-row loss functions is a major contribution as it facilitates the opti- mization of ranking losses in recommender systems. To this end, one novel loss
function to directly optimize for the NDCG score and a substantially faster ver- sion of the ordinal regression loss function have been presented. Per-row losses also are a crucial ingredient in making matrix factorization a viable method for a large class of Machine Teaching systems.
Hybrid Approach: The inclusion of features as described above extends matrix factor- ization to be what is known as a “hybrid recommender system”: One that can use both the collaborative effect of users interacting on the same set of items as well as explicitly known features of the users and items. This is an important step in ad- dressing the new user problem, which, as we have shown earlier, arises naturally in Machine Teaching.
The graph kernel and the adaptive regularization make it possible to use additional knowledge embodied in the rating matrix or the matrix representation of an artifact in the Machine Teaching setting.
The approach presented in this chapter provides a solid base for the construction of Machine Teaching systems for detailed feedback, as the prediction of suggestions for a new or changed artifact is fast and, as shall be shown below, very accurate. The wide range of possible loss functions allow the system to be used in a multitude of Machine Teaching scenarios and the possibility to use features of the artifact or the structural elements thereof can further increase this performance.
Next steps in this thesis: Before applying the algorithm in a Detailed Feedback Ma- chine Teaching approach in Chapter 6, we first evaluate it on well established Recom- mender Systems data sets to compare its performance to the state-of-the-art in that field in Chapter 5.
5 Evaluation on Recommender Systems Data
Contents 5.1 Evaluation Setup . . . 102 5.1.1 Evaluation Measures . . . 102 5.1.2 Evaluation Procedure . . . 103 5.1.3 Data Sets . . . 1045.2 Results and Discussion . . . 105
5.2.1 Model Extensions . . . 105 5.2.2 Ranking Losses . . . 106
5.1 Evaluation Setup
This chapter presents empirical evaluation results of the algorithm discussed so far in the Recommender Systems scenario before moving on to the application of the al- gorithm to a Detailed Feedback Machine Learning example in the next chapter. The reason for evaluating on Recommender Systems data lies in the fact that this field of research is well established:
• Data sets are readily available.
• The evaluation method including the evaluation measure is widely agreed upon in the field.
• Thus, it is possible to assess the performance of an algorithm in comparison to the related work.
Obviously, none of this is true for the Machine Teaching area. The purpose of the evaluation presented in this chapter therefore is to assure the general performance of the algorithm before depending upon both – the algorithm and its performance – in subsequent steps.
Similar to the evaluation presented in Chapter 3, the evaluation is discussed in two steps: First, the evaluation method including the used measures and the data and the pre-processing thereof are introduced. The remainder of this chapter then presents empirical results and their discussion for the two main questions to be evaluated:
1. Do the model extensions proposed in Section 4.6 add to the performance of the Recommender System?
2. Do the per-row losses improve the ranking performance of the system?