learning en la educación superior - We put it in Tuenti Communication and Higher Education

We put it in Tuenti Communication and Higher Education

E- learning en la educación superior

In this section, we focus on the performance and running time of the proposed and baseline algorithms. Then, we report the running time of each of the algorithms on the Supermarket dataset to compare them in this aspect.

In CD-CCA algorithm, we compute the canonical correlation between two domains, multiply the projection of source domain (canonical variates) with the diagonal correlations matrix, and project it back to the target space by multiplying the results with the discovered components for the target domain. The complexity of calculating CCA using the approach presented in [44] is O(N k(3n + 5m + 2mn)), in which N is the number of iterations for least squares, k is the number of components (equal to or less than the number of items in the source domain), n is the number of datapoints (users), and m is the number of items in the target domain. The complexity for multiplying the n × k canonical variate matrix of the source domain, to the diagonal k × k matrix of canonical correlations is O(nk). Lastly, projecting the target domain canonical variates back to the original target domain space

costs O(nkp), in which p is the number of items in the target domain. Thus, since we have k < m and k < p, the complexity of CD-CCA algorithm is O(N k(3n + 5m + 2mn) + nkp).

In the large-scale CD-LCCA, the complexity for computing canonical correlations in- cludes iterations of LING least squares algorithm and QR-decomposition of projection of original source and target matrices into their small-scale versions. Ling costs O(np(N2+kpc))

in each iteration, which N2 is the number of iterations to compute Yr in large-scale CCA

using gradient descent; and kpc is the number of singular values that are used for calculat-

ing U1U1TY . Each QR-decomposition takes O(nk2), in which k is the number of compo-

nents. Eventually, calculating large-scale CCA will cost O(N np(N2 + kpc) + N nk2). Since

we are using sparse matrices in Matlab the multiplications in CD-LCCA depend on the number of nonzero elements in the matrices. In the worst case of multiplying dense matrices, the multiplications will cost O(npk + nk2_{). Thus, as a whole, CD-LCCA will cost}

O(N np(N2+ kpc) + N nk2+ npk).

Among the baseline algorithms, SVD++ is the fastest. Since it is implemented for sparse matrices, its complexity depends on the number of nonzero elements in the matrix. So, if |Ru| shows the number of ratings by user u, the complexity for SVD++ is O(Σu|Ru|

). Figure 24 shows an example of running time of CD-CCA on different domain pairs in the Yelp dataset. The X axis shows the size of domain-pair based on number of items and users. It is in the logarithmic scale and represent the sum of user-item rating matrix sizes in the source and target domains (log₁₀(nm + np)). The Y axis shows the running time of CD-CCA in seconds in logarithmic scale. We can see four examples of domain pairs in the picture. As we can see, as the size of domain-pairs grow, the running time of CD-CCA increases respectively.

To have an analysis of algorithms’ performance in practice, we report a sample running time on one of the datasets. We ran all of the algorithms on two similar machines: a MacOS machine with 64GB RAM and two 4-core Intel Xeon, 2.26GHz CPUs and a Linux machine (CentOS) with 64GB RAM and two 4-core Intel Xeon, 2.40GHz CPUs. For CD- CCA, RMGM, and CMF, we use Matlab platform and for CD-SVD and SD-SVD, we use GraphChi software. The average running time of each algorithm on one domain pair of the Supermarket dataset is listed in Table15. As we can see, CD-CCA has the least running time

Restaurant to Food

Shopping to Nightlife

Nightlife to Hotels and Travel Ac:ve life to Beauty

and Spa 0 0.5 1 1.5 2 2.5 3 3.5 5.5 6 6.5 7 7.5 8 Ti m e to c om pl et e CD -‐C CA in lo ga rit hm ic s ca le o f se conds (log10)

Sum of user-‐item matrix sizes in source and target domains in logarithmic scale (log10)

CD-‐CCA :me based on user-‐item ra:ng matrix size

Figure 24: CD-CCA running time in four sample domain-pairs of the Yelp dataset. Numbers are in logarithmic scale.

and RMGM is very slow compared to the other algorithms. One reason for fast running time of CD-CCA is that it can be implemented in full matrices in Matlab and we can avoid loops in its implementation. However, the large-scale implementation of CD-CCA (or CD-LCCA) needs to work with the sparse matrix format in Matlab, and thus, uses less memory and is slow. Running CD-LCCA in Matlab on one domain pair of the Imhonet dataset took 21210 seconds (close to 6 hours) on average. Running CD-SVD with GraphChi on one domain pair of same dataset took almost 4 hours on average.

5.6 SUMMARY

In this chapter of the dissertation, we experimented on different, cross-domain and single- domain, algorithms on three datasets with various characteristics. We studied the feasibility and benefits of cross-domain recommender algorithms, including our proposed algorithms, CD-CCA and CD-LCCA.

Table 15: Average running time of each algorithm on one domain pair in the Supermarket dataset

CD-CCA CD-SVD SD-SVD RMGM CMF

Running time (s) 36 252 176.4 11224 295.38

We compared the results of algorithms in each of the datasets and concluded that CD- CCA is the best performing algorithm in the Yelp and Imhonet dataset, and RMGM is the best-performing one in the Supermarket dataset. On the other hand, RMGM is the worst-performing algorithm in the Yelp dataset. One of the reasons that can result in this inconsistency is the characteristics of the datasets. As we have discussed in Section 3.3.2, RMGM algorithm has problems in finding clear clusters of users and items in case there is a high skewness in the ratings of a dataset. If we look at the skewness of ratings in the Yelp dataset, we can see that most of the ratings in the Yelp dataset are on the popular items. The high skewness of the ratings in the Yelp dataset and low skewness of them in the Supermarket dataset can be one of the reasons for this inconsistency. In general, rating-based recommender systems, such as Imhonet and Yelp, are more prone to be naturally skewed; while in the recommender systems based on “implicit feedback” we see more balance in the feedback on items. Also, the nature of Supermarket dataset, in which we have the whole data on the purchased items, is inherently different from the other two datasets. Because, in Yelp and Imhonet datasets, we do not have access to the “consumption” data, e.g. we do not know if a user has gone to a restaurant or not. We only have the rating information of users, if they decide to rate the item that have consumed.

Another reason can be because of the way we processed the Supermarket dataset for RMGM. As mentioned in Section 4.3, we had to convert the frequency of purchases to a categorical rating for RMGM. Although we have lost some of the precision of data because of this pre-processing, the 10-scale categorization in the Supermarket dataset provides more flexibility compared to the 5-Likert scale of the Yelp dataset.

The third likely reason, is the sparsity of the Yelp dataset, compared to the Supermarket dataset. As we have seen in Section4.4, most of the domains in the Supermarket dataset are denser compared to the Yelp dataset domains. Again, the sparsity problem often happens more in the rating-based datasets compared to the implicit feedback ones. Because, in the explicit rating feedback, the data passes through another cognitive decision of the users, e.g. to decide if they would like to rate the items or not. While, in the implicit feedback datasets, we only see the first cognitive decision of users: to consume (purchase) the item or not. We have mentioned in Section3.3.2that RMGM has a poor performance in very sparse datasets. Also, we have seen that CMF is one of the best-performing algorithms on the Yelp dataset and the worst-performing one in the Supermarket dataset. In both of the datasets, CMF has the most variance of error, and thus widest confidence intervals. We hypothesize that the reason behind CMF’s inconsistency of performance is the ratio between number of users and number of (target) items in the two datasets. As we have seen in Section 4.4, in the Yelp dataset most of the domains have a tall user-item matrix. However, the user-item matrices in the Supermarket dataset are usually fat. Since CMF is trying to find a common user factor matrix between the source and target domains, the flexibility of item factor matrices of these domains decreases. Consequently, this leads to better representation of items when there are fewer number of items to fit in the item factor matrix.

Another interesting observation is the correlation among the errors of algorithms. We can see that in all of the datasets, if there is a significant correlation between the error of two cross-domain algorithms, this correlation is positive. However, the correlation between error of SD-SVD and other algorithms varies between the datasets. In the Supermarket dataset SD-SVD’s error has a positive correlation with error of CD-SVD; and a negative one with the rest of cross-domain algorithms’ errors. In the Imhonet dataset, there is no significant correlation between error of CD-SVD and CD-CCA. In contrast, SD-SVD error’s correlation with all of the cross-domain algorithms in the Yelp dataset is positive. This hints us to the effects that the datasets can have on performance of cross-domain algorithms: cross-domain algorithms perform worse where single-domain algorithms perform better in the Supermarket dataset; but, in the Yelp dataset, this relationship is reverse.

time with the baseline algorithms. We concluded that CD-CCA is the fastest algorithm on the average-sized data. SD-SVD and CD-SVD are the next fast ones and CMF is slower than these two algorithms. Among all of the algorithms, RMGM is very slow. On the large-scale dataset, CD-SVD and SD-SVD are faster than CD-LCCA. However, the running time of CD-LCCA is reasonable given the size of the data. On the other hand, CMF and RMGM are very slow on the large-scale dataset. Thus, using these two algorithms in large datasets is not practical. Thus, although RMGM performed better than CD-CCA in terms of estimated error in one of the datasets, it may not be practical to use it in large datasets because of its time performance.

In summary, the goal of this chapter was to answer to the first part of our first research question (Q.1.1); to understand if the benefit gained from cross-domain recommenders is because of the extra data, the better algorithm, or both.

We have seen that cross-domain algorithms mostly perform better than, or similar to the single-domain algorithm. In all of the 158 + 50 + 12 = 220 domain pairs from the three datasets, in only 8 cases SD-SVD performed significantly better than all of the cross-domain algorithms. These 8 domain pairs were all part of the Supermarket purchase dataset. In the rest of the domain pairs, there were at least one cross-domain algorithm that performed significantly better than, or similar to SD-SVD.

Nevertheless, we have seen that cross-domain recommenders do not always increase the quality of recommendation results. In some cases, the cross-domain recommender algorithms did not improve the results, compared to the single-domain algorithm; they just did not have a significantly worse results compared to SD-SVD.

Eventually, we conclude that cross-domain recommender systems are feasible and can be beneficial in some of the domain pairs and datasets.

Also, we have seen that the benefit of these recommender systems, compared to the single-domain recommender, comes from both the additional data available to them and the approach they use to utilize this additional information. CD-SVD algorithm, which uses the cross-domain setup and the single-domain approach, has performed significantly better than SD-SVD in some of the domains of all of the datasets. We attribute this behavior to the extra information that CD-SVD had compared to SD-SVD. However, we have seen

that in many cases that SD-SVD performed significantly better than CD-SVD, the other cross-domain algorithms outperformed SD-SVD. In these cases, the additional information alone is not enough to produce better recommendations. But, having better approaches that efficiently use this extra information, results in less error and better recommendations.

In later chapters, we explore the conditions, which lead to better performance of cross- domain recommender systems, compared to the single-domain ones.

In document Comunicación social (página 125-157)