Sheridan College SOURCE: Sheridan Institutional Repository

Recommender systems are software solutions for finding high-quality, relevant content for a specific type of user, ranging from online shoppers to music listeners to video game players. Traditional recommendation systems use user review data to provide recommendations, but we still want recommendations to work well for new users without review data. Currently, one of the problems that exists in recommendations is poor recommendation accuracy when there is only a small amount of data for a user, called the cold start problem.

In this research, we investigate solutions to the cold start problem in video game recommendations and propose a solution that uses a hybrid neural network and keyword ranking approach. We evaluate this system using precision and recall metrics and compare the results to a traditional recommender system.

INTRODUCTION

T ERMS AND D EFINITIONS
P ROBLEM S TATEMENT
P URPOSE OF THE S TUDY
M OTIVATION
P ROPOSED W ORK
T HESIS S TATEMENT
S IGNIFICANCE OF THE S TUDY
O RGANIZATION OF THE T HESIS
C ONCLUSION

Content-based filtering Recommender system algorithm where predictions are based on item description and user preference profile. Traditional recommender systems use data about the behavior of previous users of the system to predict content for future users. Recommender systems are popular in various online web applications and e-commerce platforms, but these systems could be improved, especially in the area of the cold start problem.

Implement a content-based filtering method in our nearest neighbor algorithm that ranks video games based on their category keywords. An early study of content-based filtering measured a prediction accuracy of 60%, this paled in comparison to the 70% accuracy for collaborative filtering-based filter recommendation systems. Our research begins with a review of the current literature surrounding recommendation systems and their popular implementation methods.

It discusses machine learning based recommendation methods and the details of the cold start problem.

LITERATURE REVIEW

R ECOMMENDATION S YSTEMS
C OLLABORATIVE F ILTERING
M ATRIX F ACTORIZATION
C ONTENT - BASED F ILTERING
C OLD -S TART P ROBLEM
H YBRID S YSTEMS
I MPLEMENTATION M ETHODS
A PPLICATIONS OF R ECOMMENDATION S YSTEMS
C ONCLUSION

Traditional recommender systems use collaborative filtering approaches [1] to suggest related content to a user, it works by using general user data to make recommendations. A collaborative filtering model is a model that predicts recommendations for future users based on data from previous users. Matrix factorization is a popular collaborative filtering algorithm that also uses user data to make predictions about future users.

This is how music recommendations are made on Spotify [35], and this technique is a major inspiration for the collaborative filtering aspect of our model. Joonseok concluded in the study that this is a useful tool to work alongside traditional collaborative filtering approaches. A solution to the cold start problem was proposed by Zi-Ke Zhang et al.

We have seen good performance from both content-based filtering and collaborative filtering recommendation system algorithms. Collaborative filtering systems are powerful because they are better at detecting highly popular content rather than just content that is similar. Where these systems are limited is with the cold start problem we mentioned earlier.

Most recommendation algorithms start by finding a set of customers whose purchased and rated items overlap with the user's purchased and rated items (eg collaborative filtering and clustering models). Instead of using these systems, Amazon developed a collaborative item-by-item filtering algorithm – one that focuses on finding similar items, not similar customers. Unlike traditional collaborative filtering which uses a very computationally heavy algorithm, Amazon's item-by-item algorithm uses online computing that scales independently of the number of customers and the number of items in the product catalog.

The neural networks use user-to-user collaborative filtering (as opposed to Amazon's item-by-item filtering algorithm). MovieLens is an item-to-item collaborative filtering RS [9] inspired by Amazon's algorithm that recommends relevant movies to users based on user behavior data.

Figure 2-1: Music Recommendations Via Vectors at Spotify

METHODOLOGY

D ATA S OURCE
D ATA P ROCESSING AND F ILTERING
P ROPOSED H YBRID M ODEL

C OLLABORATIVE F ILTERING D ETAILS
C OLLABORATIVE F ILTERING A RCHITECTURE AND T RAINING
C ONTENT - BASED F ILTERING
H YBRID C OLLABORATIVE F ILTERING AND K EYWORD S EARCH A LGORITHM

T IME C OMPLEXITY A NALYSIS
S OFTWARE U SED

Throw in video games with reviews from less than 5 people so that the video game data is also in our training and validation set. Specifically, the input of video games is an n x m matrix, where n is the number of video games in our data and m is the arbitrary size of the input, in our case we have chosen 40. We use the 1 x m matrix given for each video game and we use it as a vector that we can compare to other video game vectors.

The video game vectors are trained in our neural network, which we will discuss in the next section. Once each video game is represented as a vector, we can use the relationship between the video game vectors to make recommendations. 1 we can see a red arrow pointing to one location, and the location shows various games from the video game series "The Sims" concentrated close to each other.

This weight/embedding matrix is what we use to represent each video game as a vector. For the content-based filtering aspect of our algorithm, we will use category data for each Amazon video game. For example, in the Amazon dataset, the video game "Grand Theft Auto 3" has the following category keywords "Video games", "Retro games and micro consoles".

While content-based filtering only uses item qualities such as video game title, video game publisher, and year of release to recommend video games of similar quality. Given a user vector, the algorithm uses a nearest neighbor search to recommend nearby video game vectors. For each video game in the user's review history, view our video game embedding matrix and calculate the mean vector for all video games.

This average vector will represent the average types of video games that this user is interested in. Extract 5 most common video game category keywords from all video games in the user's review history. Rank each video game to our n nearest candidate neighbors using a combination of video game cosine vector similarity and keyword cluster similarity using the 5 most common user category keywords.

F is the cost of scanning each video game category's keywords against our nearest neighbors to find the top ranked video games.

Figure 3-2: Sample of Filtered Video Game Review Data

RESULTS AND ANALYSIS

R ESULTS
A NALYSIS
L ITERATURE C OMPARISON
T HE C OLD S TART P ROBLEM
L IMITATIONS
T RAINING R ESULTS

It is only in the case of maximum draw that we see that our model has a lower value. 4-3 we can see that almost without fail, when we recommend users with our hybrid recommendation model, the precision and recall scores improve. We can see that with this set of users, the performance of our recommendations improves when we use our hybrid system that uses keywords from video game categories.

We can see that the precision and recall improved with our hybrid model compared to the basic collaboration model, but we still need to consider whether these results are good in terms of other research studies. 4-4 we look at the accuracy values for the three different data sets that were tested and can be compared to our accuracy value of 0.015960. We can see that our accuracy value falls somewhere in the middle of these value examples.

4–5 we similarly look at the recall values for the three different data sets tested and see how it compares to our model's recall value of 0.158242. In this example, we see that our recall value is higher in all three examples of the data set. From this comparison, we can conclude that our research has the potential to be competitive with current literature metrics related to recommender systems.

A clarification is needed, the term cold start problem is sometimes used to refer to the situation with recommender systems where there is a new user who has no previous user feedback. This is not what we are referring to in our research, we are strictly researching to help with the cold start problem with users who have low average ratings. Therefore, this model can be useful to solve the cold start problem for users with low-medium previous revisions.

Our research focused on improving the cold start problem, specifically when there were new users with a "small amount" of previous reviews. The problem with investigating the cold start problem with a static data set is that it may be more difficult to validate if our recommendations were of good quality.

CONCLUSION, FUTURE IMPROVEMENTS, DISCUSSION

C ONCLUSION

F UTURE I MPROVEMENTS

D ISCUSSION

34;Recommending Video Only at Large Scale." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 34;A Meta-Learning Perspective on Cold-Start Recommendations for Items." Advances in Neural Information Processing Systems. 34;Deep Neural Networks for YouTube Recommendations.” Proceedings of the 10th ACM Conference on Recommender Systems.

34;Methods and metrics for cold-start recommendations." Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 34;Personalized News Recommendation Based on Click Behavior." Proceedings of the 15th International Conference on Intelligent User Interfaces. 34;A Hybrid Content-Based and Item-Based Collaborative Filtering Approach to Recommending TV Shows Improved with Singular Value Decomposition.

34; Addressing Cold Start in App Recommendation: Latent User Models Constructed from Twitter Followers." Proceedings of the 36th ACM SIGIR International Conference on Research and Development in Information Retrieval. 34; Content-Based Book Recommendations Using Learning for Text Categorization." Proceedings of the Fifth ACM Conference on Digital Libraries. 34; Providing Entertainment with Content-Based Filtering and Semantic Reasoning in Intelligent Recommender Systems.” IEEE Transactions on Consumer Electronics.

34;Personalizing Google News: Scalable Collaborative Filtering on the Web." Proceedings of the 16th International Conference on the World Wide Web. 34;Tag-Aware Recommender Systems from Cooperative Filtering Algorithms Fusion." Proceedings of the 2008 ACM Symposium on Applied Computing. 34; Filtering social information: algorithms for word-of-mouth automation.

34;CityVoyager: an outdoor recommendation system based on users' location history." International Conference on Ubiquitous Intelligence and Computing. 34;A place recommendation system in location-based online social networks." Proceedings of the 4th Workshop on Social Networking Systems.