logo资料库

用概率矩阵分解构建音乐推荐引擎(Pytorch).pdf

第1页 / 共18页
第2页 / 共18页
第3页 / 共18页
第4页 / 共18页
第5页 / 共18页
第6页 / 共18页
第7页 / 共18页
第8页 / 共18页
资料共18页,剩余部分请下载后查看
Building a Music Recommendation Engine with Probabilistic Matrix Factorization in PyTorch Recommendation systems are one of the most widespread forms of machine learning in modern society. Whether you are looking for your next show to watch on Netflix or listening to an automated music playlist on Spotify, recommender systems impact almost all aspects of the modern user experience. One of the most common ways to build a recommendation system is with matrix factorization, which finds ways to predict a user’s rating for a specific product based on previous ratings and other users’ preferences. In this article, I will use a dataset of music ratings to build a recommendation engine for music with PyTorch in an attempt to clarify the details behind the theory and implementation of a probabilistic matrix factorization model. Introduction to Recommendation Systems C a m e r o n W o l f e F o l l o w M a r 2 2 · 1 5 m i n r e a d
In this section, I will attempt to provide a quick, but comprehensive introduction to recommendation systems and matrix factorization before introducing the actual implementation and results for the music recommendation project. What is a Recommendation System? To create a recommendation system, we need a dataset that includes users, items, and ratings. Users can be customers buying products from a store, a family streaming a movie from home, or, in our case, even a person listening to music. Similarly, items are the products that a user is buying, rating, listening to, etc. Therefore, the data set for a recommendation system generally has many entries that include a user-item pair and a value representing the user’s rating for that item, such as the data set seen below: User ratings can be collected either explicitly (asking user for a 5 star ratings, asking whether a user likes a product, etc.) or implicitly (examining purchase history, mouse movements, listening history, etc.). Additionally, the values for ratings could be binary (a user buying a product), discrete (rating a product 1–5 stars), or almost anything else. In fact, the dataset used for this project represents ratings as the total number of minutes a user has spent listening to an F i g u r e 1 : S i m p l e D a t a S e t f o r R e c o m m e n d a t i o n S y s t e m s
artist (this is an implicit rating!). Each of the entries in the data set represents a user-item pairing with a known rating, and, from these known user-item pairings, a recommendation system tries to predict user-item ratings that are not yet known. Recommendations may be created by finding items that are similar to those a user likes (item- based), finding similar users and recommending the stuff they like (user-based), or finding a relationship between user-item interactions as a whole. In this way, the model recommends different items that it believes the user might enjoy that the user has not yet seen. It’s pretty easy to understand the purpose and value of a good recommender system, but here is a nice figure that I believe summarizes recommendation systems nicely: What is Matrix Factorization? Matrix factorization is a common and effective way to implement a recommendation system. Using the previously-mentioned user-item dataset (Figure 1), matrix factorization attempts to learn ways to quantitatively characterize both users and items in a lower- dimensional space (as opposed to looking at every item the user has ever rated), such that these characterizations can be used to predict user behavior and preferences for a known set of possible items. In recent years, matrix factorization has become increasingly popular due to its accuracy and scalability. F i g u r e 2 : R e c o m m e n d e r s y s t e m s a i m t o fi n d i t e m s t h a t a u s e r e n j o y s , b u t h a s n o t y e t s e e n   [ 1 ]
Using the data set of user-item pairings, one can create a matrix such that all rows represent different users, all columns represent different items, and each entry at location (i, j) in the matrix represents user i’s rating for item j. This matrix is illustrated in the following figure: In the above figure, each row contains a single user’s ratings for every item in the set of possible items. However, some of these ratings might be empty (represented by “???”). These empty spots represent user-item pairings that are not yet known, or that the recommendation system is trying to predict. Moreover, this partially- filled matrix, referred to as a sparse matrix, can be thought of as the product of two matrices. For example, the above 4x4 matrix could be created by multiplying two 4x4 matrices with each other, a 4x10 matrix with a 10x4 matrix, a 4x2 matrix with a 2x4 matrix, etc. This process of decomposing the original matrix into a product of two matrices is called matrix factorization. Matrix Factorization for Recommendation Systems So, we know that we can decompose a matrix by finding two matrices that can be multiplied together to form our original matrix. But, how does matrix factorization actually work in a recommendation system? F i g u r e 3 : A U s e r ‑ I t e m R e c o m m e n d a t i o n M a t r i x
When these two separate matrices are created, they each carry separate information about users, items, and the relationships between them. Namely, one of the matrices will store information that characterizes users, while the other will store information about the items. In fact, every row of the user (left) matrix is a vector of size k that quantitatively describes a single user, while every column of the item (right) matrix is a vector of size k that characterizes a single item. The size of these vectors, k, is called the latent dimensionality (or embedding size), and is a hyperparameter that must be tuned in the matrix factorization model — a larger latent dimentionality will allow for the model to capture more complex relationships and store more information, but can also lead to overfitting. This idea may seem weird at first, but if you examine closely how the matrices are multiplied to create the user-item matrix, it actually makes a lot of sense — the rating of user i for item j is obtained by finding the inner product of user i’s vector from the left matrix and item j’s vector from the right matrix. Therefore, if these vector embeddings for each user and item are trained to store useful, characteristic information, the inner product can accurately predict a user-item rating based on an item’s characteristics and a user’s preferences for such characteristics, which are all stored in the values of the embedding vector. This concept is illustrated in the following figure: In the above figure, note how each entry of the product matrix is derived. For example, entry (0, 0) of the user-item matrix, representing user 0’s rating for item 0, is obtained by taking the inner product of the 0th row on the left matrix (the embedding vector for user 0) and the 0th column on the right matrix (the embedding F i g u r e 4 : M a t r i x F a c t o r i z a t i o n w i t h U s e r ( l e f t ) a n d I t e m ( r i g h t )   M a t r i c e s
vector for item 0) — this pattern continues for every rating in the user- item matrix. Therefore, as previously described, every rating is predicted by taking the inner product of the corresponding user and item embedding vectors, which is the core idea behind how matrix factorization works for recommendation systems. Using common training methods, like stochastic gradient descent or alternating least squares, these two matrices, and the embedding vectors within them, can be trained to yield a product that is very similar to the original user-item matrix, thus creating an accurate recommendation model. But wait… didn’t you say Probabilistic Matrix Factorization Yes, I did! Don’t worry, probabilistic matrix factorization is very similar to what I have been explaining so far. The key difference between normal matrix factorization and probabilistic matrix factorization is the way in which the resulting matrix is created. For normal matrix factorization, all unknown values (the ratings we are trying to predict) are set to some constant (usually 0) and the decomposed matrices are trained to reproduce the entire matrix, including the unknown values. It is not very hard to realize why this might become an issue, as the matrix factorization model is predicting values that we do not actually know and that could be completely wrong. Additionally, how can we predict unknown user- item ratings when we are already training our model to predict a random user-defined constant for all of them? To solve this problem, we have probabilistic matrix factorization. Instead of trying to reproduce the entire resulting matrix, probabilistic matrix factorization only attempts to reproduce ratings that are in the training set, or the set of known ratings. Therefore, the model is trained using only known ratings, and unknown ratings can be predicted by taking the inner product of the user and item embedding vectors. . . . Probabilistic Matrix Factorization in PyTorch
Now that you understand the basics behind recommender systems and probabilistic matrix factorization, I am going to outline how a model for such a recommender system can be implemented using PyTorch. If you are unfamiliar with PyTorch, it is a robust python framework often used for deep learning and scientific computing. I encourage anyone who is curious in learning more about PyTorch to check it out, as I heavily use the framework in my research and personal projects. What are we trying to implement? As seen in Figure 4, our model for probabilistic matrix factorization is just two matrices — how simple! One of the matrices will represent the embeddings for each user, while the other matrix will contain embeddings for each item. Each embedding is a vector of k values (k is the latent dimensionality) that describe a user or item. Using these two matrices, a rating prediction for a user-item pairing can be obtained by taking the inner product of the user’s embedding vector and the item’s embedding vector, yielding a single value that represents the predicted rating. Such a process can be observed in the following figure: F i g u r e 5 : P r e d i c t i n g a n u n k n o w n u s e r ‑ i t e m r a t i n g
As can be seen above, a single rating can be determined by finding the corresponding user and item vectors from the user (left) and item (right) matrices and computing their inner product. In this case, the result shows that the chosen user is a big fan of ice cream — what a surprise! However, it should be noted that each of these matrices are randomly initialized. Therefore, in order for these predictions and embeddings to accurately characterize both users and items, they must be trained using a set of known ratings so that the accuracy of predictions generalizes to ratings that are not yet known. Such a model can be implemented with relative ease using the Embedding class in PyTorch, which creates a 2-dimensional embedding matrix. Using two of these embeddings, the probabilistic matrix factorization model can be created in a PyTorch module as follows: The matrix factorization model contains two embedding matrices, which are initialized inside of the model’s constructor and later trained to accurately predict unknown user-item ratings. In the forward function (used to predict a rating), the model is passed a mini-batch of index values, which represent the identification numbers (IDs) of different users and items. Using these indices, passed in the “cats” parameter (an Nx2 matrix) as user-item index pairs, the predicted ratings are obtained by indexing the user and item embedding matrices and taking the inner product of the corresponding vectors. Adding Bias The model from Figure 6 is lacking an important feature that is needed for the creation of a powerful recommendation system — the F i g u r e 6 : S i m p l e m a t r i x f a c t o r i z a t i o n i m p l e m e n t a t i o n
分享到:
收藏