logo资料库

基于社交网络的推荐系统.pdf

第1页 / 共31页
第2页 / 共31页
第3页 / 共31页
第4页 / 共31页
第5页 / 共31页
第6页 / 共31页
第7页 / 共31页
第8页 / 共31页
资料共31页,剩余部分请下载后查看
Computer Science Department
University of California, Los Angeles, CA 90095
1 Introduction
2 Background
3 A Social Network-Based Recommender System
3.1 Immediate Friend Inference
3.1.1 User Preference
3.1.2 Item Acceptance
3.1.3 Influence from Immediate Friends
Table 3.1: An example of predicting user rating from an immediate friend
3.2 Distant Friend Inference
Table 3.2: Pseudo-code for distant friend inference
4 Dataset
4.1 Review Correlations of Immediate Friends
4.2 Rating Correlations of Immediate Friends
5 Experiments
5.1 Comparison Methods
5.2 Prediction Accuracy And Coverage
5.3 Data Sparsity
5.4 Cold-Start
5.5 Role of Distant Friends
6 Semantic Filtering of Social Networks
7 Related Work
8 Conclusions
References
A Social Network-Based Recommender System (SNRS) Jianming He and Wesley W. Chu Computer Science Department University of California, Los Angeles, CA 90095 jmhek@cs.ucla.edu, wwc@cs.ucla.edu Abstract. Social influence plays an important role in product marketing. However, it has rarely been considered in traditional recommender systems. In this paper we present a new paradigm of recommender systems which can utilize information in social networks, in- cluding user preferences, item's general acceptance, and influence from social friends. A probabilistic model is developed to make personalized recommendations from such infor- mation. We extract data from a real online social network, and our analysis of this large da- taset reveals that friends have a tendency to select the same items and give similar ratings. Experimental results on this dataset show that our proposed system not only improves the prediction accuracy of recommender systems but also remedies the data sparsity and cold- start issues inherent in collaborative filtering. Furthermore, we propose to improve the per- formance of our system by applying semantic filtering of social networks, and validate its improvement via a class project experiment. In this experiment we demonstrate how rele- vant friends can be selected for inference based on the semantics of friend relationships and finer-grained user ratings. Such technologies can be deployed by most content providers. 1 Introduction In order to overcome information overload, recommender systems have become a key tool for providing users with personalized recommendations on items such as movies, music, books, news, and web pages. Intrigued by many practical applica- tions, researchers have developed algorithms and systems over the last decade. Some of them have been commercialized by online venders such as Amazon.com, Netflix.com, and IMDb.com. These systems predict user preferences (often represented as numeric ratings) for new items based on the user's past ratings on other items. There are typically two types of algorithms for recommender systems -- content-based methods and collaborative filtering. Content-based methods measure the similarity of the recommended item (target item) to the ones that a target user (i.e., user who receives recommendations) likes or dislikes [25, 22, 30] based on item attributes. On the other hand, collaborative filtering finds users with tastes that are similar to the target user’s based on their past ratings. Collaborative filtering will then make recommendations to the target user based on the opinions of those similar users [3, 5, 27]. Despite all of these efforts, recommender systems still face many challenging problems. First, there are demands for further improvements on the prediction ac-
2 curacy of recommender systems. In October 2006, Netflix announced an open competition with the grand prize of $1,000,000 for the best algorithm that predicts user ratings for films (http://www.netflixprize.com). The improvement in the pre- diction accuracy can increase user satisfaction, which in turn leads to higher prof- its for those e-commerce websites. Second, algorithms for recommender systems suffer from many issues. For example, in order to measure item similarity, con- tent-based methods rely on explicit item descriptions. However, such descriptions may be difficult to obtain for items like ideas or opinions. Collaborative filtering has the data sparsity problem and the cold-start problem [1]. In contrast to the huge number of items in recommender systems, each user normally only rates a few. Therefore, the user/item rating matrix is typically very sparse. It is difficult for recommender systems to accurately measure user similarities from those li- mited number of reviews. A related problem is the cold-start problem. Even for a system that is not particularly sparse, when a user initially joins, the system has none or perhaps only a few reviews from this user. Therefore, the system cannot accurately interpret this user's preference. To tackle those problems, two approaches have been proposed [3, 29, 21, 23]. The first approach is to condense the user/item rating matrix through dimensional- ity reduction techniques such as Singular Value Decomposition (SVD) [3, 29]. By clustering users or items according to their latent structure, unrepresentative users or items can be discarded, and thus the user/item matrix becomes denser. Howev- er, these methods do not significantly improve the performance of recommender systems, and sometimes make the performance even worse. The second approach is to "enrich" the user/item rating matrix by 1) introduc- ing default ratings or implicit user ratings, e.g., the time spent on reading articles [23]; 2) using half-baked rating predictions from content-based methods [21]; or 3) exploiting transitive associations among users through their past transactions and feedback [12]. These methods improve the performance of recommender sys- tems to some extent. In this paper we try to solve these problems from a different perspective. In particular, we propose a new paradigm of recommender systems by utilizing information in social networks, especially that of social influence. Traditional recommender systems do not take into consideration explicit so- cial relations among users, yet the importance of social influence in product mar- keting has long been recognized [32, 35]. Intuitively, when we want to buy a product that is not familiar, we often consult with our friends who have already had experience with the product, since they are those that we can reach for imme- diate advice. When friends recommend a product to us, we also tend to accept the recommendation because their inputs are trustworthy. Many marketing strategies that have leveraged this aspect of human nature have achieved great success. One classic example is the Hotmail's free email service. The marketing strategy of Hotmail is to attach a promotion message at the bottom of every outgoing email: “Get your private, free email at http://www.hotmail.com.” People who receive the email will sign up and then further propagate this promotion message. As a result, the number of Hotmail user accounts grew from zero to 12 million in 18 months on only a $500,000 advertising budget—thereby out-performing many conven-
3 tional marketing strategies [14]. Thus, social influences play a key role when people are making decisions of adopting products. Additionally, the integration of social networks can theoretically improve the performance of current recommender systems. First, in terms of the prediction ac- curacy, the additional information about users and their friends obtained from so- cial networks improves the understanding of user behaviors and ratings. There- fore, we can model and interpret user preferences more precisely, and thus improve the prediction accuracy. Second, with friend information in social net- works, it is no longer necessary to find similar users by measuring their rating si- milarity, because the fact that two people are friends already indicates that they have things in common. Thus, the data sparsity problem can be alleviated. Finally, for the cold-start issue, even if a user has no past reviews, recommender system still can make recommendations to the user based on the preferences of his/her friends if it integrates with social networks. All of these intuitions and observa- tions motivate us to design a new paradigm of recommender systems that can take advantage of information in social networks. The recent emergence of online social networks (OSNs) gives us an opportu- nity to investigate the role of social influence in recommender systems. With the increasing popularity of Web 2.0, many OSNs, such as Myspace.com, Face- book.com, and Linkedin.com have emerged. Members in those networks have their own personalized space where they not only publish their biographies, hob- bies, interests, blogs, etc., but also list their friends. Friends or visitors can visit these personal spaces and leave comments. Note that in this paper we define friends as any two users who are connected by an explicit social link. We define immediate friends as those friends who are just one hop away from each other in a social network graph, and distant friends as friends who are multiple hops away. OSNs provide platforms where people can place themselves on exhibit and main- tain connections with friends. As OSNs continue to gain more popularity, the un- precedented amount of personal information and social relations improve social science research where it was once limited by a lack of data. In our research, we are interested in the role of explicit social relations in re- commender systems, such as how user preferences or ratings are correlated with those of friends, and how to use such correlations to design a better recommender system. In particular, we design an algorithm framework which makes recom- mendations based on user's own preferences, the general acceptance of the target item, and the opinions from social friends. We crawl a real online social network from Yelp.com, and perform extensive analysis on this dataset. Some of the key questions, such as whether or not friends tend to select the same item, and whether or not friends tend to give similar ratings, have been studied in this dataset. We al- so use this dataset to evaluate the performance of our proposed system on the pre- diction accuracy, data sparsity, and cold-start. The experimental results of our sys- tem show significant improvement against traditional collaborative filtering in all of those aspects. For example, the prediction accuracy has improved by 17.8% compared to traditional collaborative filtering. Furthermore, we propose to use the semantics of friend relationships and finer-grained user ratings to improve the prediction accuracy.
4 The remainder of the paper is organized as follows. First, in Section 2 we give a background of traditional collaborative filtering algorithms. Then we formally propose a social network-based recommender system in Section 3. In Section 4 we introduce the dataset that we crawled from Yelp, and present some analytical stu- dies on this dataset. Following that, we evaluate the performance of the proposed system on the Yelp dataset in Section 5. In Section 6 we propose to further im- prove the prediction accuracy of the system by applying semantic filtering of so- cial networks, and validate its improvement via a class experiment. In Section 7 we review related studies, and conclude in Section 8. 2 Background After the pioneering work in the Grouplens project in 1994 [27], collaborative fil- tering (CF) soon became one of the most popular algorithms in recommender sys- tems. Many variations of this algorithm have also been proposed [2, 21, 11, 36, 13]. In this paper we will use the traditional CF as one of the comparison methods. Therefore, the remainder of this section will focus on this algorithm. The assumption of CF is that people who agree in the past tend to agree again in the future. Therefore, CF first finds users with taste similar to the target user's. CF will then make recommendations to the target user by predicting the target us- er's rating to the target item based on the ratings of his/her top-K similar users. User ratings are often represented by discrete values within a certain range, e.g., one to five. A one indicates an extreme dislike to the target item, while a five shows high praise. Let RUI be the rating of the target user U on the target item I. Thus, RUI is estimated as the weighted sum of the votes of similar users as follows. R UI = R U + Z × ( R VI − R V ) , (1) VUw ) ( , ∑ V Ψ∈ where UR and VR represent the average ratings of the target user U and every us- er V in U's neighborhood, Ψ, which consists of the top-K similar users of U. w(U, is a normalizing V) is the weight between users U and V, and Z = ∑V constant to normalize total weight to one. Specifically, w(U, V) can be defined us- ing the Pearson correlation coefficient [27]. 1 VUw ( ) , (2) VUw ( , ) = ( ∑ I R ( UI I ∑ R UI − R U 2 − R U ) R )( VI ∑ ( I − R V ) R VI 2 − R V )
5 where the summations over I are over the common items for which both user U and V have voted. Other variations to this algorithm include different weighting techniques. For example, when two users have less than 50 co-rated items, [11] proposed to insert a significance weighting factor of n/50 to the original weight, where n is the num- ber of co-rated items. As we can see, traditional collaborative filtering and its var- iations do not utilize the semantic friend relations among users in recommender systems; however, this is essential to the buying decisions of users. In the follow- ing sections, we are going to present a new paradigm of recommender systems which improves the performance of traditional recommender systems by using the information in social networks. 3 A Social Network-Based Recommender System Before we introduce the system, let us first show a typical scenario. Angela wants to watch a movie on a weekend. Her favorite movies are dramas. From the Inter- net, she finds two movies particularly interesting, "Revolutionary Road" and "The Curious Case of Benjamin Button." These two movies are all highly rated in the message board at Yahoo Movies. Because she cannot decide which movie to watch, she calls her best friend Linda whom she often hangs out with. Linda has not viewed these two movies either, but she knew that one of her officemates had just watched "Revolutionary Road" and highly recommended it. So Linda sug- gests "Why don't we go to watch Revolutionary Road together?" Angela is cer- tainly willing to take Linda’s recommendation, and therefore has a fun night at the movies with her friend. If we review this scenario, we can see at least three factors that really contribute to the Angela's final decision. The first factor is Angela's own preference for drama movies. If Angela did not like drama movies, she would be less likely to pick something like "Revolutionary Road" to begin with. The second factor is the public reviews on these two movies. If these movies received horrible reviews, Angela would most likely lose interest and stop any further in- vestigation. Finally, it is the recommendation from Angela's friend, Linda, that makes Angela finally choose "Revolutionary Road." Interestingly, Linda's opinion is also influenced by her officemate. If we recall the decisions that we make in our daily life, such as finding restaurants, buying a house, and looking for jobs, many of them are actually influenced by these three factors. Figure 3.1 further illustrates how these three factors impact customers' final buying decisions. Intuitively, a customer's buying decision or rating is decided by both his/her own preference for similar items and his/her knowledge about the characteristics of the target item. A user's preference, such as Angela’s interest in drama movies, is usually reflected from the user’s past ratings to other similar items, e.g. the number of drama movies that Angela previously viewed and the average rating that Angela gave to those movies. Knowledge about the target item can be obtained from public media such as magazines, television, and the Internet. Meanwhile, the feedbacks from friends are another source of knowledge regarding
6 the item, and they are often more trustworthy than advertisements. When a user starts considering the feedbacks from his/her friends, he/she is then influenced by his/her friends. Note that this influence is not limited to that from our immediate friends. Distant friends can also cast their influence indirectly to us; e.g., Angela was influenced by Linda's officemate in the previous scenario. Each one of these three factors has an impact on a user’s final buying decision. If the impact from all of them is positive, it is very likely that the target user will select the item. On the contrary, if any has a negative influence, e.g., very low ratings in other user re- views, the chance that the target user will select the item will decrease. With such an understanding in mind, we are going to propose a social network-based re- commender system (SNRS) in the following subsections. As we mentioned, social influences can come from not only immediate friends but also distant friends. The techniques for handling these types of influences are different. We shall begin with the immediate friend inference, in which we only consider influences from immediate friends. Then, in the distant friend inference, we will describe how we incorporate influences from distant friends via leveraging the immediate friend in- ference. Figure 3.1: The three factors that influence a customer’s buying decision: user preference for similar items, information regarding the target item from the public media, and feed- backs from friends. 3.1 Immediate Friend Inference We introduce the following naming conventions for the variables used in this pa- per. We use capitalize letters to represent variables, and use capitalize and bold
7 letters to represent the corresponding sets. The value for each variable or variable set is represented by the corresponding lowercase letter. Formally, let us consider a social network as a graph G = (U, E) in which U represents nodes (users) and E represents links (social relations). Each user U in U has a set of attributes AU as well as immediate neighbors (friends) N(U) such that if V ∈ N(U), (U, V) ∈ E. The values of attributes AU are represented as aU. More- over, a recommender system contains the records of users’ ratings, which can be represented by a triple relation of T = (U, I, R) in which U is the users in the so- cial network G; I is the set of items (products or services), and each item I in I has a set of attributes A'I. R stands for the ratings such that each RUI in R is user U’s rating on item I. RUI has a numeric value k (e.g. k∈{1, 2,… 5}). Moreover, we de- fine I(U) as the set of items that user U has reviewed, and refer to the set of re- viewers of item I as U(I). The goal of this recommender system is to predict Pr(RUI = k | A’=a'I, A=aU, {RVI = rVI : ∀V∈U(I)∩N(U)}); i.e., the probability dis- tribution of the target user U's rating on the target item I given the attribute values of item I, the attribute values of user U, and the ratings on item I rated by U's im- mediate friends. Once we obtain this distribution, RUI is calculated as the expecta- tion of the distribution. Items with high estimated ratings will be recommended to the target user, and users with high estimated ratings on the target item are the po- tential buyers. In order to estimate Pr(RUI = k | A’=a'I, A=aU, {RVI = rVI : ∀V∈U(I)∩N(U)}), we adopt the naive Bayes assumption which assumes that the influences from item attribute values, user attribute values, and immediate friends' ratings are indepen- dent. Although this assumption simplifies the correlations among these variables, the naive Bayes model has been shown to be quite effective in many applications including textual document classification [16]. By making this assumption, the original conditional probability can be factorized as follows, Pr( R UI = aAaA ' U = = , ' I {, R VI = r VI : ∈∀ V U I )( ∩ N U ( )}) (3) | k 1 Z Pr( R UI = = Pr( = k | aA ' = )' I × Pr( R UI Rk {| VI = r VI : ∈∀ V = R UI U I )( × k | ∩ ) = aA U )}) U ( N First, Pr(RU = k | A’= a'I,) is the conditional probability that the target user U will give a rating k to an item with the same attribute values as item I. This proba- bility represents U's preference for items similar to I. Because this value depends on the attribute values of items rather than an individual item, we drop the sub- script I in RUI for simplification. Second, Pr(RI = k | A = aU) is the probability that the target item I will receive a rating value k from a reviewer whose attribute val- ues are the same as U. This probability reflects the general acceptance of the target item I by users like U. For the same reason, because this value depends on the attribute values of users rather than a specific user, we drop the subscript U in RUI. Finally, Pr(RUI = k | {RVI = rVI : ∀V∈ U(I)∩N(U)}) is the probability that the tar- get user U gives a rating value k to the target item I given the ratings of U's imme- diate friends on item I. This is where we actually take social influences into con-
8 sideration in our system. In addition, Z is a normalizing constant. We shall present the methods to estimate each of the factors in the following subsections. 3.1.1 User Preference As we pointed out, Pr(RU = k | A' = a'I) measures the target user U's preference for the items similar to item I. For example, if we want to know how high Angela will rate "Revolutionary Road," Pr(RU = k | A' = a'I) gives us a hint of how likely it is that Angela will give a rating k to a drama movie which is also casted by Kate Winslet. To estimate this probability, we adopt the naive Bayes assumption again. We assume that the item attributes in A', e.g., category and cast, are independent of each other. Therefore, we have (4) (5) (6) Pr( R U = k | aA ' = )' I = Pr( R U = k Pr( R U = nj ∏ = k ) = × j 1 = AA ' Pr( ,' 1 2 Pr( ,..., RA | ' j U A )' n AA Pr( ' ,' ) × 1 2 AA Pr( ,' ' ,..., 1 k ) = 2 = k ) n | RA ,..., ' U A )' n , A ,'{' = 1 AA ' ,..., A }' n 2 where Pr(A'1, A'2, ..., A'n) can be treated as a normalizing constant, Pr(RU = k) is the prior probability that U gives a rating k, and Pr(A'j | RU = k) is the conditional probability that each item attribute A'j in A' has a value a'j given U rated k; e.g., Pr(movie type = drama | RU = 4) is the probability that a movie will be a type of drama movie, given that U gives a rating 4. The last two probabilities can be esti- mated from counting the review ratings of the target user U. Specifically, Pr( R U = k ) = I ( k 1) + n + , and R = U I U ) ( Pr( A ' j = Ra ' U | j = k ) = I ( A ' j I Ra ' = U R k ( ) U , j = = + k m 1) + , where |I(U)| is the number of reviews of user U's in the training set, |I(RU = k)| is the number of reviews that user U gives a rating value k, and |I (A'j = a'j, RU = k)| is the number of reviews that U gives a rating value k while attribute A'j of the cor- responding target item has a value a'j. Notice that we insert an extra value 1 to the numerators in both equations, and add n, the range of review ratings to the deno- minator in Eq. (5), and m, the range of A'j's values, to the denominator in Eq. (6). This method is also known as Laplace estimate, a well-known technique in esti- mating probabilities [7], especially on a small size of training samples. Because of Laplace estimate, "strong" probabilities, like 0 or 1, from direct probability com- putation can be avoided.
分享到:
收藏