2012最新推荐系统综述.pdf

发布时间：2022-06-13 发布人：admin 分类：说明书资料大小：8.63M 资料格式：pdf 举报版权申诉

21821fb9-8244-4339-992e-ed802f3c9eea.pdf-第1页.png

第1页 / 共97页

21821fb9-8244-4339-992e-ed802f3c9eea.pdf-第2页.png

第2页 / 共97页

21821fb9-8244-4339-992e-ed802f3c9eea.pdf-第3页.png

第3页 / 共97页

21821fb9-8244-4339-992e-ed802f3c9eea.pdf-第4页.png

第4页 / 共97页

21821fb9-8244-4339-992e-ed802f3c9eea.pdf-第5页.png

第5页 / 共97页

21821fb9-8244-4339-992e-ed802f3c9eea.pdf-第6页.png

第6页 / 共97页

21821fb9-8244-4339-992e-ed802f3c9eea.pdf-第7页.png

第7页 / 共97页

21821fb9-8244-4339-992e-ed802f3c9eea.pdf-第8页.png

第8页 / 共97页

1 Introduction

2 Real Applications of Recommender Systems

2.1 Netflix Prize

2.2 Major challenges

3 Definitions of Subjects and Problems

3.1 Networks

3.2 Bipartite Networks and Hypergraphs

3.3 Recommender Systems

3.4 Evaluation Metrics for Recommendation

3.4.1 Accuracy Metrics

3.4.2 Rank-weighted Indexes

3.4.3 Diversity and Novelty

3.4.4 Coverage

4 Similarity-based methods

4.1 Algorithms

4.1.1 User similarity

4.1.2 Item similarity

4.1.3 Slope One predictor

4.2 How to define similarity

4.2.1 Rating-based similarity

4.2.2 Structural similarity

4.2.3 Similarity involving external information

5 Dimensionality Reduction Techniques

5.1 Singular Value Decomposition (SVD)

5.2 Bayesian Clustering

5.3 Probabilistic Latent Semantic Analysis (pLSA)

5.4 Latent Dirichlet Allocation (LDA)

6 Diffusion-based methods

6.1 Heat diffusion algorithm (HDiff)

6.2 Multilevel spreading algorithm (MultiS)

6.3 Probabilistic spreading algorithm (ProbS)

6.4 Hybrid spreading-relevant algorithms

7 Social filtering

7.1 Social Influences on Recommendations

7.2 Trust-Aware Recommender Algorithms

7.3 Adaptive Social Recommendation Models

8 Meta approaches

8.1 Tag-aware methods

8.2 Time-aware methods

8.3 Iterative refinement

8.4 Hybrid algorithms

9 Performance evaluation

10 Outlook

References

Recommender Systems Linyuan L¨ua,b,c, Mat´uˇs Medob, Chi Ho Yeungb,d, Yi-Cheng Zhanga,b,c,∗, Zi-Ke Zhanga,b,c, Tao Zhoua,b,c,e aInstitute of Information Economy, Hangzhou Normal University, Hangzhou, 310036, PR China bDepartment of Physics, University of Fribourg, Fribourg, CH-1700, Switzerland cWeb Sciences Center, University of Electronic Science and Technology of China, Chengdu, 610054, PR dThe Nonlinearity and Complexity Research Group, Aston University, Birmingham B4 7ET, United eBeijing Computational Science Research Center, Beijing, 100084, PR China Kingdom China Abstract The ongoing rapid expansion of the Internet greatly increases the necessity of eﬀective recommender systems for ﬁltering the abundant information. Extensive research for rec- ommender systems is conducted by a broad range of communities including social and computer scientists, physicists, and interdisciplinary researchers. Despite substantial theo- retical and practical achievements, uniﬁcation and comparison of diﬀerent approaches are lacking, which impedes further advances. In this article, we review recent developments in recommender systems and discuss the major challenges. We compare and evaluate available algorithms and examine their roles in the future developments. In addition to algorithms, physical aspects are described to illustrate macroscopic behavior of recommender systems. Potential impacts and future directions are discussed. We emphasize that recommendation has a great scientiﬁc depth and combines diverse research ﬁelds which makes it of interests for physicists as well as interdisciplinary researchers. Keywords: recommender systems, information ﬁltering, networks 2 1 0 2 b e F 6 ] h p - c o s . s c i s y h p [ 1 v 2 1 1 1 . 2 0 2 1 : v i X r a ∗Corresponding author Email address: yi-cheng.zhang@unifr.ch (Yi-Cheng Zhang) Preprint submitted to Physics Reports February 7, 2012

Contents 1 Introduction 2 Real Applications of Recommender Systems 2.1 Netﬂix Prize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Major challenges 3 Deﬁnitions of Subjects and Problems 3.1 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Bipartite Networks and Hypergraphs . . . . . . . . . . . . . . . . . . . . . 3.3 Recommender Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Evaluation Metrics for Recommendation . . . . . . . . . . . . . . . . . . . 3.4.1 Accuracy Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Rank-weighted Indexes . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Diversity and Novelty . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.4 Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Similarity-based methods 4.1 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 User similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Item similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 4.1.3 Slope One predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 How to deﬁne similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Rating-based similarity . . . . . . . . . . . . . . . . . . . . . . . . . Structural similarity . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 4.2.3 Similarity involving external information . . . . . . . . . . . . . . . 5 Dimensionality Reduction Techniques 5.1 Singular Value Decomposition (SVD) . . . . . . . . . . . . . . . . . . . . . 5.2 Bayesian Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Probabilistic Latent Semantic Analysis (pLSA) . . . . . . . . . . . . . . . . 5.4 Latent Dirichlet Allocation (LDA) . . . . . . . . . . . . . . . . . . . . . . . 6 Diﬀusion-based methods 6.1 Heat diﬀusion algorithm (HDiﬀ) . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Multilevel spreading algorithm (MultiS) . . . . . . . . . . . . . . . . . . . 6.3 Probabilistic spreading algorithm (ProbS) . . . . . . . . . . . . . . . . . . 6.4 Hybrid spreading-relevant algorithms . . . . . . . . . . . . . . . . . . . . . 7 Social ﬁltering 7.1 Social Inﬂuences on Recommendations . . . . . . . . . . . . . . . . . . . . 7.2 Trust-Aware Recommender Algorithms . . . . . . . . . . . . . . . . . . . . 7.3 Adaptive Social Recommendation Models . . . . . . . . . . . . . . . . . . . 2 2 3 4 6 8 8 10 13 15 16 18 19 21 21 21 23 24 24 26 26 27 31 33 33 37 39 41 44 44 45 46 49 52 52 54 55

8 Meta approaches 8.1 Tag-aware methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Time-aware methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Iterative reﬁnement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Hybrid algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Performance evaluation 10 Outlook References 58 58 59 61 63 65 69 72 1

1. Introduction Thanks to computers and computer networks, our society is undergoing rapid transfor- mation in almost all aspects. We buy online, gather information by search engines and live a signiﬁcant part of our social life on the Internet. The fact that many of our actions and interactions are nowadays stored electronically gives researchers the opportunity to study socio-economical and techno-social systems at much better level of detail. Traditional “soft sciences”, such as sociology or economics, have their fast-growing branches relying on the study of these newly available massive data sets [1, 2]. Physicists, with their long experi- ence with data-driven research, have joined this trend and contributed to many ﬁelds such as ﬁnance [3, 4], network theory [5, 6, 7, 8, 9] and social dynamics [10] which are outside their traditional realm. The study of recommender systems and information ﬁltering in general is no exception with the interest of physicists steadily increasing over the past decade. The task of recommender systems is to turn data on users and their preferences into predictions of users’ possible future likes and interests. The study of recommender systems is at crossroads of science and socio-economic life and its huge potential was ﬁrst noticed by web entrepreneurs in the forefront of the information revolution. While being originally a ﬁeld dominated by computer scientists, recommendation calls for contributions from various directions and is now a topic of interest also for mathematicians, physicists, and psychologists. For instance, it is not a coincidence that an approach based on what psychologists know about human behavior scored high in a recent recommendation contest organized by the commercial company Netﬂix [11]. When computing recommendations for a particular user, the very basic approach is to select the objects favored by other users that are similar to the target user. Even this simple approach can be realized in a multitude of ways—this is because the ﬁeld of recommendation lacks general “ﬁrst principles” from which one could deduce the right way to recommend. For example, how best to measure user similarity and assess its uncertainty? How to aggregate divergent opinions from various users? How to handle users for whom little information is available? Should all data be trusted equally or can one detect reckless or intentionally misleading opinions? These and similar issues arise also when methods more sophisticated than those based on user similarity are used. Fortunately, there exist a number of real data sets that can be used to measure and compare performance of individual methods. In consequence, similarly to physics, it is the experiment what decides which recommendation approach is good and which is not. It would be very misleading to think that recommender systems are studied only because suitable data sets are available. While the availability of data is important for empirical evaluation of recommendation methods, the main driving force comes from practice: elec- tronic systems give us too much choice to handle by ourselves. The interest from industry is hardly surprising—an early book on the nascent ﬁeld of recommendation, Net Worth by John Hagel III and Marc Singer [12], clearly pointed out the enormous economic impact of “info-mediaries” who can greatly enhance individual consumers’ information capabilities. Most e-commerce web sites now oﬀer various forms of recommendation—ranging from sim- ply showing the most popular items or suggesting other products by the same producer to 2

complicated data mining techniques. People soon realized that there is no unique best rec- ommendation method. Rather, depending on the context and density of the available data, diﬀerent methods adapting to particular applications are most likely to succeed. Hence there is no panacea, and the best one can do is to understand the underlying premises and recommender mechanisms, then one can tackle many diverse application problems from the real life examples. This is also reﬂected in this review where we do not try to highlight any ultimate approach to recommendation. Instead, we review the basic ideas, methods and tools with particular emphasis on physics-rooted approaches. The motivation for writing this review is multifold. Firstly, while extensive reviews of recommender systems by computer scientists already exists [13, 14, 15], the view of physicists is diﬀerent from that of computer scientists by using more the complex networks approach and adapting various classical physics processes (such as diﬀusion) for information ﬁltering. We thus believe that this review with its structure and emphasis on respective topics can provide a novel point of view. Secondly, the past decade has already seen a growing interest of physicists in recommender systems and we hope that this review can be a useful source for them by describing the state of the art in language which is more familiar to the physics community. Finally, the interdisciplinary approach presented here might provide new insights and solutions for open problems and challenges in the active ﬁeld of information ﬁltering. This review is organized as follows. To better motivate the problem, In Section 2 we begin with a discussion of real applications of recommender systems. Next, in Section 3 we introduce basic concepts—such as complex networks, recommender systems, and metrics for their evaluation—that form a basis for all subsequent exposition. Then we proceed to description of recommendation algorithms where traditional approaches (similarity-based methods in Section 4 and dimensionality reduction techniques in Section 5) are followed by network-based approaches which have their origin in the random walk process well known to all physicists (in Section 6). Methods based on external information, such as social relationships (in Section 7), keywords or time stamps (in Section 8), are also included. We conclude with a brief evaluation of methods’ performance in Section 9 and a discussion on the outlook of the ﬁeld in Section 10. 2. Real Applications of Recommender Systems Thanks to the ever-decreasing costs of data storage and processing, recommender sys- tems gradually spread to most areas of our lives. Sellers carefully watch our purchases to recommend us other goods and enhance their sales, social web sites analyze our contacts to help us connect with new friends and get hooked with the site, and online radio stations remember skipped songs to serve us better in the future (see more examples in Table 1). In general, whenever there is plenty of diverse products and customers are not alike, per- sonalized recommendation may help to deliver the right content to the right person. This is particularly the case for those Internet-based companies that try to make use of the so-called long-tail [16] of goods which are rarely purchased yet due to their multitude they can yield considerable proﬁts (sometimes they are referred to as “worst-sellers”). For ex- 3

ample on Amazon, between 20 to 40 percent of sales is due to products that do not belong to the shop’s 100 000 most saled products [17]. A recommender system may hence have signiﬁcant impact on a company’s revenues: for example, 60% of DVDs rented by Netﬂix are selected based on personalized recommendations.1 As discussed in [18], recommender systems not only help decide which products to oﬀer to an individual customer, they also increase cross-sell by suggesting additional prod- ucts to the customers and improve consumer loyalty because consumers tend to return to the sites that best serve their needs (see [19] for an empirical analysis of the impact of recommendations and consumer feedback on sales at Amazon.com). Since no recommendation method serves best all customers, major sites are usually equipped with several distinct recommendation techniques ranging from simple popularity- based recommendations to sophisticated techniques many of which we shall encounter in the following sections. Further, new companies emerge (see, for example, string.com) which aim at collecting all sorts of user behavior (ranging from pages visited on the web and music listened on a personal player to “liking” or purchasing items) and using it to provide personalized recommendations of diﬀerent goods or services. 2.1. Netﬂix Prize In October 2006, the online DVD rental company Netﬂix released a dataset containing approximately 100 million anonymous movie ratings and challenged researchers and prac- titioners to develop recommender systems that could beat the accuracy of the company’s recommendation system, Cinematch [20]. Atlhough the released data set represented only a small fraction of the company’s rating data, thanks to its size and quality it fast became a standard in the data mining and machine learning community. The data set contained ratings in the integer scale from 1 to 5 which were accompanied by dates. For each movie, title and year of release were provided. No information about users was given. Submitted predictions were evaluated by their root mean squared error (RMSE) on a qualifying data set containing over 2,817,131 unknown ratings. Out of 20,000 registered teams, 2,000 teams submitted at least one answer set. On 21 September 2009, the grand prize of $1,000,000 was awarded to a team that overperformed the Cinematch’s accuracy by 10%. At the time when the contest was closed, there were two teams that achieved the same precision. The prize was awarded to the team that submitted their results 20 minutes earlier than the other one. (See [11] for a popular account on how the participants struggled with the challenge.) There are several lessons that we have learned in this competition [21]. Firstly, the company gained publicity and a superior recommendation system that is supposed to improve user satisfaction. Secondly, ensemble methods showed their potential of improving accuracy of the predictions.2 Thirdly, we saw that accuracy improvements are increasingly 1As presented by Jon Sanders (Recommendation Systems Engineering, Netﬂix) during the talk “Re- search Challenges in Recommenders” at the 3rd ACM Conference on Recommender Systems (2009). 2The ensemble methods deal with the selection and organization of many individual algorithms to achieve better prediction accuracy. In fact, the winning team, called BellKor’s Pragmatic Chaos, was a 4

Site Amazon Facebook WeFollow MovieLens Nanocrowd Jinni Findory Digg Zite Meehive Netﬂix CDNOW eHarmony Chemistry True.com Perfectmatch CareerBuilder Monster Pandora Muﬁn StumbleUpon web sites What is recommended books/other products friends friends movies movies movies news news news news DVDs CDs/DVDs dates dates dates dates jobs jobs music music Table 1: Popular sites using recommender systems. Besides, there are also some companies de- voting themselves to recommendation techniques, such as Baifendian (www.baifendian.com), Baynote (www.baynote.com), ChoiceStream (www.choicestream.com), Goodrec (www.goodrec.com), and others. 5

demanding when RMSE drops below a certain level. Finally, despite the company’s eﬀort, anonymity of its users was not suﬃciently ensured [25]. As a result, Netﬂix was sued by one of its users and decided to cancel a planned second competition. 2.2. Major challenges Researchers in the ﬁeld of recommender systems face several challenges which pose danger for the use and performance of their algorithms. Here we mention only the major ones: 1. Data sparsity. Since the pool of available items is often exceedingly large (major on- line bookstores oﬀer several millions of books, for example), overlap between two users is often very small or none. Further, even when the average number of evaluations per user/item are high, they are distributed among the users/items very unevenly (usually they follow a power-law distribution [26]) and hence majority of users/items may have expressed/received only a few ratings. Hence, an eﬀective recommender algorithm must take the data sparsity into account [27]. 2. Scalability. While the data is mostly sparse, for major sites it includes millions of users and items. It is therefore essential to consider the computational cost issues and search for recommender algorithms that are either little demanding or easy to parallelize (or both). Another possible solution is based on using incremental versions of the algorithms where, as the data grows, recommendations are not recomputed globally (using the whole data) but incrementally (by slightly adjusting previous recommendations according to the newly arrived data) [28, 29]. This incremental approach is similar to perturbation techniques that are widely used in physics and mathematics [30]. 3. Cold start. When new users enter the system, there is usually insuﬃcient information to produce recommendation for them. The usual solutions of this problem are based on using hybrid recommender techniques (see Section 8.4) combining content and collaborative data [31, 32] and sometimes they are accompanied by asking for some base information (such as age, location and preferred genres) from the users. Another way is to identify individual users in diﬀerent web services. For example, Baifendian developped a technique that could track individual users’ activities in several e- commerce sites, so that for a cold-start user in site A, we could make recommendation according to her records in sites B, C, D, etc. 4. Diversity vs. accuracy. When the task is to recommend items which are likely to be appreciated by a particular user, it is usually most eﬀective to recommend popular and highly rated items. Such recommendation, however, has very little value for the users because popular objects are easy to ﬁnd (often they are even hard to avoid) without a recommender system. A good list of recommended items hence should combined team of BellKor [22], Pragmatic Theory [23] and BigChaos [24] (of course, it was not a simple combination but a sophisticated design), and each of them consists of many individual algorithms. For example, the Pragmatic Theory solution considered 453 individual algorithms. 6

分享到：

赞收藏

资料库

2012最新推荐系统综述.pdf

相关推荐

课程资源

热门标签

最新资料