Implementation of Movie Recommendation System based on Android P....pdf

发布时间：2022-05-30 发布人：admin 分类：说明书资料大小：0.44M 资料格式：pdf 举报版权申诉

weixin_38636671-12243250-4744300845392422738.pdf-第1页.png

第1页 / 共14页

weixin_38636671-12243250-4744300845392422738.pdf-第2页.png

第2页 / 共14页

weixin_38636671-12243250-4744300845392422738.pdf-第3页.png

第3页 / 共14页

weixin_38636671-12243250-4744300845392422738.pdf-第4页.png

第4页 / 共14页

weixin_38636671-12243250-4744300845392422738.pdf-第5页.png

第5页 / 共14页

weixin_38636671-12243250-4744300845392422738.pdf-第6页.png

第6页 / 共14页

weixin_38636671-12243250-4744300845392422738.pdf-第7页.png

第7页 / 共14页

weixin_38636671-12243250-4744300845392422738.pdf-第8页.png

第8页 / 共14页

文本预览

http://www.paper.edu.cn 基于安卓平台的电影推荐系统的实现王薪宇北京邮电大学网络技术研究院，北京　100876 摘要：目前市面上有很多提供电影资讯的app，也有一些擅长音乐推荐的app。本文开发了一个基于安卓平台的能够推荐电影的app。本文用到的算法有：基于用户阈值邻居的协同过滤算法，基于用户固定数量邻居的协同过滤算法，基于物品的协同过滤算法，基于内容的过滤算法。本文设计并实现了整个系统包括服务器和客户端的服务器，服务器基于Java，主要包括推荐系统实现，数据库，Servlet，控制层。本文在服务器中实现了推荐策略，设计了数据库。 Servlet响应客户端的请求。控制层是服务器的核心操控者。本文设计实现了客户端，也就是app，主要包括网络连接部分和用户界面。连接层连接服务器。用户界面用以展示推荐结果。用户能够用app对电影评分并获得相应的推荐。关键词：计算机科学技术，安卓，Java，推荐系统中图分类号： TP311.1 Implementation of Movie Recommendation System based on Android Platform WANG Xin-Yu Institute Network Technology, Beijing University of Posts and Telecommunications, Beijing 100876 Abstract: There are many apps proposed for movie news collecting or music recommendation. This paper developed an app on android platform for movie recommendation. This paper researched many recommendation algorithms. The algorithms include user collaborative ﬁltering with threshold neighbourhoods, user collaborative ﬁltering with ﬁx-size neighbourhoods, item collaborative ﬁltering and content-based ﬁltering. This paper designed a strategy to mix these algorithms to improve recommendation eﬀectivity. The system contains server and client. This paper designed and implemented the server which is based on Java and mainly includes recommendation, database, Servlet and controller. This paper designed and implemented database. The Servlet is used to respond the request from client. The controller is the core operator of server. This paper designed and implemented the client, the app, which mainly includes connection and UI. The connection is used to connect with server. The UI is important for presenting recommendation results. The user is able to use the app to rate movies and get recommendation. Key words: Computer science and technology, Android, Java, Recommendation system Author Introduction: Correspondence author：Wang Xinyu (1995-)，male，master student，major research direction：deep learning, E-mail:xinyu.wang@bupt.edu.cn - 1 -

http://www.paper.edu.cn 0 Introduction This paper developed an app for movie recommendation. There are many business apps proposed for movie news collecting such as Douban and Maoyan. These apps provide movie information introduction and ticket selling. There are also some business apps expert at rec- ommendation such as netease cloud music. This app provides good recommendation for music. However, there is no app focusing on movie recommendation. So in the paper, this paper developed an app, implementing movie recommendation based on Android platform. The app provides recommendation for movies. User rates movies and then get recommendation from the app according to the rating records history. As users use the app with more time, the recommendation results are more accurate. To data, This paper have ﬁnished all the tasks. The user is able to use the app to rate movies and get recommendation. For recommendation, this paper researched many recommendation algorithms. There are many algorithms proposed to deal with recommendation. This paper discusses these popular recommendation algorithms. The algorithms used in the paper include user collaborative ﬁl- tering with threshold neighbourhoods, user collaborative ﬁltering with ﬁx-size neighbourhoods, item collaborative ﬁltering and content-based ﬁltering. To improve recommendation eﬀectivity, This paper designed a strategy to mix these algorithms. This strategy makes full use of these algorithms instead of only using one algorithm. Compared with using only one algorithm, the results show that the strategy has 4.6% improvement for all users. The improvement for all users is insuﬃcient. However, the improvement for 7% users is 116.7%. This paper will show why to choose these algorithms and how to mix them. The testing result of the strategy. The recommendation requires a lot of calculation, but the android mobile phone does not have enough calculation capacity. Recommendation needs to collect records from diﬀerent users which needs much calculation and storage space. So the system is divided into server and client. The server is mainly used to store data and to calculate recommendation. The client mainly focuses on UI. This paper designed and implemented the server and client. The server is integrated by four parts which are recommendation, database, Servlet and controller. This paper implemented my recommendation strategy with the help of a recommen- dation engine, Apache Mahout. This paper designed the database which contains three tables. The Servlet is used to respond the request from client. This paper design a suit of transmission protocol for Servlet. The controller is the core operator of server which implemented logic. The client is an app and is based on Android. It mainly focuses on UI and also cares about connection. The connection is used to connect with server using the transmission protocol this paper designed. From the aspect of user, the UI is very important. This paper need to consider how to record the rating of user and how to present the result. To improve the UI appearance, this paper spend a lot of time improving UI. This paper designed a prototype, and reviewed it for several times. This paper try my best to improve the user experience. Although there may - 2 -

http://www.paper.edu.cn still be some small problems with it, the UI is acceptable to users now. 1 Related Work 1.1 Recommendation algorithm Content-based ﬁltering[1] provides recommendation based on the behaviour of a user. For example, this algorithm uses historical records that which blogs the user reads. If a user commonly reads blogs about Linux, content-based ﬁltering will use this record to recommend similar blog like blogs about Linux. This algorithm, relies only on content, not on the behaviour of other users. Collaborative ﬁltering [2] gives a recommendation based on the behaviour of other users. The recommendations are based on an automatic collaboration of multiple users and are ﬁltered on those who exhibit similar behaviours. Suppose the system is building a website to recommend blogs. The system can group many users based on the information from those users who subscribe to blogs. So the system can know the popular blogs in that group. Then for a user in that group, the system can recommend the popular blog that the user who has not subscribed to. As in the Venn diagram [3], there is a way to view these relationships. The similarities deﬁne how to group users and the diﬀerences deﬁne the recommendation. The user collaborative ﬁltering is to ﬁnd similar users and give recommendation according to neighbourhoods’ users. The item collaborative ﬁltering is to ﬁnd similar item. For example, if one user likes one item, the algorithm give recommendation which is similar with the item that user likes. 1.2 Server Apache Mahout is an open source project of Apache which provides the algorithm in the areas of collaborative ﬁltering, clustering and classiﬁcation. Mahout contains implementations for clustering, categorization, CF, and evolutionary programming. Mahout provides Java li- braries so that the Java developer can use Mahout to use corresponding operation. Java is a programming language that is concurrent, class-based, object-oriented. [4] In java everything is Object so than Java can be easily extended. Any machine with JRE can run Java programs. Java multithreading feature makes Java able to do many tasks simultaneously. MySQL is a relational database management system. A relational database is able to create relationships between diﬀerent individual elements which avoids data redundancy and emphasizes the relationship of diﬀerent data.[5] Java Database Connectivity (JDBC) is API for Java, which deﬁnes how to access a database. It provides the methods how to query and update data from a database, and is - 3 -

http://www.paper.edu.cn oriented towards relational databases. It is the industry standard for database-independent connectivity between the Java and a database. JDBC allows developer to use Java to exploit ”Write Once, Run Anywhere” capabilities. JavaBeans [6] are classes that encapsulate many objects into a single object. They allow access to properties using getter and setter methods. It is aiming to create reusable components for Java. Other application is able to make use of it. The properties and methods that are exposed to another application can be controlled. JavaScript Object Notation (JSON) is an open-standard format that uses text to transmit data. The transmitted data is attribute–value pair. It is a common format for asynchronous browser and server communication. It is easy to parse and generate. JSON is also completely language independent. [7] So JSON is an ideal data-interchange format for the project. Servlets is used to respond to requests. [8] Servlet can communicate with any client–server protocol, but the most protocol used is the HTTP protocol. To deploy and run a Servlet, a web container must be used. Android is an operating system based on the Linux kernel and designed primarily for touchscreen mobile devices. Android is an open source and builds upon parts of many diﬀerent open source projects. Developers can develop Android app by Android SDK. 2 Design and Implementation 2.1 System outline design From sever, the database, mahout and JDBC is encapsulated as a black box. The controller is used to coordinate diﬀerent parts. The Servlet reacts to the request from client. For client, the connection is used to send request to server and receive response from server. The controller is used to coordinate other two parts and to implement some logic functions. The UI is to present data to user. The ﬁgure 1 is a package diagram of server which shows the relationship of them. The controller coordinates other three parts. 2.2 Recommendation strategy design This paper will compare diﬀerent recommendation methods and present a reasonable rec- ommendation strategy. This algorithm is based on attributes of diﬀerent people. If two persons have the same attributes value, the two persons can be regarded as similar. The attributes include gender, age, and so on. One of the two persons can get recommendation from another person. This algorithm does not rely on history record. There is no cold start problem. However, this algorithm has low accuracy. Getting information of users is also diﬃcult since the users are - 4 -

http://www.paper.edu.cn 图 1: Package diagram of the Server sensible to their privacy. So this algorithm is not used in the strategy. Similarity calculation is a part of collaborative ﬁltering. Cosine similarity considers diﬀer- ent movies of a person as diﬀerent coordinate axis. It is a multidimensional coordinate. Every user in the coordinate, according to the rating record of the user, is presented as a vector. So the cosine of included angle of two vectors is the similarity of the two users. This algorithm is the most popular one nowadays. This is also the algorithm this paper uses. [9] T (x, y) = xy ||x||2 + ||y||2 (1) Euclidean distance considers diﬀerent movies of a person as diﬀerent coordinate axis. It is a multidimensional coordinate. Every user in the coordinate, according to the rating record of the user, is presented as a point. So the Euclidean distance between two points is the similarity of the two users. There is a small problem comparing with cosine similarity. Imaging a triangle with point A, B, and C, for cosine similarity, the similarity of A and C is equal to the similarity of the plus of A and B and B and C. For Euclidean distance, the similarity of A and C is not equal to the similarity of the plus of A and B and B and C, because cosine similarity considers A, B, C as points on a circle and Euclidean distance considers A, B, C as points on a triangle. The line AC is smaller but the arc AC is the same. So this paper do not use this algorithm. [10] T (x, y) = 1 +(xi + yi)2 1 (2) Tanimoto coeﬃcient considers diﬀerent movies of a person as diﬀerent coordinate axis. It is a multidimensional coordinate. Every user in the coordinate, according to the rating record of the user, can be presented as a vector. So the tangent of included angle of two vectors is the similarity of the two users. The diﬀerence between this algorithm and cosine similarity is that Tanimoto coeﬃcient uses tangent and cosine similarity uses cosine. The changing of rating for cosine is high at middle and low at both sides. The changing of rating for tangent is normal - 5 -

http://www.paper.edu.cn at the beginning and high at the end. Tanimoto coeﬃcient put the importance at every stage. The cosine only emphasizes the middle. T (x, y) = (xi + yi)2 − xy xy (3) This algorithm relies on the similarity between diﬀerent users. The algorithm calculates the similarity at ﬁrst. As this paper mentioned before, This paper use cosine similarity to cal- culate. Then the algorithm calculates neighbourhoods of a user A, who need recommendation, according to the similarity. There are two methods to calculate neighbourhoods. One is the ﬁx-size neighbourhoods. No matter how far other users from user A, the method chooses the nearest users as neighbourhoods with ﬁxed number. Using this method, this paper can ensure that there are neighbourhoods. Another is the threshold-based neighbourhoods. This method set a similarity as threshold. All users satisfying the threshold are regarded as neighbourhoods. Using this method, this paper can ensure a good similarity of neighbourhoods. After it calculated neighbourhoods, this paper can give user A recommendation according to the neighbourhoods of user A. Item collaborative ﬁltering relies on the similarity between diﬀerent items. The algorithm calculates the similarity at ﬁrst. As this paper mentioned before, This paper use cosine sim- ilarity to calculate. Then the algorithm calculates similarity between the movies that user A rated and the other movies. If user A like movie 1, the algorithm will recommend other movies which have high similarity with movie 1. From the aspect of user A, this algorithm only recommends the movie which is similar as the movie user A has seen. Unless user A see a type of movie, the algorithm will not recommend this type of movies. However, if user A has some niche favours, this algorithm can recommend movies according to its niche favours movies, while the user collaborative ﬁltering cannot. After analyses of diﬀerent recommendation methods, this paper designs a recommendation strategy. This strategy contains three methods and four algorithms. Also take user A as the example. The ﬁrst method uses user collaborative ﬁltering with threshold-based neighbourhoods. The user collaborative ﬁltering is accurate. The threshold-based neighbourhoods ensure that the neighbourhoods have good similarity, so the recommendation result will be good. If the ﬁrst method does not provide enough recommendation, there are two main reasons. One reason is that there are no neighbourhoods of the user A with good similarity. Another reason is that the user A has watched all the movies that neighbourhoods watched. To solve these two problems, this paper designed the second method. If these two problems come, the accurate of ﬁrst method will decrease suﬃciently. If the ﬁrst method does not provide enough recommendation, the second method is used. The second method uses user collaborative ﬁltering with ﬁx-size neighbourhoods and item - 6 -

http://www.paper.edu.cn collaborative ﬁltering. The user collaborative ﬁltering with ﬁx-size neighbourhoods is able to make sure to ﬁnd neighbourhoods. Although the similarity may be not good, it is better than no neighbourhoods. The item collaborative ﬁltering can recommend movies of niche favour of user A. It is a supplementary of user collaborative ﬁltering. These two algorithms solve the two problems of ﬁrst method. The user collaborative ﬁltering with ﬁx-size neighbourhoods mainly helps the lacking neighbourhoods problem, but also helps another problem a little. The item collaborative ﬁltering mainly helps niche favour problem, but also helps another problem a little. Because the two problems are not usual, there are only few users beneﬁt from the second method. There still is a problem that is cold start. If there is a movie and no one rates it, the movie will never be recommended. So if the second method does not provide enough recommendation, the third method is used. The third method uses content-based ﬁltering. It does not rely on the rating record. It recommends movies according to movie type. It has low accurate. However, since the system cannot recommend movie from the ﬁrst and second method, low quality recommendation is better than nothing. The most important is that the third method can solve cold start problem. 2.3 Server design and implementation 2.3.1 Recommendation Implementation For implementation, this paper use Apache Mahout, a project of the Apache. The Mahout has implemented all algorithms this paper mentioned before. So this paper focus on implement the recommendation strategy with the help of Mahout. The recommendation will get the newest data from database, then execute the recommendation strategy. The data from database can be reused in all three methods of the strategy. The similarity can be reused in the two user collaborative ﬁltering with diﬀerent neighbourhoods algorithm. These reuse enhances program eﬃciency. After calculation, the recommendation part returns back the results. The controller has no idea about the recommendation strategy. 2.3.2 Servlet design and implementation This is about how server deals with request of client. The main calculation of the whole system is based on server. This paper designs a transmission protocol and designed how the other parts should be integrated with the connection part. Since it is based on some lightweight text messages, this paper use HTTP protocol. This part is responsible for transmission and connection. It does not deal with logic operation and just acts like a message porter. Focus on transmission at ﬁrst. The server deals with the request and gives response. The server should tell client which request server is responding, since a client may send many - 7 -

http://www.paper.edu.cn requests at one time, so the server also need send the function name. The client sends account name to server. However, because there is only one account on one client running, server does not need to send account name again. Also, the server will send back the corresponding result according to the function. In total, the server should respond the function name and the result. All of the information is encapsulated in JSON package. JSON, JavaScript Object Notation, is an open-standard format that uses text to transmit data by attribute–value pair. So this paper puts all information in a JSON object, and send it. There are serval examples as follows. Login: client send function name with account name and password. The server checks it and give back whether login successfully and function name. Get one account rate: the client sends account name and function name, and server will send back function name and an array which contains all the movies IDs this account rated and corresponding rating scores. Update rating score: the client sends a movie ID, its rating score, account name and function name. The server will know client wants to update rate, then server extracts the movie ID and score to update in database. After that, the server will send back whether it is successful and function name. Delete one rating record: the client sends a movie ID, account name and function name. The server will delete the rating row corresponding this movie ID and account ID then send back whether it is successful and function name. Next, focus on how the connection part is integrated with other parts. For server, the Servlet will new threads automatically to respond every request. The job of Servlet is to receive request, to extract message from JSON package and to call the corresponding method of controller with that message. The Servlet is only responsible for transmit message and tell controller to execute corresponding function. The controller just need to ﬁnish the task the Servlet gives, and does not need to care how the Servlet works. 2.3.3 Database design and implementation The database needs to store three kinds of main information. They are account, movies, and rates of accounts to movies. The information of every account itself contains a unique name, and password. The movie information contains the name, abstract, detail, and type. The abstract is brief introduction of the movie. The detail tells more information about the movie. The type is an integer which should be regarded as a binary number. For every digit of the binary number corresponds to a speciﬁc attribute. 1 means it has this attribute, while 0 means it does not. This just needs to deﬁne the relationship between digits and attributes. In this way, it can use an integer number to present several attributes. The rating information indicates which account gives which movie how much score. According to Normal Form of relative database, this paper creates three tables which are table of accounts, table of movies and table of rates. It shows as ﬁgure 2. To integrate these three tables, it needs something to connect them together which is ID. This paper gives every - 8 -

分享到：

赞收藏

资料库

Implementation of Movie Recommendation System based on Android P....pdf

相关推荐

开发技术

热门标签

最新资料