logo资料库

Implementation of Movie Recommendation System based on Android P....pdf

第1页 / 共14页
第2页 / 共14页
第3页 / 共14页
第4页 / 共14页
第5页 / 共14页
第6页 / 共14页
第7页 / 共14页
第8页 / 共14页
资料共14页,剩余部分请下载后查看
http://www.paper.edu.cn 基于安卓平台的电影推荐系统的实现 王薪宇 北京邮电大学网络技术研究院,北京 100876 摘要:目前市面上有很多提供电影资讯的app,也有一些擅长音乐推荐的app。本文开发了一 个基于安卓平台的能够推荐电影的app。本文用到的算法有:基于用户阈值邻居的协同过滤算 法,基于用户固定数量邻居的协同过滤算法,基于物品的协同过滤算法,基于内容的过滤算 法。本文设计并实现了整个系统包括服务器和客户端的服务器,服务器基于Java,主要包括推 荐系统实现,数据库,Servlet,控制层。本文在服务器中实现了推荐策略,设计了数据库。 Servlet响应客户端的请求。控制层是服务器的核心操控者。本文设计实现了客户端,也就 是app,主要包括网络连接部分和用户界面。连接层连接服务器。用户界面用以展示推荐结果。 用户能够用app对电影评分并获得相应的推荐。 关键词:计算机科学技术,安卓,Java,推荐系统 中图分类号: TP311.1 Implementation of Movie Recommendation System based on Android Platform WANG Xin-Yu Institute Network Technology, Beijing University of Posts and Telecommunications, Beijing 100876 Abstract: There are many apps proposed for movie news collecting or music recommendation. This paper developed an app on android platform for movie recommendation. This paper researched many recommendation algorithms. The algorithms include user collaborative filtering with threshold neighbourhoods, user collaborative filtering with fix-size neighbourhoods, item collaborative filtering and content-based filtering. This paper designed a strategy to mix these algorithms to improve recommendation effectivity. The system contains server and client. This paper designed and implemented the server which is based on Java and mainly includes recommendation, database, Servlet and controller. This paper designed and implemented database. The Servlet is used to respond the request from client. The controller is the core operator of server. This paper designed and implemented the client, the app, which mainly includes connection and UI. The connection is used to connect with server. The UI is important for presenting recommendation results. The user is able to use the app to rate movies and get recommendation. Key words: Computer science and technology, Android, Java, Recommendation system Author Introduction: Correspondence author:Wang Xinyu (1995-),male,master student,major research direction:deep learning, E-mail:xinyu.wang@bupt.edu.cn - 1 -
http://www.paper.edu.cn 0 Introduction This paper developed an app for movie recommendation. There are many business apps proposed for movie news collecting such as Douban and Maoyan. These apps provide movie information introduction and ticket selling. There are also some business apps expert at rec- ommendation such as netease cloud music. This app provides good recommendation for music. However, there is no app focusing on movie recommendation. So in the paper, this paper developed an app, implementing movie recommendation based on Android platform. The app provides recommendation for movies. User rates movies and then get recommendation from the app according to the rating records history. As users use the app with more time, the recommendation results are more accurate. To data, This paper have finished all the tasks. The user is able to use the app to rate movies and get recommendation. For recommendation, this paper researched many recommendation algorithms. There are many algorithms proposed to deal with recommendation. This paper discusses these popular recommendation algorithms. The algorithms used in the paper include user collaborative fil- tering with threshold neighbourhoods, user collaborative filtering with fix-size neighbourhoods, item collaborative filtering and content-based filtering. To improve recommendation effectivity, This paper designed a strategy to mix these algorithms. This strategy makes full use of these algorithms instead of only using one algorithm. Compared with using only one algorithm, the results show that the strategy has 4.6% improvement for all users. The improvement for all users is insufficient. However, the improvement for 7% users is 116.7%. This paper will show why to choose these algorithms and how to mix them. The testing result of the strategy. The recommendation requires a lot of calculation, but the android mobile phone does not have enough calculation capacity. Recommendation needs to collect records from different users which needs much calculation and storage space. So the system is divided into server and client. The server is mainly used to store data and to calculate recommendation. The client mainly focuses on UI. This paper designed and implemented the server and client. The server is integrated by four parts which are recommendation, database, Servlet and controller. This paper implemented my recommendation strategy with the help of a recommen- dation engine, Apache Mahout. This paper designed the database which contains three tables. The Servlet is used to respond the request from client. This paper design a suit of transmission protocol for Servlet. The controller is the core operator of server which implemented logic. The client is an app and is based on Android. It mainly focuses on UI and also cares about connection. The connection is used to connect with server using the transmission protocol this paper designed. From the aspect of user, the UI is very important. This paper need to consider how to record the rating of user and how to present the result. To improve the UI appearance, this paper spend a lot of time improving UI. This paper designed a prototype, and reviewed it for several times. This paper try my best to improve the user experience. Although there may - 2 -
http://www.paper.edu.cn still be some small problems with it, the UI is acceptable to users now. 1 Related Work 1.1 Recommendation algorithm Content-based filtering[1] provides recommendation based on the behaviour of a user. For example, this algorithm uses historical records that which blogs the user reads. If a user commonly reads blogs about Linux, content-based filtering will use this record to recommend similar blog like blogs about Linux. This algorithm, relies only on content, not on the behaviour of other users. Collaborative filtering [2] gives a recommendation based on the behaviour of other users. The recommendations are based on an automatic collaboration of multiple users and are filtered on those who exhibit similar behaviours. Suppose the system is building a website to recommend blogs. The system can group many users based on the information from those users who subscribe to blogs. So the system can know the popular blogs in that group. Then for a user in that group, the system can recommend the popular blog that the user who has not subscribed to. As in the Venn diagram [3], there is a way to view these relationships. The similarities define how to group users and the differences define the recommendation. The user collaborative filtering is to find similar users and give recommendation according to neighbourhoods’ users. The item collaborative filtering is to find similar item. For example, if one user likes one item, the algorithm give recommendation which is similar with the item that user likes. 1.2 Server Apache Mahout is an open source project of Apache which provides the algorithm in the areas of collaborative filtering, clustering and classification. Mahout contains implementations for clustering, categorization, CF, and evolutionary programming. Mahout provides Java li- braries so that the Java developer can use Mahout to use corresponding operation. Java is a programming language that is concurrent, class-based, object-oriented. [4] In java everything is Object so than Java can be easily extended. Any machine with JRE can run Java programs. Java multithreading feature makes Java able to do many tasks simultaneously. MySQL is a relational database management system. A relational database is able to create relationships between different individual elements which avoids data redundancy and emphasizes the relationship of different data.[5] Java Database Connectivity (JDBC) is API for Java, which defines how to access a database. It provides the methods how to query and update data from a database, and is - 3 -
http://www.paper.edu.cn oriented towards relational databases. It is the industry standard for database-independent connectivity between the Java and a database. JDBC allows developer to use Java to exploit ”Write Once, Run Anywhere” capabilities. JavaBeans [6] are classes that encapsulate many objects into a single object. They allow access to properties using getter and setter methods. It is aiming to create reusable components for Java. Other application is able to make use of it. The properties and methods that are exposed to another application can be controlled. JavaScript Object Notation (JSON) is an open-standard format that uses text to transmit data. The transmitted data is attribute–value pair. It is a common format for asynchronous browser and server communication. It is easy to parse and generate. JSON is also completely language independent. [7] So JSON is an ideal data-interchange format for the project. Servlets is used to respond to requests. [8] Servlet can communicate with any client–server protocol, but the most protocol used is the HTTP protocol. To deploy and run a Servlet, a web container must be used. Android is an operating system based on the Linux kernel and designed primarily for touchscreen mobile devices. Android is an open source and builds upon parts of many different open source projects. Developers can develop Android app by Android SDK. 2 Design and Implementation 2.1 System outline design From sever, the database, mahout and JDBC is encapsulated as a black box. The controller is used to coordinate different parts. The Servlet reacts to the request from client. For client, the connection is used to send request to server and receive response from server. The controller is used to coordinate other two parts and to implement some logic functions. The UI is to present data to user. The figure 1 is a package diagram of server which shows the relationship of them. The controller coordinates other three parts. 2.2 Recommendation strategy design This paper will compare different recommendation methods and present a reasonable rec- ommendation strategy. This algorithm is based on attributes of different people. If two persons have the same attributes value, the two persons can be regarded as similar. The attributes include gender, age, and so on. One of the two persons can get recommendation from another person. This algorithm does not rely on history record. There is no cold start problem. However, this algorithm has low accuracy. Getting information of users is also difficult since the users are - 4 -
http://www.paper.edu.cn 图 1: Package diagram of the Server sensible to their privacy. So this algorithm is not used in the strategy. Similarity calculation is a part of collaborative filtering. Cosine similarity considers differ- ent movies of a person as different coordinate axis. It is a multidimensional coordinate. Every user in the coordinate, according to the rating record of the user, is presented as a vector. So the cosine of included angle of two vectors is the similarity of the two users. This algorithm is the most popular one nowadays. This is also the algorithm this paper uses. [9] T (x, y) = xy ||x||2 + ||y||2 (1) Euclidean distance considers different movies of a person as different coordinate axis. It is a multidimensional coordinate. Every user in the coordinate, according to the rating record of the user, is presented as a point. So the Euclidean distance between two points is the similarity of the two users. There is a small problem comparing with cosine similarity. Imaging a triangle with point A, B, and C, for cosine similarity, the similarity of A and C is equal to the similarity of the plus of A and B and B and C. For Euclidean distance, the similarity of A and C is not equal to the similarity of the plus of A and B and B and C, because cosine similarity considers A, B, C as points on a circle and Euclidean distance considers A, B, C as points on a triangle. The line AC is smaller but the arc AC is the same. So this paper do not use this algorithm. [10] T (x, y) = 1 +(xi + yi)2 1 (2) Tanimoto coefficient considers different movies of a person as different coordinate axis. It is a multidimensional coordinate. Every user in the coordinate, according to the rating record of the user, can be presented as a vector. So the tangent of included angle of two vectors is the similarity of the two users. The difference between this algorithm and cosine similarity is that Tanimoto coefficient uses tangent and cosine similarity uses cosine. The changing of rating for cosine is high at middle and low at both sides. The changing of rating for tangent is normal - 5 -
http://www.paper.edu.cn at the beginning and high at the end. Tanimoto coefficient put the importance at every stage. The cosine only emphasizes the middle. T (x, y) = (xi + yi)2 − xy xy (3) This algorithm relies on the similarity between different users. The algorithm calculates the similarity at first. As this paper mentioned before, This paper use cosine similarity to cal- culate. Then the algorithm calculates neighbourhoods of a user A, who need recommendation, according to the similarity. There are two methods to calculate neighbourhoods. One is the fix-size neighbourhoods. No matter how far other users from user A, the method chooses the nearest users as neighbourhoods with fixed number. Using this method, this paper can ensure that there are neighbourhoods. Another is the threshold-based neighbourhoods. This method set a similarity as threshold. All users satisfying the threshold are regarded as neighbourhoods. Using this method, this paper can ensure a good similarity of neighbourhoods. After it calculated neighbourhoods, this paper can give user A recommendation according to the neighbourhoods of user A. Item collaborative filtering relies on the similarity between different items. The algorithm calculates the similarity at first. As this paper mentioned before, This paper use cosine sim- ilarity to calculate. Then the algorithm calculates similarity between the movies that user A rated and the other movies. If user A like movie 1, the algorithm will recommend other movies which have high similarity with movie 1. From the aspect of user A, this algorithm only recommends the movie which is similar as the movie user A has seen. Unless user A see a type of movie, the algorithm will not recommend this type of movies. However, if user A has some niche favours, this algorithm can recommend movies according to its niche favours movies, while the user collaborative filtering cannot. After analyses of different recommendation methods, this paper designs a recommendation strategy. This strategy contains three methods and four algorithms. Also take user A as the example. The first method uses user collaborative filtering with threshold-based neighbourhoods. The user collaborative filtering is accurate. The threshold-based neighbourhoods ensure that the neighbourhoods have good similarity, so the recommendation result will be good. If the first method does not provide enough recommendation, there are two main reasons. One reason is that there are no neighbourhoods of the user A with good similarity. Another reason is that the user A has watched all the movies that neighbourhoods watched. To solve these two problems, this paper designed the second method. If these two problems come, the accurate of first method will decrease sufficiently. If the first method does not provide enough recommendation, the second method is used. The second method uses user collaborative filtering with fix-size neighbourhoods and item - 6 -
http://www.paper.edu.cn collaborative filtering. The user collaborative filtering with fix-size neighbourhoods is able to make sure to find neighbourhoods. Although the similarity may be not good, it is better than no neighbourhoods. The item collaborative filtering can recommend movies of niche favour of user A. It is a supplementary of user collaborative filtering. These two algorithms solve the two problems of first method. The user collaborative filtering with fix-size neighbourhoods mainly helps the lacking neighbourhoods problem, but also helps another problem a little. The item collaborative filtering mainly helps niche favour problem, but also helps another problem a little. Because the two problems are not usual, there are only few users benefit from the second method. There still is a problem that is cold start. If there is a movie and no one rates it, the movie will never be recommended. So if the second method does not provide enough recommendation, the third method is used. The third method uses content-based filtering. It does not rely on the rating record. It recommends movies according to movie type. It has low accurate. However, since the system cannot recommend movie from the first and second method, low quality recommendation is better than nothing. The most important is that the third method can solve cold start problem. 2.3 Server design and implementation 2.3.1 Recommendation Implementation For implementation, this paper use Apache Mahout, a project of the Apache. The Mahout has implemented all algorithms this paper mentioned before. So this paper focus on implement the recommendation strategy with the help of Mahout. The recommendation will get the newest data from database, then execute the recommendation strategy. The data from database can be reused in all three methods of the strategy. The similarity can be reused in the two user collaborative filtering with different neighbourhoods algorithm. These reuse enhances program efficiency. After calculation, the recommendation part returns back the results. The controller has no idea about the recommendation strategy. 2.3.2 Servlet design and implementation This is about how server deals with request of client. The main calculation of the whole system is based on server. This paper designs a transmission protocol and designed how the other parts should be integrated with the connection part. Since it is based on some lightweight text messages, this paper use HTTP protocol. This part is responsible for transmission and connection. It does not deal with logic operation and just acts like a message porter. Focus on transmission at first. The server deals with the request and gives response. The server should tell client which request server is responding, since a client may send many - 7 -
http://www.paper.edu.cn requests at one time, so the server also need send the function name. The client sends account name to server. However, because there is only one account on one client running, server does not need to send account name again. Also, the server will send back the corresponding result according to the function. In total, the server should respond the function name and the result. All of the information is encapsulated in JSON package. JSON, JavaScript Object Notation, is an open-standard format that uses text to transmit data by attribute–value pair. So this paper puts all information in a JSON object, and send it. There are serval examples as follows. Login: client send function name with account name and password. The server checks it and give back whether login successfully and function name. Get one account rate: the client sends account name and function name, and server will send back function name and an array which contains all the movies IDs this account rated and corresponding rating scores. Update rating score: the client sends a movie ID, its rating score, account name and function name. The server will know client wants to update rate, then server extracts the movie ID and score to update in database. After that, the server will send back whether it is successful and function name. Delete one rating record: the client sends a movie ID, account name and function name. The server will delete the rating row corresponding this movie ID and account ID then send back whether it is successful and function name. Next, focus on how the connection part is integrated with other parts. For server, the Servlet will new threads automatically to respond every request. The job of Servlet is to receive request, to extract message from JSON package and to call the corresponding method of controller with that message. The Servlet is only responsible for transmit message and tell controller to execute corresponding function. The controller just need to finish the task the Servlet gives, and does not need to care how the Servlet works. 2.3.3 Database design and implementation The database needs to store three kinds of main information. They are account, movies, and rates of accounts to movies. The information of every account itself contains a unique name, and password. The movie information contains the name, abstract, detail, and type. The abstract is brief introduction of the movie. The detail tells more information about the movie. The type is an integer which should be regarded as a binary number. For every digit of the binary number corresponds to a specific attribute. 1 means it has this attribute, while 0 means it does not. This just needs to define the relationship between digits and attributes. In this way, it can use an integer number to present several attributes. The rating information indicates which account gives which movie how much score. According to Normal Form of relative database, this paper creates three tables which are table of accounts, table of movies and table of rates. It shows as figure 2. To integrate these three tables, it needs something to connect them together which is ID. This paper gives every - 8 -
分享到:
收藏