CN108710620B

CN108710620B - Book recommendation method based on k-nearest neighbor algorithm of user

Info

Publication number: CN108710620B
Application number: CN201810049034.8A
Authority: CN
Inventors: 郝宁宁; 李媛鸣; 王川; 陈梦瑶; 石冰洁; 刘二宝; 祝晓雪; 高婧
Original assignee: Rizhao Gelang E Commerce Co ltd
Current assignee: Rizhao Gelang E Commerce Co ltd
Priority date: 2018-01-18
Filing date: 2018-01-18
Publication date: 2022-05-20
Anticipated expiration: 2038-01-18
Also published as: CN108710620A

Abstract

The invention discloses a book recommendation method and system based on a user k-nearest neighbor algorithm, which realizes the book recommendation method based on the user k-nearest neighbor collaborative filtering algorithm. The recommendation algorithm applied by the invention is a recommendation algorithm based on collaborative filtering, and particularly uses a k-nearest neighbor algorithm based on users, and the algorithms can be used for individually recommending books which may be interested to different readers aiming at the scores of the books by the different readers.

Description

Book recommendation method based on k-nearest neighbor algorithm of user

Technical Field

The invention belongs to the technical field of book recommendation, and relates to a book recommendation method and system based on a k-nearest neighbor algorithm of a user.

Background

With the development of information technology and the internet, people gradually move from the times of lacking information to the times of information overload. Many times we are faced with the problem of not being short of materials, poor information, but too much of them, let us be dazzling and not know how to choose. In the face of mass information, at present, two problems exist, on one hand, how to find out the content really interested by the user from the overloaded information; on the other hand, information providers have how to make the information they provide noticed by interested persons, rather than being overwhelmed with a huge amount of information.

To address the information overload problem, catalogs and search engines have emerged. They all establish a match between the information and the user who can search for the information of interest by searching for keywords. However, the search engine also has limitations, firstly, the results provided by the search engine are usually not personalized, different people search with the same keyword, the returned results are usually the same, and the tastes of people are different; thus, search engines cannot accurately filter information for different users; another limitation of search engines is that they require users to have a clear understanding of their needs and to be able to formulate keywords, however, users sometimes have certain needs that they are not aware of, and search engines are at this time ineffective. Although both of these tools may help users find information that they may be interested in faster. None of these tools provide personalized services for different users.

Recommendation systems are another means of assisting information to match users. Unlike search engines, the recommendation system does not need the user to input additional keywords, can actively mine neighboring users with similar interests and hobbies according to past historical behavior records of the user, find the interested article information of the neighboring users, and recommend related goods or information to the user. Because the recommendation is carried out according to the characteristics of each user, the personalized requirements can be met, products meeting the personalized requirements of the users are recommended for different users, information is displayed in front of the users more accurately, and meanwhile, the information filtering method does not rely on the information actively input by the users to filter the information.

The core idea of the k-nearest neighbor collaborative filtering algorithm based on the user is as follows: under the premise of giving a user-item scoring matrix, other users (called as neighbors of the current user) most similar to the historical scoring records of the current user are found, the k-nearest neighbor algorithm generates recommendations by using the information of the neighbors, all items which are not scored for the current user but scored by the neighbors are used as candidate recommended items, the scoring of the candidate recommended items by the current user is predicted by using the scoring records of the neighbors and the similarity information between the users, and finally, a plurality of items with higher scores are recommended to the current user.

Disclosure of Invention

The invention aims to provide a book recommendation method and system based on a k-nearest neighbor algorithm of a user, the algorithm can recommend books which are possibly interested to different users in a personalized way aiming at the scores of the books by the different users, the similarity of objects involved in the system is comprehensively considered, the recommendation accuracy is improved,

in order to achieve the purpose, the invention adopts the following technical scheme:

a book recommendation method based on a k-nearest neighbor algorithm of a user comprises the following steps:

(1) randomly dividing scoring behavior data of the user historical books into M parts according to uniform distribution, selecting one part as a test set, taking the rest M-1 parts as a training set, and establishing a k-nearest neighbor recommendation model based on the user and the articles on the training set of the scoring behavior data of the user historical books;

(2) establishing a user interest model on a training set of the scoring behavior data of the user historical books through a k-nearest neighbor recommendation model to generate a recommended book list, and calculating the accuracy and recall rate of a k-nearest neighbor algorithm under the condition that the number k of the most similar users is an initial value through a test set combining the scoring behavior data of the user historical books;

(3) sequentially updating and establishing the k values of the number of most similar users of the set algorithm in the k-nearest neighbor recommendation model, and calculating the accuracy and the recall rate of the book recommendation list and the algorithm under different k values;

(4) adding the accuracy and the recall rate corresponding to each different k value to obtain a performance index value based on a k-nearest neighbor algorithm of the user; taking the value of a parameter k corresponding to the maximum value of a plurality of groups of performance index values based on the k-nearest neighbor algorithm of the user as the value of an optimal algorithm parameter k of the user in the k-nearest neighbor algorithm based on the user;

(5) and (3) inputting the value of the optimal parameter k of a certain user in the k-nearest neighbor algorithm based on the user, and generating a book recommendation list based on the k-nearest neighbor algorithm of the user for the user by using the k-nearest neighbor recommendation model in the step (1).

As a further improvement of the invention, the establishing step of the k-nearest neighbor recommendation model in the step (1) is as follows:

1) processing a training set of user historical book scoring behavior data into a user book scoring matrix R of m x n;

2) calculating the similarity between users by using a Pearson correlation coefficient;

3) for each user, sequencing the similarity of the user and other users in a descending order;

4) generating a candidate recommended book list according to the similarity calculation result and in combination with an algorithm parameter k, calculating the prediction score of each book in the candidate recommended book list by using a calculation formula of the prediction scores, sequencing the candidate recommended book list according to the sequence of the prediction scores from large to small, taking the first books in the candidate recommended book sequencing list to form a final book recommendation list, and generating a recommended book list based on a k-nearest neighbor algorithm of a user;

as a further improvement of the present invention, the similarity calculation formula between users in step 2) is as follows:

in the above formula, P (u, v) represents the similarity between user u and user v, I_uAnd I_vRepresenting the scored book collections, r, of user u and user v, respectively_uiAnd r_viRespectively representing the scoring of item i by user u and the scoring of book i by user v,

and

mean scores for books representing user u and user v, respectively;

as a further improvement of the present invention, the calculation formula of the prediction score in step 5) is as follows:

in the above-mentioned formula,

representing the predicted rating of book i by user u,

and

is the average score of user u and user u ' on the book, sim (u, u ') is the similarity between user u and user u ', r_u′iRepresents the scoring of book i by user u', and M is the set of users that are most similar to user u.

As a further improvement of the invention, the accuracy and recall rate of the k-nearest neighbor algorithm in the step (2) are calculated by the following formula:

a) rate of accuracy

Wherein Precision (U (U)) represents the accuracy of k-nearest neighbor algorithm based on users for users U, R (U) (U)) represents book recommendation lists generated for users U based on k-nearest neighbor recommendation algorithm of users, T (U) represents recommendation lists of items scored by users U, and U represents all users;

b) recall rate

In the formula, Recall (U (U)) represents the Recall rate of the k-nearest neighbor algorithm based on the user for the user U, R (U (U)) represents the book recommendation list generated for the user U based on the k-nearest neighbor recommendation algorithm of the user, T (U) represents the recommendation list of the items scored by the user U, and U represents all the users.

A book recommendation system based on a k-nearest neighbor algorithm of a user comprises a user item information collection layer, a storage layer, a recommendation engine module and an interface layer;

the storage layer is used for storing data used and generated by the system, and comprises basic information of users and books and user behavior information;

the user article information collection layer is connected with the storage layer and is used for inputting and maintaining the basic information and the user behavior information of the user and the book;

the recommendation engine module is connected with the storage layer and used for calculating on the basis of historical behavior data of the articles by the user to generate a recommendation list; constructing a recommendation engine by adopting a k-nearest neighbor recommendation model based on a user and a k-nearest neighbor recommendation model based on an article;

the interface layer is connected with the recommendation engine module and the storage layer and is communicated with the front-end display unit, the calculated data are required to be transmitted to the front-end display unit, the scores of the books are obtained and transmitted back by the user through the front-end display unit, the interface layer provides required data for calling of the front-end display unit, and the user behavior data transmitted by the front-end display unit are transmitted to the storage layer for storage.

Compared with the prior art, the invention has the following advantages:

the invention adopts the advantages of a k-nearest neighbor algorithm based on the user, finds other users (called as neighbors of the current user) most similar to the historical score record of the current user by utilizing the historical score record of the user to the articles, and the k-nearest neighbor algorithm generates recommendations by utilizing the information of the neighbors and has more accurate recommendation performance. The algorithm has the advantage that the number k of the most similar users of the core parameter related to the k-nearest neighbor algorithm is optimized. Dividing the historical book scoring records of different users into a training set and a test set, sequentially substituting different k values, respectively calculating the accuracy and recall rate of the corresponding k-nearest neighbor algorithm, and adding the accuracy and recall rate corresponding to each different k value to obtain a performance index value of the k-nearest neighbor algorithm based on the user; selecting the value of the parameter k corresponding to the maximum value of the multiple groups of performance index values as the value of the optimal algorithm parameter k of a user in a k-nearest neighbor algorithm based on the user; and the value of the core parameter k of the algorithm can be dynamically adjusted according to the change of the user history record, so that the performance of the recommendation algorithm is improved.

Drawings

FIG. 1 is a flow chart for training an optimal parameter k value;

FIG. 2 is a diagram of a user-based k-nearest neighbor algorithm recommendation process;

FIG. 3 is a flow chart of a user-based collaborative filtering algorithm;

FIG. 4 is a system framework diagram;

FIG. 5 is a scatter plot of some of the user's optimal parameters;

FIG. 6 is a block diagram of a book recommendation algorithm;

FIG. 7 is a book recommendation module layout view;

FIG. 8 is a final book recommendation list based on the user's cytun.

Detailed Description

The invention is described in detail below with reference to the attached drawing figures:

as shown in fig. 1: the invention relates to a book recommendation method based on a k-nearest neighbor algorithm of a user, which comprises the following steps of:

(1) firstly, randomly dividing scoring behavior data of a user historical book into M parts according to uniform distribution, selecting one part as a test set, and taking the rest M-1 parts as a training set. Establishing a k-nearest neighbor recommendation model based on a user on a training set of the historical book scoring behavior data of the user, as shown in fig. 2, wherein the recommendation model is established by the following steps:

1) processing the training set of the user historical book scoring behavior data into a user book scoring matrix R of m x n,where m represents m users, n represents n books, r_uiRepresenting the user u's rating of item i.

2) Calculating the similarity between users, wherein the similarity measure is a Pearson correlation coefficient, and the calculation formula is as follows:

and

mean scores for books are shown for user u and user v, respectively.

3) For each user, the similarity of the user and other users is sorted in the descending order.

4) As shown in fig. 3, a list of recommended books based on the user's k-nearest neighbor algorithm is generated. Suppose a list of recommended books is generated for user u. First, the initial value of the parameter k is set to 20, and the parameter k means the number of the most similar users. And taking the first 20 users in the similarity ranking list of the user u as a neighbor user set of the user u. Then, taking 20 books scored by the neighbor users but not scored by the user u as a candidate recommended book list of the user u, and scoring the books in the candidate recommended book list, wherein a calculation formula of book prediction scoring is as follows:

in the above-mentioned formula,

representing the predicted rating of book i by user u,

and

And sequencing the candidate recommended books from large to small according to the predicted scores of the candidate books, and setting a parameter N as 10, wherein the meaning of the parameter N is the number of the books in the book recommendation list. And taking the first 10 books in the candidate recommended book ranking list to form a final book recommendation list.

(2) Through the four steps, a user interest model can be established on a training set of the scoring behavior data of the user historical book, a recommended book list is generated, and then the coincidence degree of the behavior and the actual behavior on the testing set, namely the accuracy and the recall rate of the k-nearest neighbor algorithm, is predicted under the condition that an initial value k is 20 by combining a testing set of the scoring behavior data of the user historical book, wherein the specific calculation formula is as follows:

a) rate of accuracy

In the above formula, Precision (U)) represents the accuracy of k-nearest neighbor algorithm based on users for user U, R (U)) represents the book recommendation list generated for user U based on the k-nearest neighbor recommendation algorithm of users, t (U) represents the recommendation list of items scored by user U, and U represents all users.

b) Recall rate

In the above formula, Recall (U) (U) represents the Recall rate of the k-nearest neighbor algorithm based on the user for user U, R (U) (U) represents the book recommendation list generated for user U based on the k-nearest neighbor recommendation algorithm of the user, T (U) represents the recommendation list of the items scored by user U, and U represents all users.

(3) Updating the core k value of the algorithm set in the step 4) in the recommendation model established in the step (1) to 30, 40 and 50, and calculating book recommendation lists under different k values. And (4) repeating the step (3), and calculating the accuracy and the recall ratio of the k-nearest neighbor algorithm based on the user when k is 30, k is 40 and k is 50.

(4) Through the above 3 steps, it is finally obtained that when k is 20, k is 30, k is 40, and k is 50, the accuracy and recall rate of the k-nearest neighbor algorithm of the user are based on the accuracy and recall rate of the user, and the total of 4 sets of accuracy and recall rate are obtained. Then, the accuracy and recall corresponding to k 20, k 30, k 40 and k 50 are added to obtain the performance index value of the k-nearest neighbor algorithm based on the user. Next, based on the k-nearest neighbor algorithm of the user, under the condition of different k values, 4 groups of performance indexes are sorted from large to small. And finally, taking the value of the parameter k corresponding to the maximum value of the 4 groups of performance index values as the value of the optimal algorithm parameter k of the user in the k-nearest neighbor algorithm based on the user to perform test operation on the test set, wherein the purpose of the four steps is to train the optimal algorithm parameter k for different users, and obtain the performance index value, namely the accuracy rate and the recall rate, of the algorithm corresponding to the optimal parameter k value.

The invention also provides a book recommendation system based on the k-nearest neighbor algorithm of the user, and the recommendation system is to face two important objects: the core of the system is a recommendation engine which associates users with items. The main work of the book recommendation system is to provide a book recommendation list to the user. The general user, the article and the recommendation engine form a complete recommendation system, the overall architecture of the system and the model design of the three parts are explained below, and a specific system framework diagram is shown in fig. 4 as follows:

the storage layer is used for storing data used and generated by the system and mainly comprises basic information of users, books and user behavior information.

The user article information collection layer is responsible for entering and maintaining the basic information and user behavior information of the user and the book.

The recommendation engine calculates based on the historical behavior data of the user on the item, generating a recommendation list. The system adopts a collaborative filtering recommendation algorithm, and specifically adopts a recommendation engine constructed based on a k-nearest neighbor collaborative filtering algorithm of a user. The interface layer is responsible for the communication of the system and the front-end program. The system runs in the background, calculated data needs to be transmitted to the front end for display, the user scores the books and returns the scores through the front end, the interface layer works to provide required data for front end calling, and user behavior data transmitted from the front end is delivered to the storage layer for storage and standby.

The database uses SQLite to store the behavior data of the user and the basic data of the user and the book. The database of the system has many-to-many relationship among users, book entities, users and books, and many-to-many relationship among users, and based on the relationship, corresponding database tables can be designed:

(a) system object information collection

The system object comprises user information and basic book information. The user information is collected in two parts, wherein the first part is information filled in when the user registers; the second part is calculated by the system background according to the existing user behavior data, wherein the related numerical value is based on the core parameter of the k-nearest neighbor algorithm of the user.

For a new user, since there is no user behavior data, the daemon cannot train the optimal parameters of the algorithm for the new user, and therefore, according to the conclusion obtained by the recommended simulation experiment, the optimal parameters of most users are 50, and the specific attributes of the users are as shown in table 1 below:

TABLE 1

The acquisition of book basic information is generated by utilizing a crawler technology through a bean API (application program interface), and the specific attributes of the book are shown in the following table 2:

TABLE 2

Collection of user behavior records

The algorithm used in the recommendation system is the user behavior record, and the system mainly refers to the scoring record of the user on the book. The recording is collected by retrieving books from the system foreground by a user and grading the books, the grading data is transmitted to the background database through an interface, and the specific content of a user-book grading table related to the user behavior record is shown in the following table 3:

TABLE 3

The recommendation module is mainly used for establishing a user recommendation model and training algorithm core parameters and recommending books which may be interested to the user. The module is a core module of the book recommendation system, and the function of the module is to realize recommendation of a book list for a user. The module involves a user-based k-nearest neighbor collaborative filtering algorithm. The recommendation module design is shown in FIG. 7 as follows:

the system firstly collects user behavior records, and generates a user-item scoring matrix for relevant operation so as to realize the function of recommending users, wherein a recommendation algorithm design block diagram is shown in the following figure 6:

example (b):

a data set of a total of 129334 user book rating records is used below, where a total of 265 books and 1968 users are involved.

(1) Firstly, the data set is randomly divided into 3 parts according to uniform distribution, one part is selected as a test set, and the remaining 2 parts are used as a training set. Tables 4 and 5 below show the training and test sets of the user cytun.

TABLE 4

TABLE 5

(2) And establishing a k-nearest neighbor recommendation model based on the user on a training set of the historical book scoring behavior data of the user cytun, and calculating k-20, k-30, k-40 and k-50 by combining a test set of the historical book scoring behavior data of the user cytun, wherein 4 groups of accuracy and recall rate are calculated based on the accuracy and recall rate of a k-nearest neighbor algorithm of the user. As shown in table 6 below, table 6 is a user cytun based parameter performance.

TABLE 6

The recommendation algorithm applied by the invention is based on the k-nearest neighbor algorithm of the user, and the algorithm can be used for recommending books which are possibly interested to different readers in a personalized way aiming at the scores of the books of the different readers. In the collaborative filtering algorithm, different parameters are selected, which have different influences on the recommended effect. The invention also discloses an off-line experiment method, which trains a core parameter k based on the k-nearest neighbor algorithm for a user, and aims to improve the performance of the recommendation algorithm so as to obtain the optimal recommendation result. The invention realizes book recommendation based on a single collaborative filtering algorithm. Finally, the personalized book recommendation system is designed, functions of book historical scoring record query, book information query and book recommendation are realized, and a recommendation simulation experiment is performed on a data set.

In order to more intuitively observe the optimal parameter distribution of the existing users in the system, a user optimal parameter scatter diagram with part of the users as abscissa and the parameter k as ordinate is shown in fig. 5 below, the most important function in the k-nearest neighbor module is to generate recommendations, the user cytun is taken as an example below, and the book recommendation list generated for the user cytun by the k-nearest neighbor recommendation algorithm based on articles is shown in fig. 8 below.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent changes and modifications made within the scope of the present invention should be considered as the technical scope of the present invention.

Claims

1. A book recommendation method based on a k-nearest neighbor algorithm of a user is characterized by comprising the following steps:

step 1, randomly dividing scoring behavior data of a user historical book into M parts according to uniform distribution, selecting one part as a test set, taking the rest M-1 parts as a training set, and establishing a k-nearest neighbor recommendation model based on the user on the training set of the scoring behavior data of the user historical book;

step 2, establishing a user interest model on a training set of the scoring behavior data of the user historical books through a k-nearest neighbor recommendation model to generate a recommended book list, and calculating the accuracy and the recall rate of a k-nearest neighbor algorithm when the number k of the most similar users is at an initial value by combining a test set of the scoring behavior data of the user historical books;

step 3, updating the number k value of the most similar users of the set algorithm in the k-nearest neighbor recommendation model in sequence, and calculating book recommendation lists under different k values; under the condition of calculating different k values, the accuracy and the recall rate of a k-nearest neighbor algorithm based on the user are calculated;

step 4, adding the accuracy and the recall rate corresponding to each different k value to obtain a performance index value of the k-nearest neighbor algorithm based on the user; taking the value of a parameter k corresponding to the maximum value of the performance index value of the user-based k-nearest neighbor algorithm as the value of the optimal algorithm parameter k of a certain user in the user-based k-nearest neighbor algorithm;

and 5, inputting an optimal algorithm performance index value of a user in the k-nearest neighbor algorithm based on the user, and generating a book recommendation list based on the k-nearest neighbor algorithm of the user for the user by using the k-nearest neighbor recommendation model in the step 1.

2. The book recommendation method based on user's k-nearest neighbor algorithm as claimed in claim 1, wherein the k-nearest neighbor recommendation model in step 1 is built by the following steps:

step 1.1, processing a training set of user historical book scoring behavior data into a user book scoring matrix R of m x n;

step 1.2, calculating the similarity between users by using a Pearson correlation coefficient;

1.3, for each user, sequencing the similarity of the user and other users in a descending order;

and step 1.4, generating a candidate recommended book list according to the similarity calculation result and in combination with an algorithm parameter k, calculating the prediction score of each book in the candidate recommended book list by using a calculation formula of the prediction scores, sequencing the candidate recommended book list according to the sequence of the prediction scores from large to small, taking the first books in the candidate recommended book ranking list to form a final book recommendation list, and generating a recommended book list based on a k-nearest neighbor algorithm of a user.

3. The book recommendation method based on user k-nearest neighbor algorithm as claimed in claim 2, wherein the similarity calculation formula among users in step 1.2 is as follows:

and

the average scores of the books by user u and user v are shown, respectively.

4. The book recommendation method based on user k-nearest neighbor algorithm as claimed in claim 2, wherein the calculation formula of the prediction score in step 1.4 is as follows:

in the above-mentioned formula, the compound of formula,

and

is the average score, sim (u), of user u and user u' for the item_,u ') is the similarity between user u and user u', r_u′，iRepresents the user u's score for item i, and N is the set of most similar neighbors to user u.

5. The book recommendation method based on user k-nearest neighbor algorithm as claimed in claim 1, wherein the accuracy and recall of k-nearest neighbor algorithm in step 2 are calculated as follows:

a) rate of accuracy

b) recall rate

In the formula, Recall (U (u)) represents the Recall rate of the k-nearest neighbor algorithm based on the user for the user u, and R (U) (u)) represents the k-nearest neighbor recommendation algorithm based on the user for the user_uAnd (3) generating a book recommendation list, wherein T (U) represents a recommendation list of the items scored by the user U, and U represents all users.