CN115618127A - Collaborative filtering algorithm of neural network recommendation system - Google Patents

Collaborative filtering algorithm of neural network recommendation system Download PDF

Info

Publication number
CN115618127A
CN115618127A CN202211273752.6A CN202211273752A CN115618127A CN 115618127 A CN115618127 A CN 115618127A CN 202211273752 A CN202211273752 A CN 202211273752A CN 115618127 A CN115618127 A CN 115618127A
Authority
CN
China
Prior art keywords
user
users
item
project
scoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211273752.6A
Other languages
Chinese (zh)
Inventor
朵琳
龙国虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202211273752.6A priority Critical patent/CN115618127A/en
Publication of CN115618127A publication Critical patent/CN115618127A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a collaborative filtering algorithm of a neural network recommendation system, which is characterized by firstly adopting a convolutional neural network to establish a convolutional neural network model for some behaviors before a user and the property of a project, extracting the characteristics of the user and the property of the project, fitting scores in a full-connection mode, introducing a correlation rule thought, generating a project correlation matrix according to the project correlation relationship, generating a candidate project set, providing a modified Pearson correlation coefficient, establishing a scoring matrix of the user, optimizing the matrix, calculating the scoring similarity between the users, ranking according to the sequence from high score to low score and the degree of similarity of scores, selecting the nearest neighbor ranked in the front, calculating the scoring predicted value of the target user to the project without scoring, then selecting N projects with the predicted values ranked in the front to recommend the user, and providing good results on the corresponding evaluation indexes for a movie recommendation algorithm based on the convolutional neural network and the recurrent neural network.

Description

Collaborative filtering algorithm of neural network recommendation system
Technical Field
The invention relates to the technical field of medical instruments, in particular to a collaborative filtering algorithm of a neural network recommendation system.
Background
The technological progress makes human life more and more convenient, and people can see the beauty of the world without going out. They can share interesting things they encounter with the social platform and use the network to search for the content they are interested in a large amount. The amount of behavioral data generated by each person is enormous, which causes the information on the network to be explosive. In such a mass data environment, it is difficult for people to accurately and quickly acquire meaningful data. To solve this problem, the current solution that seems to be more effective is the recommendation system. The analysis is performed by a series of behaviors generated by a user when the user visits a webpage, such as a movie recommendation system, which integrates viewing information, browsing information and rating of movies of the user, and then recommends movies which may be of interest to the user. A mature movie recommendation would provide real-time recommendations to an online user, which would increase the user's stickiness to the web site or application.
In the current society, recommendation systems are ubiquitous, and the recommendation algorithms of the recommendation systems are in the E-commerce field, the book field, the music field, the social field, the movie field and the like. The initial recommendation is to sort according to the browsing data of the user and then recommend the goods with the top rank to the user. Taking the movie field as an example, there is a comedy that many users click on, thus resulting in the comedy being ranked top. While there is now a user who is interested in suspense and does not like comedies, it is not appropriate to recommend comedies to the user. Therefore, when online real-time recommendation is performed on the user, if the method is adopted, a large number of users may be lost, and the users cannot enjoy the method, so that the product cannot survive in the environment. At present, a plurality of video movie websites pay great attention to the quality and efficiency of a recommendation algorithm, and some video movie providers spend some funds to hold some games, so that a better quality recommendation algorithm is obtained. For example, netflix corporation spends a large amount of money in holding a game about video recommendations and applies a recommendation method that performs well in the game to movie recommendation items of their companies. For example, recommendation methods such as association rules (association rules), singular Value Decomposition (SVD), collaborative filtering (collaborative filtering) and the like are established, and these methods bring new vitality to the field of movie recommendation systems. Of course, application of these methods also achieves very good recommendation results. However, the traditional collaborative filtering algorithm cannot be used for recommending in combination with context, and cannot effectively capture the change of interests and hobbies of the user, so that the accuracy of recommendation is reduced, and when the problem of cold start of a new user is solved, the problems of insufficient neighbor number of items to be predicted and the like can occur. Therefore, a neural network recommendation system collaborative filtering algorithm is provided.
Disclosure of Invention
The invention aims to provide a collaborative filtering algorithm of a neural network recommendation system, which solves the problems that the traditional collaborative filtering algorithm cannot be used for recommending in combination with context, the change of interests and hobbies of a user cannot be effectively captured, the recommendation accuracy is reduced, and the neighbor number of a project to be predicted is insufficient when a new user is in cold start.
The technical purpose of the invention is realized by the following technical scheme:
a neural network recommendation system collaborative filtering algorithm comprises the following steps:
the method comprises the following steps:
s1, firstly, establishing a convolutional neural network model for some previous behaviors of a user and the attributes of items by adopting a convolutional neural network, so that more accurate recommendation is provided for the user;
the data set mainly comprises two static objects, namely a user and a project, wherein the related attributes of the user comprise a user ID, a gender, an occupation and an age, and the project attributes comprise a project ID, a project type and a project name, firstly, the data are coded digitally, discrete variables are converted into continuous value vectors, category fields in the data, such as the project category attributes, are represented by characters, and the common processing mode is to convert the category fields into numerical values by adopting one-hot coding;
s2, extracting characteristics of the user and the project attributes of the user, and fitting the scores in a full-connection mode;
firstly, converting attribute values into numbers for representation, taking the converted numbers as indexes N of an embedded matrix, then allocating potential factors with fixed sizes to each index, namely the size of a vector, usually selecting a criminal, wherein the size of the matrix dimension is (N) 32, obtaining user characteristic vectors and project characteristic vectors after data related attributes are operated, and fitting score values;
s3, introducing an association rule idea, generating a project association matrix according to the project association relation, and generating a candidate project set by using the project association matrix;
the relevance represents the strength of the relation between the projects, the method defines the relevance as the possibility that a user browses one project and then another project, the relevance is represented by r, and r ij The association degree of the item j to the item i is represented, and the ratio of the co-browsing user ratio of the item i and the item j to the browsing user ratio of the item i is defined to be calculated as follows:
Figure BDA0003895662740000041
wherein N is i ,N j Respectively representing the number of users scoring the item i and the item j, and establishing an item association matrix G = nxn according to the item association degree, wherein r ij = (= (i, j =1,2.. N)) is the degree of association between item j and item i, the major diagonal elements of the matrix are all 0, the degree of association between items is typically different, the magnitude of the degree of association indicates the strength of the relationship between items, and the item association matrix is typically asymmetric, i.e., r is ij ≠r ji
Figure BDA0003895662740000042
Introducing the relevance into the selection process of the candidate item set, and replacing a similarity matrix with an item relevance matrix to generate the candidate item set;
s4, aiming at the problem of poor similarity accuracy under the condition of sparse data, providing a modified Pearson correlation coefficient, establishing a user-scoring matrix, optimizing the matrix by using a forgetting function, calculating scoring similarity between a target user and other users, ranking according to the sequence from high score to low score and the scoring similarity, selecting a plurality of users arranged in the front as nearest neighbors of the target user, calculating a scoring predicted value of the target user for an unscored item according to the scoring information of the neighbors, and then selecting N items with the ranking predicted values being close to the front to recommend the users;
when selecting the common scoring data, the number of users scoring together is also considered, and if the proportion of the users scoring together is higher in the two items of scoring users, the two items are similar to each other to a certain extent, for example, the item i 1 Scored by 10 users, item i 2 Scored by 12 users, item i 3 Is over-ridden by 50 users if item i 1 And i 2 And item i 1 And i 3 All have 8 common scoring users, then the project i is considered 1 And i 2 More similarly, item i 1 And i 2 There are 8 co-scoring users that are also more similar to the two terms than there are 5 co-scoring users, for which a modified Pearson correlation coefficient is proposed, the formula being:
Figure BDA0003895662740000051
wherein, U i ,U j U represents the set of users scoring project i, project j, and both project i and project j, respectively, | U i |,|U j Respectively, representing the set U i ,U j Number of elements in U, r ui ,r uj Item i and item j are scored separately for user u,
Figure BDA0003895662740000052
average scores for item i and item j, respectively;
the algorithm is as follows:
(1) Environmental settings
Let user u be a new user to the current system, and the historical behavior information is null, but he already has the historical behavior information in other systems (such as social network sites, business network sites, video network sites) (either through authorization or the current system has a cross-site cookie);
(2) Construction of a user relationship network
The method comprises the following steps of constructing a relationship network among users according to historical information of the users in other systems, carrying out community division on the users, and using a similarity calculation formula when constructing the relationship network according to purchasing or grading information of the users, wherein the formula is as follows:
Figure BDA0003895662740000053
wherein r is u And r v Score vectors, covar (r), representing users u and v, respectively u ,r v ) Is represented by r u And r v The covariance of (a) of (b),
Figure BDA0003895662740000054
and
Figure BDA0003895662740000055
respectively represent r u And r v Standard deviation of medium non-zero elements [ 20%];
It should be noted that a similarity of 0 indicates that no connecting edge should be added between the two corresponding users, and the number of overlapping elements between the two vectors should be not less than three, otherwise the similarity between the two corresponding users is considered to be 0;
if a user has friends in a certain social network site, namely has a friend circle, the user can directly perform community division on the user;
(3) Community partitioning
The community division adopts a method of maximizing a master module (PMM), the basic principle is to maximize a modularity index Q, and the Q measures the intensity of community division on a real network by measuring the difference between a random network with same degree distribution and the interconnection in a community, and is defined as follows:
Figure BDA0003895662740000061
wherein
Figure BDA0003895662740000062
S is an attribution relationship matrix of the node u and the community C, if the node u belongs to the community C, S uC If the number is 1, otherwise, the number is 0, B is a module matrix, A is an adjacent matrix of the node relationship, d is a degree sequence of all nodes, m is the total number of edges in the node relationship network, and Q is less than 0, which indicates that the partitioning effect is very poor;
because the multi-dimensional relationship among the users is considered due to the fact that the scores of the users for the items, the friend relationships among the users and the like are included, the PMM performs community division according to the multi-dimensional relationship, and the obtained elements in the division result S indicate the height of the attribution degree of the user u to the community C;
(4) Selecting changes to similar neighbor policies
Selecting k neighbors with the maximum similarity from all users, selecting only the users in the same community with the target user u according to the method, and scoring the target user u for the prediction of the item j:
Figure BDA0003895662740000071
wherein the content of the first and second substances,
Figure BDA0003895662740000072
w is the average score of user u, a is a constant, C is the community in which user v is located, r V,j For the scoring of item j by user v,
Figure BDA0003895662740000073
and "v" is the average score of user v.
In conclusion, the invention has the following beneficial effects:
experimental simulation proves that the algorithm carries out model verification on the public film data set MovieLens. Through a series of comparative experiment results, the movie recommendation algorithm based on the convolutional neural network and the cyclic neural network researched by the method has good results on corresponding evaluation indexes.
Drawings
FIG. 1 is a comparison of user community partition based recommendations with conventional collaborative filtering recommendations in the present invention;
FIG. 2 is a comparison of UBCF-IBCF prediction accuracy in the present invention;
FIG. 3 is a comparison of UBCF-IBCF recommendation accuracy in the present invention;
FIG. 4 is a size of a candidate set of items in the present invention;
FIG. 5 is a scale of items of interest to a user in the present invention;
FIG. 6 is a partial user-rating table in the present invention;
FIG. 7 is a table of partial subscriber information in the present invention;
FIG. 8 is a table of items in the present invention;
FIG. 9 is a comparison of the Pearson correlation coefficient for the conventional and improved methods of the present invention;
FIG. 10 is a comparison of the cosine similarity of the vector of the present invention with the modified vector;
FIG. 11 is a comparison of cosine similarity of a conventional and improved modified correction vector in the present invention;
FIG. 12 is a comparison of the present invention between the conventional and modified Pearson correlation;
FIG. 13 is a convolutional neural network model framework in the present invention.
Detailed Description
The embodiment provides a collaborative filtering algorithm of a neural network recommendation system, which comprises the following steps:
the method comprises the following steps:
s1, as shown in FIG. 13, firstly, a convolutional neural network model is established for some behaviors and item attributes before a user by adopting a convolutional neural network, so that more accurate recommendation is provided for the user;
the data set mainly includes two static objects, namely, a user and an item, wherein the user-related attributes include a user ID, a gender, a occupation, and an age, and the item attributes include an item ID, an item type, and an item name. The movie year is a new feature attribute extracted from the movie name, and the items may be books, music, and the like. Firstly, digital coding is carried out on the data, discrete variables are converted into continuous value vectors, category fields in the data, such as item category attributes, are represented by characters, and a common processing mode is to convert the category fields into numerical values by one-hot coding;
s2, extracting characteristics of the user and the project attributes of the user, and fitting the scores in a full-connection mode;
firstly, converting attribute values into numbers for representation, taking the converted numbers as indexes N of an embedded matrix, then allocating potential factors with fixed sizes to each index, namely the size of a vector, usually selecting a criminal, wherein the size of the matrix dimension is (N) 32, obtaining user characteristic vectors and project characteristic vectors after data related attributes are operated, and fitting score values;
s3, introducing an association rule idea, generating a project association matrix according to the project association relation, and generating a candidate project set by using the project association matrix;
the relevance represents the strength of the relation between the projects, the method defines the relevance as the possibility that a user browses one project and then another project, the relevance is represented by r, and r ij The association degree of the item j to the item i is represented, and the ratio of the co-browsing user ratio of the item i and the item j to the browsing user ratio of the item i is defined to be calculated as follows:
Figure BDA0003895662740000091
wherein, N i ,N j Indicating the use of scores for item i and item j, respectivelyThe number of households can establish a project association matrix G = nxn according to the project association degree, wherein r ij = (= (i, j =1,2.. N)) is the degree of association between item j and item i, the major diagonal elements of the matrix are all 0, the degree of association between items is typically different, the magnitude of the degree of association indicates the strength of the relationship between items, and the item association matrix is typically asymmetric, i.e., r is ij ≠r ji
Figure BDA0003895662740000092
The relevance is introduced into the selection process of the candidate item set, and the item relevance matrix is used for replacing a similarity matrix to generate the candidate item set;
s4, aiming at the problem of poor similarity accuracy under the condition of sparse data, providing a modified Pearson correlation coefficient, establishing a user-scoring matrix, optimizing the matrix by using a forgetting function, calculating scoring similarity between a target user and other users, ranking according to the sequence from high score to low score and the scoring similarity, selecting a plurality of users arranged in the front as nearest neighbors of the target user, calculating a scoring predicted value of the target user for an unscored item according to the scoring information of the neighbors, and then selecting N items with the ranking predicted values being close to the front to recommend the users;
when selecting the common scoring data, the number of users scoring together is also considered, and if the proportion of the users scoring together is higher in the two items of scoring users, the two items are similar to each other to a certain extent, for example, the item i 1 Is scored by 10 users, item i 2 Scored by 12 users, item i 3 Is over-ridden by 50 users if item i 1 And i 2 And item i 1 And i 3 All have 8 common scoring users, then the project i is considered 1 And i 2 More similarly, item i 1 And i 2 There are 8 co-scoring users that are also more similar to the two terms than there are 5 co-scoring users, for which a modified Pearson correlation coefficient is proposed, the formula being:
Figure BDA0003895662740000101
wherein, U i ,U j U represents the set of users scoring project i, project j, and both project i and project j, respectively, | U i |,|U j Respectively, representing the set U i ,U j Number of elements in U, r ui ,r uj The item i and item j are scored separately for user u,
Figure BDA0003895662740000102
average scores for item i and item j, respectively;
the algorithm is as follows:
(1) Environment setting
If the user u is a new user for the current system, the historical behavior information is null, but the user u already has the historical behavior information in other systems (such as social networking sites, business networking sites and video networking sites) (the user u can pass authorization or the current system has cross-site cookies);
(2) Construction of a user relationship network
The method comprises the following steps of constructing a relationship network among users according to historical information of the users in other systems, carrying out community division on the users, and using a similarity calculation formula when constructing the relationship network according to purchasing or grading information of the users, wherein the formula is as follows:
Figure BDA0003895662740000111
wherein r is u And r v Score vectors, covar (r), representing users u and v, respectively u ,r v ) Is represented by r u And r v The covariance of (a) of (b),
Figure BDA0003895662740000112
and
Figure BDA0003895662740000113
respectively represent r u And r v Standard deviation of medium non-zero elements [ 20%];
It should be noted that a similarity of 0 indicates that no connecting edge should be added between the two corresponding users, and the number of overlapping elements between the two vectors should be not less than three, otherwise the similarity between the two corresponding users is considered to be 0;
if a user has friends in a certain social network site, namely has a friend circle, the user can directly perform community division on the friends;
(3) Community partitioning
The community division adopts a method of main module maximization (PMM), the basic principle is that a modularity index Q is maximized, Q measures the intensity of community division of a real network by measuring the difference of random networks with same degree distribution and the interconnection in the community, and is defined as follows:
Figure BDA0003895662740000114
wherein
Figure BDA0003895662740000115
S is an attribution relation matrix of the node u and the community C, if the node u belongs to the community C, S uC If the number is 1, otherwise, the number is 0, B is a module matrix, A is an adjacent matrix of the node relationship, d is a degree sequence of all nodes, m is the total number of edges in the node relationship network, and Q is less than 0, which indicates that the partitioning effect is very poor;
because the multi-dimensional relationship among the users is considered due to the fact that the scores of the users for the items, the friend relationships among the users and the like are included, the PMM performs community division according to the multi-dimensional relationship, and the obtained elements in the division result S indicate the height of the attribution degree of the user u to the community C;
(4) Selecting changes to similar neighbor policies
Selecting k neighbors with the maximum similarity from all users, selecting only the users in the same community with the target user u according to the method, and scoring the target user u for the prediction of the item j:
Figure BDA0003895662740000121
wherein the content of the first and second substances,
Figure BDA0003895662740000122
w is the average score of user u, a is a constant, C is the community in which user v is located, r V,j For the scoring of item j by user v,
Figure BDA0003895662740000123
and "v" is the average score of user v.
Simulation experiment:
the data set for this experiment was from an imhonet website, which is a multi-aspect social website, and the data set used in the method includes the user's connection on the web and their scores for books and movies, wherein the user has at least one connection, 4.8 general users have a score of 90 ten thousand movies for 5 ten thousand movies, 1.3 general users have an I20 general score for 1.4 ten thousand books, and each record of the ratings in the data set includes a user ID, a project ID, and a score value between them (0-10, where 0 represents no score). From this data set, those users who had ever had 20 scores on books and movies were selected, thereby producing an experimental data set that included 2363394 scores for 6089 users on 12621 movies and 1138401 scores for these users on 17907 books.
Taking 20% of the users in the data set as the test set and the remaining 80% as the training set, all users in the test set were scored to simulate a cold start problem, thereby simulating that all users in the test set of books are new users and that their scoring of the books is to be predicted.
The experiment adopts a contrast method, the contrast experiment is a traditional collaborative filtering algorithm, N nearest neighbors are selected for prediction, and the calculation method for calculating the similarity between users adopts a Pcrarson correlation coefficient. The comparison is made by modifying the number of nearest neighbors. The predicted neighbors are incremented from 10 to 100 by increments of 10, and the evaluation criteria still apply MAE, measuring the accuracy of the prediction by calculating the deviation between the predicted user score and the actual score. The smaller the MAE, the higher the quality of the recommendation algorithm.
Experimental results as in fig. 1, the square dotted line implementation represents the collaborative filtering score prediction based on the conventional pearson correlation, and the rectangular dotted line represents the score prediction based on the user community division.
Conclusion of the experiment
The experimental result shows that the score prediction based on user community division is superior to the traditional collaborative filtering, particularly after the number of nearest neighbors is larger than 40, the recommendation quality of the traditional collaborative filtering tends to be stable, the score prediction based on the community can still improve the recommendation quality, the cold start problem in the recommendation system is simulated in a data set by deleting the scores of the users for books, and experiments prove that the accuracy of the prediction of the nearest neighbors is superior to that of the prediction of the nearest neighbors under the condition that the number of the neighbors is continuously increased, namely, the algorithm can overcome the cold start problem and improve the recommendation quality.
The comparison of the term-based collaborative filtering algorithm (IBCF) and the content-based collaborative filtering algorithm (UBCF) on the MovicLens dataset and the MAE and Precision indexes as shown in fig. 2 and fig. 3, respectively. In the experiment, a data set is divided into a training set and a testing set of 80-20%, and the experiment is carried out by using a 5-fold intersection method, and meanwhile, the cosine similarity is adopted in the experiment to calculate the similarity between users and between projects. In the experiment of MAE comparison, the neighbor number knear increased from 5 to 50 with an interval of 5; in the Precision comparison experiment, the recommendation number topN is increased from 2 to 20 at intervals of 2, and meanwhile, when a candidate item set is generated, 5 most similar items or users are taken for each scored item or user (the candidate neighbor number is 5), and for the collaborative filtering algorithm neighbor from the user, 30 is taken and the collaborative filtering algorithm neighbor based on the item is taken and 15 is taken (the values are different because the two algorithms can achieve stable top measurement accuracy under the above-mentioned values). As can be seen from the following two figures, in the case of quite sparse data, the collaborative filtering algorithm based on items has better prediction accuracy (MAE value) than the collaborative filtering algorithm based on users, and indeed relieves the influence of data sparsity to some extent, but the recommendation Precision (Precision value) is inferior to the collaborative filtering algorithm based on users. In fact, project-based collaborative filtering algorithms are superior to user-based collaborative filtering algorithms in most cases, not just in cases of sparse data. In addition, at the moment, the average candidate item sets of each user of the two algorithms are different in size, the average candidate item set of the users based on the collaborative filtering algorithm of the users is larger, the average candidate item set of the users can be made smaller by reducing the number of candidate neighbors, and if the average candidate item sets of the users of the two algorithms are made smaller; and if the average candidate item set sizes of the users of the two algorithms are almost the same (when UBCF takes 2 and IBCF takes 5), the recommendation precision of the collaborative filtering algorithm based on the users is far better than that of the latter algorithm.
The method analyzes the core flow of the traditional project-based collaborative filtering algorithm: firstly, generating a candidate item set; the scores of the active user for the items in the set of candidate items are then predicted. The size and accuracy of the candidate set will affect the final recommendation accuracy. The project-based collaborative filtering algorithm needs all the scored projects I e I for the user u u Operate to read the k nearest neighbor sets N for each scored item i ={i 1 ,i 2 ,…,i k H, merging all N i And delete I therefrom u And (4) obtaining a candidate item set C from the scored items.
Fig. 4 shows a variation curve of the size of the user average candidate item set obtained by the item-based collaborative filtering algorithm on the MovieLens data set along with the k value of the neighbor number of the candidate set, and in the experiment, the cosine similarity is used to calculate the similarity between items. The horizontal axis in the middle is the value of the number k of neighbors of the candidate set, the interval is 2, the horizontal axis is the size of the candidate item set averaged by the user, that is, the number of candidates in the candidate item set. It can be seen from the figure that the size of the average candidate set of items already exceeds 50 when k =2, the size of the candidate set of items also increases sharply with increasing k value, and the size of the average candidate set per user already exceeds 300 when k = 20.
Fig. 5 shows the proportion of the items of interest to the user in the candidate set to the total candidate set under the same experimental conditions, where the horizontal axis in the figure is the size of k values of the neighbor numbers of the candidate set, and the vertical axis is the percentage of the items of interest to the user. It can be seen from the figure that when k =2, there is a maximum value of about 12.2%, and as the value of the number k of similar-to-select-set neighbors increases, the proportion of the items of interest of the user in the candidate item set gradually decreases, and when k =20, the proportion has already decreased to about 4.8%. Obviously, the project-based collaborative filtering algorithm has a very small proportion of the projects in which the user is interested in the candidate project set. Therefore, the collaborative filtering based on the items does not consider the difference between users, which easily causes the items which are not interested by the users to be listed in the candidate item set, so the recommendation precision is poor, and the candidate item set is too large, which causes the time for predicting and scoring the items in the candidate set to increase, and directly affects the expandability of the system.
The following two tables respectively show the expression of the MAE values by using the MovieLens data set and the Jester data set and several similarity calculation methods, and the optimal value, the worst value and the average value of the MAE and the neighbor number value when the optimal value is obtained under each similarity calculation method are listed in the tables. The previous table shows the results of experiments on the MovieLens dataset in which the number of neighbors knear increased from 5 to 50 with an interval of 5. The latter table shows the results of experiments with Jester data sets where the neighbor number knear increased from 2 to 30 with an interval of 2. From the table, it can be seen that the cosine similarity is very excellent in performance from an optimal value or an average value, while the Pearson correlation coefficient is not excellent in performance when the data is sparse, because the number of common scoring users between items is small, the calculation result is easy to be large or small. From the table, it can be seen that the effect of the Pearson correlation coefficient is obviously improved, which is superior to other conventional algorithms, because the Pearson correlation coefficient obtains sufficient common scoring users as the sparsity decreases. In addition, it can be seen that the proposed modified Pearson correlation coefficient shows good performance both in the case of sparse data and in the case of improvement, and especially in the case of sparse data, the effect is far superior to that of the Pearson correlation coefficient. With the gradual reduction of the sparsity, the difference between the effects of the correction Pearson correlation coefficient and the Pearson correlation coefficient is gradually reduced, and at the moment, the effects of the correction Pearson correlation coefficient and the Pearson correlation coefficient are very good. Meanwhile, as can be seen from the table, when data is sparse, the convergence speed of the Pearson correlation coefficient is low, and when the neighbor number k =50, an optimal value is obtained, and the convergence is obviously improved along with the reduction of the sparsity. On the other hand, the convergence of the corrected Pearson correlation coefficient is always good in the case of sparse data and in the case of non-sparse data, and the optimal values are obtained when k =15 and k =10, respectively.
TABLE 1 methods of calculating the degrees of similarity comparison-1
Figure BDA0003895662740000161
Figure BDA0003895662740000171
TABLE 2 methods of calculating the degree of similarity comparison-2
Figure BDA0003895662740000172
The experiment adopts a comparison method, and the effectiveness of the proposed improved algorithm is verified through comparison with other recommended algorithms.
The experimental data of the experiment is a data set of 100K collected on the MovieLens website, which comprises 1682 movies and 943 users and 100000 pieces of rating data of the movies performed by the users, the rating of the users is further graded as 1,2,3,4,5, and each user has at least a rating of 20 movies. In the data set, there are m users U = { U = { U = } 1 ,u 2 ,u 3 ,......u m N movies M = { M 1 ,m 2 ,m 3 ......m n Then the rating of the user to the movie can be expressed by a matrix R of mxn. r is a radical of hydrogen ui Scoring movie i for user u, if notIs divided into ui =0。
Firstly, the similarity between users is evaluated and calculated by using Pearson correlation, vector cosine and modified vector cosine. And then the improved algorithm provided by the experiment is used for calculating the similarity among the users. Similarity is calculated from the data in the training set and then the score of the predicted user is based on the data in the prediction set. The method performs two experiments, the first one is that Top-N recommendation is selected when the neighbors of the target user are calculated. In the experiment, in order to verify the feasibility and the effectiveness of the proposed algorithm, different numbers of neighbors are selected. And secondly, changing the similarity threshold value one by one, adding a closed value of 0.1 each time, then verifying the precision of prediction, and gradually increasing the threshold value from 0.1 to 0.9.
Recommendation system performance evaluation criteria
The average absolute error ((MeanAbsloteErrorMAE) is used to measure the accuracy of the prediction result in the recommendation system 1 ,t 2 ,......t n Is equal to { p } for a corresponding set of actual scores 1 ,p 2 .....p n And then MAE is:
Figure BDA0003895662740000181
recall (Recall) and precision (precision) are two metrics that are widely used in the field of statistical classification and in the field of information retrieval, and are applied to recommendation systems to evaluate the quality of results. Wherein the recall rate is the probability that a recommendation list generated by the system contains resources scored by the user in the recommendation system. It may reflect the completeness of the user's interest. Setting the system as user u i The generated recommendation list is L i User u i The set of resources of interest in the test set is T i The calculation formula is as follows:
Figure BDA0003895662740000191
the precision is called the accuracy, as the name implies, it is considered how much is the ratio of the resources really interested by the user in the generated recommendation list, that is, the accuracy of predicting the user interest:
Figure BDA0003895662740000192
experimental data preparation and design
Data preparation, as shown in fig. 6 and 7, the experimental data of this experiment was a 1000,000 size data set collected on the MovieLens website, which included 943 users and 100000 scoring data by these users for 1682 movies, and the users scored 5 levels of 1,2,3,4,5, and each user scored at least 20 movies. The experiment chose 80% of the collected data set as the training set (base set) and the remaining 20% as the prediction set (test set).
As shown in fig. 8, the user-score table includes 4 fields, which are user ID (userid), project ID (itemid), score (rating), and timestamp (timestamp), respectively. Time was calculated in unix system starting from UTC time 1970, 1/1.
Design of experiments
Two groups of experiments are performed in the experiment, in the first group of experiments, a Pearson and vector cosine and corrected vector cosine similarity calculation method I is compared with a similarity calculation method based on a forgetting curve, the effectiveness of the proposed algorithm is verified by changing the number of neighbors, and the final experiment result is measured according to the MAE; in the second set of experiments, the effectiveness of the proposed algorithm is verified by changing the similarity threshold by using the pearson similarity and the similarity calculation based on the forgetting curve. As previously mentioned, the accuracy of the predicted score is measured by calculating the deviation of the predicted user score from the actual score, with lower MAE values indicating better recommendation quality.
Experiment one
In this set of experiments, the comparison was made by modifying the number of nearest neighbors. The predicted neighbors were increased from 10 to 100 by 10, as shown in fig. 9, and the experimental results of the conventional pilson similarity and the improved similarity algorithm (where the solid red line is the conventional algorithm and the dashed blue line is the improved algorithm, the same applies hereinafter). It can be seen from fig. 8 that the MAE values of the two algorithms are approximately equal when the number of neighbors is equal to 30, but the improved similarity algorithm MAE value is smaller when the number of neighbors continues to increase to 100, indicating that it works better.
Fig. 10 is an experimental result of a conventional vector cosine and forgetting curve-based vector cosine similarity algorithm. It can be seen from the figure that, along with the continuous increment of the number of neighbors, the MAE value of the improved algorithm is smaller than that of the traditional vector cosine, and the prediction is closer to accurate.
Fig. 11 is an experimental result of a conventional modified vector cosine and improved forgetting curve-based similarity algorithm. It can be known from the figure that in the process of continuously increasing the number of neighbors from 10 to 100, the MAE value of the improved algorithm is always smaller than that of the traditional correction vector cosine, which shows that when the similarity is calculated by using the correction vector cosine, the algorithm based on the forgetting curve can make the prediction more accurate.
The effectiveness of the proposed algorithm is verified by all the three experiments, and when the similarity between users is calculated, the change of the user to the resource score caused by natural forgetting should be considered, and the user interest changes along with the time. The accuracy of the prediction is improved by simulating the forgetting process of a human.
Experiment two
And verifying the effectiveness of the proposed algorithm by changing the similarity threshold by using the similarity calculation of the Pearson similarity and the forgetting curve. As previously mentioned, the deviation between the predicted user score and the actual score is calculated to measure the accuracy of the predicted score, with lower MAE values indicating better recommendation quality. The threshold range is increased from 0.1 to 0.9 in sequence, the increment is 0.1, the used neighbor users are all higher than the threshold in similarity through setting the good threshold, and then the experiments are respectively carried out. As shown in FIG. 12, it can be seen that when the threshold value is greater than 0.4, the value of MAE of the traditional Pearson correlation coefficient is significantly higher than that of the improved algorithm based on the forgetting curve, and when the threshold value is less than 0.4, the traditional method is better. That is, when the threshold value is greater than 0.4, the improved algorithm predicts somewhat more accurately.
In summary, from the results of two experiments, the calculation method based on the similarity of the forgetting curve is better than the traditional algorithm in general. In the recommendation system, the score of the user is attenuated by combining a natural rule and applying a forgetting rule expressed by an Ehrlich forgetting curve, so that the accuracy of system prediction can be obviously improved. This also indicates that human cognitive rules may play a significant role in the recommendation system.

Claims (1)

1. A collaborative filtering algorithm of a neural network recommendation system is characterized in that: the method comprises the following steps:
s1, firstly, establishing a convolutional neural network model for some previous behaviors of a user and the attributes of items by adopting a convolutional neural network, so that more accurate recommendation is provided for the user;
the data set mainly comprises two static objects, namely a user and a project, wherein the related attributes of the user comprise a user ID, a gender, an occupation and an age, and the project attributes comprise a project ID, a project type and a project name, firstly, the data are coded digitally, discrete variables are converted into continuous value vectors, category fields in the data, such as the project category attributes, are represented by characters, and the common processing mode is to convert the category fields into numerical values by adopting one-hot coding;
s2, extracting characteristics of the user and the project attributes of the user, and fitting the scores in a full-connection mode;
firstly, converting attribute values into numbers for representation, taking the converted numbers as indexes N of an embedded matrix, then allocating potential factors with fixed sizes to each index, namely the size of a vector, usually selecting a criminal, wherein the size of the matrix dimension is (N) 32, obtaining user characteristic vectors and project characteristic vectors after data related attributes are operated, and fitting score values;
s3, introducing an association rule idea, generating a project association matrix according to the project association relation, and generating a candidate project set by using the project association matrix;
the relevance represents the strength of the relation between the items, the relevance is defined as the possibility that a user browses one item and then another item, the relevance is represented by r, and r ij The association degree of the item j to the item i is represented, and the ratio of the co-browsing user ratio of the item i and the item j to the browsing user ratio of the item i is defined to be calculated as follows:
Figure FDA0003895662730000021
wherein, N i ,N j Respectively representing the number of users scoring the item i and the item j, and establishing an item association matrix G = nxn according to the item association degree, wherein r ij N, the main diagonal elements of the matrix are all 0, the degree of association between items is generally different, the magnitude of the degree of association indicates the strength of the relationship between items, and the item association matrix is generally asymmetric, i.e., r is ij ≠r ji
Figure FDA0003895662730000022
The relevance is introduced into the selection process of the candidate item set, and the item relevance matrix is used for replacing a similarity matrix to generate the candidate item set;
s4, aiming at the problem of poor similarity accuracy under the condition of sparse data, providing a modified Pearson correlation coefficient, establishing a user-scoring matrix, optimizing the matrix by using a forgetting function, calculating scoring similarity between a target user and other users, ranking according to the sequence from high score to low score and the scoring similarity, selecting a plurality of users arranged in the front as nearest neighbors of the target user, calculating a scoring predicted value of the target user for an unscored item according to the scoring information of the neighbors, and then selecting N items with the ranking predicted values being close to the front to recommend the users;
when selecting the common scoring data, the number of users scoring together is also considered, and if the proportion of the users scoring together is higher in the two items of scoring users, the two items are similar to each other to a certain extent, for example, the item i 1 Is scored by 10 users, item i 2 Is scored by 12 users, item i 3 Is scored by 50 users if item i 1 And i 2 And item i 1 And i 3 All have 8 common scoring users, then consider project i 1 And i 2 More similarly, item i 1 And i 2 There are 8 users who score together and represent two projects more similar than the case of 5 users who score together, for this reason, a modified Pearson correlation coefficient is proposed, the formula is as follows:
Figure FDA0003895662730000031
wherein, U i ,U j U represents the set of users scoring project i, project j, and both project i and project j, respectively, | U i |,|U j Respectively, representing the set U i ,U j Number of elements in U, r ui ,r uj Item i and item j are scored separately for user u,
Figure FDA0003895662730000032
average scores for item i and item j, respectively;
the algorithm is as follows:
(1) Environmental settings
Let user u be a new user to the current system, and the historical behavior information is null, but he already has the historical behavior information in other systems (such as social network sites, business network sites, video network sites) (either through authorization or the current system has a cross-site cookie);
(2) Construction of a user relationship network
The method comprises the following steps of constructing a relationship network among users according to historical information of the users in other systems, carrying out community division on the users, and using a similarity calculation formula when constructing the relationship network according to purchasing or grading information of the users, wherein the formula is as follows:
Figure FDA0003895662730000033
wherein r is u And r v Score vectors, covar (r), representing users u and v, respectively u ,r v ) Is represented by r u And r v The covariance of (a) is determined,
Figure FDA0003895662730000041
and
Figure FDA0003895662730000042
respectively represent r u And r v Standard deviation of medium non-zero elements [ 20%];
It should be noted that, a similarity of 0 indicates that no connecting edge should be added between the two corresponding users, and the number of the overlapped elements between the two vectors should be not less than three, otherwise, the similarity between the two corresponding users is considered as 0;
if a user has friends in a certain social network site, namely has a friend circle, the user can directly perform community division on the friends;
(3) Community partitioning
The community division adopts a method of maximizing a master module (PMM), the basic principle is to maximize a modularity index Q, and the Q measures the intensity of community division on a real network by measuring the difference between a random network with same degree distribution and the interconnection in a community, and is defined as follows:
Figure FDA0003895662730000043
wherein
Figure FDA0003895662730000044
S is an attribution relationship matrix of the node u and the community C, if the node u belongs to the community C, S uC If the number is 1, otherwise, the number is 0, B is a module matrix, A is an adjacent matrix of the node relationship, d is a degree sequence of all nodes, m is the total number of edges in the node relationship network, and Q is less than 0, which indicates that the partitioning effect is very poor;
because the multi-dimensional relationship among the users is considered due to the fact that the multi-dimensional relationship among the users comprises the scores of the users for the items, the friend relationships among the users and the like, the PMM performs community division according to the multi-dimensional relationship, and the element in the division result S shows how high the affiliation degree of the user u to the community C is;
(4) Change of selection similar neighbor policy
Selecting k neighbors with the maximum similarity from all users, selecting only the users in the same community with the target user u according to the method, and scoring the target user u for the prediction of the item j:
Figure FDA0003895662730000051
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003895662730000052
w is the average score of user u, a is a constant, C is the community in which user v is located, r V,j For the user v to score the item j,
Figure FDA0003895662730000053
and "is the average score of user v.
CN202211273752.6A 2022-10-18 2022-10-18 Collaborative filtering algorithm of neural network recommendation system Pending CN115618127A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211273752.6A CN115618127A (en) 2022-10-18 2022-10-18 Collaborative filtering algorithm of neural network recommendation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211273752.6A CN115618127A (en) 2022-10-18 2022-10-18 Collaborative filtering algorithm of neural network recommendation system

Publications (1)

Publication Number Publication Date
CN115618127A true CN115618127A (en) 2023-01-17

Family

ID=84863098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211273752.6A Pending CN115618127A (en) 2022-10-18 2022-10-18 Collaborative filtering algorithm of neural network recommendation system

Country Status (1)

Country Link
CN (1) CN115618127A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117574177A (en) * 2024-01-15 2024-02-20 每日互动股份有限公司 Data processing method, device, medium and equipment for user wire expansion

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117574177A (en) * 2024-01-15 2024-02-20 每日互动股份有限公司 Data processing method, device, medium and equipment for user wire expansion
CN117574177B (en) * 2024-01-15 2024-04-19 每日互动股份有限公司 Data processing method, device, medium and equipment for user wire expansion

Similar Documents

Publication Publication Date Title
US11593894B2 (en) Interest recommendation method, computer device, and storage medium
Lee et al. MONERS: A news recommender for the mobile web
CN105069122B (en) A kind of personalized recommendation method and its recommendation apparatus based on user behavior
US20090287687A1 (en) System and method for recommending venues and events of interest to a user
Sachan et al. A survey on recommender systems based on collaborative filtering technique
CN105159910A (en) Information recommendation method and device
Eliyas et al. Recommendation systems: Content-based filtering vs collaborative filtering
Pipergias Analytis et al. The structure of social influence in recommender networks
CN112948625A (en) Film recommendation method based on attribute heterogeneous information network embedding
Truyen et al. Preference networks: Probabilistic models for recommendation systems
Thomas et al. Comparative study of recommender systems
CN115618127A (en) Collaborative filtering algorithm of neural network recommendation system
CN110059257B (en) Project recommendation method based on score correction
Dwivedi et al. An Item-based Collaborative Filtering Approach for Movie Recommendation System
CN110543601B (en) Method and system for recommending context-aware interest points based on intelligent set
Salehi Latent feature based recommender system for learning materials using genetic algorithm
CN108763515B (en) Time-sensitive personalized recommendation method based on probability matrix decomposition
Sani et al. A new strategy in trust-based recommender system using k-means clustering
Zhang et al. A recommender system for cold-start items: a case study in the real estate industry
CN113034231B (en) Multi-supply chain commodity intelligent recommendation system and method based on SaaS cloud service
Regi et al. A survey on recommendation techniques in E-Commerce
Darvishy et al. New attributes for neighborhood-based collaborative filtering in news recommendation
CN104641386A (en) Method and apparatus for obfuscating user demographics
Thendral et al. Clustering based transfer learning in cross domain recommender system
Xiaoyi et al. A hybrid collaborative filtering model with context and folksonomy for social recommendation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination