CN109783734A - A kind of mixing Collaborative Filtering Recommendation Algorithm based on item attribute - Google Patents

A kind of mixing Collaborative Filtering Recommendation Algorithm based on item attribute Download PDF

Info

Publication number
CN109783734A
CN109783734A CN201910042488.7A CN201910042488A CN109783734A CN 109783734 A CN109783734 A CN 109783734A CN 201910042488 A CN201910042488 A CN 201910042488A CN 109783734 A CN109783734 A CN 109783734A
Authority
CN
China
Prior art keywords
user
item
similarity
attribute
project
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910042488.7A
Other languages
Chinese (zh)
Other versions
CN109783734B (en
Inventor
胡湘
付彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN201910042488.7A priority Critical patent/CN109783734B/en
Publication of CN109783734A publication Critical patent/CN109783734A/en
Application granted granted Critical
Publication of CN109783734B publication Critical patent/CN109783734B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of mixing Collaborative Filtering Recommendation Algorithm based on item attribute, comprising the following steps: step 1: user-project rating matrix and project-attribute matrix are generated according to user's score information and project information;Step 2: item similarity and user's scoring similarity are calculated separately according to project-attribute matrix and user-project rating matrix;Step 3: user's scoring similarity being modified using item similarity to calculate user's similarity;Step 4: the common scoring reward factor of calculating, item attribute preference heterogeneity and user's confidence factor obtain the final similarity of user to correct user's similarity;Step 5: the arest neighbors of target user is chosen according to final similarity, scoring of the score information prediction target user to each project based on users all in arest neighbors;Step 6: the highest N number of project recommendation that will score is to target user.The present invention can be suitable for sparse data, and can improve and recommend accuracy.

Description

Mixed collaborative filtering recommendation algorithm based on project attributes
Technical Field
The invention belongs to the field of recommendation systems, relates to an information recommendation technology, and particularly relates to a mixed collaborative filtering recommendation algorithm based on project attributes.
Background
With the development of internet information technology, people gradually enter an information overload era from an information deficiency era. In such an information overload era, the personalized recommendation system can recommend information which may be interested to users by applying an information filtering technology, and help the users to quickly find needed information resources or commodities. In recent years, recommendation systems have been widely used in various fields such as e-commerce, music, social networks, and medicine, and have become one of the research hotspots in the industry and academia.
Among personalized recommendation algorithms, the collaborative filtering recommendation algorithm is one of the most widely used and most successful techniques. The neighborhood-based collaborative filtering recommendation algorithm is widely applied because the collaborative filtering recommendation algorithm can help users to find new categories of items and has strong recommendation performance. The collaborative filtering recommendation algorithm based on the neighborhood firstly discovers a similar user group of users, combines the rating information of the similar users on items, carries out item rating prediction on the appointed users, and finally recommends the items with higher scores. From the above, in the collaborative filtering recommendation algorithm based on the neighborhood, the user similarity measure is crucial to determine the final recommendation effect.
The conventional user similarity measurement method has the following disadvantages: 1. user similarity is usually calculated only by using common scoring information of users (common scoring information is scoring information of different users for the same project), and the common scoring information is rare under the condition that user-project scoring matrix data is sparse, which causes that the traditional user similarity measurement method cannot be applied; 2. user similarity metric indexes generally only utilize scoring information, and do not effectively utilize project attribute information; 3. the user similarity measurement index ignores influence factors of the user similarity, and comprises the following steps (1) that the user similarity of the same item and the item attribute is higher without considering the excessive evaluation; (2) no consideration is given to the different preferences of different users for the attributes of the items; (3) the scoring information of the user is not necessarily true and credible.
Due to the defects of the traditional user similarity measurement method, the collaborative filtering algorithm based on the traditional user similarity measurement method has the problems that the collaborative filtering algorithm cannot be applied to sparse data and the recommendation accuracy is not high.
Disclosure of Invention
The invention aims to solve the technical problem that aiming at the defects of the prior art, the invention provides a mixed collaborative filtering recommendation algorithm based on project attributes, which can be suitable for sparse data and can improve the recommendation accuracy.
The technical scheme provided by the invention is as follows:
a hybrid collaborative filtering recommendation algorithm based on project attributes comprises the following steps:
step 1: acquiring user behavior data and project information data, and generating a user-project scoring matrix and a project-attribute matrix according to user scoring information and project information;
step 2: respectively calculating the project similarity and the user rating similarity according to the project-attribute matrix and the user-project rating matrix;
and step 3: correcting the user scoring similarity by using the project similarity to calculate the user similarity;
and 4, step 4: calculating a common score rewarding factor, a project attribute preference factor and a user confidence factor, and correcting the user similarity according to the factors to obtain the final user similarity;
and 5: for the target user, selecting the nearest neighbor (similar user group) of the target user according to the final user similarity, and predicting the score of the target user on each project according to the score information of all users in the nearest neighbor;
step 6: and recommending the N items with the highest scores to the target user.
Further, since the scoring data is usually very sparse, the common scoring items among users are more sparse, the conventional similarity measurement methods are all limited to the common scoring items, and generally, the number of items evaluated by a user is often more than one, and if the similarity of the user can be calculated by effectively utilizing all evaluated information of the user, the application range of the algorithm is greatly expanded. In addition, considering that if the two projects are completely dissimilar, even if the scores of the two projects are similar, the similarity of the users can be lower, and the similarity of the users is higher only if the scores of the users are similar to the scores of the different projects and the projects are also similar, the invention corrects the user score similarity by using the project similarity to calculate the user similarity. The method does not depend on the common score item for calculating the similarity of the users, can well adapt to sparse data, and has strong interpretability.
Defining a score r of a user u on an item iuiRating r of item j with user vvjHas a score similarity of Spss(rui,rvj) The calculation formula is as follows:
Spss(rui,rvj)=Proximity(rui,rvj)·Significance(rui,rvj)·Singularity(rui,rvj)
wherein, Proximity (r)ui,rvj) Represents a score pair (r)ui,rvj) Absolute difference of (1), Significance (r)ui,rvj) Is represented by ruiAnd rvjThe difference from the median of all scores given by users u and v, Singularity (r)ui,rvj) Is represented by ruiAnd rvjThe difference in variance of all scores obtained for item i and item j is calculated as follows:
wherein r isu*medAnd rv*medRepresents the median, μ, of all scores given by user u and user v, respectivelyiAnd mujRepresenting the variance of all scores obtained for item i and item j, respectively.
Further, the present invention introduces an item-attribute matrix to calculate the similarity between two items. With a set A ═ A1,A2,…,Ax,…,Ak) To represent the attributes of all items, all item attribute features in the recommendation system can be represented by a matrix table An×kWhere n is the total number of items and k is the total number of item attributes. If item i has attribute AxThen A isix1, otherwise Aix0. Since the attributes of the items are identified by boolean values, the Jaccard coefficients measure boolean similarity. Therefore, the method adopts the Jaccard formula to calculate the item similarity. Defining the similarity of the items i and j as Sitem(i, j) which is calculated by the formula:
wherein S isiAnd SjRespectively representing the attribute sets of the item i and the item j, and | represents the number of elements in the set.
Further, the similarity between the user u and the user v is defined as S (u, v), and the calculation formula is as follows:
wherein, IuAnd IvRepresenting the sets of items evaluated by user u and user v, respectively.
Further, in order to effectively improve the accuracy of the algorithm, the invention considers the following aspects: the user similarity is higher when the same item and the same attribute are evaluated, and in order to effectively increase the proportion of the common score items, the common score rewarding factor is introduced; the user has different preferences for the attributes of the items, and the interests of the users with the same attribute preferences are more similar, so that the invention introduces a user item attribute preference factor; the information of the user is unreliable, and a user confidence factor is introduced into the invention.
The more attributes or items that a conventional similarity measure ignores and are evaluated by two users in common, the greater the similarity of the users. To overcome this problem, the present invention introduces a common scoring item and common scoring attributes to calculate a user's common scoring reward factor. Defining the joint scoring reward factor of the user u and the user v as Sbonus(u, v) the calculation formula is:
wherein E isuAnd EvRespectively representing the attribute sets of the items evaluated by the user u and the user v, and showing that the more the common scoring items and the common scoring attributes are, the more the similarity between the two users isIs large.
In fact, users have different preferences for the attributes of the items, and if they have the same preferences for the attributes, they are more similar, just as if two users both like the "art" attribute, they are more similar. Conventional user preference similarity measures are based on a user rating matrix to calculate similarity of user rating preferences (i.e., towards high or low scores) regardless of user preferences for attributes of items. The individual item scores do not reflect the true preferences of the user, and therefore, the present invention introduces item attributes to calculate user preference similarity. When calculating the preference of the user to the project attribute, firstly calculating the importance of the project attribute to the user according to the historical behavior of the user, and then calculating the preference of the user to the project attribute by combining the user score. In addition, the invention considers that the interest of the user can be changed, the factors influencing the interest of the user are many, and the interest can be changed along with the factors such as time, environment and the like, wherein the time is one of the most important factors. The longer the user scores the item, the greater the decay in the user's interest. Therefore, the invention introduces time attenuation due toFor any user u, any attribute AxDefining user u pair attribute AxHas a weighted preference degree ofThe calculation formula is as follows:
wherein,represents attribute AxAs to the degree of importance of the user u,representing user u to attribute AxThe degree of preference;
user u pairs attribute AxThe more the evaluation times of (A), the attribute AxThe higher the importance to user u, and therefore the invention will beDefined as the following equation:
wherein,representing user u pair having attribute AxThe number of evaluations of item (u), Count (u)A) Representing the total number of attributes of the items evaluated by the user u, and count (u) representing the total number of the items evaluated by the user u;
and if the user u pair has attribute AxIs much higher than the average of all the item scores given by user u, and user u has attribute A forxIs relatively close in time to the last evaluation of the item, i.e. the user is to AxIs less attenuated (conversely, the more attenuated the user interest), the user is interested in the attribute AxThe preference degree of (2) is larger. Therefore, the invention willDefined as the following equation:
wherein,represents the average of all the item scores given by user u;is an introduced time decay function, where t is the attribute A of the user u pairxThe interval between the time of the latest evaluation of the item and the current time is t ≧ 0, and λ is the attenuation coefficient.
And finally, calculating item attribute preference factors of the two users u and v by using cosine similarity, wherein the formula is as follows:
therefore, the item attribute preference factor comprehensively considers the scoring preference and the item attribute preference of the user, and the defect that the traditional similarity calculation method only considers the scoring preference of the user is overcome.
Conventional recommendation systems are based on the fact that user data is authentic. However, the user feedback data collected in real life contains a large amount of unreliable data, which may interfere with the final prediction result, and the reason for the unreliable is as follows: (1) the user may submit some random value as their feedback value. (2) A user may intentionally score certain services too high/too low for some benefit and score other services too low/too high. (3) The user faithfully submits the scoring data, but the data is in an abnormal range due to interference of network environment, equipment and the like. If the unreliable data are not processed, the final prediction result is seriously interfered, and a recommendation algorithm is negatively influenced. To reduce the effects of noisy data, the present invention introduces a confidence factor. For any user u, defining the confidence factor as Sconf(u), the calculation formula is as follows:
wherein,represents the average of all scores obtained for item i, r*imedRepresents the median of all scores obtained for item i. As can be seen from the above formula, if user u always has a score for the item that deviates from the average or median score for the item, his confidence level is low. The increased confidence factor can effectively reduce the influence of the untrusted user, so that the prediction is more accurate.
Further, based on the joint scoring reward factor, the item attribute preference factor and the user confidence factor, the user similarity is modified, the final modified similarity between the user u and the user v is defined as Sim (u, v), and the calculation formula is as follows:
Sim(u,v)=Sbonus(u,v)·Spre(u,v)·Sconf(u)·S(u,v)
further, there are two methods for selecting the nearest neighbor:
1) the first method is to select K users with the highest similarity to the target user, i.e., a K-nearest neighbor (KNN) method. The method needs to preset neighbor number K, sorts the similarity according to descending order, and then K users at the top of the sorting are the nearest neighbors of the target user;
2) the second method is to set a nearest neighbor similarity threshold. The method is mainly characterized in that a similarity threshold value is preset, and a user with the similarity value larger than the threshold value is found out to be used as the nearest neighbor of a target user.
In an actual recommendation system, because the similarity threshold is difficult to set, the neighbor selection is usually performed by using KNN.
After the above steps, the scoring of the target user for each item is predicted according to the scoring information of all the users in the nearest neighbor, and the N items with the highest scoring are recommended to the user. The formula for defining the score prediction of the invention is as follows:
wherein, PuiAnd denotes the predicted score of the target user u for the item i, and Nu is the nearest neighbor of the user u.
Has the advantages that:
the user similarity measurement designed by the invention is not limited to the common scoring information of the users, the influence of the non-common scoring information (scoring information on different projects) of the users on the user similarity is effectively considered, the similarity of the users can be extremely low even if the scores of the users are similar if the two different projects are completely dissimilar, and the users are similar only if the scores of the users are similar and the projects are similar, so that the similarity of the users is corrected by the project similarity. Meanwhile, in order to effectively improve the recommendation accuracy, the method is not limited to scoring data any more, the project attributes are effectively integrated into an algorithm, influence factors of user similarity are fully considered from all aspects, and three influence factors are defined: the first influence factor is a joint scoring reward factor, which considers that the more items or attributes that the user scores together, the higher the similarity of the user, and effectively increases the proportion of the joint scoring items. The second influencing factor is an item attribute preference factor that effectively distinguishes between different user preferences for item attributes. The third influence factor is user confidence, and the third influence factor considers that the scoring data of the user is not credible, so that the influence of noise data is effectively reduced, and the reliability of algorithm output is improved. Has the following advantages:
1. in the invention, the common scoring item is not a necessary condition for calculating the similarity of the users; the similarity measurement provided by the invention does not depend on common scoring items between two users, effectively utilizes all scoring information of the users, and can well solve the problem of data sparsity.
2. The algorithm overcomes the defect that the traditional algorithm only utilizes the scoring information to calculate the similarity, effectively utilizes the attribute information of the project, and simultaneously introduces three influence factors (a joint scoring reward factor, a project attribute preference factor and a user confidence factor) to distinguish the difference between the users, thereby effectively improving the reliability and the accuracy of the algorithm.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a graph showing the change of NMAE and MAME indexes under different nearest neighbor numbers for each recommendation algorithm based on an ML-test-Small data set, wherein FIG. 2(a) is a graph showing the change of the NMAE indexes, and FIG. 2(b) is a graph showing the change of the MAME indexes;
FIG. 3 is a graph of changes of F1 indexes of various recommendation algorithms based on an ML-test-Small data set under different nearest neighbor numbers;
fig. 4 is a graph of the NSP index and the NPP index of each recommendation algorithm based on the ML-test-Small data set, where fig. 4(a) is a graph of the NSP index and fig. 4(b) is a graph of the NPP index;
FIG. 5 is a graph showing the change of the MAE and NMAE indices of the respective recommendation algorithms based on the ML-1M dataset under different nearest neighbor numbers, wherein 5(a) is a graph showing the change of the NMAE indices, and 5(b) is a graph showing the change of the MAME indices;
FIG. 6 is a graph showing the variation of the F1 index for each recommendation algorithm based on the ML-1M data set at different nearest neighbor numbers;
fig. 7 is a graph showing the change of the NSP and NPP indices for each recommendation algorithm based on the ML-1M dataset with different numbers of nearest neighbors, where fig. 7(a) is a graph showing the change of the NSP index and fig. 7(b) is a graph showing the change of the NPP index.
Detailed Description
For more specific description of the present invention, the following describes the implementation steps of the present invention with reference to fig. 1:
step 1: obtaining user behavior data, firstly establishing a user-item scoring matrix Rm×nWhere m is the total number of users, n is the total number of items, ruiRepresenting the value of the user u's credit to item i.
User-item scoring matrix Rm*n
Establishing an item-attribute matrix A according to the item information datan×kWhere k is the total number of item attributes, AixIndicating whether item i has attribute x, if item i has attribute x, then AixOtherwise, it is zero.
Step 2: and respectively calculating the similarity between the items and the score similarity of the user according to the item-attribute matrix and the user-item score matrix. Defining a calculation formula of the scoring similarity of the user as follows:
Spss(rui,rvj)=Proximity(rui,rvj)*Significance(rui,rvj)*Singularity(rui,rvj)
wherein the expressions of these three formulas are as follows:
wherein, Proximity (r)ui,rvj) Represents a score pair (r)ui,rvj) Absolute difference of (1), Significance (r)ui,rvj) Is represented by ruiAnd rvjThe difference from the median of all scores given by users u and v, Singularity (r)ui,rvj) Is represented by ruiAnd rvjVariance of all scores obtained for item i and item j; r isu*medAnd rv*medRepresents the median, μ, of all scores given by user u and user v, respectivelyiAnd mujRepresenting the variance of all scores obtained for item i and item j, respectively.
Only if the items are similar and the scores are similar, the users are similar, so the similarity of the users is corrected by the similarity of the items, and the similarity of the items i and j is defined as Sitem(i, j) which is calculated by the formula:
wherein S isiAnd SjRespectively representing the attribute sets of the item i and the item j, |, representing the number of elements in the set, | Si∩SjI represents the number of attributes that item i and item j have at the same time, | Si∪SjAnd | represents the sum of the number of attributes of the item i and the item j. It can be seen that the greater the item similarity, the higher the share ratio of the common attributes owned by the items.
And step 3: and calculating a joint scoring reward factor, a project attribute preference factor and a user confidence factor, and correcting the user similarity by using the three influence factors.
Common score reward factor SbonusThe calculation formula is as follows:
wherein E isuAnd EvThe attribute sets of the items evaluated by the user u and the user v are respectively represented, and it can be seen that the more the common scoring items and the common scoring attributes are, the greater the similarity between the two users is.
Item attribute preference factor SpreThe calculation method is as follows:
first, for any user u, any attribute AxThe attribute A is calculated by the following formulaxDegree of importance to user u
Wherein,representing user u pair having attribute AxThe number of evaluations of item (u), Count (u)A) Representing the total number of attributes of the items evaluated by the user u, and count (u) representing the total number of the items evaluated by the user u;
and calculating the attribute A of the user u pair by the following formulaxDegree of preference of
Wherein n is the total number of items,represents the average of all the item scores given by user u;is an introduced time decay function, where t is the attribute A of the user u pairxThe interval between the latest evaluation time of the item and the current time is t more than or equal to 0, and lambda is an attenuation coefficient; a. theixFor indicating whether item i has attribute AxIf item i has attribute AxThen A isix1, otherwise Aix=0;
Then, the attribute A of the user u pair is calculated by the following formulaxHas a weighted preference degree of
Then, the cosine similarity is used to calculate item attribute preference factors S of two users u and vpreThe formula for (u, v) is as follows:
where k is the total number of item attributes.
For any user u, defining the confidence factor as Sconf(u), the calculation formula is as follows:
wherein,represents the average of all scores obtained for item i, r*imedRepresents the median of all scores obtained for item i. From aboveAs can be seen by the formula, if a user's rating of an item always deviates from the average or median rating of the item, his confidence level is low.
The final similarity calculation formula of the user is as follows:
Sim(u,v)=Sbonus(u,v)·Spre(u,v)·Sconf(u)·S(u,v)。
and 4, step 4: and sorting the similarity user groups in a descending order, selecting the first k users with the highest similarity as the similar user groups of the target users, and predicting the item scores according to the scoring information of the similar users on the items.
The formula for defining the score prediction of the invention is as follows:
wherein, PuiAnd denotes the predicted score of the target user u for the item i, and Nu is the nearest neighbor of the user u.
And 5: and selecting the N items with the highest scores to recommend to the specified user.
Experiment: this experiment can be further illustrated by the following simulation experiment results:
simulation experiment data and evaluation indexes:
the invention adopts public data sets ML-last-Small and ML-1M as data sets for algorithm test and verification. The description of each data set is shown in table 2:
data set Number of users M Number of items N Score R Degree of sparseness/%)
ML-Latest-Small 706 8570 100023 1.7
ML-1M 6040 4000 1000209 4.1
This example randomly selected eighty percent of the data as the training set, and the remaining twenty percent was used for the experimental test set.
The present embodiment uses three evaluation indexes to evaluate the validity of the recommendation result.
The first type of evaluation index measures the predicted error rate, i.e. the difference between the actual score and the predicted score. The MAE is the most common indicator for measuring the accuracy of recommendation algorithms, and is calculated by comparing predicted values with actual values. NMAE is a standard version of MAE by dividing by the span of scores. MAE and nmee were obtained from the following formulae:
wherein r ismaxAnd rminRepresenting the maximum score and the minimum score in the user score table.
The second evaluation index measures the recommendation precision: accuracy P, recall C and F1 are typically used for measurements. The accuracy rate refers to the proportion of the items really liked by the user in the recommendation list to all the items in the recommendation list. The recall ratio recommends the proportion of items that the user really likes to all items that the user likes in their entirety. F1 is the comprehensive index of accuracy and recall. The corresponding calculation formula is as follows:
wherein R isu,pAnd Ru,aR, respectively, a recommended item set and a user's true favorite item setu,p∩Ru,aAnd | represents the number of items really liked by the user in the recommendation list. F1The value is a comprehensive evaluation index of P and C, and the larger the F1 value is, the better the recommendation effect is.
The last category of evaluation indicators are the Number of Successful Predictions (NSP) and the number of correct predictions (NPP). Since the conventional recommendation algorithm is based on common scoring terms, under the condition of sparse data, similar neighborhoods may not exist, so that prediction cannot be performed, and therefore NSP and NPP are also important indexes. NSP is the number of times the algorithm effectively predicts the score, regardless of whether the prediction is accurate. NPP is the number of items that the algorithm correctly predicts the actual score in the test set, which emphasizes the prediction accuracy of the algorithm.
The smaller the values of MAE and nmee, the smaller the prediction grade error. F1 represents the precision of the algorithm, and the larger F1 is, the better the recommendation effect is. NSP and NPP indicate success rate and accuracy, clearly the larger the better.
In order to verify the effectiveness of the invention, the invention compares the experiment with the following algorithms which are proposed in recent years: PCC, PIP, NHSM, HUS, CFBJ.
In the collaborative filtering algorithm, the nearest neighbor number plays an important role in the recommendation performance, so the embodiment will compare the performance of the algorithm under different nearest neighbor numbers. The results of the comparative experiments are shown in FIGS. 2 to 7.
1) ML-last-Small dataset
Fig. 2 shows the recommendation error for each recommendation algorithm. As can be seen from the figure, the average absolute error of the traditional collaborative filtering algorithms such as PCC and PIP algorithms is higher, but the collaborative filtering algorithm provided by the invention effectively utilizes the information of the non-common score items, thereby greatly reducing the error. Meanwhile, due to the fact that the project attribute information is effectively utilized and various influence factors are considered, the algorithm has better recommendation precision than other algorithms, and the average absolute error is smaller than 0.7.
Figure 3 shows the accuracy of each recommendation algorithm. The recommendation precision of each algorithm is improved along with the increase of the number of the similar neighborhoods. This is because as KNN increases, the user nearest neighbor number reduces the influence of neighbor selection errors. At the same time, it is clear that the algorithm of the present invention is superior to other algorithms. The accuracy of the conventional collaborative filtering algorithm is less than 0.6. In contrast, the new algorithm HUS is relatively high in recommendation accuracy because not only the common item information is used, but also the asymmetry of the similarity is taken into consideration. However, the algorithm only considers the similarity of the user items from the point of view of the scoring probability, and does not effectively consider the influence factors of the similarity, so the accuracy is still lower than that of the algorithm of the invention.
Fig. 4 shows the NSP and NPP as a function of the number of nearest neighbors on the ML-last-small. As is apparent from fig. 4, as the number of nearest neighbors increases, both NSP and NPP increase. Among them, the newly proposed algorithm is far superior to the conventional CF algorithm. The highest NSP and NPP of the conventional algorithm are below 13500 and 5000, respectively, while the highest NSP and NPP of the newly proposed algorithm are above 15000 and 6000, respectively. The NSP and NPP values of the algorithm provided by the invention are highest and are respectively close to 20000 and 8000.
2) ML-1M data set
To verify the applicability and versatility of the algorithm, the present invention also performed relevant experiments on the ML-1M dataset. As can be seen from fig. 5, in the optimal case, the MAE of the algorithm is about 0.73, the MAE of the conventionally recommended algorithm is about 0.85, and the MAE of the newly proposed algorithm is greater than 0.77. The conventional recommended algorithm has an optimum NMAE of about 0.22, the other algorithms have an optimum NMAE of about 0.19 or more, and the present invention has an optimum NMAE of about 0.16. As shown in fig. 6, the accuracy of the present invention increases as the number of nearest neighbors increases. Meanwhile, the precision of the algorithm is obviously higher than that of other algorithms, and the optimal solution is 0.68. As can be seen from fig. 7, both NSP and NPP improve as the number of nearest neighbors increases. Also, the latest algorithms are much better than the traditional collaborative filtering algorithms. The maximum NSP and the maximum NPP recommended by the algorithm are 19600 and 8200 respectively, and other algorithms are lower than the algorithm of the invention.
The method effectively considers the influence of the non-common scoring information of the user on the similarity of the user, thereby effectively solving the problem of data sparsity; the influence factors of the similarity of several users provided by the invention greatly improve the recommendation accuracy: the common scoring reward factor considers that the similarity of users who have evaluated the same item or attribute is higher, and effectively improves the proportion of the common scoring item; the item attribute preference factor effectively distinguishes different attribute preferences of users by considering different preferences of different users to the items. The confidence coefficient factor considers that the user scoring information has unreliability, reduces the influence of noise data, and improves the reliability of model output. Experimental results show that the method solves the problem of data sparsity existing in the traditional collaborative filtering recommendation algorithm, greatly reduces the recommendation error and improves the recommendation accuracy.

Claims (10)

1. A hybrid collaborative filtering recommendation algorithm based on project attributes, comprising the steps of:
step 1: acquiring user behavior data and project information data, and generating a user-project scoring matrix and a project-attribute matrix according to user scoring information and project information;
step 2: respectively calculating the project similarity and the user rating similarity according to the project-attribute matrix and the user-project rating matrix;
and step 3: correcting the user scoring similarity by using the project similarity to calculate the user similarity;
and 4, step 4: calculating a common score rewarding factor, a project attribute preference factor and a user confidence factor, and correcting the user similarity according to the factors to obtain the final similarity of the user;
and 5: for the target user, selecting the nearest neighbor according to the final similarity, and predicting the score of the target user on each project according to the score information of all users in the nearest neighbor;
step 6: and recommending the N items with the highest scores to the target user.
2. The hybrid collaborative filtering recommendation algorithm based on item attributes according to claim 1, wherein in step 2, a user u scores r for an item iuiRating r of item j with user vvjSimilarity of (2)pss(rui,rvj) The calculation formula of (2) is as follows:
Spss(rui,rvj)=Proximity(rui,rvj)·Significance(rui,rvj)·Singularity(rui,rvj)
wherein, Proximity (r)ui,rvj) Represents a score pair (r)ui,rvj) Absolute difference of (1), Significance (r)ui,rvj) Is represented by ruiAnd rvjThe difference from the median of all scores given by users u and v, Singularity (r)ui,rvj) Is represented by ruiAnd rvjThe difference in variance of all scores obtained for item i and item j is calculated as follows:
wherein r isu*medAnd rv*medRepresents the median, μ, of all scores given by user u and user v, respectivelyiAnd mujRepresenting the variance of all scores obtained for item i and item j, respectively.
3. The mixed collaborative filtering recommendation algorithm based on item attributes according to claim 2, wherein in step 2, the similarity of the items i and j is SitemThe calculation formula of (i, j) is:
wherein S isiAnd SjRespectively representing the attribute sets of the item i and the item j, and | represents the number of elements in the set.
4. The hybrid collaborative filtering recommendation algorithm based on item attributes according to claim 3, wherein in step 3, the similarity S (u, v) between the user u and the user v is calculated according to the following formula:
wherein, IuAnd IvRepresenting the sets of items evaluated by user u and user v, respectively.
5. The hybrid collaborative filtering recommendation algorithm according to claim 4, wherein in step 4, the joint scoring reward factor S of user u and user vbonusThe formula for the calculation of (u, v) is:
wherein E isuAnd EvThe attribute sets of the items evaluated by the user u and the user v are respectively represented, and it can be seen that the more the common scoring items and the common scoring attributes are, the greater the similarity between the two users is.
6. The hybrid collaborative filtering recommendation algorithm based on item attributes according to claim 5, wherein in step 4, the item attribute preference factor is calculated by:
first, for any user u, any attribute AxThe attribute A is calculated by the following formulaxDegree of importance to user u
Wherein,representing user u pair having attribute AxThe number of evaluations of item (u), Count (u)A) Representing the total number of attributes of the items evaluated by the user u, and count (u) representing the total number of the items evaluated by the user u;
and calculating the attribute A of the user u pair by the following formulaxDegree of preference of
Wherein n is the total number of items,represents the average of all the item scores given by user u;is an introduced time decay function, where t is the attribute A of the user u pairxThe interval between the latest evaluation time of the item and the current time is t more than or equal to 0, and lambda is a set attenuation coefficient; a. theixFor indicating whether item i has attribute AxIf item i has attribute AxThen A isix1, otherwise Aix=0;
Then, the attribute A of the user u pair is calculated by the following formulaxHas a weighted preference degree of
Finally, calculating item attribute preference factors S of two users u and v by using cosine similaritypreThe formula for (u, v) is as follows:
where k is the total number of item attributes.
7. The hybrid collaborative filtering recommendation algorithm according to claim 6, wherein in step 4, for any user u, the confidence factor S isconf(u) the calculation formula is:
wherein,represents the average of all scores obtained for item i, r*imedRepresents the median of all scores obtained for item i.
8. The hybrid collaborative filtering recommendation algorithm based on item attributes according to claim 7, wherein in step 4, the final similarity Sim (u, v) between the user u and the user v is calculated according to the following formula:
Sim(u,v)=Sbonus(u,v)·Spre(u,v)·Sconf(u)·S(u,v)。
9. the project attribute-based hybrid collaborative filtering recommendation algorithm according to any one of claims 1-8, characterized in that in step 5, the nearest neighbor of the target user is selected by any one of the following methods:
1) setting neighbor number K, and selecting K users with highest final similarity with the target user as the nearest neighbors of the target user;
2) and setting a similarity threshold of the nearest neighbors, and finding out the user with the final similarity greater than the similarity threshold with the target user as the nearest neighbors of the target user.
10. The project attribute-based hybrid collaborative filtering recommendation algorithm according to any one of claims 1-8, characterized in that in step 6, the score of the target user for each project is predicted according to the score information of all the users in the nearest neighbor, and the formula is as follows:
wherein, PuiRepresents the predicted rating of the target user u for the item i,represents the average of all scores obtained for item i, rviRepresents the score of the user v on the item i, Sim (u, v) represents the final similarity of the user u and the user v, and Nu is the nearest neighbor of the user u.
CN201910042488.7A 2019-01-17 2019-01-17 Mixed collaborative filtering recommendation algorithm based on project attributes Active CN109783734B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910042488.7A CN109783734B (en) 2019-01-17 2019-01-17 Mixed collaborative filtering recommendation algorithm based on project attributes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910042488.7A CN109783734B (en) 2019-01-17 2019-01-17 Mixed collaborative filtering recommendation algorithm based on project attributes

Publications (2)

Publication Number Publication Date
CN109783734A true CN109783734A (en) 2019-05-21
CN109783734B CN109783734B (en) 2021-03-19

Family

ID=66500858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910042488.7A Active CN109783734B (en) 2019-01-17 2019-01-17 Mixed collaborative filtering recommendation algorithm based on project attributes

Country Status (1)

Country Link
CN (1) CN109783734B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309864A (en) * 2019-06-18 2019-10-08 北京化工大学 A method of the collaborative filtering recommending scheme of fusion local similarity and global similarity
CN110427567A (en) * 2019-07-24 2019-11-08 东北大学 A kind of collaborative filtering recommending method based on user preference Similarity-Weighted
CN111191178A (en) * 2019-12-30 2020-05-22 广州市百果园网络科技有限公司 Information pushing method and device, server and storage medium
CN111625707A (en) * 2020-05-29 2020-09-04 北京字节跳动网络技术有限公司 Recommendation response method, device, medium and equipment
CN111639268A (en) * 2020-06-01 2020-09-08 上海大学 User similarity calculation method
CN112364254A (en) * 2020-10-09 2021-02-12 天津大学 Collaborative filtering recommendation system and method for improving user similarity
CN112380451A (en) * 2020-12-04 2021-02-19 江苏科技大学 Favorite content recommendation method based on big data
CN112685651A (en) * 2021-01-29 2021-04-20 湖南安蓉科技有限公司 Service recommendation method for nearest neighbor search based on multi-target attributes
CN113111251A (en) * 2020-01-10 2021-07-13 阿里巴巴集团控股有限公司 Project recommendation method, device and system
CN113221003A (en) * 2021-05-20 2021-08-06 北京建筑大学 Mixed filtering recommendation method and system based on dual theory
CN115858947A (en) * 2023-02-28 2023-03-28 南京邮电大学 Structure body recommendation method based on user rating
CN116028727A (en) * 2023-03-30 2023-04-28 南京邮电大学 Video recommendation method based on image data processing
CN116452304A (en) * 2023-06-16 2023-07-18 深圳迅销科技股份有限公司 Cross-domain green consumption scene integration and preferential recommendation method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077357A (en) * 2014-05-31 2014-10-01 浙江工商大学 User based collaborative filtering hybrid recommendation method
US20170206276A1 (en) * 2016-01-14 2017-07-20 Iddo Gill Large Scale Recommendation Engine Based on User Tastes
CN108876536A (en) * 2018-06-15 2018-11-23 天津大学 Collaborative filtering recommending method based on arest neighbors information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077357A (en) * 2014-05-31 2014-10-01 浙江工商大学 User based collaborative filtering hybrid recommendation method
US20170206276A1 (en) * 2016-01-14 2017-07-20 Iddo Gill Large Scale Recommendation Engine Based on User Tastes
CN108876536A (en) * 2018-06-15 2018-11-23 天津大学 Collaborative filtering recommending method based on arest neighbors information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张佳等: "基于目标用户近邻修正的协同过滤算法", 《模式识别与人工智能》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309864A (en) * 2019-06-18 2019-10-08 北京化工大学 A method of the collaborative filtering recommending scheme of fusion local similarity and global similarity
CN110427567A (en) * 2019-07-24 2019-11-08 东北大学 A kind of collaborative filtering recommending method based on user preference Similarity-Weighted
CN111191178A (en) * 2019-12-30 2020-05-22 广州市百果园网络科技有限公司 Information pushing method and device, server and storage medium
CN113111251A (en) * 2020-01-10 2021-07-13 阿里巴巴集团控股有限公司 Project recommendation method, device and system
CN111625707A (en) * 2020-05-29 2020-09-04 北京字节跳动网络技术有限公司 Recommendation response method, device, medium and equipment
CN111625707B (en) * 2020-05-29 2023-04-14 北京字节跳动网络技术有限公司 Recommendation response method, device, medium and equipment
CN111639268A (en) * 2020-06-01 2020-09-08 上海大学 User similarity calculation method
CN111639268B (en) * 2020-06-01 2023-02-17 上海大学 User similarity calculation method
CN112364254A (en) * 2020-10-09 2021-02-12 天津大学 Collaborative filtering recommendation system and method for improving user similarity
CN112380451A (en) * 2020-12-04 2021-02-19 江苏科技大学 Favorite content recommendation method based on big data
CN112685651A (en) * 2021-01-29 2021-04-20 湖南安蓉科技有限公司 Service recommendation method for nearest neighbor search based on multi-target attributes
CN112685651B (en) * 2021-01-29 2021-10-19 湖南安蓉科技有限公司 Service recommendation method for nearest neighbor search based on multi-target attributes
CN113221003A (en) * 2021-05-20 2021-08-06 北京建筑大学 Mixed filtering recommendation method and system based on dual theory
CN113221003B (en) * 2021-05-20 2023-05-02 北京建筑大学 Mixed filtering recommendation method and system based on dual theory
CN115858947A (en) * 2023-02-28 2023-03-28 南京邮电大学 Structure body recommendation method based on user rating
CN116028727A (en) * 2023-03-30 2023-04-28 南京邮电大学 Video recommendation method based on image data processing
CN116028727B (en) * 2023-03-30 2023-08-18 南京邮电大学 Video recommendation method based on image data processing
CN116452304A (en) * 2023-06-16 2023-07-18 深圳迅销科技股份有限公司 Cross-domain green consumption scene integration and preferential recommendation method
CN116452304B (en) * 2023-06-16 2023-09-01 深圳迅销科技股份有限公司 Cross-domain green consumption scene integration and preferential recommendation method

Also Published As

Publication number Publication date
CN109783734B (en) 2021-03-19

Similar Documents

Publication Publication Date Title
CN109783734B (en) Mixed collaborative filtering recommendation algorithm based on project attributes
CN105787061B (en) Information-pushing method
Abdellaoui et al. The rich domain of uncertainty: Source functions and their experimental implementation
US20040172267A1 (en) Statistical personalized recommendation system
US20090210246A1 (en) Statistical personalized recommendation system
Zhao et al. How much novelty is relevant? it depends on your curiosity
CN106021298B (en) A kind of collaborative filtering recommending method and system based on asymmetric Weighted Similarity
CN109862431B (en) MCL-HCF algorithm-based television program mixed recommendation method
CN106682114A (en) Personalized recommending method fused with user trust relationships and comment information
CN105354260B (en) The Mobile solution of a kind of mosaic society's network and item characteristic recommends method
KR20110074167A (en) Collaborative filtering recommender system based on similarity measures using the origin moment of difference random variable and method constructing similarity table using rms
CN108446350A (en) A kind of recommendation method based on topic model analysis and user's length interest
CN106846029B (en) Collaborative filtering recommendation algorithm based on genetic algorithm and novel similarity calculation strategy
CN111324807A (en) Collaborative filtering recommendation method based on trust degree
KR101418307B1 (en) Method for obtaining solutions based on interval grey number and entropy for multiple-criteria group decision making problems
Zhu et al. A fuzzy clustering‐based denoising model for evaluating uncertainty in collaborative filtering recommender systems
CN110990713A (en) Collaborative filtering recommendation method based on optimal trust path
CN110287373A (en) Collaborative filtering film recommended method and system based on score in predicting and user characteristics
Sánchez et al. Attribute-based evaluation for recommender systems: incorporating user and item attributes in evaluation metrics
Kwon Improving top-n recommendation techniques using rating variance
CN110543601B (en) Method and system for recommending context-aware interest points based on intelligent set
CN117056761A (en) Customer subdivision method based on X-DBSCAN algorithm
CN108681947B (en) Collaborative filtering recommendation method based on time relevance and coverage of articles
CN110991517A (en) Classification method and system for unbalanced data set in stroke
CN113190763B (en) Information recommendation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant