CN109783734A

CN109783734A - A kind of mixing Collaborative Filtering Recommendation Algorithm based on item attribute

Info

Publication number: CN109783734A
Application number: CN201910042488.7A
Authority: CN
Inventors: 胡湘; 付彬
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2019-01-17
Filing date: 2019-01-17
Publication date: 2019-05-21
Anticipated expiration: 2039-01-17
Also published as: CN109783734B

Abstract

The invention discloses a kind of mixing Collaborative Filtering Recommendation Algorithm based on item attribute, comprising the following steps: step 1: user-project rating matrix and project-attribute matrix are generated according to user's score information and project information；Step 2: item similarity and user's scoring similarity are calculated separately according to project-attribute matrix and user-project rating matrix；Step 3: user's scoring similarity being modified using item similarity to calculate user's similarity；Step 4: the common scoring reward factor of calculating, item attribute preference heterogeneity and user's confidence factor obtain the final similarity of user to correct user's similarity；Step 5: the arest neighbors of target user is chosen according to final similarity, scoring of the score information prediction target user to each project based on users all in arest neighbors；Step 6: the highest N number of project recommendation that will score is to target user.The present invention can be suitable for sparse data, and can improve and recommend accuracy.

Description

Mixed collaborative filtering recommendation algorithm based on project attributes

Technical Field

The invention belongs to the field of recommendation systems, relates to an information recommendation technology, and particularly relates to a mixed collaborative filtering recommendation algorithm based on project attributes.

Background

With the development of internet information technology, people gradually enter an information overload era from an information deficiency era. In such an information overload era, the personalized recommendation system can recommend information which may be interested to users by applying an information filtering technology, and help the users to quickly find needed information resources or commodities. In recent years, recommendation systems have been widely used in various fields such as e-commerce, music, social networks, and medicine, and have become one of the research hotspots in the industry and academia.

Among personalized recommendation algorithms, the collaborative filtering recommendation algorithm is one of the most widely used and most successful techniques. The neighborhood-based collaborative filtering recommendation algorithm is widely applied because the collaborative filtering recommendation algorithm can help users to find new categories of items and has strong recommendation performance. The collaborative filtering recommendation algorithm based on the neighborhood firstly discovers a similar user group of users, combines the rating information of the similar users on items, carries out item rating prediction on the appointed users, and finally recommends the items with higher scores. From the above, in the collaborative filtering recommendation algorithm based on the neighborhood, the user similarity measure is crucial to determine the final recommendation effect.

The conventional user similarity measurement method has the following disadvantages: 1. user similarity is usually calculated only by using common scoring information of users (common scoring information is scoring information of different users for the same project), and the common scoring information is rare under the condition that user-project scoring matrix data is sparse, which causes that the traditional user similarity measurement method cannot be applied; 2. user similarity metric indexes generally only utilize scoring information, and do not effectively utilize project attribute information; 3. the user similarity measurement index ignores influence factors of the user similarity, and comprises the following steps (1) that the user similarity of the same item and the item attribute is higher without considering the excessive evaluation; (2) no consideration is given to the different preferences of different users for the attributes of the items; (3) the scoring information of the user is not necessarily true and credible.

Due to the defects of the traditional user similarity measurement method, the collaborative filtering algorithm based on the traditional user similarity measurement method has the problems that the collaborative filtering algorithm cannot be applied to sparse data and the recommendation accuracy is not high.

Disclosure of Invention

The invention aims to solve the technical problem that aiming at the defects of the prior art, the invention provides a mixed collaborative filtering recommendation algorithm based on project attributes, which can be suitable for sparse data and can improve the recommendation accuracy.

The technical scheme provided by the invention is as follows:

a hybrid collaborative filtering recommendation algorithm based on project attributes comprises the following steps:

step 1: acquiring user behavior data and project information data, and generating a user-project scoring matrix and a project-attribute matrix according to user scoring information and project information;

step 2: respectively calculating the project similarity and the user rating similarity according to the project-attribute matrix and the user-project rating matrix;

and step 3: correcting the user scoring similarity by using the project similarity to calculate the user similarity;

and 4, step 4: calculating a common score rewarding factor, a project attribute preference factor and a user confidence factor, and correcting the user similarity according to the factors to obtain the final user similarity;

and 5: for the target user, selecting the nearest neighbor (similar user group) of the target user according to the final user similarity, and predicting the score of the target user on each project according to the score information of all users in the nearest neighbor;

step 6: and recommending the N items with the highest scores to the target user.

Further, since the scoring data is usually very sparse, the common scoring items among users are more sparse, the conventional similarity measurement methods are all limited to the common scoring items, and generally, the number of items evaluated by a user is often more than one, and if the similarity of the user can be calculated by effectively utilizing all evaluated information of the user, the application range of the algorithm is greatly expanded. In addition, considering that if the two projects are completely dissimilar, even if the scores of the two projects are similar, the similarity of the users can be lower, and the similarity of the users is higher only if the scores of the users are similar to the scores of the different projects and the projects are also similar, the invention corrects the user score similarity by using the project similarity to calculate the user similarity. The method does not depend on the common score item for calculating the similarity of the users, can well adapt to sparse data, and has strong interpretability.

Defining a score r of a user u on an item i_uiRating r of item j with user v_vjHas a score similarity of S_pss(r_ui，r_vj) The calculation formula is as follows:

S_pss(r_ui，r_vj)＝Proximity(r_ui，r_vj)·Significance(r_ui，r_vj)·Singularity(r_ui，r_vj)

wherein, Proximity (r)_ui，r_vj) Represents a score pair (r)_ui，r_vj) Absolute difference of (1), Significance (r)_ui，r_vj) Is represented by r_uiAnd r_vjThe difference from the median of all scores given by users u and v, Singularity (r)_ui，r_vj) Is represented by r_uiAnd r_vjThe difference in variance of all scores obtained for item i and item j is calculated as follows:

wherein r is_u*medAnd r_v*medRepresents the median, μ, of all scores given by user u and user v, respectively_iAnd mu_jRepresenting the variance of all scores obtained for item i and item j, respectively.

Further, the present invention introduces an item-attribute matrix to calculate the similarity between two items. With a set A ═ A₁，A₂，…，A_x，…，A_k) To represent the attributes of all items, all item attribute features in the recommendation system can be represented by a matrix table A_n×kWhere n is the total number of items and k is the total number of item attributes. If item i has attribute A_xThen A is_ix1, otherwise A_ix0. Since the attributes of the items are identified by boolean values, the Jaccard coefficients measure boolean similarity. Therefore, the method adopts the Jaccard formula to calculate the item similarity. Defining the similarity of the items i and j as S_item(i, j) which is calculated by the formula:

wherein S is_iAnd S_jRespectively representing the attribute sets of the item i and the item j, and | represents the number of elements in the set.

Further, the similarity between the user u and the user v is defined as S (u, v), and the calculation formula is as follows:

wherein, I_uAnd I_vRepresenting the sets of items evaluated by user u and user v, respectively.

Further, in order to effectively improve the accuracy of the algorithm, the invention considers the following aspects: the user similarity is higher when the same item and the same attribute are evaluated, and in order to effectively increase the proportion of the common score items, the common score rewarding factor is introduced; the user has different preferences for the attributes of the items, and the interests of the users with the same attribute preferences are more similar, so that the invention introduces a user item attribute preference factor; the information of the user is unreliable, and a user confidence factor is introduced into the invention.

The more attributes or items that a conventional similarity measure ignores and are evaluated by two users in common, the greater the similarity of the users. To overcome this problem, the present invention introduces a common scoring item and common scoring attributes to calculate a user's common scoring reward factor. Defining the joint scoring reward factor of the user u and the user v as S_bonus(u, v) the calculation formula is:

wherein E is_uAnd E_vRespectively representing the attribute sets of the items evaluated by the user u and the user v, and showing that the more the common scoring items and the common scoring attributes are, the more the similarity between the two users isIs large.

In fact, users have different preferences for the attributes of the items, and if they have the same preferences for the attributes, they are more similar, just as if two users both like the "art" attribute, they are more similar. Conventional user preference similarity measures are based on a user rating matrix to calculate similarity of user rating preferences (i.e., towards high or low scores) regardless of user preferences for attributes of items. The individual item scores do not reflect the true preferences of the user, and therefore, the present invention introduces item attributes to calculate user preference similarity. When calculating the preference of the user to the project attribute, firstly calculating the importance of the project attribute to the user according to the historical behavior of the user, and then calculating the preference of the user to the project attribute by combining the user score. In addition, the invention considers that the interest of the user can be changed, the factors influencing the interest of the user are many, and the interest can be changed along with the factors such as time, environment and the like, wherein the time is one of the most important factors. The longer the user scores the item, the greater the decay in the user's interest. Therefore, the invention introduces time attenuation due toFor any user u, any attribute A_xDefining user u pair attribute A_xHas a weighted preference degree ofThe calculation formula is as follows:

wherein,represents attribute A_xAs to the degree of importance of the user u,representing user u to attribute A_xThe degree of preference;

user u pairs attribute A_xThe more the evaluation times of (A), the attribute A_xThe higher the importance to user u, and therefore the invention will beDefined as the following equation:

wherein,representing user u pair having attribute A_xThe number of evaluations of item (u), Count (u)_A) Representing the total number of attributes of the items evaluated by the user u, and count (u) representing the total number of the items evaluated by the user u;

and if the user u pair has attribute A_xIs much higher than the average of all the item scores given by user u, and user u has attribute A for_xIs relatively close in time to the last evaluation of the item, i.e. the user is to A_xIs less attenuated (conversely, the more attenuated the user interest), the user is interested in the attribute A_xThe preference degree of (2) is larger. Therefore, the invention willDefined as the following equation:

wherein,represents the average of all the item scores given by user u;is an introduced time decay function, where t is the attribute A of the user u pair_xThe interval between the time of the latest evaluation of the item and the current time is t ≧ 0, and λ is the attenuation coefficient.

And finally, calculating item attribute preference factors of the two users u and v by using cosine similarity, wherein the formula is as follows:

therefore, the item attribute preference factor comprehensively considers the scoring preference and the item attribute preference of the user, and the defect that the traditional similarity calculation method only considers the scoring preference of the user is overcome.

Conventional recommendation systems are based on the fact that user data is authentic. However, the user feedback data collected in real life contains a large amount of unreliable data, which may interfere with the final prediction result, and the reason for the unreliable is as follows: (1) the user may submit some random value as their feedback value. (2) A user may intentionally score certain services too high/too low for some benefit and score other services too low/too high. (3) The user faithfully submits the scoring data, but the data is in an abnormal range due to interference of network environment, equipment and the like. If the unreliable data are not processed, the final prediction result is seriously interfered, and a recommendation algorithm is negatively influenced. To reduce the effects of noisy data, the present invention introduces a confidence factor. For any user u, defining the confidence factor as S_conf(u), the calculation formula is as follows:

wherein,represents the average of all scores obtained for item i, r_*imedRepresents the median of all scores obtained for item i. As can be seen from the above formula, if user u always has a score for the item that deviates from the average or median score for the item, his confidence level is low. The increased confidence factor can effectively reduce the influence of the untrusted user, so that the prediction is more accurate.

Further, based on the joint scoring reward factor, the item attribute preference factor and the user confidence factor, the user similarity is modified, the final modified similarity between the user u and the user v is defined as Sim (u, v), and the calculation formula is as follows:

Sim(u，v)＝S_bonus(u，v)·S_pre(u，v)·S_conf(u)·S(u，v)

further, there are two methods for selecting the nearest neighbor:

1) the first method is to select K users with the highest similarity to the target user, i.e., a K-nearest neighbor (KNN) method. The method needs to preset neighbor number K, sorts the similarity according to descending order, and then K users at the top of the sorting are the nearest neighbors of the target user;

2) the second method is to set a nearest neighbor similarity threshold. The method is mainly characterized in that a similarity threshold value is preset, and a user with the similarity value larger than the threshold value is found out to be used as the nearest neighbor of a target user.

In an actual recommendation system, because the similarity threshold is difficult to set, the neighbor selection is usually performed by using KNN.

After the above steps, the scoring of the target user for each item is predicted according to the scoring information of all the users in the nearest neighbor, and the N items with the highest scoring are recommended to the user. The formula for defining the score prediction of the invention is as follows:

wherein, P_uiAnd denotes the predicted score of the target user u for the item i, and Nu is the nearest neighbor of the user u.

Has the advantages that:

the user similarity measurement designed by the invention is not limited to the common scoring information of the users, the influence of the non-common scoring information (scoring information on different projects) of the users on the user similarity is effectively considered, the similarity of the users can be extremely low even if the scores of the users are similar if the two different projects are completely dissimilar, and the users are similar only if the scores of the users are similar and the projects are similar, so that the similarity of the users is corrected by the project similarity. Meanwhile, in order to effectively improve the recommendation accuracy, the method is not limited to scoring data any more, the project attributes are effectively integrated into an algorithm, influence factors of user similarity are fully considered from all aspects, and three influence factors are defined: the first influence factor is a joint scoring reward factor, which considers that the more items or attributes that the user scores together, the higher the similarity of the user, and effectively increases the proportion of the joint scoring items. The second influencing factor is an item attribute preference factor that effectively distinguishes between different user preferences for item attributes. The third influence factor is user confidence, and the third influence factor considers that the scoring data of the user is not credible, so that the influence of noise data is effectively reduced, and the reliability of algorithm output is improved. Has the following advantages:

1. in the invention, the common scoring item is not a necessary condition for calculating the similarity of the users; the similarity measurement provided by the invention does not depend on common scoring items between two users, effectively utilizes all scoring information of the users, and can well solve the problem of data sparsity.

2. The algorithm overcomes the defect that the traditional algorithm only utilizes the scoring information to calculate the similarity, effectively utilizes the attribute information of the project, and simultaneously introduces three influence factors (a joint scoring reward factor, a project attribute preference factor and a user confidence factor) to distinguish the difference between the users, thereby effectively improving the reliability and the accuracy of the algorithm.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a graph showing the change of NMAE and MAME indexes under different nearest neighbor numbers for each recommendation algorithm based on an ML-test-Small data set, wherein FIG. 2(a) is a graph showing the change of the NMAE indexes, and FIG. 2(b) is a graph showing the change of the MAME indexes;

FIG. 3 is a graph of changes of F1 indexes of various recommendation algorithms based on an ML-test-Small data set under different nearest neighbor numbers;

fig. 4 is a graph of the NSP index and the NPP index of each recommendation algorithm based on the ML-test-Small data set, where fig. 4(a) is a graph of the NSP index and fig. 4(b) is a graph of the NPP index;

FIG. 5 is a graph showing the change of the MAE and NMAE indices of the respective recommendation algorithms based on the ML-1M dataset under different nearest neighbor numbers, wherein 5(a) is a graph showing the change of the NMAE indices, and 5(b) is a graph showing the change of the MAME indices;

FIG. 6 is a graph showing the variation of the F1 index for each recommendation algorithm based on the ML-1M data set at different nearest neighbor numbers;

fig. 7 is a graph showing the change of the NSP and NPP indices for each recommendation algorithm based on the ML-1M dataset with different numbers of nearest neighbors, where fig. 7(a) is a graph showing the change of the NSP index and fig. 7(b) is a graph showing the change of the NPP index.

Detailed Description

For more specific description of the present invention, the following describes the implementation steps of the present invention with reference to fig. 1:

step 1: obtaining user behavior data, firstly establishing a user-item scoring matrix R_m×nWhere m is the total number of users, n is the total number of items, r_uiRepresenting the value of the user u's credit to item i.

User-item scoring matrix R_m*n：

Establishing an item-attribute matrix A according to the item information data_n×kWhere k is the total number of item attributes, A_ixIndicating whether item i has attribute x, if item i has attribute x, then A_ixOtherwise, it is zero.

Step 2: and respectively calculating the similarity between the items and the score similarity of the user according to the item-attribute matrix and the user-item score matrix. Defining a calculation formula of the scoring similarity of the user as follows:

S_pss(r_ui，r_vj)＝Proximity(r_ui，r_vj)*Significance(r_ui，r_vj)*Singularity(r_ui，r_vj)

wherein the expressions of these three formulas are as follows:

wherein, Proximity (r)_ui，r_vj) Represents a score pair (r)_ui，r_vj) Absolute difference of (1), Significance (r)_ui，r_vj) Is represented by r_uiAnd r_vjThe difference from the median of all scores given by users u and v, Singularity (r)_ui，r_vj) Is represented by r_uiAnd r_vjVariance of all scores obtained for item i and item j; r is_u*medAnd r_v*medRepresents the median, μ, of all scores given by user u and user v, respectively_iAnd mu_jRepresenting the variance of all scores obtained for item i and item j, respectively.

Only if the items are similar and the scores are similar, the users are similar, so the similarity of the users is corrected by the similarity of the items, and the similarity of the items i and j is defined as S_item(i, j) which is calculated by the formula:

wherein S is_iAnd S_jRespectively representing the attribute sets of the item i and the item j, |, representing the number of elements in the set, | S_i∩S_jI represents the number of attributes that item i and item j have at the same time, | S_i∪S_jAnd | represents the sum of the number of attributes of the item i and the item j. It can be seen that the greater the item similarity, the higher the share ratio of the common attributes owned by the items.

And step 3: and calculating a joint scoring reward factor, a project attribute preference factor and a user confidence factor, and correcting the user similarity by using the three influence factors.

Common score reward factor S_bonusThe calculation formula is as follows:

wherein E is_uAnd E_vThe attribute sets of the items evaluated by the user u and the user v are respectively represented, and it can be seen that the more the common scoring items and the common scoring attributes are, the greater the similarity between the two users is.

Item attribute preference factor S_preThe calculation method is as follows:

first, for any user u, any attribute A_xThe attribute A is calculated by the following formula_xDegree of importance to user u

and calculating the attribute A of the user u pair by the following formula_xDegree of preference of

Wherein n is the total number of items,represents the average of all the item scores given by user u;is an introduced time decay function, where t is the attribute A of the user u pair_xThe interval between the latest evaluation time of the item and the current time is t more than or equal to 0, and lambda is an attenuation coefficient; a. the_ixFor indicating whether item i has attribute A_xIf item i has attribute A_xThen A is_ix1, otherwise A_ix＝0；

Then, the attribute A of the user u pair is calculated by the following formula_xHas a weighted preference degree of

Then, the cosine similarity is used to calculate item attribute preference factors S of two users u and v_preThe formula for (u, v) is as follows:

where k is the total number of item attributes.

For any user u, defining the confidence factor as S_conf(u), the calculation formula is as follows:

wherein,represents the average of all scores obtained for item i, r_*imedRepresents the median of all scores obtained for item i. From aboveAs can be seen by the formula, if a user's rating of an item always deviates from the average or median rating of the item, his confidence level is low.

The final similarity calculation formula of the user is as follows:

Sim(u，v)＝S_bonus(u，v)·S_pre(u，v)·S_conf(u)·S(u，v)。

and 4, step 4: and sorting the similarity user groups in a descending order, selecting the first k users with the highest similarity as the similar user groups of the target users, and predicting the item scores according to the scoring information of the similar users on the items.

The formula for defining the score prediction of the invention is as follows:

And 5: and selecting the N items with the highest scores to recommend to the specified user.

Experiment: this experiment can be further illustrated by the following simulation experiment results:

simulation experiment data and evaluation indexes:

the invention adopts public data sets ML-last-Small and ML-1M as data sets for algorithm test and verification. The description of each data set is shown in table 2:

data set	Number of users M	Number of items N	Score R	Degree of sparseness/%)
					ML-Latest-Small	706	8570	100023	1.7
ML-1M	6040	4000	1000209	4.1

This example randomly selected eighty percent of the data as the training set, and the remaining twenty percent was used for the experimental test set.

The present embodiment uses three evaluation indexes to evaluate the validity of the recommendation result.

The first type of evaluation index measures the predicted error rate, i.e. the difference between the actual score and the predicted score. The MAE is the most common indicator for measuring the accuracy of recommendation algorithms, and is calculated by comparing predicted values with actual values. NMAE is a standard version of MAE by dividing by the span of scores. MAE and nmee were obtained from the following formulae:

wherein r is_maxAnd r_minRepresenting the maximum score and the minimum score in the user score table.

The second evaluation index measures the recommendation precision: accuracy P, recall C and F1 are typically used for measurements. The accuracy rate refers to the proportion of the items really liked by the user in the recommendation list to all the items in the recommendation list. The recall ratio recommends the proportion of items that the user really likes to all items that the user likes in their entirety. F1 is the comprehensive index of accuracy and recall. The corresponding calculation formula is as follows:

wherein R is_u，pAnd R_u，aR, respectively, a recommended item set and a user's true favorite item set_u，p∩R_u，aAnd | represents the number of items really liked by the user in the recommendation list. F₁The value is a comprehensive evaluation index of P and C, and the larger the F1 value is, the better the recommendation effect is.

The last category of evaluation indicators are the Number of Successful Predictions (NSP) and the number of correct predictions (NPP). Since the conventional recommendation algorithm is based on common scoring terms, under the condition of sparse data, similar neighborhoods may not exist, so that prediction cannot be performed, and therefore NSP and NPP are also important indexes. NSP is the number of times the algorithm effectively predicts the score, regardless of whether the prediction is accurate. NPP is the number of items that the algorithm correctly predicts the actual score in the test set, which emphasizes the prediction accuracy of the algorithm.

The smaller the values of MAE and nmee, the smaller the prediction grade error. F1 represents the precision of the algorithm, and the larger F1 is, the better the recommendation effect is. NSP and NPP indicate success rate and accuracy, clearly the larger the better.

In order to verify the effectiveness of the invention, the invention compares the experiment with the following algorithms which are proposed in recent years: PCC, PIP, NHSM, HUS, CFBJ.

In the collaborative filtering algorithm, the nearest neighbor number plays an important role in the recommendation performance, so the embodiment will compare the performance of the algorithm under different nearest neighbor numbers. The results of the comparative experiments are shown in FIGS. 2 to 7.

1) ML-last-Small dataset

Fig. 2 shows the recommendation error for each recommendation algorithm. As can be seen from the figure, the average absolute error of the traditional collaborative filtering algorithms such as PCC and PIP algorithms is higher, but the collaborative filtering algorithm provided by the invention effectively utilizes the information of the non-common score items, thereby greatly reducing the error. Meanwhile, due to the fact that the project attribute information is effectively utilized and various influence factors are considered, the algorithm has better recommendation precision than other algorithms, and the average absolute error is smaller than 0.7.

Figure 3 shows the accuracy of each recommendation algorithm. The recommendation precision of each algorithm is improved along with the increase of the number of the similar neighborhoods. This is because as KNN increases, the user nearest neighbor number reduces the influence of neighbor selection errors. At the same time, it is clear that the algorithm of the present invention is superior to other algorithms. The accuracy of the conventional collaborative filtering algorithm is less than 0.6. In contrast, the new algorithm HUS is relatively high in recommendation accuracy because not only the common item information is used, but also the asymmetry of the similarity is taken into consideration. However, the algorithm only considers the similarity of the user items from the point of view of the scoring probability, and does not effectively consider the influence factors of the similarity, so the accuracy is still lower than that of the algorithm of the invention.

Fig. 4 shows the NSP and NPP as a function of the number of nearest neighbors on the ML-last-small. As is apparent from fig. 4, as the number of nearest neighbors increases, both NSP and NPP increase. Among them, the newly proposed algorithm is far superior to the conventional CF algorithm. The highest NSP and NPP of the conventional algorithm are below 13500 and 5000, respectively, while the highest NSP and NPP of the newly proposed algorithm are above 15000 and 6000, respectively. The NSP and NPP values of the algorithm provided by the invention are highest and are respectively close to 20000 and 8000.

2) ML-1M data set

To verify the applicability and versatility of the algorithm, the present invention also performed relevant experiments on the ML-1M dataset. As can be seen from fig. 5, in the optimal case, the MAE of the algorithm is about 0.73, the MAE of the conventionally recommended algorithm is about 0.85, and the MAE of the newly proposed algorithm is greater than 0.77. The conventional recommended algorithm has an optimum NMAE of about 0.22, the other algorithms have an optimum NMAE of about 0.19 or more, and the present invention has an optimum NMAE of about 0.16. As shown in fig. 6, the accuracy of the present invention increases as the number of nearest neighbors increases. Meanwhile, the precision of the algorithm is obviously higher than that of other algorithms, and the optimal solution is 0.68. As can be seen from fig. 7, both NSP and NPP improve as the number of nearest neighbors increases. Also, the latest algorithms are much better than the traditional collaborative filtering algorithms. The maximum NSP and the maximum NPP recommended by the algorithm are 19600 and 8200 respectively, and other algorithms are lower than the algorithm of the invention.

The method effectively considers the influence of the non-common scoring information of the user on the similarity of the user, thereby effectively solving the problem of data sparsity; the influence factors of the similarity of several users provided by the invention greatly improve the recommendation accuracy: the common scoring reward factor considers that the similarity of users who have evaluated the same item or attribute is higher, and effectively improves the proportion of the common scoring item; the item attribute preference factor effectively distinguishes different attribute preferences of users by considering different preferences of different users to the items. The confidence coefficient factor considers that the user scoring information has unreliability, reduces the influence of noise data, and improves the reliability of model output. Experimental results show that the method solves the problem of data sparsity existing in the traditional collaborative filtering recommendation algorithm, greatly reduces the recommendation error and improves the recommendation accuracy.

Claims

1. A hybrid collaborative filtering recommendation algorithm based on project attributes, comprising the steps of:

and 4, step 4: calculating a common score rewarding factor, a project attribute preference factor and a user confidence factor, and correcting the user similarity according to the factors to obtain the final similarity of the user;

and 5: for the target user, selecting the nearest neighbor according to the final similarity, and predicting the score of the target user on each project according to the score information of all users in the nearest neighbor;

2. The hybrid collaborative filtering recommendation algorithm based on item attributes according to claim 1, wherein in step 2, a user u scores r for an item i_uiRating r of item j with user v_vjSimilarity of (2)_pss(r_ui，r_vj) The calculation formula of (2) is as follows:

3. The mixed collaborative filtering recommendation algorithm based on item attributes according to claim 2, wherein in step 2, the similarity of the items i and j is S_itemThe calculation formula of (i, j) is:

4. The hybrid collaborative filtering recommendation algorithm based on item attributes according to claim 3, wherein in step 3, the similarity S (u, v) between the user u and the user v is calculated according to the following formula:

5. The hybrid collaborative filtering recommendation algorithm according to claim 4, wherein in step 4, the joint scoring reward factor S of user u and user v_bonusThe formula for the calculation of (u, v) is:

6. The hybrid collaborative filtering recommendation algorithm based on item attributes according to claim 5, wherein in step 4, the item attribute preference factor is calculated by:

Wherein n is the total number of items,represents the average of all the item scores given by user u;is an introduced time decay function, where t is the attribute A of the user u pair_xThe interval between the latest evaluation time of the item and the current time is t more than or equal to 0, and lambda is a set attenuation coefficient; a. the_ixFor indicating whether item i has attribute A_xIf item i has attribute A_xThen A is_ix1, otherwise A_ix＝0；

Finally, calculating item attribute preference factors S of two users u and v by using cosine similarity_preThe formula for (u, v) is as follows:

where k is the total number of item attributes.

7. The hybrid collaborative filtering recommendation algorithm according to claim 6, wherein in step 4, for any user u, the confidence factor S is_conf(u) the calculation formula is:

wherein,represents the average of all scores obtained for item i, r_*imedRepresents the median of all scores obtained for item i.

8. The hybrid collaborative filtering recommendation algorithm based on item attributes according to claim 7, wherein in step 4, the final similarity Sim (u, v) between the user u and the user v is calculated according to the following formula:

Sim(u，v)＝S_bonus(u，v)·S_pre(u,v)·S_conf(u)·S(u，v)。

9. the project attribute-based hybrid collaborative filtering recommendation algorithm according to any one of claims 1-8, characterized in that in step 5, the nearest neighbor of the target user is selected by any one of the following methods:

1) setting neighbor number K, and selecting K users with highest final similarity with the target user as the nearest neighbors of the target user;

2) and setting a similarity threshold of the nearest neighbors, and finding out the user with the final similarity greater than the similarity threshold with the target user as the nearest neighbors of the target user.

10. The project attribute-based hybrid collaborative filtering recommendation algorithm according to any one of claims 1-8, characterized in that in step 6, the score of the target user for each project is predicted according to the score information of all the users in the nearest neighbor, and the formula is as follows:

wherein, P_uiRepresents the predicted rating of the target user u for the item i,represents the average of all scores obtained for item i, r_viRepresents the score of the user v on the item i, Sim (u, v) represents the final similarity of the user u and the user v, and Nu is the nearest neighbor of the user u.