CN110134874A

CN110134874A - A kind of collaborative filtering method optimizing user's similarity

Info

Publication number: CN110134874A
Application number: CN201910312071.8A
Authority: CN
Inventors: 安彦涵; 张新鹏; 吴汉舟; 余江; 王子驰
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2019-04-18
Filing date: 2019-04-18
Publication date: 2019-08-16

Abstract

The invention proposes a kind of collaborative filtering methods for optimizing user's similarity.While not increasing server delay, the precision of proposed algorithm is improved.The characteristics of this method, is: by being standardized pretreatment to user's score data, calculate Pearson similarity, the evaluation weight of user vector distance and asymmetrical similarity weight, and then Pearson similarity is optimized, so that traditional collaborative filtering recommends precision to be improved.This method is suitable for user --- the data set of project scoring.

Description

A kind of collaborative filtering method optimizing user's similarity

Technical field

For the recommender system based on collaborative filtering, the invention proposes a kind of collaborative filtering sides for optimizing user's similarity Method.

Background technique

It the fast development of internet and popularizes as user's acquisition, share and propagate information and provide a great convenience.With this Meanwhile increasing substantially for information content but reduces the utilization rate of information, make user be difficult to obtain from network in time it is true to oneself Just useful information, causes information overload problem.A kind of method that can successfully manage information overload problem is design recommender system, It according to the demand of user, the information such as interest, by the interested content of user and Products Show to user.With search engine phase Than recommender system carries out personalized calculating, to find the point of interest of user, Jin Eryin by interest, the preference of research user The information requirement that user has found oneself is led, and obtains the information useful to oneself.Good recommender system can not only provide for user Personalized service, moreover it is possible to establish mutual substantial connection for different user, user is allowed to generate dependence to recommendation.

Recommender system mainly includes information filtering and collaborative filtering.Before the recommender system of Cempetency-based education is according to user Browsing or purchaser record obtain user's concerned item purpose feature, the new projects for being best suitable for user interest profile are recommended into use Family.And the recommender system based on collaborative filtering is similar between obtaining user by the similitude of historical record between calculating user Degree searches other users similar with target user's preference, by the interested project recommendation of this kind of user to target user.

The recommender system of Cempetency-based education only considers target user itself, and the recommender system based on collaborative filtering is then abundant Group wisdom is utilized, i.e., collects answer in the behavior and data of a large amount of crowd, the personalization level of recommendation is higher, so Collaborative Filtering Recommendation Algorithm be most widely used in personalized ventilation system, effective proposed algorithm.

Recommender system based on collaborative filtering is divided into the Collaborative Filtering Recommendation System based on model and the association based on memory again With filtered recommendation system.The former mainly utilizes the methods of machine learning, data mining and statistics, to the historical data of user It is trained, then constructs corresponding user model, provide prediction and recommendation using the model for user, be related to matrix point Solution, the technologies such as hidden semantic analysis.The latter is divided into Collaborative Filtering Recommendation System based on user and project-based collaborative filtering pushes away Recommend system.

Though traditional measures similarity, non-logarithm using Pearson formula based on the Collaborative Filtering Recommendation System of user Pre-processed according to collection, do not consider user score distance between vector, do not consider the inequalities of similarity relationships between user, The recommendation quality of recommender system can be made to decline.For this purpose, the present invention is directed to the Collaborative Filtering Recommendation Algorithm based on user, to above-mentioned three Point optimizes, and improves and recommends quality.

Summary of the invention

This invention address that reducing the mean absolute error value of traditional Collaborative Filtering Recommendation Algorithm based on user, effectively The recommendation quality for improving recommender system provides a kind of collaborative filtering method for optimizing user's similarity.

In order to achieve the above objectives, the following technical solutions are proposed by the present invention:

A kind of collaborative filtering method optimizing user's similarity, by standardizing the scoring vector of user, in conjunction with user The evaluation weight of vector distance, asymmetrical similarity weight optimize Pearson similarity, finally carry out the pre- of user's scoring It surveys, the specific steps are as follows:

1) it prepares experimental data base: collecting certain amount user to the score value of disparity items, establish experimental data base；

2) it standardization pretreatment: is standardized with scoring vector of the Z-score method to each user, and according to mark User's scoring vector after standardization, generates user --- project rating matrix；

3) calculate the similarity matrix of user: user --- the project rating matrix generated according to step 2) calculates Pearson similarity, the evaluation weight of user vector distance, asymmetrical similarity weight；In conjunction with commenting for user vector distance Valence weight, asymmetrical similarity weight optimize Pearson similarity, the similarity formula after obtaining optimization, according to Similarity formula after optimization calculates the similarity of each user and other users, ultimately generates similarity matrix；

4) prediction scoring: according to the similarity of target user and other users, calculating the neighbor user set of target user, By scoring, formula predicts the non-scoring item of target user.

Compared with prior art, the present invention has the advantage that:

The method of the present invention carries out the optimization of user's similarity to proposed algorithm module in collaborative filtering, so that recommending system System recommends quality to effectively improve while not increasing server delay.

Detailed description of the invention

Fig. 1 is the flow chart of the method for the present invention.

Specific embodiment

With reference to the accompanying drawing, specific embodiments of the present invention are described further.

The present embodiment is for MovieLens-100k data set (can be from website https: //movielens.org/ downloading) Instance analysis is carried out, which covers 943 users to total 100,000 scorings record of 1682 films, score value 1 Integer between to 5, wherein 1 representative evaluation is minimum, 5 represent evaluation highests.Each user commented at least 20 films Point.80% data are training set in data set, and 20% data are test set.

As shown in Figure 1, a kind of collaborative filtering method for optimizing user's similarity, by by the scoring vector standard of user Change, Pearson similarity is optimized in conjunction with evaluation weight, the asymmetrical similarity weight of user vector distance, is finally carried out The prediction of user's scoring, the specific steps are as follows:

1) it prepares experimental data base: collecting certain amount user to the score value of disparity items, establish experimental data base.

2) it standardization pretreatment: is standardized with scoring vector of the Z-score method to each user, and according to mark User's scoring vector after standardization, generates user --- project rating matrix, the specific steps are as follows:

If the scoring vector of u-th of user is R in training set_u=(r_(u,1),r_(u,2),…,r_(u,m)), wherein r_(u,m)It indicates Scoring of the user u to project m；As shown in formula (1), with Z-score method to R_uIt is standardized, wherein z_(u,m)It is standardization Scoring of the user u to project m afterwards,It is R_uThe average value of each component, σ_uIt is R_uThe standard deviation of each component:

The scoring vector of user u after standardization is denoted as Z_u=(z_(u,1),z_(u,2),…,z_(u,m)), Z_uMean value is 0, standard deviation It is 1.User-project rating matrix that size is 943 × 1682 is generated, wherein 943 be number of users, 1682 be the number of entry. Z_uIt is recorded in user --- the score value of the user u project not scored is denoted as 0 by the u row of project rating matrix.

3) calculate the similarity matrix of user: user --- the project rating matrix generated according to step 2) calculates Pearson similarity, the evaluation weight of user vector distance, asymmetrical similarity weight；In conjunction with commenting for user vector distance Valence weight, asymmetrical similarity weight optimize Pearson similarity, the similarity formula after obtaining optimization, according to Similarity formula after optimization calculates the similarity of each user and other users, ultimately generates similarity matrix.With In the training set of MovieLens-100k for any two user u and user v, user u is calculated to the similarity of user v, tool Steps are as follows for body:

3.1) Pearson similarity is calculated: as shown in formula (2), with Pearson similarity formula measure user u and user v Pearson similarity sim_(u,v), wherein set S is the project set of user u and user v to score jointly:

3.2) it calculates the evaluation weight of user vector distance: as shown in formula (3), calculating Z_uAnd Z_vUser vector distance Evaluation weight D_(u,v), wherein S is the common scoring item set of user u and user v, and N (S) is the element number of set S, α table The threshold value for showing the scoring gap of an independent project, if α is bigger, D_(u,v)Closer to 1, if α is smaller, D_(u,v)Closer to 0:

3.3) it calculates asymmetrical similarity weight: as shown in formula (4), calculating user u to the asymmetrical similar of user v Spend weight w_(u,v), wherein S is the common scoring item set of user u and user v, I_uIt is the scoring item set of user u, N (S) element number for being set S, N (I_u) it is set I_uElement number:

3.4) user's similarity formula: as shown in formula (5), by fusion type (2), formula (3) and formula (4), optimized User u is sim ' to the similarity of user v afterwards_(u,v):

sim′_(u,v)=D_(u,v)*w_(u,v)*sim_(u,v) (5)

3.5) it calculates user's similarity matrix: calculating the similarity between different user by formula (5), finally obtain 943 × 943 User's similarity matrix.

4) prediction scoring: according to the similarity of target user and other users, calculating the neighbor user set of target user, By scoring, formula predicts the non-scoring item of target user, with one of user u any in training set in this example For non-scoring item a, calculates user u and scores the prediction of project a, the specific steps are as follows:

4.1) it calculates neighbor user set: in training set, finding the user's set for evaluating project a, be denoted as U_a= {u_(1,a),u_(2,a),…,u_(q,a), wherein u_(q,a)Indicate q-th of user for evaluating project a；According to this q user and user u Similarity size, be ranked up by the sequence of similarity from big to small, be denoted as U '_a={ u '_(1,a),u′_(2,a),…,u′_(q,a)}； Again from sorted user's set U '_aNeighbor user set of the k user as user u, is denoted as U={ u ' before middle selection_(1,a), u′_(2,a),…,u′_(k,a)}。

4.2) scoring of the prediction user u to project a: user u is calculated to the prediction scoring p of project a by formula (6)_(u,a), wherein Set U is the neighbor user set of user u,It is R_uThe average value of each component, σ_uIt is R_uThe standard deviation of each component, z_(v,a)It is to use Normalized score of the family v to project a, sim '_(u,v)It is similarity of the user u to user v:

Such as formula (7), recommendation precision is portrayed using mean absolute error MAE (Mean Absolute Error), MAE is got over Small specification error is smaller, and precision is higher, wherein p_iIt is that user scores to the prediction of project i, r_iBe in test set user to project i Practical scoring, n is the scoring quantity in test set:

The size of neighborhood takes 10 in the present embodiment, and MAE value of the invention is 0.74086, uses Pearson than original Similarity low 3.06%.The time that the method for the present invention calculates user's similarity matrix needs is 61.3 seconds, what score in predicting needed Time is 7.04 seconds.In practical applications, calculating user's similarity matrix can be completed by off-line calculation, and user uses this When invention proposed algorithm carries out the scoring of on-line prediction project, the real-time calculating time used is almost consistent with original method, does not have Increase the time that user waits online.

Claims

1. a kind of collaborative filtering method for optimizing user's similarity, by the way that the scoring vector of user is standardized, in conjunction with user to Span from evaluation weight, asymmetrical similarity weight to Pearson similarity optimize, finally carry out user's scoring it is pre- It surveys, which is characterized in that specific step is as follows:

2) it standardization pretreatment: is standardized with scoring vector of the Z-score method to each user, and establishing criteria User's scoring vector afterwards, generates user --- project rating matrix；

3) calculate the similarity matrix of user: user --- the project rating matrix generated according to step 2) calculates Pearson phase Like degree, the evaluation weight of user vector distance, asymmetrical similarity weight；In conjunction with the evaluation weight, non-of user vector distance Symmetrical similarity weight optimizes Pearson similarity, the similarity formula after obtaining optimization, according to the phase after optimization The similarity that each user and other users are calculated like degree formula, ultimately generates similarity matrix；

4) prediction scoring: according to the similarity of target user and other users, the neighbor user set of target user is calculated, is passed through Scoring formula predicts the non-scoring item of target user.

2. the collaborative filtering method of optimization user's similarity according to claim 1, which is characterized in that the step 2) Specific step is as follows: setting the scoring vector of u-th of user in training set as R_u=(r_(u,1),r_(u,2),…,r_(u,m)), wherein z_(u,m) It is scoring of the user u to project m after standardizing, r_(u,m)Indicate scoring of the user u to project m；As shown in formula (1), with Z- Score method is to R_uIt is standardized, wherein z_(u,m)It is scoring of the user u to project m after standardizing,It is R_uEach component is put down Mean value, σ_uIt is R_uThe standard deviation of each component:

The scoring vector of user u after standardization is denoted as Z_u=(z_(u,1),z_(u,2),…,z_(u,m)), Z_uMean value is 0, standard deviation 1； Then, user --- project rating matrix is generated；Z_uBe recorded in user --- the u row of project rating matrix, by user u not into The score value of the project of row scoring is denoted as 0.

3. the collaborative filtering method of optimization user's similarity according to claim 1, which is characterized in that in the step 3) By taking any two user u in training set and user v as an example, user u is calculated to the similarity of user v, the specific steps are as follows:

3.2) it calculates the evaluation weight of user vector distance: as shown in formula (3), calculating Z_uAnd Z_vUser vector distance evaluation Weight D_(u,v), wherein S is the common scoring item set of user u and user v, and N (S) is the element number of set S, and α indicates single The threshold value of the scoring gap of an only project, if α is bigger, D_(u,v)Closer to 1, if α is smaller, D_(u,v)Closer to 0:

3.3) it calculates asymmetrical similarity weight: as shown in formula (4), calculating user u and the asymmetrical similarity of user v is weighed Weight w_(u,v), wherein S is the common scoring item set of user u and user v, I_uIt is the scoring item set of user u, N (S) For the element number of set S, N (I_u) it is set I_uElement number:

3.4) it user's similarity formula: as shown in formula (5), by fusion type (2), formula (3) and formula (4), is used after being optimized Family u is sim ' to the similarity of user v_(u,v):

sim′_(u,v)=D_(u,v)*w_(u,v)*sim_(u,v) (5)

3.5) it calculates user's similarity matrix: calculating the similarity between different user by formula (5), finally obtain user's similarity moment Battle array.

4. the collaborative filtering method of optimization user's similarity according to claim 1, which is characterized in that in the step 4) By taking a non-scoring item a of user u any in training set as an example, calculates user u and score the prediction of project a, specific steps It is as follows:

4.1) it calculates neighbor user set: in training set, finding the user's set for evaluating project a, be denoted as U_a={ u_(1,a), u_(2,a),…,u_(q,a), wherein u_(q,a)Indicate q-th of user for evaluating project a；It is similar to user u's according to this q user Size is spent, is ranked up by the sequence of similarity from big to small, is denoted as U '_a={ u '_(1,a),u′_(2,a),…,u′_(q,a)}；Again from row User's set U ' of good sequence_aNeighbor user set of the k user as user u, is denoted as U={ u ' before middle selection_(1,a),u ′_(2,a),…,u′_(k,a)}；

4.2) scoring of the prediction user u to project a: user u is calculated to the prediction scoring p of project a by formula (6)_(u,a), wherein gathering U is the neighbor user set of user u,It is R_uThe average value of each component, σ_uIt is R_uThe standard deviation of each component, z_(v,a)It is v pairs of user The normalized score of project a, sim '_(u,v)It is similarity of the user u to user v:

Such as formula (7), recommendation precision is portrayed using mean absolute error MAE, the smaller specification error of MAE is smaller, and precision is higher, Middle p_iIt is that user scores to the prediction of project i, r_iIt is practical scoring of the user to project i in test set, n is commenting in test set Dosis refracta: