CN103500228A

CN103500228A - Similarity measuring method improved through collaborative filtering recommendation algorithm

Info

Publication number: CN103500228A
Application number: CN201310505323.1A
Authority: CN
Inventors: 赵朋朋; 吴健; 冒九妹; 鲜学丰; 崔志明
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2013-10-23
Filing date: 2013-10-23
Publication date: 2014-01-08

Abstract

A similarity measuring method improved through a collaborative filtering recommendation algorithm includes the following steps of (S1) building a rating matrix R(n*m) of n users in a user set U={U1, U2,..., Un} to m items in an item set I={I1, I2,..., Im}, taking Ra,i as representation of rating of an item Ii, wherein Ua belongs to U and Ii belongs to I, (S2) calculating the similarity sim(Ua, Ub) between a user Ua and a user Ub and the similarity sim(Ii, Ij) between an item Ii and an item Ij, defining a similarity influence divisor epsilon, so that sim'(Ua, Ub) equals to epsilon* sim(Ua, Ub) and sim'(Ii, Ij) equals to epsilon* sim'(Ii, Ij), (S3) taking a parameter lambada in an interval between 0 and 1, and predicting rating of the users to the items according to lambada, epsilon, an average rating value of the users to the items, similarity between the users and similarity between the items.

Description

Improved similarity measurement method in collaborative filtering recommendation algorithm

Technical Field

The invention relates to a Collaborative filtering (Collaborative filtering) recommendation technology in recommendation system research, in particular to an improved similarity measurement method in a Collaborative filtering recommendation algorithm.

Background

With the rapid spread of the internet and the rapid development of electronic commerce, the information data on the internet is growing sharply, and how to make users quickly and efficiently obtain required information from vast data oceans becomes increasingly urgent. Providing active recommendation services for users is also increasingly being applied to various web portals and e-commerce systems. These systems provide recommendation services to users by collecting their historical information, learning their interests and behavior patterns, and analyzing their behavior characteristics.

Collaborative filtering recommendation technologies are widely applied in the field of recommendation systems, and are mainly classified into two types: based on User-based Collaborative Filtering and Item-based Collaborative Filtering, the basic idea is to generate recommendations to target users based on nearest neighbors, and the final recommendation form is score prediction and Top-N recommendation. Tapestry was the earliest proposed collaborative filtering recommendation system, which records the viewpoint of reading articles by each user, and the target user needs to explicitly point out other users whose behaviors are similar to the behavior of the target user. GroupLens, Ringo, and Video recommenders are also earlier collaborative filtering recommendation systems that provide users with recommendation services such as movies, news, and music, respectively, through opinions of other users.

With the continuous expansion of the scale of the electronic commerce system, the number of users and project data are increased sharply, so that the scoring data of user projects are extremely sparse. Under the condition that user scoring data is extremely sparse, the traditional similarity measurement method depends on the number of commonly scored items, so that certain contingency exists in the traditional similarity measurement, and the nearest neighbor of a target user and the item obtained through calculation is inaccurate, so that the recommendation quality of a recommendation system is reduced.

The collaborative filtering recommendation algorithm mainly predicts the scores of the items by the users through the similarity, the similarity can be respectively measured according to the relationship between the users or the items, and the accuracy of the similarity measurement is directly related to the recommendation quality of the whole recommendation system.

The similarity calculation may be based on similarity calculation between users or on similarity calculation between items. With sim (U)_a,U_b) Representing a user U_aAnd user U_bThe similarity between the users is obtained firstly_aAnd user U_bAll the scored projects are then used for calculating the user U through different similarity measurement methods_aAnd user U_bSimilarity between sim (U)_a,U_b). In the same way, item I_iAnd item I_jThe similarity between them is denoted sim (I)_i,I_j) Then, the item I is acquired_iAnd item I_jAll existing users are graded, and a project I is obtained according to the existing grading values_iAnd item I_jSimilarity between them sim (I)_i,I_j)。

Common similarity metrics include: cosine similarity, correlation similarity, and modified cosine similarity. In the cosine similarity measure method, a user item score matrix R (n × m) is constructed. If the user does not score an item, then the user is assumed to have a score of 0 for the item. The performance of similarity calculation can be effectively improved by setting the unknown score of the user to 0, but when the number of users and items is very large and the item evaluation data of the user is extremely sparse, the reliability of setting the unknown score to 0 is not high.

In practice, the user's preference for the unscored items may not be the same or different. When the user U_aAnd user U_bWhen no item is scored, the scoring of the item by the user is set to be 0, and the calculation of the U of the user is undoubtedly carried out_aAnd user U_bThe similarity between them is improved because they match the termsThe goal scores will not necessarily be exactly the same as 0. Therefore, when the user score data is extremely sparse, setting the unknown score to 0 has a high influence on calculating the similarity value. When the user U_aAnd user U_bWhen one user gives a score to an item and the other user gives no score, setting the unknown score to 0 will make the calculated value of similarity smaller than its actual value, but when the user score data is extremely sparse, the effect will be small.

Therefore, under the condition that the user scoring data is extremely sparse, the cosine similarity cannot effectively measure the similarity between users, the calculated value of the cosine similarity actually improves the similarity between users, and the modified cosine similarity measurement method has the same problem.

In the correlation similarity measurement method, let

Representing a user U_aScored item set, in computing user U_aAnd user U_bThe similarity between the users is calculated firstly_aAnd user U_bCommon scored item intersection

Then in the item collection

Calculating the user U by the measurement method of the correlation similarity_aAnd user U_bThe similarity between them. However, the confidence in similarity measured by the relative similarity depends on the intersection of the scoring items

The greater the number of commonly scored items, the greater the confidence in the similarity of their measures. In the case of extremely sparse user scoring dataIn case of a collection of items scored jointly by two usersEven smaller, even if the scores are very similar across such a small set of items, it cannot be determined that the similarity between users is relatively high. When the existing scoring items of the users are the same, namely

The inter-user similarity is measured by their intersection, and the confidence of the similarity measurement result is higher. When in useWhen the similarity between users is measured through the intersection part of the user rating items, the similarity between the users is undoubtedly improved, because the scoring deviations of the users in the non-intersection part of the user rating items are not necessarily completely the same, but the similarity of the users is calculated only through the intersection part, and the method is similar to the method that the scoring deviations of the users in the non-intersection part are set to be the same and are 0, and the calculated similarity is higher than an actual value. Therefore, under the condition that the user scoring data is extremely sparse, the measurement method of the correlation similarity has certain disadvantages.

In summary, in order to make the similarity value affected by the sparsity as little as possible, the present invention provides a method for improving the similarity metric by using the similarity impact factor.

Disclosure of Invention

The invention provides an improved similarity measurement method in a collaborative filtering recommendation algorithm, which comprises the following steps:

s1, creating a user set U = { U = { (U)₁,U₂,…,U_nN user pairs in the set of items I = { I = } {₁,I₂,…,I_mScoring matrix R (n × m) of m items in (n) }, with R_a,iRepresenting a user U_aFor item I_iScore of (1), whichMiddle U_a∈U,I_i∈I；

S2, calculating the user U respectively_aAnd U_bSimilarity sim (U) between_a,U_b) Item I_iAnd I_jSimilarity between them sim (I)_i,I_j) Defining a similarity influencing factor epsilon, let sim' (U)_a,U_b)=ε×sim(U_a,U_b)，sim'(I_i,I_j)=ε×sim(I_i,I_j)；

And S3, taking a parameter lambda in the [0,1] interval, and predicting the grade of the user to the project according to the lambda, the epsilon, the average value of the grade of the user to the project, the similarity between the users and the similarity between the projects.

Preferably, in step S2, the method further comprises

<math> <mrow> <mi>sim</mi> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mi>a</mi> </msub> <mo>,</mo> <msub> <mi>U</mi> <mi>b</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <munder> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <msub> <mi>I</mi> <mrow> <msub> <mi>U</mi> <mi>a</mi> </msub> <msub> <mi>U</mi> <mi>b</mi> </msub> </mrow> </msub> </mrow> </munder> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>a</mi> <mo>,</mo> <mi>k</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>a</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>b</mi> <mo>,</mo> <mi>k</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>b</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> </mrow> <mrow> <msqrt> <munder> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <msub> <mi>I</mi> <msub> <mi>U</mi> <mi>a</mi> </msub> </msub> </mrow> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>a</mi> <mo>,</mo> <mi>k</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>a</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mn>2</mn> </msup> </msqrt> <msqrt> <munder> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <msub> <mi>I</mi> <msub> <mi>U</mi> <mi>b</mi> </msub> </msub> </mrow> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>b</mi> <mo>,</mo> <mi>k</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>b</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mn>2</mn> </msup> </msqrt> </mrow> </mfrac> <mo>,</mo> </mrow> </math>

The above-mentioned

<math> <mrow> <mi>sim</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>I</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <munder> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <msub> <mi>U</mi> <mrow> <msub> <mi>I</mi> <mi>i</mi> </msub> <msub> <mi>I</mi> <mi>j</mi> </msub> </mrow> </msub> </mrow> </munder> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>i</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>j</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> </mrow> <mrow> <msqrt> <munder> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <msub> <mi>U</mi> <msub> <mi>I</mi> <mi>i</mi> </msub> </msub> </mrow> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>i</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mn>2</mn> </msup> </msqrt> <msqrt> <munder> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <msub> <mi>U</mi> <msub> <mi>I</mi> <mi>j</mi> </msub> </msub> </mrow> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>j</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mn>2</mn> </msup> </msqrt> </mrow> </mfrac> <mo>,</mo> </mrow> </math>

WhereinAnd

respectively represent users U_aAnd user U_bAverage score value for the scored item.

Preferably, in step S2, when sim' (U)_a,U_b)=ε×sim(U_a,U_b) When is in use, the

When sim' (I)_i,I_j)=ε×sim(I_i,I_j) When is in use, the

Wherein

For user U_aAnd U_bA set of commonly scored items,

And

are respectively a user U_aAnd U_bA scored set of items.

Preferably, in step S2, 0 ≦ ε ≦ 1.

Preferably, in step S3, the user U is predicted_aFor the unviewed item I_iIs scored as

<math> <mrow> <msub> <mi>R</mi> <mrow> <mi>a</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>=</mo> <mi>λ</mi> <mo>×</mo> <mrow> <mo>(</mo> <mover> <msub> <mi>R</mi> <mi>a</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>+</mo> <mfrac> <mrow> <munder> <mi>Σ</mi> <mrow> <msub> <mi>U</mi> <mi>x</mi> </msub> <mo>&Element;</mo> <mi>U</mi> </mrow> </munder> <mrow> <mo>(</mo> <msup> <mi>sim</mi> <mo>′</mo> </msup> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mi>a</mi> </msub> <mo>,</mo> <msub> <mi>U</mi> <mi>x</mi> </msub> <mo>)</mo> </mrow> <mo>×</mo> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>x</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>x</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> <mrow> <munder> <mi>Σ</mi> <mrow> <msub> <mi>U</mi> <mi>x</mi> </msub> <mo>&Element;</mo> <mi>U</mi> </mrow> </munder> <msup> <mi>sim</mi> <mo>′</mo> </msup> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mi>a</mi> </msub> <mo>,</mo> <msub> <mi>U</mi> <mi>x</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>λ</mi> <mo>)</mo> </mrow> <mo>×</mo> <mrow> <mo>(</mo> <mover> <msub> <mi>R</mi> <mi>i</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>+</mo> <mfrac> <mrow> <munder> <mi>Σ</mi> <mrow> <msub> <mi>I</mi> <mi>y</mi> </msub> <mo>&Element;</mo> <mi>I</mi> </mrow> </munder> <mrow> <mo>(</mo> <msup> <mi>sim</mi> <mo>′</mo> </msup> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>I</mi> <mi>y</mi> </msub> <mo>)</mo> </mrow> <mo>×</mo> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>a</mi> <mo>,</mo> <mi>y</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>y</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> <mrow> <munder> <mi>Σ</mi> <mrow> <msub> <mi>I</mi> <mi>y</mi> </msub> <mo>&Element;</mo> <mi>I</mi> </mrow> </munder> <msup> <mi>sim</mi> <mo>′</mo> </msup> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>I</mi> <mi>y</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>

Wherein

Respectively represent users U_a，U_xThe average of the scores of the existing items,

respectively represent items I_i，I_yThere is a mean value of the user scores.

Preferably, when λ =0, said R_a,iIs based on project similarity prediction score, when lambda =1, the R_a,iIs based on user similarity prediction scores.

According to the improved similarity measurement method in the collaborative filtering recommendation algorithm, provided by the invention, the similarity between users and the similarity between items are respectively calculated, and the similarity influence factor epsilon is defined to respectively correct the similarity values of the users and the items, so that the measurement mode of the similarity is improved. Meanwhile, according to the result of the improved similarity measurement, the average value of the user to the project score and other factors, the user score to the project is calculated, errors can be measured under the condition that data are extremely sparse, and therefore recommendation quality is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of an improved similarity measure method in a collaborative filtering recommendation algorithm according to a preferred embodiment of the present invention;

FIG. 2 is a diagram illustrating a user-to-item scoring matrix R (n × m) according to a preferred embodiment of the present invention;

FIG. 3 is a flowchart of the collaborative filtering recommendation algorithm score prediction provided by the preferred embodiment of the present invention;

FIG. 4 is a flow chart of constructing a user or project similarity matrix according to the preferred embodiment of the present invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Fig. 1 is a flowchart of an improved similarity measurement method in a collaborative filtering recommendation algorithm according to a preferred embodiment of the present invention. As shown in FIG. 1, the improved similarity measure method in the collaborative filtering recommendation algorithm according to the preferred embodiment of the present invention includes steps S1-S3.

Step S1: creating a set of users U = { U = }₁,U₂,…,U_nN user pairs in the set of items I = { I = } {₁,I₂,…,I_mScoring matrix R (n × m) of m items in (n) }, with R_a,iRepresenting a user U_aFor item I_iScore of (1), wherein U_a∈U,I_i∈I。

Specifically, FIG. 2 isThe preferred embodiment of the present invention provides a schematic diagram of a user-to-item scoring matrix R (n × m). As shown in fig. 2, the user-to-item scoring matrix R (n × m) has n rows and m columns, where n rows represent n users and m columns represent m items. If the user set is U and the project set is I, a certain user U is set_aFor item I_i(wherein U is_a∈U,I_iE.g. I) is scored as R_a,iThen score R_a,iEmbodies the user U_aFor item I_iInterests and preferences.

Step S2: respectively calculate user U_aAnd U_bSimilarity sim (U) between_a,U_b) Item I_iAnd I_jSimilarity between them sim (I)_i,I_j) Defining a similarity influencing factor epsilon, let sim' (U)_a,U_b)=ε×sim(U_a,U_b)，sim'(I_i,I_j)=ε×sim(I_i,I_j)。

Specifically, the modified cosine similarity may modify the problem of deviation of different scoring metrics between different users in the cosine similarity measure method. Therefore, the present embodiment calculates the similarity sim (U) between users according to the modified cosine similarity measure method_a,U_b) And similarity sim (I) between items_i,I_j)。

For example, if the user U_aAnd user U_bThe collective scored set of items is represented as

And

respectively represent users U_aAnd user U_bScored item set, user U_aAnd user U_bSimilarity between sim (U)_a,U_b) Is shown as

<math> <mrow> <mi>sim</mi> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mi>a</mi> </msub> <mo>,</mo> <msub> <mi>U</mi> <mi>b</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <munder> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <msub> <mi>I</mi> <mrow> <msub> <mi>U</mi> <mi>a</mi> </msub> <msub> <mi>U</mi> <mi>b</mi> </msub> </mrow> </msub> </mrow> </munder> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>a</mi> <mo>,</mo> <mi>k</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>a</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>b</mi> <mo>,</mo> <mi>k</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>b</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> </mrow> <mrow> <msqrt> <munder> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <msub> <mi>I</mi> <msub> <mi>U</mi> <mi>a</mi> </msub> </msub> </mrow> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>a</mi> <mo>,</mo> <mi>k</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>a</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mn>2</mn> </msup> </msqrt> <msqrt> <munder> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <msub> <mi>I</mi> <msub> <mi>U</mi> <mi>b</mi> </msub> </msub> </mrow> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>b</mi> <mo>,</mo> <mi>k</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>b</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mn>2</mn> </msup> </msqrt> </mrow> </mfrac> <mo>.</mo> </mrow> </math>

Wherein,and

respectively represent users U_aAnd user U_bAverage rating value, R, for given rated items_a,kRepresenting a user U_aFor item I_kThe value of (a).

If the items I are to be paired together_iAnd item I_jThe set of users giving a score is represented as

And

respectively represent the pair item I_iAnd item I_jGiven a scored set of users, item I_iAnd item I_jSimilarity between them sim (I)_i,I_j) Expressed as:

<math> <mrow> <mi>sim</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>I</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <munder> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <msub> <mi>U</mi> <mrow> <msub> <mi>I</mi> <mi>i</mi> </msub> <msub> <mi>I</mi> <mi>j</mi> </msub> </mrow> </msub> </mrow> </munder> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>i</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>j</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> </mrow> <mrow> <msqrt> <munder> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <msub> <mi>U</mi> <msub> <mi>I</mi> <mi>i</mi> </msub> </msub> </mrow> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>i</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mn>2</mn> </msup> </msqrt> <msqrt> <munder> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <msub> <mi>U</mi> <msub> <mi>I</mi> <mi>j</mi> </msub> </msub> </mrow> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>-</mo> <mover> <msub> <mi>R</mi> <mi>j</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mn>2</mn> </msup> </msqrt> </mrow> </mfrac> <mo>.</mo> </mrow> </math>

wherein R is_k,iRepresenting item I_iBy user U_kThe value of the score given is given by,

and

respectively represent items I_iAnd item I_jThe average of the scores is available.

In the embodiment, a similarity influence factor epsilon is introduced to correct the conventional similarity measurement method. When sim' (U)_a,U_b)=ε×sim(U_a,U_b) When is in use, the

When sim' (I)_i,I_j)=ε×sim(I_i,I_j) When is in use, the

Here, ε is 0. ltoreq. ε.ltoreq.1.

The similarity measure between users is taken as an example in the following, according to

And

different corresponding relations and different values of epsilon are explained.

When in use

When, ε =1Represents the user U_aAnd U_bAll the scored items are the same, the similarity values obtained by the traditional similarity measurement method can fully reflect the similarity between users, and the corrected user similarity meets sim' (U)_a,U_b)=sim(U_a,U_b)。

When in use

When, ε =0, represents user U_aAnd U_bIf all the scored items are completely different, the similarity values obtained by the traditional similarity measurement method cannot explain the similarity between users, and sim' (U) is used at the moment_a,U_b)=0。

When in useWhen is 0<ε<1, represents a user U_aAnd U_bThe scoring items have non-intersection items, the influence factors correct the traditional similarity metric value according to the proportion of the user common scoring items in the user scored item set, and the corrected similarity is sim' (U)_a,U_b)=ε×sim(U_a,U_b)<sim(U_a,U_b)。

Similarly, for the measure of similarity between projects, the measure may be based on the condition of the projects that are commonly scored in the projects.

Step S3: and taking a parameter lambda in the [0,1] interval, and predicting the grade of the user to the project according to the lambda, the epsilon, the average value of the grade of the user to the project, the similarity between the users and the similarity between the projects.

Specifically, this step will predict the user's rating of the item based on the improved similarity metric results, resulting in a corresponding recommendation.

For example for user U_aUnviewed item I_iPredicting user U_aFor the unviewed item I_iIs scored as

Wherein

respectively represent items I_i，I_yThere is a mean value of the user scores.

In this embodiment, R is λ =0_a,iIs based on project similarity prediction score, when lambda =1, R_a,iIs based on user similarity prediction scores.

FIG. 3 is a flowchart of the collaborative filtering recommendation algorithm score prediction provided by the preferred embodiment of the present invention. FIG. 4 is a flow chart of constructing a user or project similarity matrix according to the preferred embodiment of the present invention. As shown in fig. 3 and 4, the technical solution of the present invention can be better understood by referring to fig. 1.

In summary, according to the improved similarity measurement method in the collaborative filtering recommendation algorithm provided by the preferred embodiment of the present invention, the similarity between users and the similarity between items are respectively calculated, and the similarity influence factor epsilon is defined to respectively correct the similarity between the users and the items, so as to improve the measurement manner of the similarity. Meanwhile, according to the result of the improved similarity measurement, the average value of the user to the project score and other factors, the user score to the project is calculated, errors can be measured under the condition that data are extremely sparse, and therefore recommendation quality is improved.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein. The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An improved similarity measurement method in a collaborative filtering recommendation algorithm is characterized by comprising the following steps:

s1, creating a user set U ═ U₁,U₂,…,U_nN user pairs in the item set I ═ I₁,I₂,…,I_mScoring matrix R (n × m) of m items in (n) }, with R_a,iRepresenting a user U_aFor item I_iScore of (1), wherein U_a∈U,I_i∈I；

S2, calculating the user U respectively_aAnd U_bBetweenSimilarity sim (U) of_a,U_b) Item I_iAnd I_jSimilarity between them sim (I)_i,I_j) Defining a similarity influencing factor epsilon, let sim' (U)_a,U_b)＝ε×sim(U_a,U_b)，sim'(I_i,I_j)＝ε×sim(I_i,I_j)；

2. The method according to claim 1, wherein in step S2, the method comprises

The above-mentioned

Wherein

And

3. The method of claim 1, wherein in step S2, when sim' (U)_a,U_b)＝ε×sim(U_a,U_b) When is in use, the

When sim' (I)_i,I_j)＝ε×sim(I_i,I_j) When is in use, the

Wherein

For user U_aAnd U_bA set of commonly scored items,

And

are respectively a user U_aAnd U_bA scored set of items.

4. The method of claim 1, wherein 0 ≦ ε ≦ 1 in step S2.

5. The method of claim 1, wherein in step S3, a user U is predicted_aFor the unviewed item I_iIs scored as

Wherein

respectively represent items I_i，I_yThere is a mean value of the user scores.

6. The method of claim 5, wherein R is when λ =0_a,iIs based on project similarity prediction score, when lambda =1, the R_a,iIs based on user similarity prediction scores.