CN110377841B

CN110377841B - Similarity calculation method and system applied to collaborative filtering method

Info

Publication number: CN110377841B
Application number: CN201910478934.9A
Authority: CN
Inventors: 杨志明
Original assignee: Ideepwise Artificial Intelligence Robot Technology Beijing Co ltd
Current assignee: Ideepwise Artificial Intelligence Robot Technology Beijing Co ltd
Priority date: 2019-06-04
Filing date: 2019-06-04
Publication date: 2022-01-07
Anticipated expiration: 2039-06-04
Also published as: CN110377841A

Abstract

The embodiment of the invention provides a collaborative filtering recommendation method based on relevant comment information of a user without grading data of the user, and particularly improves the step of calculating the similarity in the conventional collaborative filtering recommendation method, so that a similarity calculation part only carries out modeling according to the comment information of the user, and then the grading information of the user is directly input during use to obtain a similarity result. Therefore, the calculation of the similarity in the collaborative filtering recommendation method can be realized on the basis of no need of the scoring data of the user.

Description

Similarity calculation method and system applied to collaborative filtering method

Technical Field

The invention relates to the technical field of computers, in particular to a similarity calculation method and a similarity calculation system applied to a collaborative filtering recommendation method.

Background

With the rapid development of the internet technology, the internet network side carries out personalized recommendation for the user according to the user data. In the process of personalized recommendation, recommendation information which is interested in a user needs to be provided for the user according to the historical preference and the historical data of the user. In order to recommend information for the user, a collaborative filtering recommendation method can be adopted for recommendation.

Most of the current collaborative filtering recommendation methods need to construct a recommendation model based on explicit rating of a user on an article, then input rating information of the user into the recommendation model, and finally output the rating information to obtain recommendation information.

The current collaborative filtering recommendation method comprises the following steps:

the first step, calculating the similarity between users

There are many methods for calculating similarity between users currently, among which the widely used methods include euclidean distance, cosine similarity, pierce correlation coefficient, and jaccard correlation coefficient, etc., all of which need to be calculated based on the scores of the users on the articles, and the jaccard correlation coefficient can complete the calculation of the user similarity without being scored, considering the number of the articles related to the users, and the calculation formula is:

wherein Jaccard (u, v) represents the similarity between user u and user v; i is_uAnd I_vRepresenting a set of items associated with user u and user v, respectively; i is_u,vRepresenting the intersection of items associated with user u and user v.

Second step, obtaining K nearest neighbor user sets of target users

And screening out K user sets with the maximum similarity to the target user based on the similarity between the users calculated in the first step, namely screening out K users with the maximum similarity to the target user.

A third step of acquiring a potential recommended item set of the target user

Based on K nearest neighbor user sets of a target user, a potential recommended item set of the target user is obtained, and the specific implementation steps are as follows: a. acquiring a union set of related articles of all users in K nearest neighbor user sets of a target user; b. deleting all items related to the target user from the union of related items in a; c. and the item set obtained according to the step b is a potential recommended item set of the target user.

The fourth step, obtain the recommended article set to the goal user

Respectively calculating the preference degrees of all the articles in the potential recommended article set of the target user obtained in the third step, wherein the calculation formula is as follows:

wherein p is_u,iRepresenting the preference of the user u for the item i; u shape_iRepresenting a set of users associated with item i; u shape_uK nearest neighbor user set representing user u; s_u,vRepresenting the similarity between user u and user v; r is_viRepresenting the user v's rating of item i.

It can be seen that, in the whole process of the conventional collaborative filtering recommendation method, scoring data actively provided by a user is required to participate, and when the scoring data of the user cannot be obtained, collaborative filtering recommendation cannot be completed. However, it is now increasingly common that: the network side cannot acquire active scoring data of the user, for example, the network side does not provide an explicit scoring option for the item, but only provides a comment option for the item, similar to options such as praise or collection, and the like, and at this time, the current collaborative filtering recommendation method cannot be adopted to recommend information for the user.

Furthermore, as a calculation basis in the collaborative filtering recommendation method, similarity calculation between users, in addition to calculation in the form of the Jacard similarity coefficient, needs to be performed in other calculation modes in order to meet the scoring data of users.

Disclosure of Invention

In view of this, the embodiment of the present invention provides a similarity calculation method applied to a collaborative filtering recommendation method, which can realize the calculation of similarities in the collaborative filtering recommendation method on the basis of no user scoring data.

The embodiment of the invention also provides a similarity calculation system applied to the collaborative filtering recommendation method, and the system can realize the calculation of the similarity in the collaborative filtering recommendation method on the basis of not needing the scoring data of the user.

The embodiment of the invention is realized as follows:

a similarity calculation method applied to a collaborative filtering recommendation method comprises the following steps:

based on the comment information of the users, similarity modeling between the users or between the articles is carried out;

and obtaining comment information of the users, and inputting the comment information into a similarity model between the users or between the articles to obtain a similarity result between the users or a similarity result between the articles.

The comment information of the user includes: the item set related to the user and the acquired characteristic information of the user.

The modeling of similarity between users is based on: the user's attention to the item, the item's popularity, the number of non-co-related items, and the number of co-related items.

The formula for modeling the similarity between users is as follows:

wherein us_u,vRepresenting the similarity between user u and user v; i is_uAnd I_vRespectively representing item sets which are commented by the user u and the user v; i is_u,vA collection of items representing common comments of user u and user v; alpha is alpha_uThe similarity coefficient of the user is more than 0 and is set to be 1; beta is a_u0 is the user Jacard coefficient, set to 0.5; noc_u,iRepresenting the number of comments of the user u on the item i; noc_v,iRepresenting the number of comments of the user v on the item i; wherein U is_iRepresenting a set of users associated with item i.

The modeling of similarity between items is based on: items by degree of interest, user interest popularity, number of non-co-related users, and number of co-related users.

The formula adopted for modeling the similarity between the articles is as follows:

wherein is_i,jRepresenting the similarity between item i and item j;

U_iand U_jA set of review users representing item i and item j, respectively;

U_i,ja set of co-commenting users representing item i and item j;

noc_u,irepresenting the number of comments of the user u on the item i; noc_v,iRepresenting the number of comments of the user u on the item j;

I_urepresenting a set of items that user u has commented on;

α_ithe item similarity coefficient is more than 0 and is initially set to 1; beta is a_i> 0 is the item Jacard coefficient, initially set to 0.5.

A is said_iAnd beta_iAnd respectively updating in real time.

The method further comprises the following steps:

acquiring a nearest neighbor user set of the set number of users based on the similarity result between the users;

acquiring a potential recommended item set of the user according to the recommended item set of the nearest neighbor user set of the set number of the users;

and inputting the obtained potential recommended item set of the user into a set recommended item model to obtain an item set recommended for the user.

The recommended article model is as follows:

wherein candidateItem_uA set of candidate recommended items representing a target user u; i is_vRepresenting a collection of items related to user v; i is_uRepresenting a set of items that user u has commented on;

the recognition degree of the nearest neighbor user with the set K number for the user u to the article i is represented;

setting a recommended weight value for each item in the candidate recommended item set of the target user u, and taking the set number of items with the maximum calculated weight values as recommended item results;

the calculation of setting a recommended weight value for each item includes:

p_u,i＝mus_u,i·recognition_i·pml_u,i·ma_u,i·uic_u,i·heat_i

wherein p is_u,iRepresenting the preference of user u relative to item i; uic therein_u,iRepresenting a correlation between user u and item i;

mus_u,irepresenting the maximum of item i with respect to user uLarge user similarity, U_iSet of commenting users, s, representing item i_u,vRepresenting the similarity between users u and v;

recognition_u,iacceptance, state of nearest neighbor user to item i indicating number K of settings of user u_v,iA status flag indicating whether item i is associated with user v,

wherein the content of the first and second substances,

pml_u,ilevel of matching of a set of tags representing item i with a set of user u portraits, F_iRepresenting a set of attributes, LUP, of an item i_uRepresenting a user u implicit image set;

ma_u,irepresents the maximum attention, of item i relative to user u_v,iIs calculated as

attention_u,iRepresents the attention of user u to item i, attention_v,iIndicating the degree of interest, noc, of user v in item i_u,iRepresenting the number of comments of the user u on the item i; k is a focus coefficient, k > 0, k is set to 1;

uic therein_u,iRepresenting a correlation between user u and item i; s_i,jRepresenting the similarity between items i and j;

heat_i＝noc_i，heat_irepresenting the total number of reviews obtained for item i;

noc_iindicating the number of reviews for item i.

A system for calculating similarity applied to a collaborative filtering recommendation method based on the method comprises the following steps: a model building module and a processing module, wherein,

the model building module is used for carrying out similarity modeling between users or objects based on the comment information of the users;

and the processing module is used for acquiring comment information of the users, inputting the comment information into the similarity model between the users or between the articles, and acquiring a similarity result between the users or a similarity result between the articles.

As can be seen from the above, the embodiment of the present invention provides a collaborative filtering recommendation method based on relevant comment information of a user without requiring rating data of the user, and particularly improves the step of calculating the similarity in the existing collaborative filtering recommendation method, so that the similarity calculation part performs modeling only according to the comment information of the user, and then directly inputs the rating information of the user when in use, thereby obtaining a similarity result. Therefore, the calculation of the similarity in the collaborative filtering recommendation method can be realized on the basis of no need of the scoring data of the user.

Drawings

Fig. 1 is a flowchart of a similarity calculation method applied to a collaborative filtering recommendation method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a similarity calculation system applied in a collaborative filtering recommendation method according to an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating an execution process of a collaborative filtering recommendation method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples.

It can be seen from the background art that the entire process of the conventional collaborative filtering recommendation method requires the scoring data actively provided by the user to participate, and when the scoring data of the user is not obtained, the collaborative filtering recommendation cannot be completed, wherein the calculation of the similarity between the users in the conventional collaborative filtering recommendation method does not specifically indicate how to perform the calculation based on the scoring data of the user.

In order to overcome the problems, the embodiment of the invention provides a collaborative filtering recommendation method based on the relevant comment information of the user without the grading data of the user, and particularly improves the step of calculating the similarity in the existing collaborative filtering recommendation method, so that the similarity calculation part only models according to the comment information of the user, and then the grading information of the user is directly input during use to obtain the similarity result.

Therefore, the calculation of the similarity in the collaborative filtering recommendation method can be realized on the basis of no need of the scoring data of the user.

In the embodiment of the invention, the calculation of the similarity part in the collaborative filtering recommendation method can be performed according to the similarity between users and also according to the similarity between articles.

The calculation of the similarity in the embodiment of the invention is based on the user comment information, rather than the score data of the user, and the user comment information comprises: the item set related to the user and the acquired characteristic information of the user. The characteristic information of the user may also be referred to as an implicit representation of the user. At the time of acquisition, Natural Language Processing (NLP) techniques may be employed for acquisition.

In the embodiment of the present invention, the User Profile (UP) can be divided into: an Explicit User Profile (EUP: Explicit User Profile) and an implicit User Profile (LUP: Latent User Profile). Generally, the user portrait refers to background information that can embody user personalization, and the EUP of the embodiment of the present invention refers to information provided by the user himself, such as the native, age, sex, and/or taste of the user, or explicitly mentioned by the user, such as: "I like to eat spicy" and "I have diabetes" in the "I have diabetes". When the characteristic information explicitly mentioned by the user is obtained, the NLP technology is adopted to obtain the characteristic information from the relevant comment information of the user. LUP refers to UP that a user does not explicitly mention, for example, most of a user's daily recipes carry a "pregnant woman" label, then it is presumed that the user or family of the user is pregnant, and the "pregnant" label can be determined as the user's LUP.

In the embodiment of the present invention, the specific manner of acquiring the LUP includes:

1) acquiring a set of items related to a user:

the form of the related articles is different on different websites provided by the internet network side, and on the websites provided by the internet network side, the related article set can be articles bought by the user, articles in the middle year of the shopping cart of the user and even articles browsed by the user.

2) Obtaining an implicit user profile of a user

Counting the information of all articles, such as article attribute information, and determining the attribute information set with the largest occurrence number as the LUP of the user, wherein N1 is an algorithm parameter and needs to be determined according to actual data.

Fig. 1 is a flowchart of a similarity calculation method applied to a collaborative filtering recommendation method according to an embodiment of the present invention, which includes the following specific steps:

step 101, based on comment information of users, carrying out similarity modeling between users or between articles;

and 102, obtaining comment information of the users, inputting the comment information into a similarity model between the users or between the articles, and obtaining a similarity result between the users or a similarity result between the articles.

How to model the similarity between users and the similarity between items is described in detail below.

Performing attention modeling

The attention (attention) refers to the attention of a user to a related article, the number of comments of the user to the article is obtained from the comment information of the user, and the attention is modeled based on the number of comments of the user to the article, wherein the specific modeling form is as follows:

wherein attention_u,iRepresenting the attention of the user u to the item i; noc_u,iRepresenting the number of comments of the user u on the item i; k (> 0) is an attention coefficient, the default value is 1, and the actual value needs to be determined according to specific data. The reason is that when the value of k is small, the influence of the number of comments on the preference intensity cannot be highlighted; when the value is large, the preference intensity of the number of comments 1 is seriously ignored.

Performing similarity modeling

Because the result obtained by the similarity calculation is applied to the collaborative filtering recommendation algorithm, a similarity relation between users needs to be established, and the purpose is to find a neighbor set with similarity preference with a target user, and further complete the collaborative filtering process.

In addition, in the embodiment of the present invention, similarity between the object and the user is also used to weight the candidate recommended object set of the target user, and the similarity between the object and the user is modeled based on the similarity between the target object and each object that is reviewed by the target user, so that the similarity between the objects needs to be measured.

In the background art, except for similarity calculation between users in the jaccard similarity modeling manner, almost all of the rest needs to participate in scoring data of the user on the articles, and the similarity between the users and the similarity between the articles are more effectively modeled based on comment information of the user in the embodiment of the present invention, which is respectively described below.

The similarity calculation between users and the similarity calculation between articles have the same structure, and both of them are composed of two parts: a similarity body part and a Jacard factor part based on Jacard similarity modeling.

Calculation of similarity between users

Based on the comment information of the user, specifically the item set information related to the user, including various information of the items, such as specific object numbers, the number of the commented items of each item, and the like, the similarity (us, user similarity) between the users is modeled, and the whole modeling process takes the following four factors into consideration:

1) user attention to the item: the factor considers that the attention degrees of two users about the same article influence the similarity between the users, and the higher the attention degree of two different users to the same article at the same time, the more similar the two users can be shown to a certain extent, and the specific form is as follows: attention_u,i·attention_v,i。

2) Popularity of the item: this factor takes into account the impact of item popularity on user similarity, which is characterized by the number of users associated with the item. The more related users an item has, the more popular the item is, in other words, the item is a favorite item of the public, so it cannot highlight the similarity between two users, and the specific form is:

i.e., us and | U_i| is negatively correlated, where U_iRepresenting a set of users, k, associated with an item i₁(>0) Are parameters.

3) Number of noncorrelated articles: this factor takes into account the effect of non-co-related item quantities on user similarity. Among all items related to at least one of the two users, the smaller the number of the items related to the same user (i.e., the larger the number of the items not related to the same user), the smaller the overlap degree of the preference between the two users, and the smaller the similarity degree between the two users; on the contrary, the method has larger similarity, and the concrete form is based on modeling of Jacard similarity coefficient, namely | I_u,v|/|I_u+I_v|。

4) Number of co-related items: the factor considers the influence of the number of items commonly related to two users on the similarity of the whole user, and obviously, the two users with the larger number of the commonly related items can have higher similarity, and the method is embodied in a superposition form of the user similarity based on a single item.

The overall modeling form of us is as follows:

order to

Then:

wherein us_u,vRepresenting the similarity between users u and v; i is_uAnd I_vRespectively representing item sets commented by users u and v; i is_u,vA collection of items representing common comments by users u and v; alpha is alpha_u(> 0) is a user similarity coefficient, and the default value is 1; beta is a_u(> 0) is the user Jacard coefficient with a default value of 0.5. Alpha is alpha_uAnd beta_uThe actual values need to be determined based on the effect of the experiment on the actual data.

Similarity calculation between articles

In the process of calculating the similarity between the articles, modeling is performed on the is based on the influence of the information to be commented of the articles, including the number of specific commenting users, the number of articles commented by each commenting user and the like on the similarity (is) between the articles, wherein the modeling form is the same as that of the similarity of the users, and four factors are considered, namely:

1) the degree of filling of the article: the factor considers the influence of the attention degree of two different articles about the same user on the similarity of the articles, and the higher the attention degree of two different articles simultaneously obtaining the same user is, the more similar the two articles can be shown to a certain extent, and the specific form is attention_u,i·attention_u,j。

2) The breadth of user interest: this factor takes into account the effect on similarity of items that the interest of a user who is interested in two different items simultaneously is widespread. The larger the number of the items that the user reviews, the more extensive interest of the user is indicated to some extent, that is, even if a certain user reviews two items, only two of the items that the user has reviewed extensively are not good indications that the user has unique feelings about the two items. In other words, the user cannot highlight the similarity between the two items; conversely, if a user merely reviews two items, it may be stated to some extent that the two items are similar in a certain dimension. The specific form is as follows:

that is and | I_uL is negatively correlated, where I_uRepresenting a user u-related set of items, k₂(>0) Are parameters.

3) Number of non-co-related users: this factor takes into account the influence of the number of non-co-related users on the similarity of items. Among all users related to at least one of the two items, the smaller the number of users related to the item, that is, the larger the number of users not related to the item, the smaller the overlap of the attributes between the two items can be, and the two items may have a smaller similarity; on the contrary, the method has larger similarity, and the concrete form is based on modeling of Jacard similarity coefficient, namely | U _ (i, j) |/| U _ i + U _ j |.

4) Number of co-related users: the factor considers the influence of the number of users commonly related to the two articles on the similarity of the whole user, and obviously, the two articles with larger number of commonly related users can have higher similarity, and the method is embodied in a superposition form of the similarity of the articles based on a single user.

The overall modeling form of is as follows:

order to

Then:

wherein is_i,jRepresenting the similarity between items i and j; u shape_iAnd U_jA set of review users representing items i and j, respectively; u shape_i,jA set of co-commenting users representing items i and j; alpha is alpha_i(> 0) is an item similarity coefficient, and the default value is 1; beta is a_i(> 0) is the item Jacard coefficient with a default value of 0.5. Alpha is alpha_iAnd beta_iThe actual values need to be determined based on the effect of the experiment on the actual data.

In the embodiment of the present invention, the similarity calculation model between the users may be applied to the collaborative filtering recommendation method, and other steps in the collaborative filtering recommendation method are performed by using the steps in the background art.

Fig. 2 is a schematic structural diagram of a similarity calculation system applied in a collaborative filtering recommendation method according to an embodiment of the present invention, including: a model building module and a processing module, wherein,

The similarity result between the users provided by the embodiment of the invention can be applied to a collaborative filtering recommendation method, and a final recommended item set is obtained through calculation. At this time, the recommendation method provided in the background art may be adopted, or the recommendation execution process shown in fig. 3 may be adopted, which will be described in detail below.

As shown in fig. 3, the specific steps are executed as follows:

step 1, acquiring a neighbor set of a target user

The neighbor set of the target user refers to a set of users having common related items with the target user. The specific acquisition mode is as follows:

wherein u represents a target user; neighbor carbon_uA neighbor set representing u; i is_uRepresenting a set of items associated with u; u shape_iRepresenting a set of users associated with item i.

Step 2, calculating the similarity between the target user and all the neighbor users

Respectively calculating neighbor based on similarity calculation formula between users_uSimilarity u of the user to the target user.

Step 3, acquiring a set number K nearest neighbor user set of target users

According to the user similarity pair calculated in the step 2_uThe users in (1) are sorted in the order of similarity from big to small, the first K users with the maximum similarity are determined as the K nearest user set of the target user u, and the K nearest user set is represented by symbols

Where K is an algorithm parameter that needs to be determined based on a specific data set.

Step 4, obtaining a candidate recommended item set of the target user

K nearest neighbor user set based on candidate recommended articles of target user u

The acquisition mode is as follows:

wherein candidateItem_uA set of candidate recommended items representing a target user u; i is_vRepresenting a collection of items associated with user v.

Step 5, calculating the recommended weight of the candidate recommended item

At the beginning, candidateItem_uThe recommended weights of all the items in the list are equal, and the default value is 1. This step will be candidateItem_uEach item in the item list is added with a differentiated weight, that is, the default weight 1 is weighted, and a specific weighting method is as follows.

Step 6, generating a recommended article list of the target user

Candidate item pair according to the recommended weight of item calculated in step 5_uThe items in (b) are sorted by weight from large to small, wherein the top N2 items with the largest weight are determined as the recommended item list of the target user.

In order to better recommend item information to a target user, the embodiment of the invention needs to be implemented in candidatetetem_uAnd finding an item which is more in line with the preference of the user, wherein the higher the preference degree of the user for the item is, the higher the recommended weight of the item is. For this reason, the embodiment of the present invention designs 6 factors to measure the preference of the user with respect to the item, including 5 basic factors and 1 expansion factor. The 5 basic factors are: maximum user similarity (mus), item acceptance (recognition), portrait matching level (pml), maximum attention (ma) and user-item correlation (uic); the 1 spreading factor is the item heat (heat).

The concrete modeling scheme of the preference degree is as follows:

p_u,i＝mus_u,i·recognition_i·pml_u,i·ma_u,i·uic_u,i·heat_i (4)

wherein p is_u,iThe table item indicates the preference of user u with respect to item i.

The details of the 5 basic factors and the 1 spreading factor are described below.

Maximum user similarity

Maximum user similarity (mus) refers to: the maximum of the user's similarities with the target user associated with the recommended item is the final uus weighting factor, specifically using the similarity corresponding to the neighbor with the most similarity to the target user. The reason is that: each item in the item candidate set may correspond to multiple neighboring users, and thus, multiple user similarities may also correspond thereto. uus the specific calculation formula is as follows:

wherein mus_u,iRepresenting the maximum user similarity of item i with respect to user u; u shape_iA set of review users representing item i; s_u,vRepresenting the similarity between user u and user v.

Degree of acceptance of article

For candidateItem_uFor example, the set number K of nearest neighbor users associated with each item may be more than one, such as being associated with at least one user, and being associated with at most one user

Is relevant to all users in (1). Accordingly, the acceptance (recognition) of an article means:

the number of users associated with the recommended item, i.e., in candidateItem_uEach article is repeatedly appeared in the generation process. Obviously, the item with the larger number of the associated K nearest neighbor users should obtain higher recommended weight. The concrete calculation formula of recognition is as follows:

wherein the recognition_u,iIndicating the acceptance of the K nearest neighbor user of the user u to the item i; state_v,iAnd the state mark indicates whether the item i is related to the user v or not, and the calculation formula is as follows:

wherein I_vRepresenting a collection of items associated with user v.

Level of portrait matching

The image matching degree (pml) is: size of intersection of attribute set of item and implicit user representation (LUP). Obviously, those items that match the target user representation more closely (the larger the size of the intersection of the set of attributes of the item and the implicit user representation) should receive higher recommendation weights. The specific pml calculation formula is as follows:

pml therein_u,iA level of matching of a set of tags representing item i with a set of user u images; f_iA set of attributes representing item i; LUP_uRepresenting the implicit imagery set of user u.

Maximum degree of attention

The maximum preference strength (ma) refers to: the maximum value of the attention degree of the neighbor users of the target user to the recommended item. When a user reviews an item many times without bother, it is sufficient to see how much the user is interested in the item, which in turn is more in line with the user's preferences, and therefore should get higher weight when recommending. The concrete calculation formula of ma is as follows:

wherein ma_u,iRepresents the maximum attention of item i relative to user u; attention_v,iSee equation (1).

User-item correlation

The correlation (uic) here means: the degree of correlation between the user and the item is modeled based on item similarity, i.e., the average of the similarity between the target item i and all items that are correlated by the target user u. The specific modeling scheme is as follows:

wherein uic represents a correlation between user u and item i; s_i,jRepresenting the similarity between item i and item j.

Additional weighting factors

There are many factors that can reflect the heat (heat) of an item, such as: number of reviews, collection, number of clicks, etc. obtained for an item. Different systems may select different heat factors, here tentatively the total number of reviews obtained for an item.

heat_i＝noc_i (11)

Wherein heat_iRepresenting the total number of reviews obtained for item i.

In addition, in order to show individuation when facing different data, the model provided in the embodiment of the present invention sets a plurality of parameters, which are respectively: number of potential user portraits N1, attention coefficient k, user and item similarity coefficient alpha_uAnd alpha_iJacard coefficient beta of users and articles_uAnd beta_iAnd the number of nearest neighbor users K. In different systems, the scale of the user, the scale of the article and the attribute scale of the article are different, and the above parameters are set to enable the model to better fit the difference.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A similarity calculation method applied to a collaborative filtering recommendation method is characterized by comprising the following steps:

obtaining comment information of users, inputting the comment information into a similarity model between users or between articles, and obtaining a similarity result between the users or a similarity result between the articles;

the comment information of the user includes: the method comprises the steps of collecting items related to a user and obtaining characteristic information of the user;

when the modeling of the similarity between users is based on: when the user focuses on the articles, the popularity of the articles, the quantity of the non-common related articles and the quantity of the common related articles, the formula adopted for modeling the similarity between the users is as follows:

wherein us_u，vRepresenting the similarity between user u and user v; i is_uAnd I_vRespectively representing item sets which are commented by the user u and the user v; i is_u，vA collection of items representing common comments of user u and user v; alpha is alpha_uThe similarity coefficient of the user is more than 0 and is set to be 1; beta is a_u0 is the user Jacard coefficient, set to 0.5; noc_u，iRepresenting the number of comments of the user u on the item i; noc_v，iRepresenting the number of comments of the user v on the item i; wherein U is_iRepresenting a set of users associated with item i;

when the performing similarity modeling between items is based on: when the items are the attention degree, the user interest universality, the non-common related user quantity and the common related user quantity, the similarity modeling between the items adopts the following formula:

wherein is_i，jRepresenting the similarity between item i and item j;

U_iand U_jA set of review users representing item i and item j, respectively;

U_i，ja set of co-commenting users representing item i and item j;

noc_u，irepresenting the number of comments of the user u on the item i; noc_v，iRepresenting the number of comments of the user u on the item j;

I_urepresenting a set of items that user u has commented on;

2. The method of claim 1, wherein α is_iAnd beta_iAnd respectively updating in real time.

3. The method of claim 1, further comprising:

4. The method of claim 3, wherein the recommended item model is:

the calculation of setting a recommended weight value for each item includes:

p_u，i＝mus_u，i·recognition_u，i·pml_u，i·ma_u，i·uic_u，i·heat_i

wherein p is_u，iRepresenting the preference of user u relative to item i; uic therein_u，iRepresenting a correlation between user u and item i;

mus_u，irepresents the maximum user similarity of item i with respect to user U, U_iSet of commenting users, s, representing item i_u，vRepresenting the similarity between users u and v;

recognition_u，iacceptance, state of nearest neighbor user to item i indicating number K of settings of user u_v，iA status flag indicating whether item i is associated with user v,

wherein the content of the first and second substances,

pml_u，ilevel of matching of a set of tags representing item i with a set of user u portraits, F_iRepresenting a set of attributes, LUP, of an item i_uRepresenting a user u implicit image set;

ma_u，irepresents the maximum attention, of item i relative to user u_v，iIs calculated as

attention_u，iRepresents the attention of user u to item i, attention_v，iIndicating the degree of interest, noc, of user v in item i_u，iRepresenting the number of comments of the user u on the item i; k is a focus coefficient, k > 0, k is set to 1;

uic therein_u，iRepresenting a correlation between user u and item i; s_i，jRepresenting the similarity between items i and j;

noc_iindicating the number of reviews for item i.

5. A system for calculating similarity applied in a collaborative filtering recommendation method based on the method of claim 1, comprising: a model building module and a processing module, wherein,