CN110377841B - Similarity calculation method and system applied to collaborative filtering method - Google Patents

Similarity calculation method and system applied to collaborative filtering method Download PDF

Info

Publication number
CN110377841B
CN110377841B CN201910478934.9A CN201910478934A CN110377841B CN 110377841 B CN110377841 B CN 110377841B CN 201910478934 A CN201910478934 A CN 201910478934A CN 110377841 B CN110377841 B CN 110377841B
Authority
CN
China
Prior art keywords
user
item
similarity
representing
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910478934.9A
Other languages
Chinese (zh)
Other versions
CN110377841A (en
Inventor
杨志明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ideepwise Artificial Intelligence Robot Technology Beijing Co ltd
Original Assignee
Ideepwise Artificial Intelligence Robot Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ideepwise Artificial Intelligence Robot Technology Beijing Co ltd filed Critical Ideepwise Artificial Intelligence Robot Technology Beijing Co ltd
Priority to CN201910478934.9A priority Critical patent/CN110377841B/en
Publication of CN110377841A publication Critical patent/CN110377841A/en
Application granted granted Critical
Publication of CN110377841B publication Critical patent/CN110377841B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering

Abstract

The embodiment of the invention provides a collaborative filtering recommendation method based on relevant comment information of a user without grading data of the user, and particularly improves the step of calculating the similarity in the conventional collaborative filtering recommendation method, so that a similarity calculation part only carries out modeling according to the comment information of the user, and then the grading information of the user is directly input during use to obtain a similarity result. Therefore, the calculation of the similarity in the collaborative filtering recommendation method can be realized on the basis of no need of the scoring data of the user.

Description

Similarity calculation method and system applied to collaborative filtering method
Technical Field
The invention relates to the technical field of computers, in particular to a similarity calculation method and a similarity calculation system applied to a collaborative filtering recommendation method.
Background
With the rapid development of the internet technology, the internet network side carries out personalized recommendation for the user according to the user data. In the process of personalized recommendation, recommendation information which is interested in a user needs to be provided for the user according to the historical preference and the historical data of the user. In order to recommend information for the user, a collaborative filtering recommendation method can be adopted for recommendation.
Most of the current collaborative filtering recommendation methods need to construct a recommendation model based on explicit rating of a user on an article, then input rating information of the user into the recommendation model, and finally output the rating information to obtain recommendation information.
The current collaborative filtering recommendation method comprises the following steps:
the first step, calculating the similarity between users
There are many methods for calculating similarity between users currently, among which the widely used methods include euclidean distance, cosine similarity, pierce correlation coefficient, and jaccard correlation coefficient, etc., all of which need to be calculated based on the scores of the users on the articles, and the jaccard correlation coefficient can complete the calculation of the user similarity without being scored, considering the number of the articles related to the users, and the calculation formula is:
Figure GDA0003337595510000011
wherein Jaccard (u, v) represents the similarity between user u and user v; i isuAnd IvRepresenting a set of items associated with user u and user v, respectively; i isu,vRepresenting the intersection of items associated with user u and user v.
Second step, obtaining K nearest neighbor user sets of target users
And screening out K user sets with the maximum similarity to the target user based on the similarity between the users calculated in the first step, namely screening out K users with the maximum similarity to the target user.
A third step of acquiring a potential recommended item set of the target user
Based on K nearest neighbor user sets of a target user, a potential recommended item set of the target user is obtained, and the specific implementation steps are as follows: a. acquiring a union set of related articles of all users in K nearest neighbor user sets of a target user; b. deleting all items related to the target user from the union of related items in a; c. and the item set obtained according to the step b is a potential recommended item set of the target user.
The fourth step, obtain the recommended article set to the goal user
Respectively calculating the preference degrees of all the articles in the potential recommended article set of the target user obtained in the third step, wherein the calculation formula is as follows:
Figure GDA0003337595510000021
wherein p isu,iRepresenting the preference of the user u for the item i; u shapeiRepresenting a set of users associated with item i; u shapeuK nearest neighbor user set representing user u; su,vRepresenting the similarity between user u and user v; r isviRepresenting the user v's rating of item i.
It can be seen that, in the whole process of the conventional collaborative filtering recommendation method, scoring data actively provided by a user is required to participate, and when the scoring data of the user cannot be obtained, collaborative filtering recommendation cannot be completed. However, it is now increasingly common that: the network side cannot acquire active scoring data of the user, for example, the network side does not provide an explicit scoring option for the item, but only provides a comment option for the item, similar to options such as praise or collection, and the like, and at this time, the current collaborative filtering recommendation method cannot be adopted to recommend information for the user.
Furthermore, as a calculation basis in the collaborative filtering recommendation method, similarity calculation between users, in addition to calculation in the form of the Jacard similarity coefficient, needs to be performed in other calculation modes in order to meet the scoring data of users.
Disclosure of Invention
In view of this, the embodiment of the present invention provides a similarity calculation method applied to a collaborative filtering recommendation method, which can realize the calculation of similarities in the collaborative filtering recommendation method on the basis of no user scoring data.
The embodiment of the invention also provides a similarity calculation system applied to the collaborative filtering recommendation method, and the system can realize the calculation of the similarity in the collaborative filtering recommendation method on the basis of not needing the scoring data of the user.
The embodiment of the invention is realized as follows:
a similarity calculation method applied to a collaborative filtering recommendation method comprises the following steps:
based on the comment information of the users, similarity modeling between the users or between the articles is carried out;
and obtaining comment information of the users, and inputting the comment information into a similarity model between the users or between the articles to obtain a similarity result between the users or a similarity result between the articles.
The comment information of the user includes: the item set related to the user and the acquired characteristic information of the user.
The modeling of similarity between users is based on: the user's attention to the item, the item's popularity, the number of non-co-related items, and the number of co-related items.
The formula for modeling the similarity between users is as follows:
Figure GDA0003337595510000022
wherein usu,vRepresenting the similarity between user u and user v; i isuAnd IvRespectively representing item sets which are commented by the user u and the user v; i isu,vA collection of items representing common comments of user u and user v; alpha is alphauThe similarity coefficient of the user is more than 0 and is set to be 1; beta is au0 is the user Jacard coefficient, set to 0.5; nocu,iRepresenting the number of comments of the user u on the item i; nocv,iRepresenting the number of comments of the user v on the item i; wherein U isiRepresenting a set of users associated with item i.
The modeling of similarity between items is based on: items by degree of interest, user interest popularity, number of non-co-related users, and number of co-related users.
The formula adopted for modeling the similarity between the articles is as follows:
Figure GDA0003337595510000031
wherein isi,jRepresenting the similarity between item i and item j;
Uiand UjA set of review users representing item i and item j, respectively;
Ui,ja set of co-commenting users representing item i and item j;
nocu,irepresenting the number of comments of the user u on the item i; nocv,iRepresenting the number of comments of the user u on the item j;
Iurepresenting a set of items that user u has commented on;
αithe item similarity coefficient is more than 0 and is initially set to 1; beta is ai> 0 is the item Jacard coefficient, initially set to 0.5.
A is saidiAnd betaiAnd respectively updating in real time.
The method further comprises the following steps:
acquiring a nearest neighbor user set of the set number of users based on the similarity result between the users;
acquiring a potential recommended item set of the user according to the recommended item set of the nearest neighbor user set of the set number of the users;
and inputting the obtained potential recommended item set of the user into a set recommended item model to obtain an item set recommended for the user.
The recommended article model is as follows:
Figure GDA0003337595510000032
wherein candidateItemuA set of candidate recommended items representing a target user u; i isvRepresenting a collection of items related to user v; i isuRepresenting a set of items that user u has commented on;
Figure GDA0003337595510000033
the recognition degree of the nearest neighbor user with the set K number for the user u to the article i is represented;
setting a recommended weight value for each item in the candidate recommended item set of the target user u, and taking the set number of items with the maximum calculated weight values as recommended item results;
the calculation of setting a recommended weight value for each item includes:
pu,i=musu,i·recognitioni·pmlu,i·mau,i·uicu,i·heati
wherein p isu,iRepresenting the preference of user u relative to item i; uic thereinu,iRepresenting a correlation between user u and item i;
Figure GDA0003337595510000041
musu,irepresenting the maximum of item i with respect to user uLarge user similarity, UiSet of commenting users, s, representing item iu,vRepresenting the similarity between users u and v;
Figure GDA0003337595510000042
recognitionu,iacceptance, state of nearest neighbor user to item i indicating number K of settings of user uv,iA status flag indicating whether item i is associated with user v,
Figure GDA0003337595510000043
wherein the content of the first and second substances,
Figure GDA0003337595510000044
pmlu,ilevel of matching of a set of tags representing item i with a set of user u portraits, FiRepresenting a set of attributes, LUP, of an item iuRepresenting a user u implicit image set;
Figure GDA0003337595510000045
mau,irepresents the maximum attention, of item i relative to user uv,iIs calculated as
Figure GDA0003337595510000046
attentionu,iRepresents the attention of user u to item i, attentionv,iIndicating the degree of interest, noc, of user v in item iu,iRepresenting the number of comments of the user u on the item i; k is a focus coefficient, k > 0, k is set to 1;
Figure GDA0003337595510000047
uic thereinu,iRepresenting a correlation between user u and item i; si,jRepresenting the similarity between items i and j;
heati=noci,heatirepresenting the total number of reviews obtained for item i;
nociindicating the number of reviews for item i.
A system for calculating similarity applied to a collaborative filtering recommendation method based on the method comprises the following steps: a model building module and a processing module, wherein,
the model building module is used for carrying out similarity modeling between users or objects based on the comment information of the users;
and the processing module is used for acquiring comment information of the users, inputting the comment information into the similarity model between the users or between the articles, and acquiring a similarity result between the users or a similarity result between the articles.
As can be seen from the above, the embodiment of the present invention provides a collaborative filtering recommendation method based on relevant comment information of a user without requiring rating data of the user, and particularly improves the step of calculating the similarity in the existing collaborative filtering recommendation method, so that the similarity calculation part performs modeling only according to the comment information of the user, and then directly inputs the rating information of the user when in use, thereby obtaining a similarity result. Therefore, the calculation of the similarity in the collaborative filtering recommendation method can be realized on the basis of no need of the scoring data of the user.
Drawings
Fig. 1 is a flowchart of a similarity calculation method applied to a collaborative filtering recommendation method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a similarity calculation system applied in a collaborative filtering recommendation method according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating an execution process of a collaborative filtering recommendation method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples.
It can be seen from the background art that the entire process of the conventional collaborative filtering recommendation method requires the scoring data actively provided by the user to participate, and when the scoring data of the user is not obtained, the collaborative filtering recommendation cannot be completed, wherein the calculation of the similarity between the users in the conventional collaborative filtering recommendation method does not specifically indicate how to perform the calculation based on the scoring data of the user.
In order to overcome the problems, the embodiment of the invention provides a collaborative filtering recommendation method based on the relevant comment information of the user without the grading data of the user, and particularly improves the step of calculating the similarity in the existing collaborative filtering recommendation method, so that the similarity calculation part only models according to the comment information of the user, and then the grading information of the user is directly input during use to obtain the similarity result.
Therefore, the calculation of the similarity in the collaborative filtering recommendation method can be realized on the basis of no need of the scoring data of the user.
In the embodiment of the invention, the calculation of the similarity part in the collaborative filtering recommendation method can be performed according to the similarity between users and also according to the similarity between articles.
The calculation of the similarity in the embodiment of the invention is based on the user comment information, rather than the score data of the user, and the user comment information comprises: the item set related to the user and the acquired characteristic information of the user. The characteristic information of the user may also be referred to as an implicit representation of the user. At the time of acquisition, Natural Language Processing (NLP) techniques may be employed for acquisition.
In the embodiment of the present invention, the User Profile (UP) can be divided into: an Explicit User Profile (EUP: Explicit User Profile) and an implicit User Profile (LUP: Latent User Profile). Generally, the user portrait refers to background information that can embody user personalization, and the EUP of the embodiment of the present invention refers to information provided by the user himself, such as the native, age, sex, and/or taste of the user, or explicitly mentioned by the user, such as: "I like to eat spicy" and "I have diabetes" in the "I have diabetes". When the characteristic information explicitly mentioned by the user is obtained, the NLP technology is adopted to obtain the characteristic information from the relevant comment information of the user. LUP refers to UP that a user does not explicitly mention, for example, most of a user's daily recipes carry a "pregnant woman" label, then it is presumed that the user or family of the user is pregnant, and the "pregnant" label can be determined as the user's LUP.
In the embodiment of the present invention, the specific manner of acquiring the LUP includes:
1) acquiring a set of items related to a user:
the form of the related articles is different on different websites provided by the internet network side, and on the websites provided by the internet network side, the related article set can be articles bought by the user, articles in the middle year of the shopping cart of the user and even articles browsed by the user.
2) Obtaining an implicit user profile of a user
Counting the information of all articles, such as article attribute information, and determining the attribute information set with the largest occurrence number as the LUP of the user, wherein N1 is an algorithm parameter and needs to be determined according to actual data.
Fig. 1 is a flowchart of a similarity calculation method applied to a collaborative filtering recommendation method according to an embodiment of the present invention, which includes the following specific steps:
step 101, based on comment information of users, carrying out similarity modeling between users or between articles;
and 102, obtaining comment information of the users, inputting the comment information into a similarity model between the users or between the articles, and obtaining a similarity result between the users or a similarity result between the articles.
How to model the similarity between users and the similarity between items is described in detail below.
Performing attention modeling
The attention (attention) refers to the attention of a user to a related article, the number of comments of the user to the article is obtained from the comment information of the user, and the attention is modeled based on the number of comments of the user to the article, wherein the specific modeling form is as follows:
Figure GDA0003337595510000061
wherein attentionu,iRepresenting the attention of the user u to the item i; nocu,iRepresenting the number of comments of the user u on the item i; k (> 0) is an attention coefficient, the default value is 1, and the actual value needs to be determined according to specific data. The reason is that when the value of k is small, the influence of the number of comments on the preference intensity cannot be highlighted; when the value is large, the preference intensity of the number of comments 1 is seriously ignored.
Performing similarity modeling
Because the result obtained by the similarity calculation is applied to the collaborative filtering recommendation algorithm, a similarity relation between users needs to be established, and the purpose is to find a neighbor set with similarity preference with a target user, and further complete the collaborative filtering process.
In addition, in the embodiment of the present invention, similarity between the object and the user is also used to weight the candidate recommended object set of the target user, and the similarity between the object and the user is modeled based on the similarity between the target object and each object that is reviewed by the target user, so that the similarity between the objects needs to be measured.
In the background art, except for similarity calculation between users in the jaccard similarity modeling manner, almost all of the rest needs to participate in scoring data of the user on the articles, and the similarity between the users and the similarity between the articles are more effectively modeled based on comment information of the user in the embodiment of the present invention, which is respectively described below.
The similarity calculation between users and the similarity calculation between articles have the same structure, and both of them are composed of two parts: a similarity body part and a Jacard factor part based on Jacard similarity modeling.
Calculation of similarity between users
Based on the comment information of the user, specifically the item set information related to the user, including various information of the items, such as specific object numbers, the number of the commented items of each item, and the like, the similarity (us, user similarity) between the users is modeled, and the whole modeling process takes the following four factors into consideration:
1) user attention to the item: the factor considers that the attention degrees of two users about the same article influence the similarity between the users, and the higher the attention degree of two different users to the same article at the same time, the more similar the two users can be shown to a certain extent, and the specific form is as follows: attentionu,i·attentionv,i
2) Popularity of the item: this factor takes into account the impact of item popularity on user similarity, which is characterized by the number of users associated with the item. The more related users an item has, the more popular the item is, in other words, the item is a favorite item of the public, so it cannot highlight the similarity between two users, and the specific form is:
Figure GDA0003337595510000071
i.e., us and | Ui| is negatively correlated, where UiRepresenting a set of users, k, associated with an item i1(>0) Are parameters.
3) Number of noncorrelated articles: this factor takes into account the effect of non-co-related item quantities on user similarity. Among all items related to at least one of the two users, the smaller the number of the items related to the same user (i.e., the larger the number of the items not related to the same user), the smaller the overlap degree of the preference between the two users, and the smaller the similarity degree between the two users; on the contrary, the method has larger similarity, and the concrete form is based on modeling of Jacard similarity coefficient, namely | Iu,v|/|Iu+Iv|。
4) Number of co-related items: the factor considers the influence of the number of items commonly related to two users on the similarity of the whole user, and obviously, the two users with the larger number of the commonly related items can have higher similarity, and the method is embodied in a superposition form of the user similarity based on a single item.
The overall modeling form of us is as follows:
Figure GDA0003337595510000072
order to
Figure GDA0003337595510000073
Then:
Figure GDA0003337595510000081
wherein usu,vRepresenting the similarity between users u and v; i isuAnd IvRespectively representing item sets commented by users u and v; i isu,vA collection of items representing common comments by users u and v; alpha is alphau(> 0) is a user similarity coefficient, and the default value is 1; beta is au(> 0) is the user Jacard coefficient with a default value of 0.5. Alpha is alphauAnd betauThe actual values need to be determined based on the effect of the experiment on the actual data.
Similarity calculation between articles
In the process of calculating the similarity between the articles, modeling is performed on the is based on the influence of the information to be commented of the articles, including the number of specific commenting users, the number of articles commented by each commenting user and the like on the similarity (is) between the articles, wherein the modeling form is the same as that of the similarity of the users, and four factors are considered, namely:
1) the degree of filling of the article: the factor considers the influence of the attention degree of two different articles about the same user on the similarity of the articles, and the higher the attention degree of two different articles simultaneously obtaining the same user is, the more similar the two articles can be shown to a certain extent, and the specific form is attentionu,i·attentionu,j
2) The breadth of user interest: this factor takes into account the effect on similarity of items that the interest of a user who is interested in two different items simultaneously is widespread. The larger the number of the items that the user reviews, the more extensive interest of the user is indicated to some extent, that is, even if a certain user reviews two items, only two of the items that the user has reviewed extensively are not good indications that the user has unique feelings about the two items. In other words, the user cannot highlight the similarity between the two items; conversely, if a user merely reviews two items, it may be stated to some extent that the two items are similar in a certain dimension. The specific form is as follows:
Figure GDA0003337595510000082
that is and | IuL is negatively correlated, where IuRepresenting a user u-related set of items, k2(>0) Are parameters.
3) Number of non-co-related users: this factor takes into account the influence of the number of non-co-related users on the similarity of items. Among all users related to at least one of the two items, the smaller the number of users related to the item, that is, the larger the number of users not related to the item, the smaller the overlap of the attributes between the two items can be, and the two items may have a smaller similarity; on the contrary, the method has larger similarity, and the concrete form is based on modeling of Jacard similarity coefficient, namely | U _ (i, j) |/| U _ i + U _ j |.
4) Number of co-related users: the factor considers the influence of the number of users commonly related to the two articles on the similarity of the whole user, and obviously, the two articles with larger number of commonly related users can have higher similarity, and the method is embodied in a superposition form of the similarity of the articles based on a single user.
The overall modeling form of is as follows:
Figure GDA0003337595510000083
Figure GDA0003337595510000091
order to
Figure GDA0003337595510000092
Then:
Figure GDA0003337595510000093
wherein isi,jRepresenting the similarity between items i and j; u shapeiAnd UjA set of review users representing items i and j, respectively; u shapei,jA set of co-commenting users representing items i and j; alpha is alphai(> 0) is an item similarity coefficient, and the default value is 1; beta is ai(> 0) is the item Jacard coefficient with a default value of 0.5. Alpha is alphaiAnd betaiThe actual values need to be determined based on the effect of the experiment on the actual data.
In the embodiment of the present invention, the similarity calculation model between the users may be applied to the collaborative filtering recommendation method, and other steps in the collaborative filtering recommendation method are performed by using the steps in the background art.
Fig. 2 is a schematic structural diagram of a similarity calculation system applied in a collaborative filtering recommendation method according to an embodiment of the present invention, including: a model building module and a processing module, wherein,
the model building module is used for carrying out similarity modeling between users or objects based on the comment information of the users;
and the processing module is used for acquiring comment information of the users, inputting the comment information into the similarity model between the users or between the articles, and acquiring a similarity result between the users or a similarity result between the articles.
The similarity result between the users provided by the embodiment of the invention can be applied to a collaborative filtering recommendation method, and a final recommended item set is obtained through calculation. At this time, the recommendation method provided in the background art may be adopted, or the recommendation execution process shown in fig. 3 may be adopted, which will be described in detail below.
As shown in fig. 3, the specific steps are executed as follows:
step 1, acquiring a neighbor set of a target user
The neighbor set of the target user refers to a set of users having common related items with the target user. The specific acquisition mode is as follows:
Figure GDA0003337595510000094
wherein u represents a target user; neighbor carbonuA neighbor set representing u; i isuRepresenting a set of items associated with u; u shapeiRepresenting a set of users associated with item i.
Step 2, calculating the similarity between the target user and all the neighbor users
Respectively calculating neighbor based on similarity calculation formula between usersuSimilarity u of the user to the target user.
Step 3, acquiring a set number K nearest neighbor user set of target users
According to the user similarity pair calculated in the step 2uThe users in (1) are sorted in the order of similarity from big to small, the first K users with the maximum similarity are determined as the K nearest user set of the target user u, and the K nearest user set is represented by symbols
Figure GDA0003337595510000095
Where K is an algorithm parameter that needs to be determined based on a specific data set.
Step 4, obtaining a candidate recommended item set of the target user
K nearest neighbor user set based on candidate recommended articles of target user u
Figure GDA0003337595510000101
The acquisition mode is as follows:
Figure GDA0003337595510000102
wherein candidateItemuA set of candidate recommended items representing a target user u; i isvRepresenting a collection of items associated with user v.
Step 5, calculating the recommended weight of the candidate recommended item
At the beginning, candidateItemuThe recommended weights of all the items in the list are equal, and the default value is 1. This step will be candidateItemuEach item in the item list is added with a differentiated weight, that is, the default weight 1 is weighted, and a specific weighting method is as follows.
Step 6, generating a recommended article list of the target user
Candidate item pair according to the recommended weight of item calculated in step 5uThe items in (b) are sorted by weight from large to small, wherein the top N2 items with the largest weight are determined as the recommended item list of the target user.
In order to better recommend item information to a target user, the embodiment of the invention needs to be implemented in candidatetetemuAnd finding an item which is more in line with the preference of the user, wherein the higher the preference degree of the user for the item is, the higher the recommended weight of the item is. For this reason, the embodiment of the present invention designs 6 factors to measure the preference of the user with respect to the item, including 5 basic factors and 1 expansion factor. The 5 basic factors are: maximum user similarity (mus), item acceptance (recognition), portrait matching level (pml), maximum attention (ma) and user-item correlation (uic); the 1 spreading factor is the item heat (heat).
The concrete modeling scheme of the preference degree is as follows:
pu,i=musu,i·recognitioni·pmlu,i·mau,i·uicu,i·heati (4)
wherein p isu,iThe table item indicates the preference of user u with respect to item i.
The details of the 5 basic factors and the 1 spreading factor are described below.
Maximum user similarity
Maximum user similarity (mus) refers to: the maximum of the user's similarities with the target user associated with the recommended item is the final uus weighting factor, specifically using the similarity corresponding to the neighbor with the most similarity to the target user. The reason is that: each item in the item candidate set may correspond to multiple neighboring users, and thus, multiple user similarities may also correspond thereto. uus the specific calculation formula is as follows:
Figure GDA0003337595510000103
wherein musu,iRepresenting the maximum user similarity of item i with respect to user u; u shapeiA set of review users representing item i; su,vRepresenting the similarity between user u and user v.
Degree of acceptance of article
For candidateItemuFor example, the set number K of nearest neighbor users associated with each item may be more than one, such as being associated with at least one user, and being associated with at most one user
Figure GDA0003337595510000111
Is relevant to all users in (1). Accordingly, the acceptance (recognition) of an article means:
Figure GDA0003337595510000112
the number of users associated with the recommended item, i.e., in candidateItemuEach article is repeatedly appeared in the generation process. Obviously, the item with the larger number of the associated K nearest neighbor users should obtain higher recommended weight. The concrete calculation formula of recognition is as follows:
Figure GDA0003337595510000113
wherein the recognitionu,iIndicating the acceptance of the K nearest neighbor user of the user u to the item i; statev,iAnd the state mark indicates whether the item i is related to the user v or not, and the calculation formula is as follows:
Figure GDA0003337595510000114
wherein IvRepresenting a collection of items associated with user v.
Level of portrait matching
The image matching degree (pml) is: size of intersection of attribute set of item and implicit user representation (LUP). Obviously, those items that match the target user representation more closely (the larger the size of the intersection of the set of attributes of the item and the implicit user representation) should receive higher recommendation weights. The specific pml calculation formula is as follows:
Figure GDA0003337595510000115
pml thereinu,iA level of matching of a set of tags representing item i with a set of user u images; fiA set of attributes representing item i; LUPuRepresenting the implicit imagery set of user u.
Maximum degree of attention
The maximum preference strength (ma) refers to: the maximum value of the attention degree of the neighbor users of the target user to the recommended item. When a user reviews an item many times without bother, it is sufficient to see how much the user is interested in the item, which in turn is more in line with the user's preferences, and therefore should get higher weight when recommending. The concrete calculation formula of ma is as follows:
Figure GDA0003337595510000116
wherein mau,iRepresents the maximum attention of item i relative to user u; attentionv,iSee equation (1).
User-item correlation
The correlation (uic) here means: the degree of correlation between the user and the item is modeled based on item similarity, i.e., the average of the similarity between the target item i and all items that are correlated by the target user u. The specific modeling scheme is as follows:
Figure GDA0003337595510000117
wherein uic represents a correlation between user u and item i; si,jRepresenting the similarity between item i and item j.
Additional weighting factors
There are many factors that can reflect the heat (heat) of an item, such as: number of reviews, collection, number of clicks, etc. obtained for an item. Different systems may select different heat factors, here tentatively the total number of reviews obtained for an item.
heati=noci (11)
Wherein heatiRepresenting the total number of reviews obtained for item i.
In addition, in order to show individuation when facing different data, the model provided in the embodiment of the present invention sets a plurality of parameters, which are respectively: number of potential user portraits N1, attention coefficient k, user and item similarity coefficient alphauAnd alphaiJacard coefficient beta of users and articlesuAnd betaiAnd the number of nearest neighbor users K. In different systems, the scale of the user, the scale of the article and the attribute scale of the article are different, and the above parameters are set to enable the model to better fit the difference.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (5)

1. A similarity calculation method applied to a collaborative filtering recommendation method is characterized by comprising the following steps:
based on the comment information of the users, similarity modeling between the users or between the articles is carried out;
obtaining comment information of users, inputting the comment information into a similarity model between users or between articles, and obtaining a similarity result between the users or a similarity result between the articles;
the comment information of the user includes: the method comprises the steps of collecting items related to a user and obtaining characteristic information of the user;
when the modeling of the similarity between users is based on: when the user focuses on the articles, the popularity of the articles, the quantity of the non-common related articles and the quantity of the common related articles, the formula adopted for modeling the similarity between the users is as follows:
Figure FDA0003337595500000011
wherein usu,vRepresenting the similarity between user u and user v; i isuAnd IvRespectively representing item sets which are commented by the user u and the user v; i isu,vA collection of items representing common comments of user u and user v; alpha is alphauThe similarity coefficient of the user is more than 0 and is set to be 1; beta is au0 is the user Jacard coefficient, set to 0.5; nocu,iRepresenting the number of comments of the user u on the item i; nocv,iRepresenting the number of comments of the user v on the item i; wherein U isiRepresenting a set of users associated with item i;
when the performing similarity modeling between items is based on: when the items are the attention degree, the user interest universality, the non-common related user quantity and the common related user quantity, the similarity modeling between the items adopts the following formula:
Figure FDA0003337595500000012
wherein isi,jRepresenting the similarity between item i and item j;
Uiand UjA set of review users representing item i and item j, respectively;
Ui,ja set of co-commenting users representing item i and item j;
nocu,irepresenting the number of comments of the user u on the item i; nocv,iRepresenting the number of comments of the user u on the item j;
Iurepresenting a set of items that user u has commented on;
αithe item similarity coefficient is more than 0 and is initially set to 1; beta is ai> 0 is the item Jacard coefficient, initially set to 0.5.
2. The method of claim 1, wherein α isiAnd betaiAnd respectively updating in real time.
3. The method of claim 1, further comprising:
acquiring a nearest neighbor user set of the set number of users based on the similarity result between the users;
acquiring a potential recommended item set of the user according to the recommended item set of the nearest neighbor user set of the set number of the users;
and inputting the obtained potential recommended item set of the user into a set recommended item model to obtain an item set recommended for the user.
4. The method of claim 3, wherein the recommended item model is:
Figure FDA0003337595500000021
wherein candidateItemuA set of candidate recommended items representing a target user u; i isvRepresenting a collection of items related to user v; i isuRepresenting a set of items that user u has commented on;
Figure FDA0003337595500000022
the recognition degree of the nearest neighbor user with the set K number for the user u to the article i is represented;
setting a recommended weight value for each item in the candidate recommended item set of the target user u, and taking the set number of items with the maximum calculated weight values as recommended item results;
the calculation of setting a recommended weight value for each item includes:
pu,i=musu,i·recognitionu,i·pmlu,i·mau,i·uicu,i·heati
wherein p isu,iRepresenting the preference of user u relative to item i; uic thereinu,iRepresenting a correlation between user u and item i;
Figure FDA0003337595500000023
musu,irepresents the maximum user similarity of item i with respect to user U, UiSet of commenting users, s, representing item iu,vRepresenting the similarity between users u and v;
Figure FDA0003337595500000024
recognitionu,iacceptance, state of nearest neighbor user to item i indicating number K of settings of user uv,iA status flag indicating whether item i is associated with user v,
Figure FDA0003337595500000025
wherein the content of the first and second substances,
Figure FDA0003337595500000026
pmlu,ilevel of matching of a set of tags representing item i with a set of user u portraits, FiRepresenting a set of attributes, LUP, of an item iuRepresenting a user u implicit image set;
Figure FDA0003337595500000027
mau,irepresents the maximum attention, of item i relative to user uv,iIs calculated as
Figure FDA0003337595500000028
attentionu,iRepresents the attention of user u to item i, attentionv,iIndicating the degree of interest, noc, of user v in item iu,iRepresenting the number of comments of the user u on the item i; k is a focus coefficient, k > 0, k is set to 1;
Figure FDA0003337595500000029
uic thereinu,iRepresenting a correlation between user u and item i; si,jRepresenting the similarity between items i and j;
heati=noci,heatirepresenting the total number of reviews obtained for item i;
nociindicating the number of reviews for item i.
5. A system for calculating similarity applied in a collaborative filtering recommendation method based on the method of claim 1, comprising: a model building module and a processing module, wherein,
the model building module is used for carrying out similarity modeling between users or objects based on the comment information of the users;
and the processing module is used for acquiring comment information of the users, inputting the comment information into the similarity model between the users or between the articles, and acquiring a similarity result between the users or a similarity result between the articles.
CN201910478934.9A 2019-06-04 2019-06-04 Similarity calculation method and system applied to collaborative filtering method Active CN110377841B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910478934.9A CN110377841B (en) 2019-06-04 2019-06-04 Similarity calculation method and system applied to collaborative filtering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910478934.9A CN110377841B (en) 2019-06-04 2019-06-04 Similarity calculation method and system applied to collaborative filtering method

Publications (2)

Publication Number Publication Date
CN110377841A CN110377841A (en) 2019-10-25
CN110377841B true CN110377841B (en) 2022-01-07

Family

ID=68249775

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910478934.9A Active CN110377841B (en) 2019-06-04 2019-06-04 Similarity calculation method and system applied to collaborative filtering method

Country Status (1)

Country Link
CN (1) CN110377841B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625704A (en) * 2020-05-11 2020-09-04 镇江纵陌阡横信息科技有限公司 Non-personalized recommendation algorithm model based on user intention and data cooperation
CN114969566B (en) * 2022-06-27 2023-03-24 中国测绘科学研究院 Distance-measuring government affair service item collaborative filtering recommendation method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108573411A (en) * 2018-04-17 2018-09-25 重庆理工大学 Depth sentiment analysis and multi-source based on user comment recommend the mixing of view fusion to recommend method
CN109408734A (en) * 2018-09-28 2019-03-01 嘉兴学院 A kind of collaborative filtering recommending method of fuse information Entropy conformability degree and dynamic trust
CN109783738A (en) * 2019-01-22 2019-05-21 东华大学 A kind of double extreme learning machine mixing collaborative filtering recommending methods based on more similarities

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160138052A (en) * 2014-02-24 2016-12-02 아마존 테크놀로지스, 인크. Method and system for improving size-based product recommendations using aggregated review data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108573411A (en) * 2018-04-17 2018-09-25 重庆理工大学 Depth sentiment analysis and multi-source based on user comment recommend the mixing of view fusion to recommend method
CN109408734A (en) * 2018-09-28 2019-03-01 嘉兴学院 A kind of collaborative filtering recommending method of fuse information Entropy conformability degree and dynamic trust
CN109783738A (en) * 2019-01-22 2019-05-21 东华大学 A kind of double extreme learning machine mixing collaborative filtering recommending methods based on more similarities

Also Published As

Publication number Publication date
CN110377841A (en) 2019-10-25

Similar Documents

Publication Publication Date Title
CN108256093B (en) Collaborative filtering recommendation algorithm based on multiple interests and interest changes of users
US10713317B2 (en) Conversational agent for search
CN109299994B (en) Recommendation method, device, equipment and readable storage medium
Jiang et al. Social contextual recommendation
US9864747B2 (en) Content recommendation device, recommended content search method, and program
Sun et al. Personalized clothing recommendation combining user social circle and fashion style consistency
CN110390046B (en) Collaborative filtering recommendation method and system
CN103425677B (en) Keyword classification model determines method, keyword classification method and device
CN108694647B (en) Method and device for mining merchant recommendation reason and electronic equipment
JP2016181196A (en) Information processing apparatus, information processing method, and program
KR20130079352A (en) Product synthesis from multiple sources
CN109241366B (en) Hybrid recommendation system and method based on multitask deep learning
CN111061962A (en) Recommendation method based on user score analysis
Xie et al. A hybrid semantic item model for recipe search by example
CN113127754A (en) Knowledge graph-based supplier recommendation method
CN110377841B (en) Similarity calculation method and system applied to collaborative filtering method
CN111949887A (en) Item recommendation method and device and computer-readable storage medium
JP2018077615A (en) Advertising image generation device, advertising image generation method and program for advertising image generation device
CN108920521A (en) User's portrait-item recommendation system and method based on pseudo- ontology
CN108241619A (en) A kind of recommendation method based on the more interest of user
CN106897419A (en) The study recommendation method that sorted to level of fusion social information
Wang et al. Research on product recommendation based on matrix factorization models fusing user reviews
CN111310046A (en) Object recommendation method and device
Tayal et al. Personalized ranking of products using aspect-based sentiment analysis and Plithogenic sets
JP2022035314A (en) Information processing unit and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant