CN109063120A - A kind of collaborative filtering recommending method and device based on cluster - Google Patents

A kind of collaborative filtering recommending method and device based on cluster Download PDF

Info

Publication number
CN109063120A
CN109063120A CN201810863191.2A CN201810863191A CN109063120A CN 109063120 A CN109063120 A CN 109063120A CN 201810863191 A CN201810863191 A CN 201810863191A CN 109063120 A CN109063120 A CN 109063120A
Authority
CN
China
Prior art keywords
article
user
target item
preset
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810863191.2A
Other languages
Chinese (zh)
Other versions
CN109063120B (en
Inventor
高志鹏
李博
杨杨
王颖
谭清
王茜
肖楷乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201810863191.2A priority Critical patent/CN109063120B/en
Publication of CN109063120A publication Critical patent/CN109063120A/en
Application granted granted Critical
Publication of CN109063120B publication Critical patent/CN109063120B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides collaborative filtering recommending methods and device based on cluster, comprising: obtains the label genome vector of the first article;First article, is divided into the cluster class of the first quantity by the label genome vector based on the first article;For each target item: when the target item and the second article belong to same cluster class, relative coefficient is calculated at a distance from the preset kind based on the target item between the second article;When the target item belongs to different cluster classes from the second article, relative coefficient is calculated based on the Poisson related coefficient of the target item and the second article;Relative coefficient of the target user to the preset scoring of the second article and the target item and the second article is weighted summation, target user is obtained and scores the prediction of the target item;The target item that prediction scoring is met to preset condition, recommends target user.It can be improved the objectivity of recommendation score using the embodiment of the present invention.

Description

A kind of collaborative filtering recommending method and device based on cluster
Technical field
The present invention relates to proposed algorithm technical field, more particularly to a kind of collaborative filtering recommending method based on cluster and Device.
Background technique
With the rapid development of internet technology, internet provides various massive informations for user, enrich and Facilitate the work and life of people.But meanwhile user obtains interested information also into the time-consuming expense of part from massive information The thing of power.For this purpose, producing proposed algorithm, proposed algorithm does not need user and provides specific demand, but passes through user's Historical behavior analyzes the interest and demand of user, to recommend to can satisfy the article of interest and demand for user.
Specifically, Collaborative Filtering Recommendation Algorithm is one of widely used proposed algorithm, Collaborative Filtering Recommendation Algorithm Processing step it is as follows:
The first step is concentrated from preset public data, obtains the historical record that user and article interact.Usually In the case of, public data collection can be obtained from the website for specialize in recommender system.What user and article interacted Historical record includes user-article rating matrix, user-article consumption matrix etc., wherein user-article rating matrix includes The scoring from the user that multiple articles and each article obtain.It should be noted that the scoring that each article obtains may From different users.User-article rating matrix is referred to as rating matrix below, and is said by taking rating matrix as an example It is bright.
For convenience of explanation, the object of article will be recommended to be known as target user, the article for recommending target user is known as Target item;Article in rating matrix is known as the first article.It should be understood that should be to the article that target user recommends The article that target user does not have used article namely target user not to provide scoring, then, target item should be the In one article, target user does not provide the article of scoring.
Second step, for each target item: firstly, the relative coefficient of the target item and the second article is calculated, second Article is the article in the first article in addition to all target items;Secondly, by target user in rating matrix to the second article Scoring and the relative coefficient of the target item and the second article be weighted summation, calculate target user to the mesh Mark the prediction scoring of article;
Specifically, Poisson related coefficient can be used as the relative coefficient between article, the meter of Poisson related coefficient It calculates shown in formula such as formula (1):
In formula (1), target item is article i;Second article is article j;ρijIt is the Poisson of article i and article j Related coefficient;Uij=Ui∩UjCommon user to score article i and article j gathers;U is common user's set Uij In user;ruiThe scoring obtained for article i;For the mean value of the obtained scoring of article i;rujThe scoring obtained for article j; For the mean value of the obtained scoring of article j.
Prediction scoring is met the target item of preset condition, recommends target user by third step.
Specifically, can arrange according to target user to the size of the prediction scoring of each target item target item Sequence;By the target item of the higher preset quantity of prediction scoring, target user is recommended;Prediction can also be scored above to scoring The target item of threshold value, recommends target user.
As it can be seen that the relative coefficient of target item and other articles all relies on object in above-mentioned Collaborative Filtering Recommendation Algorithm The scoring that product obtain, and scoring is provided by user's subjectivity, is easy to influence the objectivity of calculated prediction scoring.For example, Article A only obtains the lower assessment point that a user provides in the state of feeling blue, but the quality of actually article A is very good, This allow for according to the prediction of the calculated commodity A of above-mentioned Collaborative Filtering Recommendation Algorithm score it is lower, objectivity is poor.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of collaborative filtering recommending method and device based on cluster, to improve The objectivity of recommendation score.Specific technical solution is as follows:
The embodiment of the invention provides a kind of collaborative filtering recommending methods based on cluster, comprising:
From preset label genomic information matrix, the label genome vector of the first article, label genome are obtained Vector is used to describe the build-in attribute of the first article;
Using preset clustering algorithm, the first article is divided into default by the label genome vector based on the first article The cluster class of first quantity;
For each target item: when the target item and the second article belong to same cluster class, based on the target item Label genome vector is at a distance from the preset kind between the label genome vector of the second article, calculate the target item with The relative coefficient of second article, target item refer in the first article that target user does not provide the article of scoring, the second object Product refer in the first article, the article in addition to all target items;When the target item belongs to different cluster classes from the second article When, the Poisson related coefficient based on the target item and the second article calculates the correlation system of the target item with the second article Number;Relative coefficient of the target user to the preset scoring of the second article and the target item and the second article is carried out Weighted sum obtains target user and scores the prediction of the target item;
The target item that prediction scoring is met to preset condition, recommends target user.
The embodiment of the invention also provides a kind of collaborative filtering recommending device based on cluster, comprising:
First obtains module, for obtaining the label gene of the first article from preset label genomic information matrix Group vector, label genome vector are used to describe the build-in attribute of the first article;
Division module, for using preset clustering algorithm, the label genome vector based on the first article, by the first object Product are divided into the cluster class of default first quantity;
First computing module, for being directed to each target item: when the target item and the second article belong to same cluster class, Label genome vector based on the target item is at a distance from the preset kind between the label genome vector of the second article, The relative coefficient of the target item and the second article is calculated, target item refers in the first article that target user does not provide The article of scoring, the second article refer in the first article, the article in addition to all target items;When the target item and second When article belongs to different cluster classes, the Poisson related coefficient based on the target item and the second article calculates the target item and The relative coefficient of two articles;By target user to the preset scoring of the second article and the target item and the second article Relative coefficient be weighted summation, obtain target user and score the prediction of the target item;
Recommending module recommends target user for that will predict that scoring meets the target item of preset condition.
The embodiment of the present invention provides a kind of electronic equipment again, including processor, communication interface, memory and communication are always Line, wherein processor, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any of the above-described association based on cluster Same filtered recommendation method.
The embodiment of the present invention provides a kind of computer readable storage medium again, is stored in computer readable storage medium Instruction, when run on a computer, so that computer executes any of the above-described collaborative filtering recommending based on cluster Method.
The embodiment of the present invention provides a kind of computer program product comprising instruction again, when it runs on computers When, so that computer executes any of the above-described collaborative filtering recommending method based on cluster.
Collaborative filtering recommending method and device provided in an embodiment of the present invention based on cluster, firstly, from preset label The label genome vector of the first article is obtained in genomic information matrix, label genome vector is for describing the first article Build-in attribute;Then, using preset clustering algorithm, it is based on label genome vector, the first article is divided into default first The cluster class of quantity, so that belonging to the first article of same cluster class has similar build-in attribute;Next, being directed to each object Product: when the target item and the second article belong to same cluster class, the label genome vector and second based on the target item The distance of preset kind between the label genome vector of article, calculates correlation system of the target item with the second article Number, target item refer in the first article that target user does not provide the article of scoring, and the second article refers in the first article, Article in addition to all target items;When the target item belongs to different cluster classes from the second article, it is based on the target item With the Poisson related coefficient of the second article, the relative coefficient of the target item and the second article is calculated;Preset target is used Family is weighted summation to the relative coefficient of the scoring of the second article and the target item and the second article, obtains target User scores to the prediction of the target item;Finally, prediction scoring to be met to the target item of preset condition, target use is recommended Family.
In this way, according to label genome vector, after carrying out classification processing to the first article, for belonging to same cluster First article of class can calculate relative coefficient based on the distance of the preset kind between label genome vector.Due to Label genome vector is used to describe the build-in attribute of article, does not change with the subjective desire of user, has objectivity, So that calculated relative coefficient also has objectivity, predicts that the objectivity of scoring is also stronger obtained from, avoid the occurrence of Due to user subjective scoring and influence prediction scoring objectivity the problem of.
Certainly, implement any of the products of the present invention or method it is not absolutely required at the same reach all the above excellent Point.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow diagram of the collaborative filtering recommending method provided in an embodiment of the present invention based on cluster;
Fig. 2 is a kind of specific flow chart of step 103 in the embodiment of the present invention;
Fig. 3 is a kind of specific flow chart of sub-step 12 in the embodiment of the present invention;
Fig. 4 is a kind of specific flow chart of sub-step 13 in the embodiment of the present invention;
Fig. 5 is to determine the flow chart of parameter optimal value in the embodiment of the present invention;
Fig. 6 is a kind of specific flow chart of step 503 in the embodiment of the present invention;
Fig. 7 is the structural schematic diagram of the collaborative filtering recommending device based on cluster of the embodiment of the present invention;
Fig. 8 is the structural schematic diagram of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
The embodiment of the invention provides a kind of collaborative filtering recommending methods based on cluster, are the present invention referring to Fig. 1, Fig. 1 A kind of flow diagram for the collaborative filtering recommending method based on cluster that embodiment provides, may include steps of:
Step 101, from preset label genomic information matrix, the label genome vector of the first article is obtained.
Wherein, label genome vector is used to describe the build-in attribute of the first article.
In this step, the label genome of the first article can be obtained from preset label genomic information matrix Vector Genome classifies to the first article according to label genome vector with will pass through subsequent step;Wherein, label base Because group vector can be used for describing the build-in attribute of the first article, the first article refers to that include in label genome matrix owns Article, usually multiple articles.
It should be noted that label genomic information matrix is by manually acquiring or the method for machine learning is from each net It is got in the user comment stood, belongs to the contextual information of article.In practical application, system can also be recommended from specializing in Public data on the website of system, which is concentrated, to be obtained, and is obtained for example, can concentrate from the public data on the website movielens.Mark Label genomic information matrix includes the label genome vector of the first article, and label genome vector is for describing the first article Build-in attribute, the specially degree of the first article and each feature or the correlation of label, the fractional representation between use 0 to 1, Numerical value is bigger, represent that the weight of first article in the characteristic or label be higher namely the build-in attribute of first article with The characteristic or label are closer.
For example, when the first article be film, label be respectively " terror ", " emotion " and " making laughs ", then, the first article Label genome vector corresponds to [0.9,0.8,0.1], wherein 0.9 is first article and the correlation of label " terror " Degree, 0.8 is the degree of first article and the correlation of label " emotion ", and 0.1 is first article and label " making laughs " The degree of correlation.So, first article and label " terror " are closest, it is believed that are a horrow movies.
It is objective deposit it should be understood that due to the label in label genome vector or being characterized in the build-in attribute of article , it does not change with the subjective desire of user, therefore, label genome vector has objectivity.
Step 102, using preset clustering algorithm, the label genome vector based on the first article draws the first article It is divided into the cluster class of default first quantity.
In this step, preset clustering algorithm can be used, be divided into the first article based on label genome vector The cluster class of default first quantity, wherein preset clustering algorithm can be K mean value (K-means) clustering algorithm, and average drifting is poly- Class algorithm etc..
Due to the label in label genome vector or it is characterized in the build-in attribute of article, is objective reality, therefore, returns The first article for belonging to same cluster class has similar objective characteristics, to make for the first article for belonging to same cluster class Relative coefficient is calculated with based on label genome vector, so that calculated relative coefficient has stronger objectivity.
In a kind of implementation, preset clustering algorithm is K mean cluster algorithm, can be based on the label base of the first article Because of a group vector, the first article is divided into K cluster class, specific treatment process are as follows:
The first step, input label genomic information matrix G.
Second step, K cluster class central point of random initializtion, is expressed as μ1, μ2..., μk∈Rn
Wherein, RnIndicate that length is the vector space of n, n is the number of the feature or label in label genomic information matrix G Amount.
Third step calculates the genome vector g of the first article iiThe cluster class c belonged to(i)
c(i):=argmin | | gij||2 (2)
In formula (2), giFor the genome vector of the first article i, gi∈G;c(i)For giThe cluster class belonged to;μjFor cluster Class central point, j ∈ k;Formula (2) is meant that c(i)It is defined as argmin | | gij||2
4th step recalculates the cluster class central point μ of cluster class j for each cluster class jj
In formula (3), giFor the genome vector of the first article i, gi∈G;c(i)For giThe cluster class belonged to;μjFor cluster Class central point, j ∈ k;M is the total quantity of the first article, i ∈ m;Formula (3) is meant that μjIt is defined as
Third step and the 4th step are repeated, until convergence.It should be noted that being based on using K-means clustering algorithm The detailed process that first article is divided into K cluster class by the label genome vector of the first article can be with reference to the prior art, herein It repeats no more.
Step 103, for each target item: when the target item and the second article belong to same cluster class, being based on the mesh The label genome vector of article is marked at a distance from the preset kind between the label genome vector of the second article, calculates the mesh Mark the relative coefficient of article and the second article;When the target item belongs to different cluster classes from the second article, it is based on the target The Poisson related coefficient of article and the second article, calculates the relative coefficient of the target item and the second article;By preset mesh Mark user is weighted summation to the relative coefficient of the scoring of the second article and the target item and the second article, obtains Target user scores to the prediction of the target item.
Wherein, target item refers in the first article, and target user does not provide the article of scoring;Second article refers to Article in one article, in addition to all target items.
Since the first article is divided into multiple cluster classes by execution step 102, then, in this step, for each Target item is handled as follows, and is a kind of specific flow chart of step 103 in the embodiment of the present invention with reference to Fig. 2, Fig. 2:
Sub-step 11, when the target item and the second article belong to same cluster class, the label base based on the target item Because group vector is at a distance from the preset kind between the label genome vector of the second article, the target item and the second object are calculated The relative coefficient of product.
Wherein, the distance of preset kind can be for Euclidean distance, manhatton distance, mahalanobis distance etc., can be according to reality Situation determines.
In one implementation, when the target item and the second article belong to same cluster class, the target item is calculated Label genome vector and the second article label genome vector between Euclidean distance, by calculated Euclidean distance value Relative coefficient as the target item and the second article.
It should be noted that the method for calculating Euclidean distance can refer to the prior art, details are not described herein.
Due to the label in label genome vector or it is characterized in the build-in attribute of article, is objective reality, it is therefore, right In the first article for belonging to same cluster class, it is stronger objective to be had based on the calculated relative coefficient of label genome vector Property.
Sub-step 12 is based on the target item and the second object when the target item belongs to different cluster classes from the second article The Poisson related coefficient of product, calculates the relative coefficient of the target item and the second article.
It is a kind of specific flow chart of sub-step 12 in the embodiment of the present invention with reference to Fig. 3, Fig. 3 in a kind of implementation, Sub-step 12 may include:
Sub-step 121 calculates the Poisson related coefficient of the target item and the second article.
Specifically, the detailed process of sub-step 121 can be with reference to the formula (1) and related description in background technique, herein It repeats no more.
Sub-step 122, the quantity of the co-user based on the target item and the second article and calculated Poisson are closed Coefficient, generates the scaling Poisson related coefficient of the target item and the second article, and by the contracting of the target item and the second article Put relative coefficient of the Poisson related coefficient as the target item and the second article.
It, can also be from specializing in it should be noted that preset rating matrix is the same with label genomic information matrix Public data on the website of recommender system, which is concentrated, to be obtained, and is obtained for example, can concentrate from the public data on the website movielens It takes.Preset rating matrix includes the scoring from the user that multiple articles and each article obtain.But due to default Rating matrix there is sparsity, the scoring from the user that the large numbers of items in preset rating matrix obtains is less so that Co-user between article is considerably less.If Poisson related coefficient will lead to directly as the relative coefficient between article Relative coefficient can not correlation between real embodiment article, accuracy is poor.
Therefore, the embodiment of the present invention calculates relative coefficient based on more co-users, enables relative coefficient It is enough more accurate, can with the correlation between real embodiment article, specifically, the embodiment of the present invention is being based on Poisson related coefficient, The parameter of introducing co-user quantity, the Poisson related coefficient scaled, and using the Poisson related coefficient of scaling as article Between relative coefficient.
, can be according to formula (4) in a kind of specific implementation, the co-user based on the target item and the second article Quantity and calculated poisson formula number, generate the scaling Poisson related coefficient of the target item and the second article;
In formula (4), which is article i;Second article is article j;sijFor the scaling of article i and article j Poisson related coefficient;ρijFor the Poisson related coefficient of article i and article j;nijFor the number of article i and the co-user of article j Amount;λ1For the parameter of co-user quantity.
Specifically, the embodiment of the present invention is being based on Poisson related coefficient, the parameter lambda of co-user quantity is introduced1, increase Influence of the co-user quantity to relative coefficient compensates for the considerably less problem of co-user between article to correlation system Several adverse effects, enables correlation and standard between the calculated relative coefficient real embodiment article of the embodiment of the present invention True property is higher, so that the objectivity of calculated prediction scoring is stronger.
In practical applications, the parameter lambda of co-user quantity1Value can be 100.
Sub-step 13, by preset target user to the scoring of the second article and the target item and the second article Relative coefficient is weighted summation, obtains target user and scores the prediction of the target item.
In this step, due to having included scoring of the target user to the second article in preset rating matrix, Scoring of the preset target user to the second article can be obtained, and calculated by second step from preset rating matrix Relative coefficient be weighted summation, obtain target user and score the prediction of the target item.
Due in second step, according to classification results, for belonging to same cluster class, and belong between the article of different cluster classes Relative coefficient used different calculation methods so that the objectivity of calculated relative coefficient is relatively strong and accuracy compared with Height, therefore, the target user that this step obtains is also more preferable to the objectivity of the prediction scoring of the target item.
In a kind of implementation, according to formula shown in formula (5), preset target user comments the second article Point and the relative coefficient of the target item and the second article be weighted summation, obtain target user to the target item Prediction scoring;
In the formula shown in formula (5), which is article i;Second article is article j;Target user is to use Family u;It scores for user u the prediction of article i;rujIt scores for user u the prediction of article j;sijFor article i's and article j Scale Poisson related coefficient;ρijFor the Poisson related coefficient of article i and article j;εijFor article i label genome vector with The Euclidean distance value of the label genome vector of article j;κuIt is user u in addition to article i, provides the collection of the article of scoring It closes;pijFor adjustment factor;In the case that the first article is divided into k cluster class, the object of same cluster class is belonged to article i The set of product.
Specifically, in the formula shown in formula (5), according to the classification results of step 102, for the target item category Euclidean distance value between the second article of same cluster class, the label genome vector based on the target item and the second article, It calculates target user to score to the prediction of the target item, due to the label in label genome vector or is characterized in consolidating for article There is attribute, be objective reality, therefore, it is stronger objective to be had based on the calculated relative coefficient of label genome vector Property;And for the second article for belonging to different cluster classes from the target item, based on the contracting between the target item and the second article Poisson related coefficient is put, enables correlation between calculated relative coefficient real embodiment article and accuracy is higher; To sum up, thus the objectivity of calculated prediction scoring is stronger, being capable of evaluation of the actual response target user to the target item.
Step 104, the target item that prediction scoring is met to preset condition, recommends target user.
It, can be right according to the calculated target user of step 103 to the size of the prediction scoring of target item in this step Target item is ranked up;By the target item of the higher preset quantity of prediction scoring, target user is recommended;It can also will be pre- The target item for being scored above scoring threshold value is surveyed, target user is recommended, can specifically determine according to actual needs, herein no longer It repeats.
It is calculated prediction scoring objectivity it is stronger, can evaluation of the actual response target user to the target item, Thus the higher target item of prediction scoring filtered out, can preferably match the hobby of target user, the use feeling of user By more preferably.
As it can be seen that in the collaborative filtering recommending method based on cluster that the embodiment of the present invention proposes, it can be according to label Genome vector after carrying out classification processing to the first article, for belonging to the first article of same cluster class, is based on label base Relative coefficient is calculated because of the Euclidean distance value of group vector.Since label genome vector is used to describe the intrinsic category of article Property, do not change with the subjective desire of user, have objectivity so that calculated relative coefficient also have it is objective Property, it predicts that the objectivity of scoring is also stronger obtained from, avoids the occurrence of the subjective scoring due to user and influence prediction scoring Objectivity the problem of.
It is in the embodiment of the present invention with reference to Fig. 4, Fig. 4, one kind of sub-step 13 is specific in a kind of optional embodiment Flow chart can specifically include:
Sub-step 131, by preset target user to the scoring of the second article and the target item and the second article Relative coefficient is weighted summation.
Sub-step 132 is biased to adjusting parameter using personalizing parameters, user and article is biased to adjusting parameter, asks in weighting It is adjusted on the basis of the result of sum, obtains target user and score the prediction of the target item.
It should be noted that since preset target user to the scoring of the second article is obtained from preset rating matrix It takes, and the scoring in preset rating matrix may be influenced by a variety of deviation factors.Specifically, the factor of deviation may wrap It is biased to containing user and article is biased to, user is biased to refer to that the user having gets used to getting higher scoring to article, and article is biased to Refer to that the article having is easy to get higher scoring due to the influence of the extraneous factors such as advertising.Therefore, preset rating matrix In scoring possibly can not accurately reflect the hobby of user and the quality good or not of article.
Therefore, in embodiments of the present invention, can be biased in the result of sub-step 131 by personalizing parameters, user Adjusting parameter and article are biased to adjusting parameter and are adjusted, so that calculated target user scores to the prediction of the target item It is more accurate, it can really reflect the hobby of user and the quality good or not of article.
In a kind of implementation, it can be used formula (6), the scoring by preset target user to the second article, and The relative coefficient of the target item and the second article is weighted summation;Adjusting parameter is biased to using personalizing parameters, user And article is biased to adjusting parameter, is adjusted on the basis of the result of weighted sum, obtains target user to the target item Prediction scoring;
In the formula shown in formula (6), which is article i;Second article is article j;Target user is to use Family u;μ is the scoring mean value in preset rating matrix;buTo be biased to adjusting parameter for the user of user u;biFor for article The article of i is biased to adjusting parameter;αuFor the personalizing parameters for user u;It scores for user u the prediction of article i;rujFor User u scores to the prediction of article j;sijFor the scaling Poisson related coefficient of article i and article j;ρijFor article i's and article j Poisson related coefficient;εijFor the Euclidean distance value of the label genome vector of the label genome vector and article j of article i;κu It is user u in addition to article i, provides the set of the article of scoring;pijFor adjustment factor;For the first article is divided into k In the case where a cluster class, the set of the article of same cluster class is belonged to article i.
Specifically, formula shown in formula (6) is joined in preset rating matrix on the basis of formula (5) Score mean μ, user's deviation adjusting parameter bu, article be biased to adjusting parameter bi, also, in view of each user is by similar Article effect is different, introduces personalizing parameters α for each useru, so that calculated target user is to the target The prediction scoring of article is more accurate, can really reflect the hobby of user and the quality good or not of article.
More objective and accurate prediction scoring in order to obtain is calculating prediction scoring using the formula as shown in formula (6) Before, it can determine that the user in the formula is biased to adjusting parameter bu, article be biased to adjusting parameter biWith personalizing parameters αuIt is optimal Value, to obtain optimal formula (6).
It is to determine the flow chart of parameter optimal value in the embodiment of the present invention with reference to Fig. 5, Fig. 5.As shown in figure 5, determining second Predict that the user in the calculation formula of scoring is biased to adjusting parameter bu, article be biased to adjusting parameter biWith personalizing parameters αuMost The step of figure of merit, is as follows:
Step 501, from preset rating matrix, the sample set of preset quantity is obtained.
Wherein, each sample set respectively includes the scoring that user, article and the user provide the article.
In this step, the sample set of preset quantity can be obtained, according to present count from preset rating matrix The sample set of amount carries out minimum processing to loss function.
Due to including the obtained scoring from the user of multiple articles and each article in rating matrix, it can be with The scoring that user, article and the user provide the article is therefrom obtained, and using above-mentioned three as a sample set.Sample The quantity of this set can be determines according to actual conditions.
Step 502, for each sample set, using the user in the sample set as target user, and by the sample set Article in conjunction is as target item;It calculates the target user to score to the prediction of the target item, by calculated pre- assessment It is allocated as the corresponding prediction scoring of the sample set.
In this step, it can be handled as follows for every part of sample set in the sample set of preset quantity:
The first step, using the user in the sample set as target user, and using the article in the sample set as mesh Mark article.
Specifically, the user in the sample set is made for a sample set in the sample set of preset quantity It is predicted for target user using the article in the sample set as target item with being calculated according to target user and target item Scoring.
Second step calculates the target user and scores the prediction of the target item, and calculated prediction scoring is used as should The corresponding prediction scoring of sample set.
Specifically, using the formula as shown in formula (6), calculates the target user and scores the prediction of the target item, It is used as the corresponding prediction of the sample set to score calculated prediction scoring, to be scored according to prediction and really to be scored, Minimum processing is carried out to loss function value.
It should be noted that the existing scoring of the sample set Central Plains sheet is true scoring of the user to article, and count What is calculated is that user scores to the prediction of article.Here prediction scoring is calculated for Optimal Parameters, reduces true scoring as far as possible With the gap of prediction scoring, the calculated prediction scoring of formula shown in formula (6) is made to be more nearly true scoring.It can manage Solution, in practical applications, when by calculating prediction scoring come to target user's recommendation target item, since target item should Be that target user does not have therefore article that is used, or not buying will not provide the object of scoring to target user Product calculate prediction scoring again.
Step 503, based on each sample set and the corresponding prediction scoring of each sample set, to preset loss function Minimum processing is carried out, shown in preset loss function such as formula (7);
In the preset loss function as shown in formula (7), κ is the sample set of preset quantity;Article i is sample set Article in conjunction;User u is the user in sample set;buTo be biased to adjusting parameter for the user of user u;biFor for object The article of product i is biased to adjusting parameter;αuFor the personalizing parameters for user u;It is calculated user u to the pre- of article i Assessment point;ruiFor obtained in the preset rating matrix, scoring of the user u to article i;λ2For preset regularization parameter.
It in this step, can be according to calculated each sample set and the corresponding pre- assessment of each sample set Point, minimum processing is carried out to the loss function as shown in formula (7), with the loss function minimized.
It is a kind of specific flow chart of step 503 in the embodiment of the present invention, packet with reference to Fig. 6, Fig. 6 in a kind of implementation It includes:
Personalizing parameters α is arranged for each sample set in sub-step 5031u, user be biased to adjusting parameter buAnd article It is biased to adjusting parameter biInitial value, and initialize step-length γ and initialization the number of iterations.
Specifically, specifically, personalizing parameters αu, user be biased to adjusting parameter buAnd article is biased to adjusting parameter bi's Initial value can be set to random (0,1), 0 and 0 respectively;The initial value of step-length γ can be 0.04;The number of iterations it is initial Value can be 0.
Sub-step 5032, setting user's set U and article set I is null set.
Specifically, it is that null set will be located with will pass through execution subsequent step that user's set U and article set I, which is respectively set, The user in sample set and article managed is separately added into user's set U and article set I.
Sub-step 5033, it is random to obtain a sample set from the sample set κ of preset quantity.
Wherein, a sample set includes user u, article i, and obtained in the preset rating matrix, and u pairs of user The scoring r of article iuiAnd calculated user u scores to the prediction of article i
Specifically, a sample set used in from sub-step 5034 to sub-step 5037 is the sample set from preset quantity It is got at random in conjunction κ;Wherein, a sample set includes user u, article i, and is obtained from preset rating matrix , true scoring r of the user u to article iui, and the prediction scoring according to the calculated user u of formula (6) to article i To calculate loss function value according to prediction scoring and true scoring.
Sub-step 5034 judges in user's set U, if in the user u comprising the sample set and article set I Whether include the sample set article i;Sub-step 5033 is executed if it has, then returning, if it has not, then executing sub-step 5035。
Specifically, whether a sample set for judging that sub-step 5033 selects has processed, it is possible to understand that, User and article in processed sample set should be existing in user's set U and article set I.For having handled The sample set crossed no longer is handled in current iteration, is directly skipped, and randomly chooses one again by executing sub-step 5033 Part sample set.
This is because since the quantity of the sample set in the sample set κ of preset quantity is more, using traditional random The problem that gradient descent algorithm can bring time complexity excessive.In order to make the minimum processing to loss function can be reasonable It is completed in time, in the minimum processing to loss function, the embodiment of the present invention uses improved stochastic gradient descent algorithm. That is, only handling once in each iteration each sample set in the sample set κ of preset quantity.If The sample set currently chosen has been processed in current iteration, then skips the sample set.
In this way, can significantly be dropped in the minimum processing to loss function using improved stochastic gradient descent algorithm Low time complexity so that the minimum processing to loss function can be completed within reasonable time, while can effectively be prevented Only over-fitting.
Sub-step 5035, according to the scoring r in the sample setuiIt scores with the predictionBetween difference eui, Update personalizing parameters αu, user be biased to adjusting parameter buAnd article is biased to adjusting parameter bi, and by the use in the sample set Family u is put into user set U, and the article i in the sample set is put into article set I.
In this step, it in user's set U, is not wrapped in user u and article set I not comprising the sample set When article i containing the sample set, which can be handled.
Specifically, firstly, calculating the scoring r in the sample setuiIt scores with the predictionBetween difference eui, as shown in formula (8);
In formula (8),It scores for calculated user u the prediction of article i;ruiFor from preset rating matrix It obtains, scoring of the user u to article i;euiFor the scoring ruiIt scores with the predictionBetween difference.
Then, according to the following formula (9) to (11), personalizing parameters α is updated respectivelyu, user be biased to adjusting parameter buAnd Article is biased to adjusting parameter bi
In formula (9), γ is step-length;euiFor the scoring ruiIt scores with the predictionBetween difference;rujFor with Family u scores to the prediction of article j;sijFor the scaling Poisson related coefficient of article i and article j;ρijFor the pool of article i and article j Loose related coefficient;εijFor the Euclidean distance value of the label genome vector of the label genome vector and article j of article i;αuFor For the personalizing parameters of user u;λ2For preset regularization parameter;κuIt is user u in addition to article i, provides the object of scoring The set of product.
In specific implementation, personalizing parameters α is directed to shown in formula (9)uMore new formula be formula (6) to individual character Change parameter alphauDerivation, detailed treatment process can refer to the prior art, and details are not described herein.
bu←bu+γ·(eui2·bu) (10)
In formula (10), buTo be biased to adjusting parameter for the user of user u;γ is step-length;euiFor the scoring rui It scores with the predictionBetween difference;λ2For preset regularization parameter.
In specific implementation, adjusting parameter b is biased to for user shown in formula (10)uMore new formula be formula (6) adjusting parameter b is biased to useruDerivation, detailed treatment process can refer to the prior art, and details are not described herein.
bi←bi+γ·(eui2·bi) (11)
In formula (11), biTo be biased to adjusting parameter for the article of article i;γ is step-length;euiFor the scoring rui It scores with the predictionBetween difference;λ2For preset regularization parameter.
In specific implementation, adjusting parameter b is biased to for article shown in formula (11)iMore new formula be formula (6) adjusting parameter b is biased to articleiDerivation, detailed treatment process can refer to the prior art, and details are not described herein.Wherein, The arrow of direction of the formula (9) into formula (11) to the left refers to, with the value of the expression formula on the right of arrow, replaces the arrow left side The value of parameter.
Sub-step 5036 reduces step-length γ, returns and executes sub-step 5033;
Sub-step 5037, when all sample sets in the sample set κ for having traversed preset quantity, the number of iterations adds One, execute sub-step 5038.
Specifically, illustrating that current iteration is handled when all sample sets in the sample set κ for having traversed preset quantity It completes, the number of iterations can be added one;Also, it is prevented in next iteration processing using smaller step-length with guaranteeing to restrain It vibrates.
Sub-step 5038 judges whether the number of iterations is more than preset the number of iterations threshold value;If it is, executing step 5039, if it has not, then executing sub-step 5032.
Specifically, being completed by executing sub-step 5038 to pre- when the number of iterations is more than preset the number of iterations threshold value If loss function minimum processing;When the number of iterations is not above preset the number of iterations threshold value, execution can be returned Sub-step 5032, continues next iteration, until the number of iterations is more than preset the number of iterations threshold value.
Sub-step 5039 is completed to handle the minimum of preset loss function.
Specifically, illustrating the minimum to preset loss function when the number of iterations is more than preset the number of iterations threshold value Change processing has been completed.
Step 504, from the loss function of minimum, determine that personalizing parameters, user are biased to adjusting parameter and article is inclined To the optimal value of adjusting parameter.
In this step, the minimum loss function obtained after being handled according to minimum determines that personalizing parameters, user are inclined The optimal value of adjusting parameter is biased to adjusting parameter and article, and personalizing parameters, user's deviation adjusting parameter and article is inclined To the optimal value of adjusting parameter, bring into the calculation formula of the second prediction scoring as shown in figure formula (6), after being optimized The calculation formula of second prediction scoring.
As it can be seen that in embodiments of the present invention, second can be obtained by carrying out minimum processing to preset loss function Predict that the user in the calculation formula of scoring is biased to adjusting parameter bu, article be biased to adjusting parameter biWith personalizing parameters αuMost The figure of merit, and then the calculation formula of the second prediction scoring after being optimized, so that according to the meter of the second prediction scoring after optimization It is more accurate and objective to calculate the calculated prediction scoring of formula.
The embodiment of the present invention separately provides a kind of collaborative filtering recommending device based on cluster, is this hair with reference to Fig. 7, Fig. 7 The structural schematic diagram of the collaborative filtering recommending device based on cluster of bright embodiment, device include:
First obtains module 701, for obtaining the label base of the first article from preset label genomic information matrix Because of a group vector, label genome vector is used to describe the build-in attribute of the first article;
Division module 702, for using preset clustering algorithm, the label genome vector based on the first article, by the One article is divided into the cluster class of default first quantity;
First computing module 703, for being directed to each target item: when the target item and the second article belong to same cluster class When, preset kind between the label genome vector of label genome vector and the second article based on the target item away from From calculating the relative coefficient of the target item and the second article, target item refers in the first article that target user does not give The article to score out, the second article refer in the first article, the article in addition to all target items;When the target item and When two articles belong to different cluster classes, the Poisson related coefficient based on the target item and the second article, calculate the target item with The relative coefficient of second article;By target user to the preset scoring of the second article and the target item and the second object The relative coefficient of product is weighted summation, obtains target user and scores the prediction of the target item;
Recommending module 704 recommends target user for that will predict that scoring meets the target item of preset condition.
Optionally, the first computing module 703, is specifically used for
Calculate the Poisson related coefficient of the target item and the second article;
The quantity of co-user based on the target item and the second article and calculated poisson formula number generate The scaling Poisson related coefficient of the target item and the second article, and the target item is related to the scaling Poisson of the second article Relative coefficient of the coefficient as the target item and the second article.
Optionally, the first computing module 703, is specifically used for
According to the following formula, the quantity of the co-user based on the target item and the second article and calculated pool Loose relationship number generates the scaling Poisson related coefficient of the target item and the second article;
In formula, which is article i;Second article is article j;sijFor the scaling Poisson of article i and article j Related coefficient;ρijFor the Poisson related coefficient of article i and article j;nijFor the quantity of article i and the co-user of article j;λ1For The parameter of co-user quantity.
Optionally, the distance of the preset kind includes Euclidean distance;
First computing module 703, is specifically used for
According to the following formula, by preset target user to the scoring of the second article and the target item and the second object The relative coefficient of product is weighted summation, obtains target user and scores the prediction of the target item;
In formula, which is article i;Second article is article j;Target user is user u;For user u Prediction scoring to article i;rujIt scores for user u the prediction of article j;sijIt is related to the scaling Poisson of article j for article i Coefficient;ρijFor the Poisson related coefficient of article i and article j;εijFor the label genome vector of article i and the label base of article j Because of the Euclidean distance value of group vector;κuIt is user u in addition to article i, provides the set of the article of scoring;pijTo adjust system Number;In the case that the first article is divided into k cluster class, the set of the article of same cluster class is belonged to article i.
Optionally, the first computing module 703, is specifically used for
By preset target user to the relative coefficient of the scoring of the second article and the target item and the second article It is weighted summation;
Adjusting parameter and article are biased to using personalizing parameters, user are biased to adjusting parameter, in the result of weighted sum On the basis of be adjusted, obtain target user and score the prediction of the target item.
Optionally, the first computing module 703, is specifically used for
Using following formula, by preset target user to the scoring of the second article and the target item and the second object The relative coefficient of product is weighted summation;Adjusting parameter is biased to using personalizing parameters, user and article is biased to adjusting parameter, It is adjusted on the basis of the result of weighted sum, obtains target user and score the prediction of the target item;
In formula, which is article i;Second article is article j;Target user is user u;μ is preset Scoring mean value in rating matrix;buTo be biased to adjusting parameter for the user of user u;biTo be biased to adjust for the article of article i Whole parameter;αuFor the personalizing parameters for user u;It scores for user u the prediction of article i;rujIt is user u to article j Prediction scoring;sijFor the scaling Poisson related coefficient of article i and article j;ρijFor the Poisson phase relation of article i and article j Number;εijFor the Euclidean distance value of the label genome vector of the label genome vector and article j of article i;κuIt is user u to removing Other than article i, the set of the article of scoring is provided;pijFor adjustment factor;For the feelings that the first article is divided into k cluster class Under condition, the set of the article of same cluster class is belonged to article i.
Optionally, device further include:
Second acquisition unit, for obtaining the sample set of preset quantity, each sample set from preset rating matrix Respectively include the scoring that user, article and the user provide the article;
Second computing module, for being directed to each sample set, using the user in the sample set as target user, and will Article in the sample set is as target item;Using the calculation formula of the second prediction scoring, the target user is calculated to this The prediction of target item is scored, and calculated prediction scoring is scored as the corresponding prediction of the sample set;
Module is minimized, for scoring based on each sample set and the corresponding prediction of each sample set, to preset damage It loses function and carries out minimum processing, the loss function minimized, shown in the preset following formula of loss function;
In preset loss function, κ is the sample set of preset quantity;Article i is the article in each sample set;With Family u is the user in each sample set;buTo be biased to adjusting parameter for the user of user u;biIt is inclined for the article for article i To adjusting parameter;αuFor the personalizing parameters for user u;It scores for calculated user u the prediction of article i;ruiFor Obtained in preset rating matrix, scoring of the user u to article i;λ2For preset regularization parameter;
Determining module, for from the loss function of minimum, determining that personalizing parameters, user are biased to adjusting parameter and object The optimal value of product deviation adjusting parameter.
Optionally, module is minimized, is specifically used for
For each sample set, personalizing parameters α is setu, user be biased to adjusting parameter buAnd article is biased to adjusting parameter biInitial value, and initialize step-length γ and the number of iterations;
It is null set that user's set U and article set I, which is arranged,;
From the sample set κ of preset quantity, a sample set is obtained, a sample set includes user u, article i, And obtained in the preset rating matrix, scoring r of the user u to article iuiAnd calculated user u is to the pre- of article i Assessment point
Judge in user's set U, if whether include the sample in the user u comprising the sample set and article set I The article i of this set;
If it has, then the step of returning described in execution from the sample set κ of preset quantity, obtaining a sample set;
If it has not, then according to the scoring r in the sample setuiIt scores with the predictionBetween difference eui, more New personalizing parameters αu, user be biased to adjusting parameter buAnd article is biased to adjusting parameter bi, and by the user in the sample set U is put into user set U, and the article i in the sample set is put into article set I;
Reduce step-length γ, returns and execute described from the sample set κ of preset quantity, the step of a sample set of acquisition Suddenly;
When all sample sets in the sample set κ for having traversed preset quantity, the number of iterations adds one;
Judge whether the number of iterations is more than preset the number of iterations threshold value;If it has, then completing to preset loss function Minimum processing, execute the step of setting user's set U and article set I is null set if it has not, returning.
As it can be seen that in the collaborative filtering recommending device based on cluster that the embodiment of the present invention proposes, it can be according to label Genome vector after carrying out classification processing to the first article, for belonging to the first article of same cluster class, is based on label base Relative coefficient is calculated because of the Euclidean distance value of group vector.Since label genome vector is used to describe the intrinsic category of article Property, do not change with the subjective desire of user, have objectivity so that calculated relative coefficient also have it is objective Property, it predicts that the objectivity of scoring is also stronger obtained from, avoids the occurrence of the subjective scoring due to user and influence prediction scoring Objectivity the problem of.
The embodiment of the present invention provides a kind of electronic equipment again, and with reference to Fig. 8, Fig. 8 is electronics provided in an embodiment of the present invention The structural schematic diagram of equipment.As shown in figure 8, including processor 81, communication interface 82, memory 83 and communication bus 84, wherein Processor 81, communication interface 82, memory 93 complete mutual communication by communication bus 84,
Memory 83, for storing computer program;
Processor 81 when for executing the program stored on memory 83, realizes following steps:
From preset label genomic information matrix, the label genome vector of the first article, label genome are obtained Vector is used to describe the build-in attribute of the first article;
Using preset clustering algorithm, the first article is divided into default by the label genome vector based on the first article The cluster class of first quantity;
For each target item: when the target item and the second article belong to same cluster class, based on the target item Label genome vector is at a distance from the preset kind between the label genome vector of the second article, calculate the target item with The relative coefficient of second article, target item refer in the first article that target user does not provide the article of scoring, the second object Product refer in the first article, the article in addition to all target items;When the target item belongs to different cluster classes from the second article When, the Poisson related coefficient based on the target item and the second article calculates the correlation system of the target item with the second article Number;Relative coefficient of the target user to the preset scoring of the second article and the target item and the second article is carried out Weighted sum obtains target user and scores the prediction of the target item;
The target item that prediction scoring is met to preset condition, recommends target user.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just Yu Wei is only, but is not only a bus or a type of bus with a thick line in figure.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components.
Method provided in an embodiment of the present invention can be applied to electronic equipment.Specifically, the electronic equipment can be with are as follows: desk-top Computer, portable computer, intelligent mobile terminal, server etc..It is not limited thereto, it is any that electricity of the invention may be implemented Sub- equipment, all belongs to the scope of protection of the present invention.
The embodiment of the present invention provides a kind of computer readable storage medium again, is stored with computer in the storage medium The step of program, the computer program realizes the above-mentioned collaborative filtering recommending method based on cluster when being executed by processor.
The embodiment of the present invention provides a kind of computer program product comprising instruction again, when it runs on computers When, so that the step of computer executes the above-mentioned collaborative filtering recommending method based on cluster.
The embodiment of the present invention provides a kind of computer program again, when run on a computer, so that computer is held The step of row above-mentioned collaborative filtering recommending method based on cluster.
For device/electronic equipment/storage medium/computer program product/computer program embodiments comprising instruction For, since it is substantially similar to the method embodiment, so being described relatively simple, referring to the portion of embodiment of the method in place of correlation It defends oneself bright.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device/ For electronic equipment/storage medium/computer program product/computer program embodiments comprising instruction, due to its basic phase It is similar to embodiment of the method, so being described relatively simple, the relevent part can refer to the partial explaination of embodiments of method.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (10)

1. a kind of collaborative filtering recommending method based on cluster characterized by comprising
From preset label genomic information matrix, the label genome vector of the first article, label genome vector are obtained For describing the build-in attribute of the first article;
Using preset clustering algorithm, the first article is divided into default first by the label genome vector based on the first article The cluster class of quantity;
For each target item: when the target item and the second article belong to same cluster class, the label based on the target item Genome vector calculates the target item and second at a distance from the preset kind between the label genome vector of the second article The relative coefficient of article, target item refer in the first article that target user does not provide the article of scoring, and the second article is Refer in the first article, the article in addition to all target items;When the target item belongs to different cluster classes from the second article, base In the Poisson related coefficient of the target item and the second article, the relative coefficient of the target item and the second article is calculated;It will Target user is weighted the relative coefficient of the preset scoring of the second article and the target item and the second article and asks With obtain target user and score the prediction of the target item;
The target item that prediction scoring is met to preset condition, recommends target user.
2. the method according to claim 1, wherein the Poisson phase based on the target item with the second article Relationship number, the step of calculating the relative coefficient of the target item and the second article, comprising:
Calculate the Poisson related coefficient of the target item and the second article;
The quantity of co-user based on the target item and the second article and calculated poisson formula number, generate the mesh The scaling Poisson related coefficient of article and the second article is marked, and by the scaling Poisson related coefficient of the target item and the second article Relative coefficient as the target item and the second article.
3. according to the method described in claim 2, it is characterized in that, the common use based on the target item and the second article The quantity and poisson formula number at family, the step of generating the scaling Poisson related coefficient of the target item and the second article, packet It includes:
According to the following formula, the quantity of the co-user based on the target item and the second article and calculated Poisson are closed Coefficient generates the scaling Poisson related coefficient of the target item and the second article;
In formula, which is article i;Second article is article j;sijIt is related to the scaling Poisson of article j for article i Coefficient;ρijFor the Poisson related coefficient of article i and article j;nijFor the quantity of article i and the co-user of article j;λ1It is common The parameter of number of users.
4. according to the method described in claim 3, it is characterized in that, the distance of the preset kind includes Euclidean distance;
It is described by preset target user to the relative coefficient of the scoring of the second article and the target item and the second article It is weighted summation, obtains the step of target user scores to the prediction of the target item, comprising:
According to the following formula, by preset target user to the scoring of the second article and the target item and the second article Relative coefficient is weighted summation, obtains target user and scores the prediction of the target item;
In formula, which is article i;Second article is article j;Target user is user u;It is user u to article The prediction of i is scored;rujIt scores for user u the prediction of article j;sijFor the scaling Poisson related coefficient of article i and article j;ρij For the Poisson related coefficient of article i and article j;εijFor the label genome vector of article i and the label genome vector of article j Euclidean distance value;κuIt is user u in addition to article i, provides the set of the article of scoring;pijFor adjustment factor;To incite somebody to action In the case that first article is divided into k cluster class, the set of the article of same cluster class is belonged to article i.
5. according to the method described in claim 4, it is characterized in that, described comment the second article preset target user Point and the relative coefficient of the target item and the second article be weighted summation, obtain target user to the target item Prediction scoring the step of, comprising:
Relative coefficient of the preset target user to the scoring of the second article and the target item and the second article is carried out Weighted sum;
Adjusting parameter is biased to using personalizing parameters, user and article is biased to adjusting parameter, on the basis of the result of weighted sum On be adjusted, obtain target user and score the prediction of the target item.
6. according to the method described in claim 5, it is characterized in that, described comment the second article preset target user Point and the relative coefficient of the target item and the second article be weighted summation;It is biased to adjust using personalizing parameters, user Whole parameter and article are biased to adjusting parameter, are adjusted on the basis of the result of weighted sum, obtain target user to the mesh The step of marking the prediction scoring of article, comprising:
Using following formula, by preset target user to the scoring of the second article and the target item and the second article Relative coefficient is weighted summation;Adjusting parameter is biased to using personalizing parameters, user and article is biased to adjusting parameter, is being added It is adjusted on the basis of the result of power summation, obtains target user and score the prediction of the target item;
In formula, which is article i;Second article is article j;Target user is user u;μ is preset scoring Scoring mean value in matrix;buTo be biased to adjusting parameter for the user of user u;biTo be biased to adjustment ginseng for the article of article i Number;αuFor the personalizing parameters for user u;It scores for user u the prediction of article i;rujIt is user u to the pre- of article j Assessment point;sijFor the scaling Poisson related coefficient of article i and article j;ρijFor the Poisson related coefficient of article i and article j;εij For the Euclidean distance value of the label genome vector of the label genome vector and article j of article i;κuIt is user u to except article i In addition, the set of the article of scoring is provided;pijFor adjustment factor;In the case that the first article is divided into k cluster class, Belong to the set of the article of same cluster class with article i.
7. according to the method described in claim 6, it is characterized in that, public in the calculating using the second following prediction scoring Formula carries out relative coefficient of the preset target user to the scoring of the second article and the target item and the second article Before the step of weighted sum, the method also includes:
From preset rating matrix, obtain the sample set of preset quantity, each sample set respectively include user, article and The scoring that the user provides the article;
For each sample set, using the user in the sample set as target user, and the article in the sample set is made For target item;It calculates the target user to score to the prediction of the target item, regard calculated prediction scoring as the sample Gather corresponding prediction scoring;
Based on each sample set and the corresponding prediction scoring of each sample set, preset loss function is carried out at minimum Reason, the loss function minimized, shown in the preset following formula of loss function;
In preset loss function, κ is the sample set of preset quantity;Article i is the article in each sample set;User u For the user in each sample set;buTo be biased to adjusting parameter for the user of user u;biTo be biased to adjust for the article of article i Whole parameter;αuFor the personalizing parameters for user u;It scores for calculated user u the prediction of article i;ruiFor from pre- If rating matrix obtained in, scoring of the user u to article i;λ2For preset regularization parameter;
From the loss function of minimum, determine that personalizing parameters, user are biased to adjusting parameter and article is biased to adjusting parameter Optimal value.
8. the method according to the description of claim 7 is characterized in that described be based on each sample set and each sample set pair The step of prediction scoring answered, carries out minimum processing to preset loss function, the loss function minimized, comprising:
For each sample set, personalizing parameters α is setu, user be biased to adjusting parameter buAnd article is biased to adjusting parameter bi's Initial value, and initialize step-length γ and the number of iterations;
It is null set that user's set U and article set I, which is arranged,;
From the sample set κ of preset quantity, a sample set is obtained, a sample set includes user u, article i, and Obtained in preset rating matrix, scoring r of the user u to article iuiAnd pre- assessment of the calculated user u to article i Point
Judge in user's set U, if whether include the sample set in the user u comprising the sample set and article set I The article i of conjunction;
If it has, then the step of returning described in execution from the sample set κ of preset quantity, obtaining a sample set;
If it has not, then according to the scoring r in the sample setuiIt scores with the predictionBetween difference eui, update a Property parameter alphau, user be biased to adjusting parameter buAnd article is biased to adjusting parameter bi, and the user u in the sample set is put Access customer set U, and the article i in the sample set is put into article set I;
The step of reducing step-length γ, returning described in executing from the sample set κ of preset quantity, obtain a sample set;
When all sample sets in the sample set κ for having traversed preset quantity, the number of iterations adds one;
Judge whether the number of iterations is more than preset the number of iterations threshold value;If it has, then completing to preset loss function most Smallization processing executes the step of setting user's set U and article set I is null set if it has not, returning.
9. a kind of collaborative filtering recommending device based on cluster characterized by comprising
First obtains module, for from preset label genomic information matrix, obtain the label genome of the first article to Amount, label genome vector are used to describe the build-in attribute of the first article;
Division module, for using preset clustering algorithm, the label genome vector based on the first article draws the first article It is divided into the cluster class of default first quantity;
First computing module, for being directed to each target item: when the target item and the second article belong to same cluster class, being based on The label genome vector of the target item calculates at a distance from the preset kind between the label genome vector of the second article The relative coefficient of the target item and the second article, target item refer in the first article that target user does not provide scoring Article, the second article refers in the first article, the article in addition to all target items;When the target item and the second article When belonging to different cluster classes, the Poisson related coefficient based on the target item and the second article calculates the target item and the second object The relative coefficient of product;By target user to the preset scoring of the second article and the phase of the target item and the second article It closes property coefficient and is weighted summation, obtain target user and score the prediction of the target item;
Recommending module recommends target user for that will predict that scoring meets the target item of preset condition.
10. device according to claim 9, which is characterized in that
First computing module, is specifically used for
Calculate the Poisson related coefficient of the target item and the second article;
The quantity of co-user based on the target item and the second article and calculated poisson formula number, generate the mesh The scaling Poisson related coefficient of article and the second article is marked, and by the scaling Poisson related coefficient of the target item and the second article Relative coefficient as the target item and the second article.
CN201810863191.2A 2018-08-01 2018-08-01 Collaborative filtering recommendation method and device based on clustering Active CN109063120B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810863191.2A CN109063120B (en) 2018-08-01 2018-08-01 Collaborative filtering recommendation method and device based on clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810863191.2A CN109063120B (en) 2018-08-01 2018-08-01 Collaborative filtering recommendation method and device based on clustering

Publications (2)

Publication Number Publication Date
CN109063120A true CN109063120A (en) 2018-12-21
CN109063120B CN109063120B (en) 2021-05-28

Family

ID=64832235

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810863191.2A Active CN109063120B (en) 2018-08-01 2018-08-01 Collaborative filtering recommendation method and device based on clustering

Country Status (1)

Country Link
CN (1) CN109063120B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084477A (en) * 2019-03-26 2019-08-02 兰雨晴 A kind of cloud data center operating status appraisal procedure
CN112765458A (en) * 2021-01-07 2021-05-07 同济大学 Mixed recommendation method based on metric decomposition and label self-adaptive weight distribution
CN112990444A (en) * 2021-05-13 2021-06-18 电子科技大学 Hybrid neural network training method, system, equipment and storage medium
CN117455573A (en) * 2023-10-26 2024-01-26 深圳市维卓数字营销有限公司 Internet data analysis method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090083258A1 (en) * 2007-09-26 2009-03-26 At&T Labs, Inc. Methods and Apparatus for Improved Neighborhood Based Analysis in Ratings Estimation
CN102841929A (en) * 2012-07-19 2012-12-26 南京邮电大学 Recommending method integrating user and project rating and characteristic factors
CN103412948A (en) * 2013-08-27 2013-11-27 北京交通大学 Cluster-based collaborative filtering commodity recommendation method and system
CN104298787A (en) * 2014-11-13 2015-01-21 吴健 Individual recommendation method and device based on fusion strategy
CN106095974A (en) * 2016-06-20 2016-11-09 上海理工大学 Commending system score in predicting based on network structure similarity and proposed algorithm
CN108198045A (en) * 2018-01-30 2018-06-22 东华大学 The design method of mixing commending system based on e-commerce website data mining

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090083258A1 (en) * 2007-09-26 2009-03-26 At&T Labs, Inc. Methods and Apparatus for Improved Neighborhood Based Analysis in Ratings Estimation
CN102841929A (en) * 2012-07-19 2012-12-26 南京邮电大学 Recommending method integrating user and project rating and characteristic factors
CN103412948A (en) * 2013-08-27 2013-11-27 北京交通大学 Cluster-based collaborative filtering commodity recommendation method and system
CN104298787A (en) * 2014-11-13 2015-01-21 吴健 Individual recommendation method and device based on fusion strategy
CN106095974A (en) * 2016-06-20 2016-11-09 上海理工大学 Commending system score in predicting based on network structure similarity and proposed algorithm
CN108198045A (en) * 2018-01-30 2018-06-22 东华大学 The design method of mixing commending system based on e-commerce website data mining

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
彭玉 等: "基于属性相似性的Item-based协同过滤算法", 《计算机工程与应用》 *
袁利: "基于聚类的协同过滤个性化推荐算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084477A (en) * 2019-03-26 2019-08-02 兰雨晴 A kind of cloud data center operating status appraisal procedure
CN112765458A (en) * 2021-01-07 2021-05-07 同济大学 Mixed recommendation method based on metric decomposition and label self-adaptive weight distribution
CN112765458B (en) * 2021-01-07 2022-10-14 同济大学 Mixed recommendation method based on metric decomposition and label self-adaptive weight distribution
CN112990444A (en) * 2021-05-13 2021-06-18 电子科技大学 Hybrid neural network training method, system, equipment and storage medium
CN112990444B (en) * 2021-05-13 2021-09-24 电子科技大学 Hybrid neural network training method, system, equipment and storage medium
CN117455573A (en) * 2023-10-26 2024-01-26 深圳市维卓数字营销有限公司 Internet data analysis method and system
CN117455573B (en) * 2023-10-26 2024-08-09 深圳市维卓数字营销有限公司 Internet data analysis method and system

Also Published As

Publication number Publication date
CN109063120B (en) 2021-05-28

Similar Documents

Publication Publication Date Title
CN107451894B (en) Data processing method, device and computer readable storage medium
CN107330115B (en) Information recommendation method and device
EP4181026A1 (en) Recommendation model training method and apparatus, recommendation method and apparatus, and computer-readable medium
CN109063120A (en) A kind of collaborative filtering recommending method and device based on cluster
CN108509466A (en) A kind of information recommendation method and device
CN110647696B (en) Business object sorting method and device
CN111444395A (en) Method, system and equipment for obtaining relation expression between entities and advertisement recalling system
CN104899246B (en) Collaborative filtering recommending method based on blurring mechanism user scoring neighborhood information
CN111815415A (en) Commodity recommendation method, system and equipment
CN110008397B (en) Recommendation model training method and device
CN110163647A (en) A kind of data processing method and device
CN102426686A (en) Internet information product recommending method based on matrix decomposition
CN110020128A (en) A kind of search result ordering method and device
CN107239993A (en) A kind of matrix decomposition recommendation method and system based on expansion label
CN109087162A (en) Data processing method, system, medium and calculating equipment
CN105468628B (en) A kind of sort method and device
CN107402961B (en) Recommendation method and device and electronic equipment
CN107436914A (en) Recommend method and device
CN111161009B (en) Information pushing method, device, computer equipment and storage medium
Aouad et al. The exponomial choice model for assortment optimization: An alternative to the MNL model?
Bhattacharjee et al. An integrated machine learning and DEMATEL approach for feature preference and purchase intention modelling
CN109376307B (en) Article recommendation method and device and terminal
CN110427565A (en) A kind of item recommendation method based on collaborative filtering, intelligent terminal and storage medium
CN111680213A (en) Information recommendation method, data processing method and device
CN113850654A (en) Training method of item recommendation model, item screening method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant