CN109063120A

CN109063120A - A kind of collaborative filtering recommending method and device based on cluster

Info

Publication number: CN109063120A
Application number: CN201810863191.2A
Authority: CN
Inventors: 高志鹏; 李博; 杨杨; 王颖; 谭清; 王茜; 肖楷乐
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2018-08-01
Filing date: 2018-08-01
Publication date: 2018-12-21
Anticipated expiration: 2038-08-01
Also published as: CN109063120B

Abstract

The embodiment of the present invention provides a clustering-based collaborative filtering recommendation method and device, including: obtaining the tag genome vector of the first item; based on the tag genome vector of the first item, classifying the first item into a first number of clusters; For each target item: when the target item and the second item belong to the same cluster, calculate the correlation coefficient based on the distance of the preset type between the target item and the second item; when the target item and the second item belong to different When clustering, the correlation coefficient is calculated based on the Poisson correlation coefficient between the target item and the second item; the target user’s preset score for the second item and the correlation coefficient between the target item and the second item are weighted and, obtain the target user's predicted score for the target item; recommend the target item whose predicted score meets the preset condition to the target user. The application of the embodiment of the present invention can improve the objectivity of the recommendation score.

Description

A kind of collaborative filtering recommending method and device based on cluster

Technical field

The present invention relates to proposed algorithm technical field, more particularly to a kind of collaborative filtering recommending method based on cluster and Device.

Background technique

With the rapid development of internet technology, internet provides various massive informations for user, enrich and Facilitate the work and life of people.But meanwhile user obtains interested information also into the time-consuming expense of part from massive information The thing of power.For this purpose, producing proposed algorithm, proposed algorithm does not need user and provides specific demand, but passes through user's Historical behavior analyzes the interest and demand of user, to recommend to can satisfy the article of interest and demand for user.

Specifically, Collaborative Filtering Recommendation Algorithm is one of widely used proposed algorithm, Collaborative Filtering Recommendation Algorithm Processing step it is as follows:

The first step is concentrated from preset public data, obtains the historical record that user and article interact.Usually In the case of, public data collection can be obtained from the website for specialize in recommender system.What user and article interacted Historical record includes user-article rating matrix, user-article consumption matrix etc., wherein user-article rating matrix includes The scoring from the user that multiple articles and each article obtain.It should be noted that the scoring that each article obtains may From different users.User-article rating matrix is referred to as rating matrix below, and is said by taking rating matrix as an example It is bright.

For convenience of explanation, the object of article will be recommended to be known as target user, the article for recommending target user is known as Target item；Article in rating matrix is known as the first article.It should be understood that should be to the article that target user recommends The article that target user does not have used article namely target user not to provide scoring, then, target item should be the In one article, target user does not provide the article of scoring.

Second step, for each target item: firstly, the relative coefficient of the target item and the second article is calculated, second Article is the article in the first article in addition to all target items；Secondly, by target user in rating matrix to the second article Scoring and the relative coefficient of the target item and the second article be weighted summation, calculate target user to the mesh Mark the prediction scoring of article；

Specifically, Poisson related coefficient can be used as the relative coefficient between article, the meter of Poisson related coefficient It calculates shown in formula such as formula (1):

In formula (1), target item is article i；Second article is article j；ρ_ijIt is the Poisson of article i and article j Related coefficient；U_ij=U_i∩U_jCommon user to score article i and article j gathers；U is common user's set U_ij In user；r_uiThe scoring obtained for article i；For the mean value of the obtained scoring of article i；r_ujThe scoring obtained for article j； For the mean value of the obtained scoring of article j.

Prediction scoring is met the target item of preset condition, recommends target user by third step.

Specifically, can arrange according to target user to the size of the prediction scoring of each target item target item Sequence；By the target item of the higher preset quantity of prediction scoring, target user is recommended；Prediction can also be scored above to scoring The target item of threshold value, recommends target user.

As it can be seen that the relative coefficient of target item and other articles all relies on object in above-mentioned Collaborative Filtering Recommendation Algorithm The scoring that product obtain, and scoring is provided by user's subjectivity, is easy to influence the objectivity of calculated prediction scoring.For example, Article A only obtains the lower assessment point that a user provides in the state of feeling blue, but the quality of actually article A is very good, This allow for according to the prediction of the calculated commodity A of above-mentioned Collaborative Filtering Recommendation Algorithm score it is lower, objectivity is poor.

Summary of the invention

The embodiment of the present invention is designed to provide a kind of collaborative filtering recommending method and device based on cluster, to improve The objectivity of recommendation score.Specific technical solution is as follows:

The embodiment of the invention provides a kind of collaborative filtering recommending methods based on cluster, comprising:

From preset label genomic information matrix, the label genome vector of the first article, label genome are obtained Vector is used to describe the build-in attribute of the first article；

Using preset clustering algorithm, the first article is divided into default by the label genome vector based on the first article The cluster class of first quantity；

For each target item: when the target item and the second article belong to same cluster class, based on the target item Label genome vector is at a distance from the preset kind between the label genome vector of the second article, calculate the target item with The relative coefficient of second article, target item refer in the first article that target user does not provide the article of scoring, the second object Product refer in the first article, the article in addition to all target items；When the target item belongs to different cluster classes from the second article When, the Poisson related coefficient based on the target item and the second article calculates the correlation system of the target item with the second article Number；Relative coefficient of the target user to the preset scoring of the second article and the target item and the second article is carried out Weighted sum obtains target user and scores the prediction of the target item；

The target item that prediction scoring is met to preset condition, recommends target user.

The embodiment of the invention also provides a kind of collaborative filtering recommending device based on cluster, comprising:

First obtains module, for obtaining the label gene of the first article from preset label genomic information matrix Group vector, label genome vector are used to describe the build-in attribute of the first article；

Division module, for using preset clustering algorithm, the label genome vector based on the first article, by the first object Product are divided into the cluster class of default first quantity；

First computing module, for being directed to each target item: when the target item and the second article belong to same cluster class, Label genome vector based on the target item is at a distance from the preset kind between the label genome vector of the second article, The relative coefficient of the target item and the second article is calculated, target item refers in the first article that target user does not provide The article of scoring, the second article refer in the first article, the article in addition to all target items；When the target item and second When article belongs to different cluster classes, the Poisson related coefficient based on the target item and the second article calculates the target item and The relative coefficient of two articles；By target user to the preset scoring of the second article and the target item and the second article Relative coefficient be weighted summation, obtain target user and score the prediction of the target item；

Recommending module recommends target user for that will predict that scoring meets the target item of preset condition.

The embodiment of the present invention provides a kind of electronic equipment again, including processor, communication interface, memory and communication are always Line, wherein processor, communication interface, memory complete mutual communication by communication bus；

Memory, for storing computer program；

Processor when for executing the program stored on memory, realizes any of the above-described association based on cluster Same filtered recommendation method.

The embodiment of the present invention provides a kind of computer readable storage medium again, is stored in computer readable storage medium Instruction, when run on a computer, so that computer executes any of the above-described collaborative filtering recommending based on cluster Method.

The embodiment of the present invention provides a kind of computer program product comprising instruction again, when it runs on computers When, so that computer executes any of the above-described collaborative filtering recommending method based on cluster.

Collaborative filtering recommending method and device provided in an embodiment of the present invention based on cluster, firstly, from preset label The label genome vector of the first article is obtained in genomic information matrix, label genome vector is for describing the first article Build-in attribute；Then, using preset clustering algorithm, it is based on label genome vector, the first article is divided into default first The cluster class of quantity, so that belonging to the first article of same cluster class has similar build-in attribute；Next, being directed to each object Product: when the target item and the second article belong to same cluster class, the label genome vector and second based on the target item The distance of preset kind between the label genome vector of article, calculates correlation system of the target item with the second article Number, target item refer in the first article that target user does not provide the article of scoring, and the second article refers in the first article, Article in addition to all target items；When the target item belongs to different cluster classes from the second article, it is based on the target item With the Poisson related coefficient of the second article, the relative coefficient of the target item and the second article is calculated；Preset target is used Family is weighted summation to the relative coefficient of the scoring of the second article and the target item and the second article, obtains target User scores to the prediction of the target item；Finally, prediction scoring to be met to the target item of preset condition, target use is recommended Family.

In this way, according to label genome vector, after carrying out classification processing to the first article, for belonging to same cluster First article of class can calculate relative coefficient based on the distance of the preset kind between label genome vector.Due to Label genome vector is used to describe the build-in attribute of article, does not change with the subjective desire of user, has objectivity, So that calculated relative coefficient also has objectivity, predicts that the objectivity of scoring is also stronger obtained from, avoid the occurrence of Due to user subjective scoring and influence prediction scoring objectivity the problem of.

Certainly, implement any of the products of the present invention or method it is not absolutely required at the same reach all the above excellent Point.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is a kind of flow diagram of the collaborative filtering recommending method provided in an embodiment of the present invention based on cluster；

Fig. 2 is a kind of specific flow chart of step 103 in the embodiment of the present invention；

Fig. 3 is a kind of specific flow chart of sub-step 12 in the embodiment of the present invention；

Fig. 4 is a kind of specific flow chart of sub-step 13 in the embodiment of the present invention；

Fig. 5 is to determine the flow chart of parameter optimal value in the embodiment of the present invention；

Fig. 6 is a kind of specific flow chart of step 503 in the embodiment of the present invention；

Fig. 7 is the structural schematic diagram of the collaborative filtering recommending device based on cluster of the embodiment of the present invention；

Fig. 8 is the structural schematic diagram of electronic equipment provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

The embodiment of the invention provides a kind of collaborative filtering recommending methods based on cluster, are the present invention referring to Fig. 1, Fig. 1 A kind of flow diagram for the collaborative filtering recommending method based on cluster that embodiment provides, may include steps of:

Step 101, from preset label genomic information matrix, the label genome vector of the first article is obtained.

Wherein, label genome vector is used to describe the build-in attribute of the first article.

In this step, the label genome of the first article can be obtained from preset label genomic information matrix Vector Genome classifies to the first article according to label genome vector with will pass through subsequent step；Wherein, label base Because group vector can be used for describing the build-in attribute of the first article, the first article refers to that include in label genome matrix owns Article, usually multiple articles.

It should be noted that label genomic information matrix is by manually acquiring or the method for machine learning is from each net It is got in the user comment stood, belongs to the contextual information of article.In practical application, system can also be recommended from specializing in Public data on the website of system, which is concentrated, to be obtained, and is obtained for example, can concentrate from the public data on the website movielens.Mark Label genomic information matrix includes the label genome vector of the first article, and label genome vector is for describing the first article Build-in attribute, the specially degree of the first article and each feature or the correlation of label, the fractional representation between use 0 to 1, Numerical value is bigger, represent that the weight of first article in the characteristic or label be higher namely the build-in attribute of first article with The characteristic or label are closer.

For example, when the first article be film, label be respectively " terror ", " emotion " and " making laughs ", then, the first article Label genome vector corresponds to [0.9,0.8,0.1], wherein 0.9 is first article and the correlation of label " terror " Degree, 0.8 is the degree of first article and the correlation of label " emotion ", and 0.1 is first article and label " making laughs " The degree of correlation.So, first article and label " terror " are closest, it is believed that are a horrow movies.

It is objective deposit it should be understood that due to the label in label genome vector or being characterized in the build-in attribute of article , it does not change with the subjective desire of user, therefore, label genome vector has objectivity.

Step 102, using preset clustering algorithm, the label genome vector based on the first article draws the first article It is divided into the cluster class of default first quantity.

In this step, preset clustering algorithm can be used, be divided into the first article based on label genome vector The cluster class of default first quantity, wherein preset clustering algorithm can be K mean value (K-means) clustering algorithm, and average drifting is poly- Class algorithm etc..

Due to the label in label genome vector or it is characterized in the build-in attribute of article, is objective reality, therefore, returns The first article for belonging to same cluster class has similar objective characteristics, to make for the first article for belonging to same cluster class Relative coefficient is calculated with based on label genome vector, so that calculated relative coefficient has stronger objectivity.

In a kind of implementation, preset clustering algorithm is K mean cluster algorithm, can be based on the label base of the first article Because of a group vector, the first article is divided into K cluster class, specific treatment process are as follows:

The first step, input label genomic information matrix G.

Second step, K cluster class central point of random initializtion, is expressed as μ₁, μ₂..., μ_k∈Rⁿ；

Wherein, RⁿIndicate that length is the vector space of n, n is the number of the feature or label in label genomic information matrix G Amount.

Third step calculates the genome vector g of the first article i_iThe cluster class c belonged to⁽ⁱ⁾；

c⁽ⁱ⁾:=argmin | | g_i-μ_j||² (2)

In formula (2), g_iFor the genome vector of the first article i, g_i∈G；c⁽ⁱ⁾For g_iThe cluster class belonged to；μ_jFor cluster Class central point, j ∈ k；Formula (2) is meant that c⁽ⁱ⁾It is defined as argmin | | g_i-μ_j||²。

4th step recalculates the cluster class central point μ of cluster class j for each cluster class j_j；

In formula (3), g_iFor the genome vector of the first article i, g_i∈G；c⁽ⁱ⁾For g_iThe cluster class belonged to；μ_jFor cluster Class central point, j ∈ k；M is the total quantity of the first article, i ∈ m；Formula (3) is meant that μ_jIt is defined as

Third step and the 4th step are repeated, until convergence.It should be noted that being based on using K-means clustering algorithm The detailed process that first article is divided into K cluster class by the label genome vector of the first article can be with reference to the prior art, herein It repeats no more.

Step 103, for each target item: when the target item and the second article belong to same cluster class, being based on the mesh The label genome vector of article is marked at a distance from the preset kind between the label genome vector of the second article, calculates the mesh Mark the relative coefficient of article and the second article；When the target item belongs to different cluster classes from the second article, it is based on the target The Poisson related coefficient of article and the second article, calculates the relative coefficient of the target item and the second article；By preset mesh Mark user is weighted summation to the relative coefficient of the scoring of the second article and the target item and the second article, obtains Target user scores to the prediction of the target item.

Wherein, target item refers in the first article, and target user does not provide the article of scoring；Second article refers to Article in one article, in addition to all target items.

Since the first article is divided into multiple cluster classes by execution step 102, then, in this step, for each Target item is handled as follows, and is a kind of specific flow chart of step 103 in the embodiment of the present invention with reference to Fig. 2, Fig. 2:

Sub-step 11, when the target item and the second article belong to same cluster class, the label base based on the target item Because group vector is at a distance from the preset kind between the label genome vector of the second article, the target item and the second object are calculated The relative coefficient of product.

Wherein, the distance of preset kind can be for Euclidean distance, manhatton distance, mahalanobis distance etc., can be according to reality Situation determines.

In one implementation, when the target item and the second article belong to same cluster class, the target item is calculated Label genome vector and the second article label genome vector between Euclidean distance, by calculated Euclidean distance value Relative coefficient as the target item and the second article.

It should be noted that the method for calculating Euclidean distance can refer to the prior art, details are not described herein.

Due to the label in label genome vector or it is characterized in the build-in attribute of article, is objective reality, it is therefore, right In the first article for belonging to same cluster class, it is stronger objective to be had based on the calculated relative coefficient of label genome vector Property.

Sub-step 12 is based on the target item and the second object when the target item belongs to different cluster classes from the second article The Poisson related coefficient of product, calculates the relative coefficient of the target item and the second article.

It is a kind of specific flow chart of sub-step 12 in the embodiment of the present invention with reference to Fig. 3, Fig. 3 in a kind of implementation, Sub-step 12 may include:

Sub-step 121 calculates the Poisson related coefficient of the target item and the second article.

Specifically, the detailed process of sub-step 121 can be with reference to the formula (1) and related description in background technique, herein It repeats no more.

Sub-step 122, the quantity of the co-user based on the target item and the second article and calculated Poisson are closed Coefficient, generates the scaling Poisson related coefficient of the target item and the second article, and by the contracting of the target item and the second article Put relative coefficient of the Poisson related coefficient as the target item and the second article.

It, can also be from specializing in it should be noted that preset rating matrix is the same with label genomic information matrix Public data on the website of recommender system, which is concentrated, to be obtained, and is obtained for example, can concentrate from the public data on the website movielens It takes.Preset rating matrix includes the scoring from the user that multiple articles and each article obtain.But due to default Rating matrix there is sparsity, the scoring from the user that the large numbers of items in preset rating matrix obtains is less so that Co-user between article is considerably less.If Poisson related coefficient will lead to directly as the relative coefficient between article Relative coefficient can not correlation between real embodiment article, accuracy is poor.

Therefore, the embodiment of the present invention calculates relative coefficient based on more co-users, enables relative coefficient It is enough more accurate, can with the correlation between real embodiment article, specifically, the embodiment of the present invention is being based on Poisson related coefficient, The parameter of introducing co-user quantity, the Poisson related coefficient scaled, and using the Poisson related coefficient of scaling as article Between relative coefficient.

, can be according to formula (4) in a kind of specific implementation, the co-user based on the target item and the second article Quantity and calculated poisson formula number, generate the scaling Poisson related coefficient of the target item and the second article；

In formula (4), which is article i；Second article is article j；s_ijFor the scaling of article i and article j Poisson related coefficient；ρ_ijFor the Poisson related coefficient of article i and article j；n_ijFor the number of article i and the co-user of article j Amount；λ₁For the parameter of co-user quantity.

Specifically, the embodiment of the present invention is being based on Poisson related coefficient, the parameter lambda of co-user quantity is introduced₁, increase Influence of the co-user quantity to relative coefficient compensates for the considerably less problem of co-user between article to correlation system Several adverse effects, enables correlation and standard between the calculated relative coefficient real embodiment article of the embodiment of the present invention True property is higher, so that the objectivity of calculated prediction scoring is stronger.

In practical applications, the parameter lambda of co-user quantity₁Value can be 100.

Sub-step 13, by preset target user to the scoring of the second article and the target item and the second article Relative coefficient is weighted summation, obtains target user and scores the prediction of the target item.

In this step, due to having included scoring of the target user to the second article in preset rating matrix, Scoring of the preset target user to the second article can be obtained, and calculated by second step from preset rating matrix Relative coefficient be weighted summation, obtain target user and score the prediction of the target item.

Due in second step, according to classification results, for belonging to same cluster class, and belong between the article of different cluster classes Relative coefficient used different calculation methods so that the objectivity of calculated relative coefficient is relatively strong and accuracy compared with Height, therefore, the target user that this step obtains is also more preferable to the objectivity of the prediction scoring of the target item.

In a kind of implementation, according to formula shown in formula (5), preset target user comments the second article Point and the relative coefficient of the target item and the second article be weighted summation, obtain target user to the target item Prediction scoring；

In the formula shown in formula (5), which is article i；Second article is article j；Target user is to use Family u；It scores for user u the prediction of article i；r_ujIt scores for user u the prediction of article j；s_ijFor article i's and article j Scale Poisson related coefficient；ρ_ijFor the Poisson related coefficient of article i and article j；ε_ijFor article i label genome vector with The Euclidean distance value of the label genome vector of article j；κ_uIt is user u in addition to article i, provides the collection of the article of scoring It closes；p_ijFor adjustment factor；In the case that the first article is divided into k cluster class, the object of same cluster class is belonged to article i The set of product.

Specifically, in the formula shown in formula (5), according to the classification results of step 102, for the target item category Euclidean distance value between the second article of same cluster class, the label genome vector based on the target item and the second article, It calculates target user to score to the prediction of the target item, due to the label in label genome vector or is characterized in consolidating for article There is attribute, be objective reality, therefore, it is stronger objective to be had based on the calculated relative coefficient of label genome vector Property；And for the second article for belonging to different cluster classes from the target item, based on the contracting between the target item and the second article Poisson related coefficient is put, enables correlation between calculated relative coefficient real embodiment article and accuracy is higher； To sum up, thus the objectivity of calculated prediction scoring is stronger, being capable of evaluation of the actual response target user to the target item.

Step 104, the target item that prediction scoring is met to preset condition, recommends target user.

It, can be right according to the calculated target user of step 103 to the size of the prediction scoring of target item in this step Target item is ranked up；By the target item of the higher preset quantity of prediction scoring, target user is recommended；It can also will be pre- The target item for being scored above scoring threshold value is surveyed, target user is recommended, can specifically determine according to actual needs, herein no longer It repeats.

It is calculated prediction scoring objectivity it is stronger, can evaluation of the actual response target user to the target item, Thus the higher target item of prediction scoring filtered out, can preferably match the hobby of target user, the use feeling of user By more preferably.

As it can be seen that in the collaborative filtering recommending method based on cluster that the embodiment of the present invention proposes, it can be according to label Genome vector after carrying out classification processing to the first article, for belonging to the first article of same cluster class, is based on label base Relative coefficient is calculated because of the Euclidean distance value of group vector.Since label genome vector is used to describe the intrinsic category of article Property, do not change with the subjective desire of user, have objectivity so that calculated relative coefficient also have it is objective Property, it predicts that the objectivity of scoring is also stronger obtained from, avoids the occurrence of the subjective scoring due to user and influence prediction scoring Objectivity the problem of.

It is in the embodiment of the present invention with reference to Fig. 4, Fig. 4, one kind of sub-step 13 is specific in a kind of optional embodiment Flow chart can specifically include:

Sub-step 131, by preset target user to the scoring of the second article and the target item and the second article Relative coefficient is weighted summation.

Sub-step 132 is biased to adjusting parameter using personalizing parameters, user and article is biased to adjusting parameter, asks in weighting It is adjusted on the basis of the result of sum, obtains target user and score the prediction of the target item.

It should be noted that since preset target user to the scoring of the second article is obtained from preset rating matrix It takes, and the scoring in preset rating matrix may be influenced by a variety of deviation factors.Specifically, the factor of deviation may wrap It is biased to containing user and article is biased to, user is biased to refer to that the user having gets used to getting higher scoring to article, and article is biased to Refer to that the article having is easy to get higher scoring due to the influence of the extraneous factors such as advertising.Therefore, preset rating matrix In scoring possibly can not accurately reflect the hobby of user and the quality good or not of article.

Therefore, in embodiments of the present invention, can be biased in the result of sub-step 131 by personalizing parameters, user Adjusting parameter and article are biased to adjusting parameter and are adjusted, so that calculated target user scores to the prediction of the target item It is more accurate, it can really reflect the hobby of user and the quality good or not of article.

In a kind of implementation, it can be used formula (6), the scoring by preset target user to the second article, and The relative coefficient of the target item and the second article is weighted summation；Adjusting parameter is biased to using personalizing parameters, user And article is biased to adjusting parameter, is adjusted on the basis of the result of weighted sum, obtains target user to the target item Prediction scoring；

In the formula shown in formula (6), which is article i；Second article is article j；Target user is to use Family u；μ is the scoring mean value in preset rating matrix；b_uTo be biased to adjusting parameter for the user of user u；b_iFor for article The article of i is biased to adjusting parameter；α_uFor the personalizing parameters for user u；It scores for user u the prediction of article i；r_ujFor User u scores to the prediction of article j；s_ijFor the scaling Poisson related coefficient of article i and article j；ρ_ijFor article i's and article j Poisson related coefficient；ε_ijFor the Euclidean distance value of the label genome vector of the label genome vector and article j of article i；κ_u It is user u in addition to article i, provides the set of the article of scoring；p_ijFor adjustment factor；For the first article is divided into k In the case where a cluster class, the set of the article of same cluster class is belonged to article i.

Specifically, formula shown in formula (6) is joined in preset rating matrix on the basis of formula (5) Score mean μ, user's deviation adjusting parameter b_u, article be biased to adjusting parameter b_i, also, in view of each user is by similar Article effect is different, introduces personalizing parameters α for each user_u, so that calculated target user is to the target The prediction scoring of article is more accurate, can really reflect the hobby of user and the quality good or not of article.

More objective and accurate prediction scoring in order to obtain is calculating prediction scoring using the formula as shown in formula (6) Before, it can determine that the user in the formula is biased to adjusting parameter b_u, article be biased to adjusting parameter b_iWith personalizing parameters α_uIt is optimal Value, to obtain optimal formula (6).

It is to determine the flow chart of parameter optimal value in the embodiment of the present invention with reference to Fig. 5, Fig. 5.As shown in figure 5, determining second Predict that the user in the calculation formula of scoring is biased to adjusting parameter b_u, article be biased to adjusting parameter b_iWith personalizing parameters α_uMost The step of figure of merit, is as follows:

Step 501, from preset rating matrix, the sample set of preset quantity is obtained.

Wherein, each sample set respectively includes the scoring that user, article and the user provide the article.

In this step, the sample set of preset quantity can be obtained, according to present count from preset rating matrix The sample set of amount carries out minimum processing to loss function.

Due to including the obtained scoring from the user of multiple articles and each article in rating matrix, it can be with The scoring that user, article and the user provide the article is therefrom obtained, and using above-mentioned three as a sample set.Sample The quantity of this set can be determines according to actual conditions.

Step 502, for each sample set, using the user in the sample set as target user, and by the sample set Article in conjunction is as target item；It calculates the target user to score to the prediction of the target item, by calculated pre- assessment It is allocated as the corresponding prediction scoring of the sample set.

In this step, it can be handled as follows for every part of sample set in the sample set of preset quantity:

The first step, using the user in the sample set as target user, and using the article in the sample set as mesh Mark article.

Specifically, the user in the sample set is made for a sample set in the sample set of preset quantity It is predicted for target user using the article in the sample set as target item with being calculated according to target user and target item Scoring.

Second step calculates the target user and scores the prediction of the target item, and calculated prediction scoring is used as should The corresponding prediction scoring of sample set.

Specifically, using the formula as shown in formula (6), calculates the target user and scores the prediction of the target item, It is used as the corresponding prediction of the sample set to score calculated prediction scoring, to be scored according to prediction and really to be scored, Minimum processing is carried out to loss function value.

It should be noted that the existing scoring of the sample set Central Plains sheet is true scoring of the user to article, and count What is calculated is that user scores to the prediction of article.Here prediction scoring is calculated for Optimal Parameters, reduces true scoring as far as possible With the gap of prediction scoring, the calculated prediction scoring of formula shown in formula (6) is made to be more nearly true scoring.It can manage Solution, in practical applications, when by calculating prediction scoring come to target user's recommendation target item, since target item should Be that target user does not have therefore article that is used, or not buying will not provide the object of scoring to target user Product calculate prediction scoring again.

Step 503, based on each sample set and the corresponding prediction scoring of each sample set, to preset loss function Minimum processing is carried out, shown in preset loss function such as formula (7)；

In the preset loss function as shown in formula (7), κ is the sample set of preset quantity；Article i is sample set Article in conjunction；User u is the user in sample set；b_uTo be biased to adjusting parameter for the user of user u；b_iFor for object The article of product i is biased to adjusting parameter；α_uFor the personalizing parameters for user u；It is calculated user u to the pre- of article i Assessment point；r_uiFor obtained in the preset rating matrix, scoring of the user u to article i；λ₂For preset regularization parameter.

It in this step, can be according to calculated each sample set and the corresponding pre- assessment of each sample set Point, minimum processing is carried out to the loss function as shown in formula (7), with the loss function minimized.

It is a kind of specific flow chart of step 503 in the embodiment of the present invention, packet with reference to Fig. 6, Fig. 6 in a kind of implementation It includes:

Personalizing parameters α is arranged for each sample set in sub-step 5031_u, user be biased to adjusting parameter b_uAnd article It is biased to adjusting parameter b_iInitial value, and initialize step-length γ and initialization the number of iterations.

Specifically, specifically, personalizing parameters α_u, user be biased to adjusting parameter b_uAnd article is biased to adjusting parameter b_i's Initial value can be set to random (0,1), 0 and 0 respectively；The initial value of step-length γ can be 0.04；The number of iterations it is initial Value can be 0.

Sub-step 5032, setting user's set U and article set I is null set.

Specifically, it is that null set will be located with will pass through execution subsequent step that user's set U and article set I, which is respectively set, The user in sample set and article managed is separately added into user's set U and article set I.

Sub-step 5033, it is random to obtain a sample set from the sample set κ of preset quantity.

Wherein, a sample set includes user u, article i, and obtained in the preset rating matrix, and u pairs of user The scoring r of article i_uiAnd calculated user u scores to the prediction of article i

Specifically, a sample set used in from sub-step 5034 to sub-step 5037 is the sample set from preset quantity It is got at random in conjunction κ；Wherein, a sample set includes user u, article i, and is obtained from preset rating matrix , true scoring r of the user u to article i_ui, and the prediction scoring according to the calculated user u of formula (6) to article i To calculate loss function value according to prediction scoring and true scoring.

Sub-step 5034 judges in user's set U, if in the user u comprising the sample set and article set I Whether include the sample set article i；Sub-step 5033 is executed if it has, then returning, if it has not, then executing sub-step 5035。

Specifically, whether a sample set for judging that sub-step 5033 selects has processed, it is possible to understand that, User and article in processed sample set should be existing in user's set U and article set I.For having handled The sample set crossed no longer is handled in current iteration, is directly skipped, and randomly chooses one again by executing sub-step 5033 Part sample set.

This is because since the quantity of the sample set in the sample set κ of preset quantity is more, using traditional random The problem that gradient descent algorithm can bring time complexity excessive.In order to make the minimum processing to loss function can be reasonable It is completed in time, in the minimum processing to loss function, the embodiment of the present invention uses improved stochastic gradient descent algorithm. That is, only handling once in each iteration each sample set in the sample set κ of preset quantity.If The sample set currently chosen has been processed in current iteration, then skips the sample set.

In this way, can significantly be dropped in the minimum processing to loss function using improved stochastic gradient descent algorithm Low time complexity so that the minimum processing to loss function can be completed within reasonable time, while can effectively be prevented Only over-fitting.

Sub-step 5035, according to the scoring r in the sample set_uiIt scores with the predictionBetween difference e_ui, Update personalizing parameters α_u, user be biased to adjusting parameter b_uAnd article is biased to adjusting parameter b_i, and by the use in the sample set Family u is put into user set U, and the article i in the sample set is put into article set I.

In this step, it in user's set U, is not wrapped in user u and article set I not comprising the sample set When article i containing the sample set, which can be handled.

Specifically, firstly, calculating the scoring r in the sample set_uiIt scores with the predictionBetween difference e_ui, as shown in formula (8)；

In formula (8),It scores for calculated user u the prediction of article i；r_uiFor from preset rating matrix It obtains, scoring of the user u to article i；e_uiFor the scoring r_uiIt scores with the predictionBetween difference.

Then, according to the following formula (9) to (11), personalizing parameters α is updated respectively_u, user be biased to adjusting parameter b_uAnd Article is biased to adjusting parameter b_i。

In formula (9), γ is step-length；e_uiFor the scoring r_uiIt scores with the predictionBetween difference；r_ujFor with Family u scores to the prediction of article j；s_ijFor the scaling Poisson related coefficient of article i and article j；ρ_ijFor the pool of article i and article j Loose related coefficient；ε_ijFor the Euclidean distance value of the label genome vector of the label genome vector and article j of article i；α_uFor For the personalizing parameters of user u；λ₂For preset regularization parameter；κ_uIt is user u in addition to article i, provides the object of scoring The set of product.

In specific implementation, personalizing parameters α is directed to shown in formula (9)_uMore new formula be formula (6) to individual character Change parameter alpha_uDerivation, detailed treatment process can refer to the prior art, and details are not described herein.

b_u←b_u+γ·(e_ui-λ₂·b_u) (10)

In formula (10), b_uTo be biased to adjusting parameter for the user of user u；γ is step-length；e_uiFor the scoring r_ui It scores with the predictionBetween difference；λ₂For preset regularization parameter.

In specific implementation, adjusting parameter b is biased to for user shown in formula (10)_uMore new formula be formula (6) adjusting parameter b is biased to user_uDerivation, detailed treatment process can refer to the prior art, and details are not described herein.

b_i←b_i+γ·(e_ui-λ₂·b_i) (11)

In formula (11), b_iTo be biased to adjusting parameter for the article of article i；γ is step-length；e_uiFor the scoring r_ui It scores with the predictionBetween difference；λ₂For preset regularization parameter.

In specific implementation, adjusting parameter b is biased to for article shown in formula (11)_iMore new formula be formula (6) adjusting parameter b is biased to article_iDerivation, detailed treatment process can refer to the prior art, and details are not described herein.Wherein, The arrow of direction of the formula (9) into formula (11) to the left refers to, with the value of the expression formula on the right of arrow, replaces the arrow left side The value of parameter.

Sub-step 5036 reduces step-length γ, returns and executes sub-step 5033；

Sub-step 5037, when all sample sets in the sample set κ for having traversed preset quantity, the number of iterations adds One, execute sub-step 5038.

Specifically, illustrating that current iteration is handled when all sample sets in the sample set κ for having traversed preset quantity It completes, the number of iterations can be added one；Also, it is prevented in next iteration processing using smaller step-length with guaranteeing to restrain It vibrates.

Sub-step 5038 judges whether the number of iterations is more than preset the number of iterations threshold value；If it is, executing step 5039, if it has not, then executing sub-step 5032.

Specifically, being completed by executing sub-step 5038 to pre- when the number of iterations is more than preset the number of iterations threshold value If loss function minimum processing；When the number of iterations is not above preset the number of iterations threshold value, execution can be returned Sub-step 5032, continues next iteration, until the number of iterations is more than preset the number of iterations threshold value.

Sub-step 5039 is completed to handle the minimum of preset loss function.

Specifically, illustrating the minimum to preset loss function when the number of iterations is more than preset the number of iterations threshold value Change processing has been completed.

Step 504, from the loss function of minimum, determine that personalizing parameters, user are biased to adjusting parameter and article is inclined To the optimal value of adjusting parameter.

In this step, the minimum loss function obtained after being handled according to minimum determines that personalizing parameters, user are inclined The optimal value of adjusting parameter is biased to adjusting parameter and article, and personalizing parameters, user's deviation adjusting parameter and article is inclined To the optimal value of adjusting parameter, bring into the calculation formula of the second prediction scoring as shown in figure formula (6), after being optimized The calculation formula of second prediction scoring.

As it can be seen that in embodiments of the present invention, second can be obtained by carrying out minimum processing to preset loss function Predict that the user in the calculation formula of scoring is biased to adjusting parameter b_u, article be biased to adjusting parameter b_iWith personalizing parameters α_uMost The figure of merit, and then the calculation formula of the second prediction scoring after being optimized, so that according to the meter of the second prediction scoring after optimization It is more accurate and objective to calculate the calculated prediction scoring of formula.

The embodiment of the present invention separately provides a kind of collaborative filtering recommending device based on cluster, is this hair with reference to Fig. 7, Fig. 7 The structural schematic diagram of the collaborative filtering recommending device based on cluster of bright embodiment, device include:

First obtains module 701, for obtaining the label base of the first article from preset label genomic information matrix Because of a group vector, label genome vector is used to describe the build-in attribute of the first article；

Division module 702, for using preset clustering algorithm, the label genome vector based on the first article, by the One article is divided into the cluster class of default first quantity；

First computing module 703, for being directed to each target item: when the target item and the second article belong to same cluster class When, preset kind between the label genome vector of label genome vector and the second article based on the target item away from From calculating the relative coefficient of the target item and the second article, target item refers in the first article that target user does not give The article to score out, the second article refer in the first article, the article in addition to all target items；When the target item and When two articles belong to different cluster classes, the Poisson related coefficient based on the target item and the second article, calculate the target item with The relative coefficient of second article；By target user to the preset scoring of the second article and the target item and the second object The relative coefficient of product is weighted summation, obtains target user and scores the prediction of the target item；

Recommending module 704 recommends target user for that will predict that scoring meets the target item of preset condition.

Optionally, the first computing module 703, is specifically used for

Calculate the Poisson related coefficient of the target item and the second article；

The quantity of co-user based on the target item and the second article and calculated poisson formula number generate The scaling Poisson related coefficient of the target item and the second article, and the target item is related to the scaling Poisson of the second article Relative coefficient of the coefficient as the target item and the second article.

Optionally, the first computing module 703, is specifically used for

According to the following formula, the quantity of the co-user based on the target item and the second article and calculated pool Loose relationship number generates the scaling Poisson related coefficient of the target item and the second article；

In formula, which is article i；Second article is article j；s_ijFor the scaling Poisson of article i and article j Related coefficient；ρ_ijFor the Poisson related coefficient of article i and article j；n_ijFor the quantity of article i and the co-user of article j；λ₁For The parameter of co-user quantity.

Optionally, the distance of the preset kind includes Euclidean distance；

First computing module 703, is specifically used for

According to the following formula, by preset target user to the scoring of the second article and the target item and the second object The relative coefficient of product is weighted summation, obtains target user and scores the prediction of the target item；

In formula, which is article i；Second article is article j；Target user is user u；For user u Prediction scoring to article i；r_ujIt scores for user u the prediction of article j；s_ijIt is related to the scaling Poisson of article j for article i Coefficient；ρ_ijFor the Poisson related coefficient of article i and article j；ε_ijFor the label genome vector of article i and the label base of article j Because of the Euclidean distance value of group vector；κ_uIt is user u in addition to article i, provides the set of the article of scoring；p_ijTo adjust system Number；In the case that the first article is divided into k cluster class, the set of the article of same cluster class is belonged to article i.

Optionally, the first computing module 703, is specifically used for

By preset target user to the relative coefficient of the scoring of the second article and the target item and the second article It is weighted summation；

Adjusting parameter and article are biased to using personalizing parameters, user are biased to adjusting parameter, in the result of weighted sum On the basis of be adjusted, obtain target user and score the prediction of the target item.

Optionally, the first computing module 703, is specifically used for

Using following formula, by preset target user to the scoring of the second article and the target item and the second object The relative coefficient of product is weighted summation；Adjusting parameter is biased to using personalizing parameters, user and article is biased to adjusting parameter, It is adjusted on the basis of the result of weighted sum, obtains target user and score the prediction of the target item；

In formula, which is article i；Second article is article j；Target user is user u；μ is preset Scoring mean value in rating matrix；b_uTo be biased to adjusting parameter for the user of user u；b_iTo be biased to adjust for the article of article i Whole parameter；α_uFor the personalizing parameters for user u；It scores for user u the prediction of article i；r_ujIt is user u to article j Prediction scoring；s_ijFor the scaling Poisson related coefficient of article i and article j；ρ_ijFor the Poisson phase relation of article i and article j Number；ε_ijFor the Euclidean distance value of the label genome vector of the label genome vector and article j of article i；κ_uIt is user u to removing Other than article i, the set of the article of scoring is provided；p_ijFor adjustment factor；For the feelings that the first article is divided into k cluster class Under condition, the set of the article of same cluster class is belonged to article i.

Optionally, device further include:

Second acquisition unit, for obtaining the sample set of preset quantity, each sample set from preset rating matrix Respectively include the scoring that user, article and the user provide the article；

Second computing module, for being directed to each sample set, using the user in the sample set as target user, and will Article in the sample set is as target item；Using the calculation formula of the second prediction scoring, the target user is calculated to this The prediction of target item is scored, and calculated prediction scoring is scored as the corresponding prediction of the sample set；

Module is minimized, for scoring based on each sample set and the corresponding prediction of each sample set, to preset damage It loses function and carries out minimum processing, the loss function minimized, shown in the preset following formula of loss function；

In preset loss function, κ is the sample set of preset quantity；Article i is the article in each sample set；With Family u is the user in each sample set；b_uTo be biased to adjusting parameter for the user of user u；b_iIt is inclined for the article for article i To adjusting parameter；α_uFor the personalizing parameters for user u；It scores for calculated user u the prediction of article i；r_uiFor Obtained in preset rating matrix, scoring of the user u to article i；λ₂For preset regularization parameter；

Determining module, for from the loss function of minimum, determining that personalizing parameters, user are biased to adjusting parameter and object The optimal value of product deviation adjusting parameter.

Optionally, module is minimized, is specifically used for

For each sample set, personalizing parameters α is set_u, user be biased to adjusting parameter b_uAnd article is biased to adjusting parameter b_iInitial value, and initialize step-length γ and the number of iterations；

It is null set that user's set U and article set I, which is arranged,；

From the sample set κ of preset quantity, a sample set is obtained, a sample set includes user u, article i, And obtained in the preset rating matrix, scoring r of the user u to article i_uiAnd calculated user u is to the pre- of article i Assessment point

Judge in user's set U, if whether include the sample in the user u comprising the sample set and article set I The article i of this set；

If it has, then the step of returning described in execution from the sample set κ of preset quantity, obtaining a sample set；

If it has not, then according to the scoring r in the sample set_uiIt scores with the predictionBetween difference e_ui, more New personalizing parameters α_u, user be biased to adjusting parameter b_uAnd article is biased to adjusting parameter b_i, and by the user in the sample set U is put into user set U, and the article i in the sample set is put into article set I；

Reduce step-length γ, returns and execute described from the sample set κ of preset quantity, the step of a sample set of acquisition Suddenly；

When all sample sets in the sample set κ for having traversed preset quantity, the number of iterations adds one；

Judge whether the number of iterations is more than preset the number of iterations threshold value；If it has, then completing to preset loss function Minimum processing, execute the step of setting user's set U and article set I is null set if it has not, returning.

As it can be seen that in the collaborative filtering recommending device based on cluster that the embodiment of the present invention proposes, it can be according to label Genome vector after carrying out classification processing to the first article, for belonging to the first article of same cluster class, is based on label base Relative coefficient is calculated because of the Euclidean distance value of group vector.Since label genome vector is used to describe the intrinsic category of article Property, do not change with the subjective desire of user, have objectivity so that calculated relative coefficient also have it is objective Property, it predicts that the objectivity of scoring is also stronger obtained from, avoids the occurrence of the subjective scoring due to user and influence prediction scoring Objectivity the problem of.

The embodiment of the present invention provides a kind of electronic equipment again, and with reference to Fig. 8, Fig. 8 is electronics provided in an embodiment of the present invention The structural schematic diagram of equipment.As shown in figure 8, including processor 81, communication interface 82, memory 83 and communication bus 84, wherein Processor 81, communication interface 82, memory 93 complete mutual communication by communication bus 84,

Memory 83, for storing computer program；

Processor 81 when for executing the program stored on memory 83, realizes following steps:

The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just Yu Wei is only, but is not only a bus or a type of bus with a thick line in figure.

Communication interface is for the communication between above-mentioned electronic equipment and other equipment.

Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.

Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.；It can also be digital signal processor (Digital Signal Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components.

Method provided in an embodiment of the present invention can be applied to electronic equipment.Specifically, the electronic equipment can be with are as follows: desk-top Computer, portable computer, intelligent mobile terminal, server etc..It is not limited thereto, it is any that electricity of the invention may be implemented Sub- equipment, all belongs to the scope of protection of the present invention.

The embodiment of the present invention provides a kind of computer readable storage medium again, is stored with computer in the storage medium The step of program, the computer program realizes the above-mentioned collaborative filtering recommending method based on cluster when being executed by processor.

The embodiment of the present invention provides a kind of computer program product comprising instruction again, when it runs on computers When, so that the step of computer executes the above-mentioned collaborative filtering recommending method based on cluster.

The embodiment of the present invention provides a kind of computer program again, when run on a computer, so that computer is held The step of row above-mentioned collaborative filtering recommending method based on cluster.

For device/electronic equipment/storage medium/computer program product/computer program embodiments comprising instruction For, since it is substantially similar to the method embodiment, so being described relatively simple, referring to the portion of embodiment of the method in place of correlation It defends oneself bright.

It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device/ For electronic equipment/storage medium/computer program product/computer program embodiments comprising instruction, due to its basic phase It is similar to embodiment of the method, so being described relatively simple, the relevent part can refer to the partial explaination of embodiments of method.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims

1. a kind of collaborative filtering recommending method based on cluster characterized by comprising

From preset label genomic information matrix, the label genome vector of the first article, label genome vector are obtained For describing the build-in attribute of the first article；

Using preset clustering algorithm, the first article is divided into default first by the label genome vector based on the first article The cluster class of quantity；

For each target item: when the target item and the second article belong to same cluster class, the label based on the target item Genome vector calculates the target item and second at a distance from the preset kind between the label genome vector of the second article The relative coefficient of article, target item refer in the first article that target user does not provide the article of scoring, and the second article is Refer in the first article, the article in addition to all target items；When the target item belongs to different cluster classes from the second article, base In the Poisson related coefficient of the target item and the second article, the relative coefficient of the target item and the second article is calculated；It will Target user is weighted the relative coefficient of the preset scoring of the second article and the target item and the second article and asks With obtain target user and score the prediction of the target item；

2. the method according to claim 1, wherein the Poisson phase based on the target item with the second article Relationship number, the step of calculating the relative coefficient of the target item and the second article, comprising:

The quantity of co-user based on the target item and the second article and calculated poisson formula number, generate the mesh The scaling Poisson related coefficient of article and the second article is marked, and by the scaling Poisson related coefficient of the target item and the second article Relative coefficient as the target item and the second article.

3. according to the method described in claim 2, it is characterized in that, the common use based on the target item and the second article The quantity and poisson formula number at family, the step of generating the scaling Poisson related coefficient of the target item and the second article, packet It includes:

According to the following formula, the quantity of the co-user based on the target item and the second article and calculated Poisson are closed Coefficient generates the scaling Poisson related coefficient of the target item and the second article；

In formula, which is article i；Second article is article j；s_ijIt is related to the scaling Poisson of article j for article i Coefficient；ρ_ijFor the Poisson related coefficient of article i and article j；n_ijFor the quantity of article i and the co-user of article j；λ₁It is common The parameter of number of users.

4. according to the method described in claim 3, it is characterized in that, the distance of the preset kind includes Euclidean distance；

It is described by preset target user to the relative coefficient of the scoring of the second article and the target item and the second article It is weighted summation, obtains the step of target user scores to the prediction of the target item, comprising:

According to the following formula, by preset target user to the scoring of the second article and the target item and the second article Relative coefficient is weighted summation, obtains target user and scores the prediction of the target item；

In formula, which is article i；Second article is article j；Target user is user u；It is user u to article The prediction of i is scored；r_ujIt scores for user u the prediction of article j；s_ijFor the scaling Poisson related coefficient of article i and article j；ρ_ij For the Poisson related coefficient of article i and article j；ε_ijFor the label genome vector of article i and the label genome vector of article j Euclidean distance value；κ_uIt is user u in addition to article i, provides the set of the article of scoring；p_ijFor adjustment factor；To incite somebody to action In the case that first article is divided into k cluster class, the set of the article of same cluster class is belonged to article i.

5. according to the method described in claim 4, it is characterized in that, described comment the second article preset target user Point and the relative coefficient of the target item and the second article be weighted summation, obtain target user to the target item Prediction scoring the step of, comprising:

Relative coefficient of the preset target user to the scoring of the second article and the target item and the second article is carried out Weighted sum；

Adjusting parameter is biased to using personalizing parameters, user and article is biased to adjusting parameter, on the basis of the result of weighted sum On be adjusted, obtain target user and score the prediction of the target item.

6. according to the method described in claim 5, it is characterized in that, described comment the second article preset target user Point and the relative coefficient of the target item and the second article be weighted summation；It is biased to adjust using personalizing parameters, user Whole parameter and article are biased to adjusting parameter, are adjusted on the basis of the result of weighted sum, obtain target user to the mesh The step of marking the prediction scoring of article, comprising:

Using following formula, by preset target user to the scoring of the second article and the target item and the second article Relative coefficient is weighted summation；Adjusting parameter is biased to using personalizing parameters, user and article is biased to adjusting parameter, is being added It is adjusted on the basis of the result of power summation, obtains target user and score the prediction of the target item；

In formula, which is article i；Second article is article j；Target user is user u；μ is preset scoring Scoring mean value in matrix；b_uTo be biased to adjusting parameter for the user of user u；b_iTo be biased to adjustment ginseng for the article of article i Number；α_uFor the personalizing parameters for user u；It scores for user u the prediction of article i；r_ujIt is user u to the pre- of article j Assessment point；s_ijFor the scaling Poisson related coefficient of article i and article j；ρ_ijFor the Poisson related coefficient of article i and article j；ε_ij For the Euclidean distance value of the label genome vector of the label genome vector and article j of article i；κ_uIt is user u to except article i In addition, the set of the article of scoring is provided；p_ijFor adjustment factor；In the case that the first article is divided into k cluster class, Belong to the set of the article of same cluster class with article i.

7. according to the method described in claim 6, it is characterized in that, public in the calculating using the second following prediction scoring Formula carries out relative coefficient of the preset target user to the scoring of the second article and the target item and the second article Before the step of weighted sum, the method also includes:

From preset rating matrix, obtain the sample set of preset quantity, each sample set respectively include user, article and The scoring that the user provides the article；

For each sample set, using the user in the sample set as target user, and the article in the sample set is made For target item；It calculates the target user to score to the prediction of the target item, regard calculated prediction scoring as the sample Gather corresponding prediction scoring；

Based on each sample set and the corresponding prediction scoring of each sample set, preset loss function is carried out at minimum Reason, the loss function minimized, shown in the preset following formula of loss function；

In preset loss function, κ is the sample set of preset quantity；Article i is the article in each sample set；User u For the user in each sample set；b_uTo be biased to adjusting parameter for the user of user u；b_iTo be biased to adjust for the article of article i Whole parameter；α_uFor the personalizing parameters for user u；It scores for calculated user u the prediction of article i；r_uiFor from pre- If rating matrix obtained in, scoring of the user u to article i；λ₂For preset regularization parameter；

From the loss function of minimum, determine that personalizing parameters, user are biased to adjusting parameter and article is biased to adjusting parameter Optimal value.

8. the method according to the description of claim 7 is characterized in that described be based on each sample set and each sample set pair The step of prediction scoring answered, carries out minimum processing to preset loss function, the loss function minimized, comprising:

For each sample set, personalizing parameters α is set_u, user be biased to adjusting parameter b_uAnd article is biased to adjusting parameter b_i's Initial value, and initialize step-length γ and the number of iterations；

It is null set that user's set U and article set I, which is arranged,；

From the sample set κ of preset quantity, a sample set is obtained, a sample set includes user u, article i, and Obtained in preset rating matrix, scoring r of the user u to article i_uiAnd pre- assessment of the calculated user u to article i Point

Judge in user's set U, if whether include the sample set in the user u comprising the sample set and article set I The article i of conjunction；

If it has not, then according to the scoring r in the sample set_uiIt scores with the predictionBetween difference e_ui, update a Property parameter alpha_u, user be biased to adjusting parameter b_uAnd article is biased to adjusting parameter b_i, and the user u in the sample set is put Access customer set U, and the article i in the sample set is put into article set I；

The step of reducing step-length γ, returning described in executing from the sample set κ of preset quantity, obtain a sample set；

Judge whether the number of iterations is more than preset the number of iterations threshold value；If it has, then completing to preset loss function most Smallization processing executes the step of setting user's set U and article set I is null set if it has not, returning.

9. a kind of collaborative filtering recommending device based on cluster characterized by comprising

First obtains module, for from preset label genomic information matrix, obtain the label genome of the first article to Amount, label genome vector are used to describe the build-in attribute of the first article；

Division module, for using preset clustering algorithm, the label genome vector based on the first article draws the first article It is divided into the cluster class of default first quantity；

First computing module, for being directed to each target item: when the target item and the second article belong to same cluster class, being based on The label genome vector of the target item calculates at a distance from the preset kind between the label genome vector of the second article The relative coefficient of the target item and the second article, target item refer in the first article that target user does not provide scoring Article, the second article refers in the first article, the article in addition to all target items；When the target item and the second article When belonging to different cluster classes, the Poisson related coefficient based on the target item and the second article calculates the target item and the second object The relative coefficient of product；By target user to the preset scoring of the second article and the phase of the target item and the second article It closes property coefficient and is weighted summation, obtain target user and score the prediction of the target item；

10. device according to claim 9, which is characterized in that

First computing module, is specifically used for