CN109063120A - A kind of collaborative filtering recommending method and device based on cluster - Google Patents
A kind of collaborative filtering recommending method and device based on cluster Download PDFInfo
- Publication number
- CN109063120A CN109063120A CN201810863191.2A CN201810863191A CN109063120A CN 109063120 A CN109063120 A CN 109063120A CN 201810863191 A CN201810863191 A CN 201810863191A CN 109063120 A CN109063120 A CN 109063120A
- Authority
- CN
- China
- Prior art keywords
- article
- user
- target item
- preset
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides collaborative filtering recommending methods and device based on cluster, comprising: obtains the label genome vector of the first article;First article, is divided into the cluster class of the first quantity by the label genome vector based on the first article;For each target item: when the target item and the second article belong to same cluster class, relative coefficient is calculated at a distance from the preset kind based on the target item between the second article;When the target item belongs to different cluster classes from the second article, relative coefficient is calculated based on the Poisson related coefficient of the target item and the second article;Relative coefficient of the target user to the preset scoring of the second article and the target item and the second article is weighted summation, target user is obtained and scores the prediction of the target item;The target item that prediction scoring is met to preset condition, recommends target user.It can be improved the objectivity of recommendation score using the embodiment of the present invention.
Description
Technical field
The present invention relates to proposed algorithm technical field, more particularly to a kind of collaborative filtering recommending method based on cluster and
Device.
Background technique
With the rapid development of internet technology, internet provides various massive informations for user, enrich and
Facilitate the work and life of people.But meanwhile user obtains interested information also into the time-consuming expense of part from massive information
The thing of power.For this purpose, producing proposed algorithm, proposed algorithm does not need user and provides specific demand, but passes through user's
Historical behavior analyzes the interest and demand of user, to recommend to can satisfy the article of interest and demand for user.
Specifically, Collaborative Filtering Recommendation Algorithm is one of widely used proposed algorithm, Collaborative Filtering Recommendation Algorithm
Processing step it is as follows:
The first step is concentrated from preset public data, obtains the historical record that user and article interact.Usually
In the case of, public data collection can be obtained from the website for specialize in recommender system.What user and article interacted
Historical record includes user-article rating matrix, user-article consumption matrix etc., wherein user-article rating matrix includes
The scoring from the user that multiple articles and each article obtain.It should be noted that the scoring that each article obtains may
From different users.User-article rating matrix is referred to as rating matrix below, and is said by taking rating matrix as an example
It is bright.
For convenience of explanation, the object of article will be recommended to be known as target user, the article for recommending target user is known as
Target item;Article in rating matrix is known as the first article.It should be understood that should be to the article that target user recommends
The article that target user does not have used article namely target user not to provide scoring, then, target item should be the
In one article, target user does not provide the article of scoring.
Second step, for each target item: firstly, the relative coefficient of the target item and the second article is calculated, second
Article is the article in the first article in addition to all target items;Secondly, by target user in rating matrix to the second article
Scoring and the relative coefficient of the target item and the second article be weighted summation, calculate target user to the mesh
Mark the prediction scoring of article;
Specifically, Poisson related coefficient can be used as the relative coefficient between article, the meter of Poisson related coefficient
It calculates shown in formula such as formula (1):
In formula (1), target item is article i;Second article is article j;ρijIt is the Poisson of article i and article j
Related coefficient;Uij=Ui∩UjCommon user to score article i and article j gathers;U is common user's set Uij
In user;ruiThe scoring obtained for article i;For the mean value of the obtained scoring of article i;rujThe scoring obtained for article j;
For the mean value of the obtained scoring of article j.
Prediction scoring is met the target item of preset condition, recommends target user by third step.
Specifically, can arrange according to target user to the size of the prediction scoring of each target item target item
Sequence;By the target item of the higher preset quantity of prediction scoring, target user is recommended;Prediction can also be scored above to scoring
The target item of threshold value, recommends target user.
As it can be seen that the relative coefficient of target item and other articles all relies on object in above-mentioned Collaborative Filtering Recommendation Algorithm
The scoring that product obtain, and scoring is provided by user's subjectivity, is easy to influence the objectivity of calculated prediction scoring.For example,
Article A only obtains the lower assessment point that a user provides in the state of feeling blue, but the quality of actually article A is very good,
This allow for according to the prediction of the calculated commodity A of above-mentioned Collaborative Filtering Recommendation Algorithm score it is lower, objectivity is poor.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of collaborative filtering recommending method and device based on cluster, to improve
The objectivity of recommendation score.Specific technical solution is as follows:
The embodiment of the invention provides a kind of collaborative filtering recommending methods based on cluster, comprising:
From preset label genomic information matrix, the label genome vector of the first article, label genome are obtained
Vector is used to describe the build-in attribute of the first article;
Using preset clustering algorithm, the first article is divided into default by the label genome vector based on the first article
The cluster class of first quantity;
For each target item: when the target item and the second article belong to same cluster class, based on the target item
Label genome vector is at a distance from the preset kind between the label genome vector of the second article, calculate the target item with
The relative coefficient of second article, target item refer in the first article that target user does not provide the article of scoring, the second object
Product refer in the first article, the article in addition to all target items;When the target item belongs to different cluster classes from the second article
When, the Poisson related coefficient based on the target item and the second article calculates the correlation system of the target item with the second article
Number;Relative coefficient of the target user to the preset scoring of the second article and the target item and the second article is carried out
Weighted sum obtains target user and scores the prediction of the target item;
The target item that prediction scoring is met to preset condition, recommends target user.
The embodiment of the invention also provides a kind of collaborative filtering recommending device based on cluster, comprising:
First obtains module, for obtaining the label gene of the first article from preset label genomic information matrix
Group vector, label genome vector are used to describe the build-in attribute of the first article;
Division module, for using preset clustering algorithm, the label genome vector based on the first article, by the first object
Product are divided into the cluster class of default first quantity;
First computing module, for being directed to each target item: when the target item and the second article belong to same cluster class,
Label genome vector based on the target item is at a distance from the preset kind between the label genome vector of the second article,
The relative coefficient of the target item and the second article is calculated, target item refers in the first article that target user does not provide
The article of scoring, the second article refer in the first article, the article in addition to all target items;When the target item and second
When article belongs to different cluster classes, the Poisson related coefficient based on the target item and the second article calculates the target item and
The relative coefficient of two articles;By target user to the preset scoring of the second article and the target item and the second article
Relative coefficient be weighted summation, obtain target user and score the prediction of the target item;
Recommending module recommends target user for that will predict that scoring meets the target item of preset condition.
The embodiment of the present invention provides a kind of electronic equipment again, including processor, communication interface, memory and communication are always
Line, wherein processor, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any of the above-described association based on cluster
Same filtered recommendation method.
The embodiment of the present invention provides a kind of computer readable storage medium again, is stored in computer readable storage medium
Instruction, when run on a computer, so that computer executes any of the above-described collaborative filtering recommending based on cluster
Method.
The embodiment of the present invention provides a kind of computer program product comprising instruction again, when it runs on computers
When, so that computer executes any of the above-described collaborative filtering recommending method based on cluster.
Collaborative filtering recommending method and device provided in an embodiment of the present invention based on cluster, firstly, from preset label
The label genome vector of the first article is obtained in genomic information matrix, label genome vector is for describing the first article
Build-in attribute;Then, using preset clustering algorithm, it is based on label genome vector, the first article is divided into default first
The cluster class of quantity, so that belonging to the first article of same cluster class has similar build-in attribute;Next, being directed to each object
Product: when the target item and the second article belong to same cluster class, the label genome vector and second based on the target item
The distance of preset kind between the label genome vector of article, calculates correlation system of the target item with the second article
Number, target item refer in the first article that target user does not provide the article of scoring, and the second article refers in the first article,
Article in addition to all target items;When the target item belongs to different cluster classes from the second article, it is based on the target item
With the Poisson related coefficient of the second article, the relative coefficient of the target item and the second article is calculated;Preset target is used
Family is weighted summation to the relative coefficient of the scoring of the second article and the target item and the second article, obtains target
User scores to the prediction of the target item;Finally, prediction scoring to be met to the target item of preset condition, target use is recommended
Family.
In this way, according to label genome vector, after carrying out classification processing to the first article, for belonging to same cluster
First article of class can calculate relative coefficient based on the distance of the preset kind between label genome vector.Due to
Label genome vector is used to describe the build-in attribute of article, does not change with the subjective desire of user, has objectivity,
So that calculated relative coefficient also has objectivity, predicts that the objectivity of scoring is also stronger obtained from, avoid the occurrence of
Due to user subjective scoring and influence prediction scoring objectivity the problem of.
Certainly, implement any of the products of the present invention or method it is not absolutely required at the same reach all the above excellent
Point.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow diagram of the collaborative filtering recommending method provided in an embodiment of the present invention based on cluster;
Fig. 2 is a kind of specific flow chart of step 103 in the embodiment of the present invention;
Fig. 3 is a kind of specific flow chart of sub-step 12 in the embodiment of the present invention;
Fig. 4 is a kind of specific flow chart of sub-step 13 in the embodiment of the present invention;
Fig. 5 is to determine the flow chart of parameter optimal value in the embodiment of the present invention;
Fig. 6 is a kind of specific flow chart of step 503 in the embodiment of the present invention;
Fig. 7 is the structural schematic diagram of the collaborative filtering recommending device based on cluster of the embodiment of the present invention;
Fig. 8 is the structural schematic diagram of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
The embodiment of the invention provides a kind of collaborative filtering recommending methods based on cluster, are the present invention referring to Fig. 1, Fig. 1
A kind of flow diagram for the collaborative filtering recommending method based on cluster that embodiment provides, may include steps of:
Step 101, from preset label genomic information matrix, the label genome vector of the first article is obtained.
Wherein, label genome vector is used to describe the build-in attribute of the first article.
In this step, the label genome of the first article can be obtained from preset label genomic information matrix
Vector Genome classifies to the first article according to label genome vector with will pass through subsequent step;Wherein, label base
Because group vector can be used for describing the build-in attribute of the first article, the first article refers to that include in label genome matrix owns
Article, usually multiple articles.
It should be noted that label genomic information matrix is by manually acquiring or the method for machine learning is from each net
It is got in the user comment stood, belongs to the contextual information of article.In practical application, system can also be recommended from specializing in
Public data on the website of system, which is concentrated, to be obtained, and is obtained for example, can concentrate from the public data on the website movielens.Mark
Label genomic information matrix includes the label genome vector of the first article, and label genome vector is for describing the first article
Build-in attribute, the specially degree of the first article and each feature or the correlation of label, the fractional representation between use 0 to 1,
Numerical value is bigger, represent that the weight of first article in the characteristic or label be higher namely the build-in attribute of first article with
The characteristic or label are closer.
For example, when the first article be film, label be respectively " terror ", " emotion " and " making laughs ", then, the first article
Label genome vector corresponds to [0.9,0.8,0.1], wherein 0.9 is first article and the correlation of label " terror "
Degree, 0.8 is the degree of first article and the correlation of label " emotion ", and 0.1 is first article and label " making laughs "
The degree of correlation.So, first article and label " terror " are closest, it is believed that are a horrow movies.
It is objective deposit it should be understood that due to the label in label genome vector or being characterized in the build-in attribute of article
, it does not change with the subjective desire of user, therefore, label genome vector has objectivity.
Step 102, using preset clustering algorithm, the label genome vector based on the first article draws the first article
It is divided into the cluster class of default first quantity.
In this step, preset clustering algorithm can be used, be divided into the first article based on label genome vector
The cluster class of default first quantity, wherein preset clustering algorithm can be K mean value (K-means) clustering algorithm, and average drifting is poly-
Class algorithm etc..
Due to the label in label genome vector or it is characterized in the build-in attribute of article, is objective reality, therefore, returns
The first article for belonging to same cluster class has similar objective characteristics, to make for the first article for belonging to same cluster class
Relative coefficient is calculated with based on label genome vector, so that calculated relative coefficient has stronger objectivity.
In a kind of implementation, preset clustering algorithm is K mean cluster algorithm, can be based on the label base of the first article
Because of a group vector, the first article is divided into K cluster class, specific treatment process are as follows:
The first step, input label genomic information matrix G.
Second step, K cluster class central point of random initializtion, is expressed as μ1, μ2..., μk∈Rn;
Wherein, RnIndicate that length is the vector space of n, n is the number of the feature or label in label genomic information matrix G
Amount.
Third step calculates the genome vector g of the first article iiThe cluster class c belonged to(i);
c(i):=argmin | | gi-μj||2 (2)
In formula (2), giFor the genome vector of the first article i, gi∈G;c(i)For giThe cluster class belonged to;μjFor cluster
Class central point, j ∈ k;Formula (2) is meant that c(i)It is defined as argmin | | gi-μj||2。
4th step recalculates the cluster class central point μ of cluster class j for each cluster class jj;
In formula (3), giFor the genome vector of the first article i, gi∈G;c(i)For giThe cluster class belonged to;μjFor cluster
Class central point, j ∈ k;M is the total quantity of the first article, i ∈ m;Formula (3) is meant that μjIt is defined as
Third step and the 4th step are repeated, until convergence.It should be noted that being based on using K-means clustering algorithm
The detailed process that first article is divided into K cluster class by the label genome vector of the first article can be with reference to the prior art, herein
It repeats no more.
Step 103, for each target item: when the target item and the second article belong to same cluster class, being based on the mesh
The label genome vector of article is marked at a distance from the preset kind between the label genome vector of the second article, calculates the mesh
Mark the relative coefficient of article and the second article;When the target item belongs to different cluster classes from the second article, it is based on the target
The Poisson related coefficient of article and the second article, calculates the relative coefficient of the target item and the second article;By preset mesh
Mark user is weighted summation to the relative coefficient of the scoring of the second article and the target item and the second article, obtains
Target user scores to the prediction of the target item.
Wherein, target item refers in the first article, and target user does not provide the article of scoring;Second article refers to
Article in one article, in addition to all target items.
Since the first article is divided into multiple cluster classes by execution step 102, then, in this step, for each
Target item is handled as follows, and is a kind of specific flow chart of step 103 in the embodiment of the present invention with reference to Fig. 2, Fig. 2:
Sub-step 11, when the target item and the second article belong to same cluster class, the label base based on the target item
Because group vector is at a distance from the preset kind between the label genome vector of the second article, the target item and the second object are calculated
The relative coefficient of product.
Wherein, the distance of preset kind can be for Euclidean distance, manhatton distance, mahalanobis distance etc., can be according to reality
Situation determines.
In one implementation, when the target item and the second article belong to same cluster class, the target item is calculated
Label genome vector and the second article label genome vector between Euclidean distance, by calculated Euclidean distance value
Relative coefficient as the target item and the second article.
It should be noted that the method for calculating Euclidean distance can refer to the prior art, details are not described herein.
Due to the label in label genome vector or it is characterized in the build-in attribute of article, is objective reality, it is therefore, right
In the first article for belonging to same cluster class, it is stronger objective to be had based on the calculated relative coefficient of label genome vector
Property.
Sub-step 12 is based on the target item and the second object when the target item belongs to different cluster classes from the second article
The Poisson related coefficient of product, calculates the relative coefficient of the target item and the second article.
It is a kind of specific flow chart of sub-step 12 in the embodiment of the present invention with reference to Fig. 3, Fig. 3 in a kind of implementation,
Sub-step 12 may include:
Sub-step 121 calculates the Poisson related coefficient of the target item and the second article.
Specifically, the detailed process of sub-step 121 can be with reference to the formula (1) and related description in background technique, herein
It repeats no more.
Sub-step 122, the quantity of the co-user based on the target item and the second article and calculated Poisson are closed
Coefficient, generates the scaling Poisson related coefficient of the target item and the second article, and by the contracting of the target item and the second article
Put relative coefficient of the Poisson related coefficient as the target item and the second article.
It, can also be from specializing in it should be noted that preset rating matrix is the same with label genomic information matrix
Public data on the website of recommender system, which is concentrated, to be obtained, and is obtained for example, can concentrate from the public data on the website movielens
It takes.Preset rating matrix includes the scoring from the user that multiple articles and each article obtain.But due to default
Rating matrix there is sparsity, the scoring from the user that the large numbers of items in preset rating matrix obtains is less so that
Co-user between article is considerably less.If Poisson related coefficient will lead to directly as the relative coefficient between article
Relative coefficient can not correlation between real embodiment article, accuracy is poor.
Therefore, the embodiment of the present invention calculates relative coefficient based on more co-users, enables relative coefficient
It is enough more accurate, can with the correlation between real embodiment article, specifically, the embodiment of the present invention is being based on Poisson related coefficient,
The parameter of introducing co-user quantity, the Poisson related coefficient scaled, and using the Poisson related coefficient of scaling as article
Between relative coefficient.
, can be according to formula (4) in a kind of specific implementation, the co-user based on the target item and the second article
Quantity and calculated poisson formula number, generate the scaling Poisson related coefficient of the target item and the second article;
In formula (4), which is article i;Second article is article j;sijFor the scaling of article i and article j
Poisson related coefficient;ρijFor the Poisson related coefficient of article i and article j;nijFor the number of article i and the co-user of article j
Amount;λ1For the parameter of co-user quantity.
Specifically, the embodiment of the present invention is being based on Poisson related coefficient, the parameter lambda of co-user quantity is introduced1, increase
Influence of the co-user quantity to relative coefficient compensates for the considerably less problem of co-user between article to correlation system
Several adverse effects, enables correlation and standard between the calculated relative coefficient real embodiment article of the embodiment of the present invention
True property is higher, so that the objectivity of calculated prediction scoring is stronger.
In practical applications, the parameter lambda of co-user quantity1Value can be 100.
Sub-step 13, by preset target user to the scoring of the second article and the target item and the second article
Relative coefficient is weighted summation, obtains target user and scores the prediction of the target item.
In this step, due to having included scoring of the target user to the second article in preset rating matrix,
Scoring of the preset target user to the second article can be obtained, and calculated by second step from preset rating matrix
Relative coefficient be weighted summation, obtain target user and score the prediction of the target item.
Due in second step, according to classification results, for belonging to same cluster class, and belong between the article of different cluster classes
Relative coefficient used different calculation methods so that the objectivity of calculated relative coefficient is relatively strong and accuracy compared with
Height, therefore, the target user that this step obtains is also more preferable to the objectivity of the prediction scoring of the target item.
In a kind of implementation, according to formula shown in formula (5), preset target user comments the second article
Point and the relative coefficient of the target item and the second article be weighted summation, obtain target user to the target item
Prediction scoring;
In the formula shown in formula (5), which is article i;Second article is article j;Target user is to use
Family u;It scores for user u the prediction of article i;rujIt scores for user u the prediction of article j;sijFor article i's and article j
Scale Poisson related coefficient;ρijFor the Poisson related coefficient of article i and article j;εijFor article i label genome vector with
The Euclidean distance value of the label genome vector of article j;κuIt is user u in addition to article i, provides the collection of the article of scoring
It closes;pijFor adjustment factor;In the case that the first article is divided into k cluster class, the object of same cluster class is belonged to article i
The set of product.
Specifically, in the formula shown in formula (5), according to the classification results of step 102, for the target item category
Euclidean distance value between the second article of same cluster class, the label genome vector based on the target item and the second article,
It calculates target user to score to the prediction of the target item, due to the label in label genome vector or is characterized in consolidating for article
There is attribute, be objective reality, therefore, it is stronger objective to be had based on the calculated relative coefficient of label genome vector
Property;And for the second article for belonging to different cluster classes from the target item, based on the contracting between the target item and the second article
Poisson related coefficient is put, enables correlation between calculated relative coefficient real embodiment article and accuracy is higher;
To sum up, thus the objectivity of calculated prediction scoring is stronger, being capable of evaluation of the actual response target user to the target item.
Step 104, the target item that prediction scoring is met to preset condition, recommends target user.
It, can be right according to the calculated target user of step 103 to the size of the prediction scoring of target item in this step
Target item is ranked up;By the target item of the higher preset quantity of prediction scoring, target user is recommended;It can also will be pre-
The target item for being scored above scoring threshold value is surveyed, target user is recommended, can specifically determine according to actual needs, herein no longer
It repeats.
It is calculated prediction scoring objectivity it is stronger, can evaluation of the actual response target user to the target item,
Thus the higher target item of prediction scoring filtered out, can preferably match the hobby of target user, the use feeling of user
By more preferably.
As it can be seen that in the collaborative filtering recommending method based on cluster that the embodiment of the present invention proposes, it can be according to label
Genome vector after carrying out classification processing to the first article, for belonging to the first article of same cluster class, is based on label base
Relative coefficient is calculated because of the Euclidean distance value of group vector.Since label genome vector is used to describe the intrinsic category of article
Property, do not change with the subjective desire of user, have objectivity so that calculated relative coefficient also have it is objective
Property, it predicts that the objectivity of scoring is also stronger obtained from, avoids the occurrence of the subjective scoring due to user and influence prediction scoring
Objectivity the problem of.
It is in the embodiment of the present invention with reference to Fig. 4, Fig. 4, one kind of sub-step 13 is specific in a kind of optional embodiment
Flow chart can specifically include:
Sub-step 131, by preset target user to the scoring of the second article and the target item and the second article
Relative coefficient is weighted summation.
Sub-step 132 is biased to adjusting parameter using personalizing parameters, user and article is biased to adjusting parameter, asks in weighting
It is adjusted on the basis of the result of sum, obtains target user and score the prediction of the target item.
It should be noted that since preset target user to the scoring of the second article is obtained from preset rating matrix
It takes, and the scoring in preset rating matrix may be influenced by a variety of deviation factors.Specifically, the factor of deviation may wrap
It is biased to containing user and article is biased to, user is biased to refer to that the user having gets used to getting higher scoring to article, and article is biased to
Refer to that the article having is easy to get higher scoring due to the influence of the extraneous factors such as advertising.Therefore, preset rating matrix
In scoring possibly can not accurately reflect the hobby of user and the quality good or not of article.
Therefore, in embodiments of the present invention, can be biased in the result of sub-step 131 by personalizing parameters, user
Adjusting parameter and article are biased to adjusting parameter and are adjusted, so that calculated target user scores to the prediction of the target item
It is more accurate, it can really reflect the hobby of user and the quality good or not of article.
In a kind of implementation, it can be used formula (6), the scoring by preset target user to the second article, and
The relative coefficient of the target item and the second article is weighted summation;Adjusting parameter is biased to using personalizing parameters, user
And article is biased to adjusting parameter, is adjusted on the basis of the result of weighted sum, obtains target user to the target item
Prediction scoring;
In the formula shown in formula (6), which is article i;Second article is article j;Target user is to use
Family u;μ is the scoring mean value in preset rating matrix;buTo be biased to adjusting parameter for the user of user u;biFor for article
The article of i is biased to adjusting parameter;αuFor the personalizing parameters for user u;It scores for user u the prediction of article i;rujFor
User u scores to the prediction of article j;sijFor the scaling Poisson related coefficient of article i and article j;ρijFor article i's and article j
Poisson related coefficient;εijFor the Euclidean distance value of the label genome vector of the label genome vector and article j of article i;κu
It is user u in addition to article i, provides the set of the article of scoring;pijFor adjustment factor;For the first article is divided into k
In the case where a cluster class, the set of the article of same cluster class is belonged to article i.
Specifically, formula shown in formula (6) is joined in preset rating matrix on the basis of formula (5)
Score mean μ, user's deviation adjusting parameter bu, article be biased to adjusting parameter bi, also, in view of each user is by similar
Article effect is different, introduces personalizing parameters α for each useru, so that calculated target user is to the target
The prediction scoring of article is more accurate, can really reflect the hobby of user and the quality good or not of article.
More objective and accurate prediction scoring in order to obtain is calculating prediction scoring using the formula as shown in formula (6)
Before, it can determine that the user in the formula is biased to adjusting parameter bu, article be biased to adjusting parameter biWith personalizing parameters αuIt is optimal
Value, to obtain optimal formula (6).
It is to determine the flow chart of parameter optimal value in the embodiment of the present invention with reference to Fig. 5, Fig. 5.As shown in figure 5, determining second
Predict that the user in the calculation formula of scoring is biased to adjusting parameter bu, article be biased to adjusting parameter biWith personalizing parameters αuMost
The step of figure of merit, is as follows:
Step 501, from preset rating matrix, the sample set of preset quantity is obtained.
Wherein, each sample set respectively includes the scoring that user, article and the user provide the article.
In this step, the sample set of preset quantity can be obtained, according to present count from preset rating matrix
The sample set of amount carries out minimum processing to loss function.
Due to including the obtained scoring from the user of multiple articles and each article in rating matrix, it can be with
The scoring that user, article and the user provide the article is therefrom obtained, and using above-mentioned three as a sample set.Sample
The quantity of this set can be determines according to actual conditions.
Step 502, for each sample set, using the user in the sample set as target user, and by the sample set
Article in conjunction is as target item;It calculates the target user to score to the prediction of the target item, by calculated pre- assessment
It is allocated as the corresponding prediction scoring of the sample set.
In this step, it can be handled as follows for every part of sample set in the sample set of preset quantity:
The first step, using the user in the sample set as target user, and using the article in the sample set as mesh
Mark article.
Specifically, the user in the sample set is made for a sample set in the sample set of preset quantity
It is predicted for target user using the article in the sample set as target item with being calculated according to target user and target item
Scoring.
Second step calculates the target user and scores the prediction of the target item, and calculated prediction scoring is used as should
The corresponding prediction scoring of sample set.
Specifically, using the formula as shown in formula (6), calculates the target user and scores the prediction of the target item,
It is used as the corresponding prediction of the sample set to score calculated prediction scoring, to be scored according to prediction and really to be scored,
Minimum processing is carried out to loss function value.
It should be noted that the existing scoring of the sample set Central Plains sheet is true scoring of the user to article, and count
What is calculated is that user scores to the prediction of article.Here prediction scoring is calculated for Optimal Parameters, reduces true scoring as far as possible
With the gap of prediction scoring, the calculated prediction scoring of formula shown in formula (6) is made to be more nearly true scoring.It can manage
Solution, in practical applications, when by calculating prediction scoring come to target user's recommendation target item, since target item should
Be that target user does not have therefore article that is used, or not buying will not provide the object of scoring to target user
Product calculate prediction scoring again.
Step 503, based on each sample set and the corresponding prediction scoring of each sample set, to preset loss function
Minimum processing is carried out, shown in preset loss function such as formula (7);
In the preset loss function as shown in formula (7), κ is the sample set of preset quantity;Article i is sample set
Article in conjunction;User u is the user in sample set;buTo be biased to adjusting parameter for the user of user u;biFor for object
The article of product i is biased to adjusting parameter;αuFor the personalizing parameters for user u;It is calculated user u to the pre- of article i
Assessment point;ruiFor obtained in the preset rating matrix, scoring of the user u to article i;λ2For preset regularization parameter.
It in this step, can be according to calculated each sample set and the corresponding pre- assessment of each sample set
Point, minimum processing is carried out to the loss function as shown in formula (7), with the loss function minimized.
It is a kind of specific flow chart of step 503 in the embodiment of the present invention, packet with reference to Fig. 6, Fig. 6 in a kind of implementation
It includes:
Personalizing parameters α is arranged for each sample set in sub-step 5031u, user be biased to adjusting parameter buAnd article
It is biased to adjusting parameter biInitial value, and initialize step-length γ and initialization the number of iterations.
Specifically, specifically, personalizing parameters αu, user be biased to adjusting parameter buAnd article is biased to adjusting parameter bi's
Initial value can be set to random (0,1), 0 and 0 respectively;The initial value of step-length γ can be 0.04;The number of iterations it is initial
Value can be 0.
Sub-step 5032, setting user's set U and article set I is null set.
Specifically, it is that null set will be located with will pass through execution subsequent step that user's set U and article set I, which is respectively set,
The user in sample set and article managed is separately added into user's set U and article set I.
Sub-step 5033, it is random to obtain a sample set from the sample set κ of preset quantity.
Wherein, a sample set includes user u, article i, and obtained in the preset rating matrix, and u pairs of user
The scoring r of article iuiAnd calculated user u scores to the prediction of article i
Specifically, a sample set used in from sub-step 5034 to sub-step 5037 is the sample set from preset quantity
It is got at random in conjunction κ;Wherein, a sample set includes user u, article i, and is obtained from preset rating matrix
, true scoring r of the user u to article iui, and the prediction scoring according to the calculated user u of formula (6) to article i
To calculate loss function value according to prediction scoring and true scoring.
Sub-step 5034 judges in user's set U, if in the user u comprising the sample set and article set I
Whether include the sample set article i;Sub-step 5033 is executed if it has, then returning, if it has not, then executing sub-step
5035。
Specifically, whether a sample set for judging that sub-step 5033 selects has processed, it is possible to understand that,
User and article in processed sample set should be existing in user's set U and article set I.For having handled
The sample set crossed no longer is handled in current iteration, is directly skipped, and randomly chooses one again by executing sub-step 5033
Part sample set.
This is because since the quantity of the sample set in the sample set κ of preset quantity is more, using traditional random
The problem that gradient descent algorithm can bring time complexity excessive.In order to make the minimum processing to loss function can be reasonable
It is completed in time, in the minimum processing to loss function, the embodiment of the present invention uses improved stochastic gradient descent algorithm.
That is, only handling once in each iteration each sample set in the sample set κ of preset quantity.If
The sample set currently chosen has been processed in current iteration, then skips the sample set.
In this way, can significantly be dropped in the minimum processing to loss function using improved stochastic gradient descent algorithm
Low time complexity so that the minimum processing to loss function can be completed within reasonable time, while can effectively be prevented
Only over-fitting.
Sub-step 5035, according to the scoring r in the sample setuiIt scores with the predictionBetween difference eui,
Update personalizing parameters αu, user be biased to adjusting parameter buAnd article is biased to adjusting parameter bi, and by the use in the sample set
Family u is put into user set U, and the article i in the sample set is put into article set I.
In this step, it in user's set U, is not wrapped in user u and article set I not comprising the sample set
When article i containing the sample set, which can be handled.
Specifically, firstly, calculating the scoring r in the sample setuiIt scores with the predictionBetween difference
eui, as shown in formula (8);
In formula (8),It scores for calculated user u the prediction of article i;ruiFor from preset rating matrix
It obtains, scoring of the user u to article i;euiFor the scoring ruiIt scores with the predictionBetween difference.
Then, according to the following formula (9) to (11), personalizing parameters α is updated respectivelyu, user be biased to adjusting parameter buAnd
Article is biased to adjusting parameter bi。
In formula (9), γ is step-length;euiFor the scoring ruiIt scores with the predictionBetween difference;rujFor with
Family u scores to the prediction of article j;sijFor the scaling Poisson related coefficient of article i and article j;ρijFor the pool of article i and article j
Loose related coefficient;εijFor the Euclidean distance value of the label genome vector of the label genome vector and article j of article i;αuFor
For the personalizing parameters of user u;λ2For preset regularization parameter;κuIt is user u in addition to article i, provides the object of scoring
The set of product.
In specific implementation, personalizing parameters α is directed to shown in formula (9)uMore new formula be formula (6) to individual character
Change parameter alphauDerivation, detailed treatment process can refer to the prior art, and details are not described herein.
bu←bu+γ·(eui-λ2·bu) (10)
In formula (10), buTo be biased to adjusting parameter for the user of user u;γ is step-length;euiFor the scoring rui
It scores with the predictionBetween difference;λ2For preset regularization parameter.
In specific implementation, adjusting parameter b is biased to for user shown in formula (10)uMore new formula be formula
(6) adjusting parameter b is biased to useruDerivation, detailed treatment process can refer to the prior art, and details are not described herein.
bi←bi+γ·(eui-λ2·bi) (11)
In formula (11), biTo be biased to adjusting parameter for the article of article i;γ is step-length;euiFor the scoring rui
It scores with the predictionBetween difference;λ2For preset regularization parameter.
In specific implementation, adjusting parameter b is biased to for article shown in formula (11)iMore new formula be formula
(6) adjusting parameter b is biased to articleiDerivation, detailed treatment process can refer to the prior art, and details are not described herein.Wherein,
The arrow of direction of the formula (9) into formula (11) to the left refers to, with the value of the expression formula on the right of arrow, replaces the arrow left side
The value of parameter.
Sub-step 5036 reduces step-length γ, returns and executes sub-step 5033;
Sub-step 5037, when all sample sets in the sample set κ for having traversed preset quantity, the number of iterations adds
One, execute sub-step 5038.
Specifically, illustrating that current iteration is handled when all sample sets in the sample set κ for having traversed preset quantity
It completes, the number of iterations can be added one;Also, it is prevented in next iteration processing using smaller step-length with guaranteeing to restrain
It vibrates.
Sub-step 5038 judges whether the number of iterations is more than preset the number of iterations threshold value;If it is, executing step
5039, if it has not, then executing sub-step 5032.
Specifically, being completed by executing sub-step 5038 to pre- when the number of iterations is more than preset the number of iterations threshold value
If loss function minimum processing;When the number of iterations is not above preset the number of iterations threshold value, execution can be returned
Sub-step 5032, continues next iteration, until the number of iterations is more than preset the number of iterations threshold value.
Sub-step 5039 is completed to handle the minimum of preset loss function.
Specifically, illustrating the minimum to preset loss function when the number of iterations is more than preset the number of iterations threshold value
Change processing has been completed.
Step 504, from the loss function of minimum, determine that personalizing parameters, user are biased to adjusting parameter and article is inclined
To the optimal value of adjusting parameter.
In this step, the minimum loss function obtained after being handled according to minimum determines that personalizing parameters, user are inclined
The optimal value of adjusting parameter is biased to adjusting parameter and article, and personalizing parameters, user's deviation adjusting parameter and article is inclined
To the optimal value of adjusting parameter, bring into the calculation formula of the second prediction scoring as shown in figure formula (6), after being optimized
The calculation formula of second prediction scoring.
As it can be seen that in embodiments of the present invention, second can be obtained by carrying out minimum processing to preset loss function
Predict that the user in the calculation formula of scoring is biased to adjusting parameter bu, article be biased to adjusting parameter biWith personalizing parameters αuMost
The figure of merit, and then the calculation formula of the second prediction scoring after being optimized, so that according to the meter of the second prediction scoring after optimization
It is more accurate and objective to calculate the calculated prediction scoring of formula.
The embodiment of the present invention separately provides a kind of collaborative filtering recommending device based on cluster, is this hair with reference to Fig. 7, Fig. 7
The structural schematic diagram of the collaborative filtering recommending device based on cluster of bright embodiment, device include:
First obtains module 701, for obtaining the label base of the first article from preset label genomic information matrix
Because of a group vector, label genome vector is used to describe the build-in attribute of the first article;
Division module 702, for using preset clustering algorithm, the label genome vector based on the first article, by the
One article is divided into the cluster class of default first quantity;
First computing module 703, for being directed to each target item: when the target item and the second article belong to same cluster class
When, preset kind between the label genome vector of label genome vector and the second article based on the target item away from
From calculating the relative coefficient of the target item and the second article, target item refers in the first article that target user does not give
The article to score out, the second article refer in the first article, the article in addition to all target items;When the target item and
When two articles belong to different cluster classes, the Poisson related coefficient based on the target item and the second article, calculate the target item with
The relative coefficient of second article;By target user to the preset scoring of the second article and the target item and the second object
The relative coefficient of product is weighted summation, obtains target user and scores the prediction of the target item;
Recommending module 704 recommends target user for that will predict that scoring meets the target item of preset condition.
Optionally, the first computing module 703, is specifically used for
Calculate the Poisson related coefficient of the target item and the second article;
The quantity of co-user based on the target item and the second article and calculated poisson formula number generate
The scaling Poisson related coefficient of the target item and the second article, and the target item is related to the scaling Poisson of the second article
Relative coefficient of the coefficient as the target item and the second article.
Optionally, the first computing module 703, is specifically used for
According to the following formula, the quantity of the co-user based on the target item and the second article and calculated pool
Loose relationship number generates the scaling Poisson related coefficient of the target item and the second article;
In formula, which is article i;Second article is article j;sijFor the scaling Poisson of article i and article j
Related coefficient;ρijFor the Poisson related coefficient of article i and article j;nijFor the quantity of article i and the co-user of article j;λ1For
The parameter of co-user quantity.
Optionally, the distance of the preset kind includes Euclidean distance;
First computing module 703, is specifically used for
According to the following formula, by preset target user to the scoring of the second article and the target item and the second object
The relative coefficient of product is weighted summation, obtains target user and scores the prediction of the target item;
In formula, which is article i;Second article is article j;Target user is user u;For user u
Prediction scoring to article i;rujIt scores for user u the prediction of article j;sijIt is related to the scaling Poisson of article j for article i
Coefficient;ρijFor the Poisson related coefficient of article i and article j;εijFor the label genome vector of article i and the label base of article j
Because of the Euclidean distance value of group vector;κuIt is user u in addition to article i, provides the set of the article of scoring;pijTo adjust system
Number;In the case that the first article is divided into k cluster class, the set of the article of same cluster class is belonged to article i.
Optionally, the first computing module 703, is specifically used for
By preset target user to the relative coefficient of the scoring of the second article and the target item and the second article
It is weighted summation;
Adjusting parameter and article are biased to using personalizing parameters, user are biased to adjusting parameter, in the result of weighted sum
On the basis of be adjusted, obtain target user and score the prediction of the target item.
Optionally, the first computing module 703, is specifically used for
Using following formula, by preset target user to the scoring of the second article and the target item and the second object
The relative coefficient of product is weighted summation;Adjusting parameter is biased to using personalizing parameters, user and article is biased to adjusting parameter,
It is adjusted on the basis of the result of weighted sum, obtains target user and score the prediction of the target item;
In formula, which is article i;Second article is article j;Target user is user u;μ is preset
Scoring mean value in rating matrix;buTo be biased to adjusting parameter for the user of user u;biTo be biased to adjust for the article of article i
Whole parameter;αuFor the personalizing parameters for user u;It scores for user u the prediction of article i;rujIt is user u to article j
Prediction scoring;sijFor the scaling Poisson related coefficient of article i and article j;ρijFor the Poisson phase relation of article i and article j
Number;εijFor the Euclidean distance value of the label genome vector of the label genome vector and article j of article i;κuIt is user u to removing
Other than article i, the set of the article of scoring is provided;pijFor adjustment factor;For the feelings that the first article is divided into k cluster class
Under condition, the set of the article of same cluster class is belonged to article i.
Optionally, device further include:
Second acquisition unit, for obtaining the sample set of preset quantity, each sample set from preset rating matrix
Respectively include the scoring that user, article and the user provide the article;
Second computing module, for being directed to each sample set, using the user in the sample set as target user, and will
Article in the sample set is as target item;Using the calculation formula of the second prediction scoring, the target user is calculated to this
The prediction of target item is scored, and calculated prediction scoring is scored as the corresponding prediction of the sample set;
Module is minimized, for scoring based on each sample set and the corresponding prediction of each sample set, to preset damage
It loses function and carries out minimum processing, the loss function minimized, shown in the preset following formula of loss function;
In preset loss function, κ is the sample set of preset quantity;Article i is the article in each sample set;With
Family u is the user in each sample set;buTo be biased to adjusting parameter for the user of user u;biIt is inclined for the article for article i
To adjusting parameter;αuFor the personalizing parameters for user u;It scores for calculated user u the prediction of article i;ruiFor
Obtained in preset rating matrix, scoring of the user u to article i;λ2For preset regularization parameter;
Determining module, for from the loss function of minimum, determining that personalizing parameters, user are biased to adjusting parameter and object
The optimal value of product deviation adjusting parameter.
Optionally, module is minimized, is specifically used for
For each sample set, personalizing parameters α is setu, user be biased to adjusting parameter buAnd article is biased to adjusting parameter
biInitial value, and initialize step-length γ and the number of iterations;
It is null set that user's set U and article set I, which is arranged,;
From the sample set κ of preset quantity, a sample set is obtained, a sample set includes user u, article i,
And obtained in the preset rating matrix, scoring r of the user u to article iuiAnd calculated user u is to the pre- of article i
Assessment point
Judge in user's set U, if whether include the sample in the user u comprising the sample set and article set I
The article i of this set;
If it has, then the step of returning described in execution from the sample set κ of preset quantity, obtaining a sample set;
If it has not, then according to the scoring r in the sample setuiIt scores with the predictionBetween difference eui, more
New personalizing parameters αu, user be biased to adjusting parameter buAnd article is biased to adjusting parameter bi, and by the user in the sample set
U is put into user set U, and the article i in the sample set is put into article set I;
Reduce step-length γ, returns and execute described from the sample set κ of preset quantity, the step of a sample set of acquisition
Suddenly;
When all sample sets in the sample set κ for having traversed preset quantity, the number of iterations adds one;
Judge whether the number of iterations is more than preset the number of iterations threshold value;If it has, then completing to preset loss function
Minimum processing, execute the step of setting user's set U and article set I is null set if it has not, returning.
As it can be seen that in the collaborative filtering recommending device based on cluster that the embodiment of the present invention proposes, it can be according to label
Genome vector after carrying out classification processing to the first article, for belonging to the first article of same cluster class, is based on label base
Relative coefficient is calculated because of the Euclidean distance value of group vector.Since label genome vector is used to describe the intrinsic category of article
Property, do not change with the subjective desire of user, have objectivity so that calculated relative coefficient also have it is objective
Property, it predicts that the objectivity of scoring is also stronger obtained from, avoids the occurrence of the subjective scoring due to user and influence prediction scoring
Objectivity the problem of.
The embodiment of the present invention provides a kind of electronic equipment again, and with reference to Fig. 8, Fig. 8 is electronics provided in an embodiment of the present invention
The structural schematic diagram of equipment.As shown in figure 8, including processor 81, communication interface 82, memory 83 and communication bus 84, wherein
Processor 81, communication interface 82, memory 93 complete mutual communication by communication bus 84,
Memory 83, for storing computer program;
Processor 81 when for executing the program stored on memory 83, realizes following steps:
From preset label genomic information matrix, the label genome vector of the first article, label genome are obtained
Vector is used to describe the build-in attribute of the first article;
Using preset clustering algorithm, the first article is divided into default by the label genome vector based on the first article
The cluster class of first quantity;
For each target item: when the target item and the second article belong to same cluster class, based on the target item
Label genome vector is at a distance from the preset kind between the label genome vector of the second article, calculate the target item with
The relative coefficient of second article, target item refer in the first article that target user does not provide the article of scoring, the second object
Product refer in the first article, the article in addition to all target items;When the target item belongs to different cluster classes from the second article
When, the Poisson related coefficient based on the target item and the second article calculates the correlation system of the target item with the second article
Number;Relative coefficient of the target user to the preset scoring of the second article and the target item and the second article is carried out
Weighted sum obtains target user and scores the prediction of the target item;
The target item that prediction scoring is met to preset condition, recommends target user.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component
Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard
Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just
Yu Wei is only, but is not only a bus or a type of bus with a thick line in figure.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy
The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also
To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit,
CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal
Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing
It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete
Door or transistor logic, discrete hardware components.
Method provided in an embodiment of the present invention can be applied to electronic equipment.Specifically, the electronic equipment can be with are as follows: desk-top
Computer, portable computer, intelligent mobile terminal, server etc..It is not limited thereto, it is any that electricity of the invention may be implemented
Sub- equipment, all belongs to the scope of protection of the present invention.
The embodiment of the present invention provides a kind of computer readable storage medium again, is stored with computer in the storage medium
The step of program, the computer program realizes the above-mentioned collaborative filtering recommending method based on cluster when being executed by processor.
The embodiment of the present invention provides a kind of computer program product comprising instruction again, when it runs on computers
When, so that the step of computer executes the above-mentioned collaborative filtering recommending method based on cluster.
The embodiment of the present invention provides a kind of computer program again, when run on a computer, so that computer is held
The step of row above-mentioned collaborative filtering recommending method based on cluster.
For device/electronic equipment/storage medium/computer program product/computer program embodiments comprising instruction
For, since it is substantially similar to the method embodiment, so being described relatively simple, referring to the portion of embodiment of the method in place of correlation
It defends oneself bright.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device/
For electronic equipment/storage medium/computer program product/computer program embodiments comprising instruction, due to its basic phase
It is similar to embodiment of the method, so being described relatively simple, the relevent part can refer to the partial explaination of embodiments of method.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention
It is interior.
Claims (10)
1. a kind of collaborative filtering recommending method based on cluster characterized by comprising
From preset label genomic information matrix, the label genome vector of the first article, label genome vector are obtained
For describing the build-in attribute of the first article;
Using preset clustering algorithm, the first article is divided into default first by the label genome vector based on the first article
The cluster class of quantity;
For each target item: when the target item and the second article belong to same cluster class, the label based on the target item
Genome vector calculates the target item and second at a distance from the preset kind between the label genome vector of the second article
The relative coefficient of article, target item refer in the first article that target user does not provide the article of scoring, and the second article is
Refer in the first article, the article in addition to all target items;When the target item belongs to different cluster classes from the second article, base
In the Poisson related coefficient of the target item and the second article, the relative coefficient of the target item and the second article is calculated;It will
Target user is weighted the relative coefficient of the preset scoring of the second article and the target item and the second article and asks
With obtain target user and score the prediction of the target item;
The target item that prediction scoring is met to preset condition, recommends target user.
2. the method according to claim 1, wherein the Poisson phase based on the target item with the second article
Relationship number, the step of calculating the relative coefficient of the target item and the second article, comprising:
Calculate the Poisson related coefficient of the target item and the second article;
The quantity of co-user based on the target item and the second article and calculated poisson formula number, generate the mesh
The scaling Poisson related coefficient of article and the second article is marked, and by the scaling Poisson related coefficient of the target item and the second article
Relative coefficient as the target item and the second article.
3. according to the method described in claim 2, it is characterized in that, the common use based on the target item and the second article
The quantity and poisson formula number at family, the step of generating the scaling Poisson related coefficient of the target item and the second article, packet
It includes:
According to the following formula, the quantity of the co-user based on the target item and the second article and calculated Poisson are closed
Coefficient generates the scaling Poisson related coefficient of the target item and the second article;
In formula, which is article i;Second article is article j;sijIt is related to the scaling Poisson of article j for article i
Coefficient;ρijFor the Poisson related coefficient of article i and article j;nijFor the quantity of article i and the co-user of article j;λ1It is common
The parameter of number of users.
4. according to the method described in claim 3, it is characterized in that, the distance of the preset kind includes Euclidean distance;
It is described by preset target user to the relative coefficient of the scoring of the second article and the target item and the second article
It is weighted summation, obtains the step of target user scores to the prediction of the target item, comprising:
According to the following formula, by preset target user to the scoring of the second article and the target item and the second article
Relative coefficient is weighted summation, obtains target user and scores the prediction of the target item;
In formula, which is article i;Second article is article j;Target user is user u;It is user u to article
The prediction of i is scored;rujIt scores for user u the prediction of article j;sijFor the scaling Poisson related coefficient of article i and article j;ρij
For the Poisson related coefficient of article i and article j;εijFor the label genome vector of article i and the label genome vector of article j
Euclidean distance value;κuIt is user u in addition to article i, provides the set of the article of scoring;pijFor adjustment factor;To incite somebody to action
In the case that first article is divided into k cluster class, the set of the article of same cluster class is belonged to article i.
5. according to the method described in claim 4, it is characterized in that, described comment the second article preset target user
Point and the relative coefficient of the target item and the second article be weighted summation, obtain target user to the target item
Prediction scoring the step of, comprising:
Relative coefficient of the preset target user to the scoring of the second article and the target item and the second article is carried out
Weighted sum;
Adjusting parameter is biased to using personalizing parameters, user and article is biased to adjusting parameter, on the basis of the result of weighted sum
On be adjusted, obtain target user and score the prediction of the target item.
6. according to the method described in claim 5, it is characterized in that, described comment the second article preset target user
Point and the relative coefficient of the target item and the second article be weighted summation;It is biased to adjust using personalizing parameters, user
Whole parameter and article are biased to adjusting parameter, are adjusted on the basis of the result of weighted sum, obtain target user to the mesh
The step of marking the prediction scoring of article, comprising:
Using following formula, by preset target user to the scoring of the second article and the target item and the second article
Relative coefficient is weighted summation;Adjusting parameter is biased to using personalizing parameters, user and article is biased to adjusting parameter, is being added
It is adjusted on the basis of the result of power summation, obtains target user and score the prediction of the target item;
In formula, which is article i;Second article is article j;Target user is user u;μ is preset scoring
Scoring mean value in matrix;buTo be biased to adjusting parameter for the user of user u;biTo be biased to adjustment ginseng for the article of article i
Number;αuFor the personalizing parameters for user u;It scores for user u the prediction of article i;rujIt is user u to the pre- of article j
Assessment point;sijFor the scaling Poisson related coefficient of article i and article j;ρijFor the Poisson related coefficient of article i and article j;εij
For the Euclidean distance value of the label genome vector of the label genome vector and article j of article i;κuIt is user u to except article i
In addition, the set of the article of scoring is provided;pijFor adjustment factor;In the case that the first article is divided into k cluster class,
Belong to the set of the article of same cluster class with article i.
7. according to the method described in claim 6, it is characterized in that, public in the calculating using the second following prediction scoring
Formula carries out relative coefficient of the preset target user to the scoring of the second article and the target item and the second article
Before the step of weighted sum, the method also includes:
From preset rating matrix, obtain the sample set of preset quantity, each sample set respectively include user, article and
The scoring that the user provides the article;
For each sample set, using the user in the sample set as target user, and the article in the sample set is made
For target item;It calculates the target user to score to the prediction of the target item, regard calculated prediction scoring as the sample
Gather corresponding prediction scoring;
Based on each sample set and the corresponding prediction scoring of each sample set, preset loss function is carried out at minimum
Reason, the loss function minimized, shown in the preset following formula of loss function;
In preset loss function, κ is the sample set of preset quantity;Article i is the article in each sample set;User u
For the user in each sample set;buTo be biased to adjusting parameter for the user of user u;biTo be biased to adjust for the article of article i
Whole parameter;αuFor the personalizing parameters for user u;It scores for calculated user u the prediction of article i;ruiFor from pre-
If rating matrix obtained in, scoring of the user u to article i;λ2For preset regularization parameter;
From the loss function of minimum, determine that personalizing parameters, user are biased to adjusting parameter and article is biased to adjusting parameter
Optimal value.
8. the method according to the description of claim 7 is characterized in that described be based on each sample set and each sample set pair
The step of prediction scoring answered, carries out minimum processing to preset loss function, the loss function minimized, comprising:
For each sample set, personalizing parameters α is setu, user be biased to adjusting parameter buAnd article is biased to adjusting parameter bi's
Initial value, and initialize step-length γ and the number of iterations;
It is null set that user's set U and article set I, which is arranged,;
From the sample set κ of preset quantity, a sample set is obtained, a sample set includes user u, article i, and
Obtained in preset rating matrix, scoring r of the user u to article iuiAnd pre- assessment of the calculated user u to article i
Point
Judge in user's set U, if whether include the sample set in the user u comprising the sample set and article set I
The article i of conjunction;
If it has, then the step of returning described in execution from the sample set κ of preset quantity, obtaining a sample set;
If it has not, then according to the scoring r in the sample setuiIt scores with the predictionBetween difference eui, update a
Property parameter alphau, user be biased to adjusting parameter buAnd article is biased to adjusting parameter bi, and the user u in the sample set is put
Access customer set U, and the article i in the sample set is put into article set I;
The step of reducing step-length γ, returning described in executing from the sample set κ of preset quantity, obtain a sample set;
When all sample sets in the sample set κ for having traversed preset quantity, the number of iterations adds one;
Judge whether the number of iterations is more than preset the number of iterations threshold value;If it has, then completing to preset loss function most
Smallization processing executes the step of setting user's set U and article set I is null set if it has not, returning.
9. a kind of collaborative filtering recommending device based on cluster characterized by comprising
First obtains module, for from preset label genomic information matrix, obtain the label genome of the first article to
Amount, label genome vector are used to describe the build-in attribute of the first article;
Division module, for using preset clustering algorithm, the label genome vector based on the first article draws the first article
It is divided into the cluster class of default first quantity;
First computing module, for being directed to each target item: when the target item and the second article belong to same cluster class, being based on
The label genome vector of the target item calculates at a distance from the preset kind between the label genome vector of the second article
The relative coefficient of the target item and the second article, target item refer in the first article that target user does not provide scoring
Article, the second article refers in the first article, the article in addition to all target items;When the target item and the second article
When belonging to different cluster classes, the Poisson related coefficient based on the target item and the second article calculates the target item and the second object
The relative coefficient of product;By target user to the preset scoring of the second article and the phase of the target item and the second article
It closes property coefficient and is weighted summation, obtain target user and score the prediction of the target item;
Recommending module recommends target user for that will predict that scoring meets the target item of preset condition.
10. device according to claim 9, which is characterized in that
First computing module, is specifically used for
Calculate the Poisson related coefficient of the target item and the second article;
The quantity of co-user based on the target item and the second article and calculated poisson formula number, generate the mesh
The scaling Poisson related coefficient of article and the second article is marked, and by the scaling Poisson related coefficient of the target item and the second article
Relative coefficient as the target item and the second article.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810863191.2A CN109063120B (en) | 2018-08-01 | 2018-08-01 | Collaborative filtering recommendation method and device based on clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810863191.2A CN109063120B (en) | 2018-08-01 | 2018-08-01 | Collaborative filtering recommendation method and device based on clustering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109063120A true CN109063120A (en) | 2018-12-21 |
CN109063120B CN109063120B (en) | 2021-05-28 |
Family
ID=64832235
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810863191.2A Active CN109063120B (en) | 2018-08-01 | 2018-08-01 | Collaborative filtering recommendation method and device based on clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109063120B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084477A (en) * | 2019-03-26 | 2019-08-02 | 兰雨晴 | A kind of cloud data center operating status appraisal procedure |
CN112765458A (en) * | 2021-01-07 | 2021-05-07 | 同济大学 | Mixed recommendation method based on metric decomposition and label self-adaptive weight distribution |
CN112990444A (en) * | 2021-05-13 | 2021-06-18 | 电子科技大学 | Hybrid neural network training method, system, equipment and storage medium |
CN117455573A (en) * | 2023-10-26 | 2024-01-26 | 深圳市维卓数字营销有限公司 | Internet data analysis method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090083258A1 (en) * | 2007-09-26 | 2009-03-26 | At&T Labs, Inc. | Methods and Apparatus for Improved Neighborhood Based Analysis in Ratings Estimation |
CN102841929A (en) * | 2012-07-19 | 2012-12-26 | 南京邮电大学 | Recommending method integrating user and project rating and characteristic factors |
CN103412948A (en) * | 2013-08-27 | 2013-11-27 | 北京交通大学 | Cluster-based collaborative filtering commodity recommendation method and system |
CN104298787A (en) * | 2014-11-13 | 2015-01-21 | 吴健 | Individual recommendation method and device based on fusion strategy |
CN106095974A (en) * | 2016-06-20 | 2016-11-09 | 上海理工大学 | Commending system score in predicting based on network structure similarity and proposed algorithm |
CN108198045A (en) * | 2018-01-30 | 2018-06-22 | 东华大学 | The design method of mixing commending system based on e-commerce website data mining |
-
2018
- 2018-08-01 CN CN201810863191.2A patent/CN109063120B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090083258A1 (en) * | 2007-09-26 | 2009-03-26 | At&T Labs, Inc. | Methods and Apparatus for Improved Neighborhood Based Analysis in Ratings Estimation |
CN102841929A (en) * | 2012-07-19 | 2012-12-26 | 南京邮电大学 | Recommending method integrating user and project rating and characteristic factors |
CN103412948A (en) * | 2013-08-27 | 2013-11-27 | 北京交通大学 | Cluster-based collaborative filtering commodity recommendation method and system |
CN104298787A (en) * | 2014-11-13 | 2015-01-21 | 吴健 | Individual recommendation method and device based on fusion strategy |
CN106095974A (en) * | 2016-06-20 | 2016-11-09 | 上海理工大学 | Commending system score in predicting based on network structure similarity and proposed algorithm |
CN108198045A (en) * | 2018-01-30 | 2018-06-22 | 东华大学 | The design method of mixing commending system based on e-commerce website data mining |
Non-Patent Citations (2)
Title |
---|
彭玉 等: "基于属性相似性的Item-based协同过滤算法", 《计算机工程与应用》 * |
袁利: "基于聚类的协同过滤个性化推荐算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084477A (en) * | 2019-03-26 | 2019-08-02 | 兰雨晴 | A kind of cloud data center operating status appraisal procedure |
CN112765458A (en) * | 2021-01-07 | 2021-05-07 | 同济大学 | Mixed recommendation method based on metric decomposition and label self-adaptive weight distribution |
CN112765458B (en) * | 2021-01-07 | 2022-10-14 | 同济大学 | Mixed recommendation method based on metric decomposition and label self-adaptive weight distribution |
CN112990444A (en) * | 2021-05-13 | 2021-06-18 | 电子科技大学 | Hybrid neural network training method, system, equipment and storage medium |
CN112990444B (en) * | 2021-05-13 | 2021-09-24 | 电子科技大学 | Hybrid neural network training method, system, equipment and storage medium |
CN117455573A (en) * | 2023-10-26 | 2024-01-26 | 深圳市维卓数字营销有限公司 | Internet data analysis method and system |
CN117455573B (en) * | 2023-10-26 | 2024-08-09 | 深圳市维卓数字营销有限公司 | Internet data analysis method and system |
Also Published As
Publication number | Publication date |
---|---|
CN109063120B (en) | 2021-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107451894B (en) | Data processing method, device and computer readable storage medium | |
CN107330115B (en) | Information recommendation method and device | |
EP4181026A1 (en) | Recommendation model training method and apparatus, recommendation method and apparatus, and computer-readable medium | |
CN109063120A (en) | A kind of collaborative filtering recommending method and device based on cluster | |
CN108509466A (en) | A kind of information recommendation method and device | |
CN110647696B (en) | Business object sorting method and device | |
CN111444395A (en) | Method, system and equipment for obtaining relation expression between entities and advertisement recalling system | |
CN104899246B (en) | Collaborative filtering recommending method based on blurring mechanism user scoring neighborhood information | |
CN111815415A (en) | Commodity recommendation method, system and equipment | |
CN110008397B (en) | Recommendation model training method and device | |
CN110163647A (en) | A kind of data processing method and device | |
CN102426686A (en) | Internet information product recommending method based on matrix decomposition | |
CN110020128A (en) | A kind of search result ordering method and device | |
CN107239993A (en) | A kind of matrix decomposition recommendation method and system based on expansion label | |
CN109087162A (en) | Data processing method, system, medium and calculating equipment | |
CN105468628B (en) | A kind of sort method and device | |
CN107402961B (en) | Recommendation method and device and electronic equipment | |
CN107436914A (en) | Recommend method and device | |
CN111161009B (en) | Information pushing method, device, computer equipment and storage medium | |
Aouad et al. | The exponomial choice model for assortment optimization: An alternative to the MNL model? | |
Bhattacharjee et al. | An integrated machine learning and DEMATEL approach for feature preference and purchase intention modelling | |
CN109376307B (en) | Article recommendation method and device and terminal | |
CN110427565A (en) | A kind of item recommendation method based on collaborative filtering, intelligent terminal and storage medium | |
CN111680213A (en) | Information recommendation method, data processing method and device | |
CN113850654A (en) | Training method of item recommendation model, item screening method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |