CN109872006A - A kind of scoring distribution forecasting method and device - Google Patents
A kind of scoring distribution forecasting method and device Download PDFInfo
- Publication number
- CN109872006A CN109872006A CN201910181625.5A CN201910181625A CN109872006A CN 109872006 A CN109872006 A CN 109872006A CN 201910181625 A CN201910181625 A CN 201910181625A CN 109872006 A CN109872006 A CN 109872006A
- Authority
- CN
- China
- Prior art keywords
- scoring
- latent factor
- matrix
- function
- factor matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention provides a kind of scoring distribution forecasting method and device, this method comprises: building includes for describe the sign function of correlation between two scoring latent factor vectors and for calculating two objective functions for scoring the distance function of distance between latent factor vectors;Training set data is inputted, scoring latent factor matrix when determining the scoring latent factor matrix when previous iteration and calculating next iteration accordingly, to obtain new matrix;New matrix is substituted into objective function, solves the first derivative of objective function;When the value of F norm of the first derivative at new matrix is less than preset termination threshold value, the scoring forecast of distribution of sample to be predicted is obtained as a result, otherwise according to new matrix, new matrix is regard as the scoring latent factor matrix when previous iteration again, so recycles.This programme passes through created symbol function and distance function in objective function, it is contemplated that the correlation between scoring label, therefore predict that prediction accuracy can be improved when scoring distribution based on this.
Description
Technical field
The present invention relates to field of computer technology, in particular to a kind of scoring distribution forecasting method and device.
Background technique
Under normal conditions, by the forecast of distribution that scores, it anticipated that the scoring distribution situation of sample to be released.To base
In certain foreseeability to reduce investment risk as far as possible.By taking moviemaking as an example, moviemaking is as the tens billion of beauty of value
Member global industry, there is thousands of New cinema to put goods on the market every year, but and not all film be all on box office
Function.Therefore, either motion picture producer or cinema, require prediction New cinema before publication spectators to this film
Like degree.
Currently, the scoring that can use film is distributed and likes degree to characterize spectators to certain film.Naturally, LDL
(Label DistrubitionLearning, label Distributed learning) method can be applied in this field.LDL is single label
One kind of study and multi-tag study is extensive, is a more generally applicable learning framework, can be obtained in practical application by LDL
Label significance distribution, to can effectively solve label fuzzy problem.
Existing work predicted using support vector machines scoring distribution, this method think scoring between be it is independent, do not have
There is the relevance between considering to score.However, this relevance is generally existing, for example, for 1-5 points-scoring system and
Speech, pessimistic user give 1 point to the film not liked, and optimistic user may provide 2 points or higher to identical film
Point.It therefore, the use of the scoring distribution strictly distinguished is improper come the pouplarity for characterizing a film.As it can be seen that existing reality
The prediction accuracy of existing mode is not high.
Summary of the invention
The present invention provides a kind of scoring distribution forecasting method and devices, can be improved prediction accuracy.
In order to achieve the above object, the present invention is achieved through the following technical solutions:
On the one hand, the present invention provides a kind of scoring distribution forecasting methods, comprising:
S1: building objective function, wherein the objective function includes: for phase between two scoring latent factor vectors of description
Close property sign function, and, for calculates two score latent factor vectors between distance distance function;
S2: the first scoring latent factor matrix when previous iteration is determined;
S3: the second scoring latent factor square when next iteration is calculated according to the first scoring latent factor matrix
Battle array;
S4: the second scoring latent factor matrix is substituted into the objective function, and is based on preset training set, is solved
The first derivative of the objective function;
S5: judge the value of F norm of the first derivative at the second scoring latent factor matrix, if be less than pre-
If termination threshold value, if so, execute S6, otherwise, by it is described second scoring latent factor matrix be used as again when previous iteration
Scoring latent factor matrix, and execute S2;
S6: according to the second scoring latent factor matrix, the scoring forecast of distribution result of sample to be predicted is obtained.
Further, the sign function includes:
Wherein,
The distance function includes:
Wherein, θ is the scoring latent factor matrix of c column m row, and c is scoring number, and m is characterized number, θikFor the i-th row in θ
The numerical value of kth column, θjkThe numerical value arranged for jth row kth in θ.
Further, the objective function includes:
Wherein, t (θ) is combination of function, λ1For predetermined coefficient.
Further, the combination of function includes:
Wherein, x is the sample characteristics matrix of n row m column, and d is the sample rating matrix of n row c column, and n is number of samples, dijFor
The numerical value that the i-th row jth arranges in d, xikFor the numerical value that the i-th row kth in x arranges, λ2For predetermined coefficient, | | θ | |FFor the F norm of θ.
Further, the S1 includes: to be distributed prediction scoring point corresponding with sample according to the corresponding true scoring of sample
Cloth constructs the first function for calculating gap between true scoring distribution and prediction scoring distribution;
It two is commented according to the sign function for describing correlation between two scoring latent factor vectors, and according to for calculating
The distance function of distance between point latent factor vector, building for describe correlation between scoring apart from mapping function;
According to the first function, regular terms and described apart from mapping function, second function is constructed;
It controls the prediction scoring distribution and meets maximum entropy model, to carry out simplifying processing to the second function, thus
Obtain objective function.
Further, the S3, comprising:
Determine the fit metric when previous iteration;
Determine first derivative when previous iteration, wherein the first derivative when previous iteration is, by described the
One scoring latent factor matrix substitutes into the objective function, and is based on the training set, and the one of the objective function acquired
Order derivative;
According to the fit metric and the first derivative when previous iteration, the searcher when previous iteration is calculated
To;
According to described search direction, the step-size in search when previous iteration is determined;
It is changed next time according to the first scoring latent factor matrix, described search direction and described search step-length, calculating
For when second scoring latent factor matrix.
Further, the direction of search of the calculating when previous iteration, comprising: work as previous iteration using the calculating of formula one
When the direction of search;
Step-size in search of the determination when previous iteration, comprising: determine meet the formula group two, work as previous iteration
When step-size in search;
Scoring latent factor matrix when the calculating next iteration, comprising: calculate next iteration using formula three
When scoring latent factor matrix;
The formula one includes:
The formula group two includes:
The formula three includes: θ(l+1)=θ(l)+a(l)p(l);
Wherein, p(l)Direction of search when for the l times iteration, B(l)Fit metric when for the l times iteration, B(0)It is initial
Fit metric when change,First derivative when for the l times iteration, a(l)Step-size in search when for the l times iteration, 0 <
c1< c2< 1, (p(l))TFor p(l)Transposition, θ(l)Scoring latent factor matrix when for the l times iteration, θ(0)When to initialize
Score latent factor matrix.
Further, fit metric of the calculating when previous iteration, comprising: work as previous iteration using the calculating of formula four
When fit metric;
The formula four includes: B(l+1)=(I- ρ(l)s(l)(u(l))T)B(l)(I-ρ(l)u(l)(s(l))T)+ρ(l)s(l)(s(l))T
Wherein,s(l)=θ(l+1)-θ(l);
Wherein, B(l)Fit metric when for the l times iteration, B(0)Fit metric when to initialize, I are unit matrix,
θ(l)Scoring latent factor matrix when for the l times iteration, θ(0)Scoring latent factor matrix when to initialize,For
First derivative when the l times iteration, (u(l))TFor u(l)Transposition.
Further, the S6 includes: by the second scoring latent factor matrix and the corresponding sample of sample to be predicted
Eigenmatrix substitutes into maximum entropy model, calculates the corresponding sample rating matrix of the sample to be predicted, described to be predicted to obtain
The scoring forecast of distribution result of sample;
The maximum entropy model includes:
Wherein, θ is the scoring latent factor matrix of c column m row, and x is the sample characteristics matrix of n row m column, and c is scoring number,
M is characterized number, and n is number of samples, θjkFor the numerical value that jth row kth in θ arranges, xikFor the numerical value that the i-th row kth in x arranges, pi=
{p(y1|xi;θ),p(y2|xi;θ),...,p(yc|xi;θ) }, piFor sample xiCorresponding prediction scoring distribution.
Further, after the S6, further comprise: by the corresponding sample characteristics matrix of preset test set and institute
It states the second scoring latent factor matrix and substitutes into the maximum entropy model, calculate forecast sample rating matrix;
Using at least one evaluation index, the forecast sample rating matrix and the corresponding sample of the test set are verified respectively
Similarity degree between this rating matrix.
On the other hand, the present invention provides a kind of scoring distribution for executing any of the above-described scoring distribution forecasting method is pre-
Survey device characterized by comprising
Objective function construction unit, for constructing objective function, wherein the objective function includes: to comment for describing two
The sign function of correlation between point latent factor vector, and, for calculates two score distance between latent factor vectors apart from letter
Number;
Determination unit, for determining the first scoring latent factor matrix when previous iteration;
First processing units, second when for calculating next iteration according to the first scoring latent factor matrix comments
Divide latent factor matrix;
First derivative solves unit, for the second scoring latent factor matrix to be substituted into the objective function, and base
In preset training set, the first derivative of the objective function is solved;
The second processing unit, for judging F norm of the first derivative at the second scoring latent factor matrix
Value, if be less than preset termination threshold value, if so, triggering scoring forecast of distribution unit, otherwise, will it is described second score it is potential
Factor matrix is used as the scoring latent factor matrix when previous iteration again, and triggers the determination unit;
The scoring forecast of distribution unit, for obtaining sample to be predicted according to the second scoring latent factor matrix
Scoring forecast of distribution result.
The present invention provides a kind of scoring distribution forecasting method and devices, this method comprises: building includes for describing
Two scoring latent factor vectors between correlation sign function and for calculates two score latent factor vectors between distance distance
The objective function of function;Training set data is inputted, under determining the scoring latent factor matrix when previous iteration and calculating accordingly
Scoring latent factor matrix when an iteration, to obtain new matrix;New matrix is substituted into objective function, solves objective function
First derivative;When the value of F norm of the first derivative at new matrix is less than preset termination threshold value, obtained according to new matrix to be predicted
The scoring forecast of distribution of sample is used as the scoring latent factor matrix when previous iteration as a result, otherwise, by new matrix again, such as
This circulation.The present invention passes through created symbol function and distance function in objective function, it is contemplated that the correlation between scoring label
Property, therefore predict that prediction accuracy can be improved when scoring distribution based on this.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention
Some embodiments for those of ordinary skill in the art without creative efforts, can also basis
These attached drawings obtain other attached drawings.
Fig. 1 is a kind of flow chart for scoring distribution forecasting method that one embodiment of the invention provides;
Fig. 2 is the flow chart for another scoring distribution forecasting method that one embodiment of the invention provides;
Fig. 3 is a kind of schematic diagram for scoring forecast of distribution device that one embodiment of the invention provides.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments, based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
As shown in Figure 1, may comprise steps of the embodiment of the invention provides a kind of scoring distribution forecasting method:
Step 101: building objective function, wherein the objective function includes: for describing two scoring latent factor vectors
Between correlation sign function, and, for calculate two scoring latent factor vectors between distance distance function.
Step 102: determining the first scoring latent factor matrix when previous iteration.
Step 103: according to it is described first scoring latent factor matrix calculate next iteration when second scoring it is potential because
Submatrix.
Step 104: the second scoring latent factor matrix being substituted into the objective function, and is based on preset training
Collection, solves the first derivative of the objective function.
Step 105: judging the value of F norm of the first derivative at the second scoring latent factor matrix, if
Less than preset termination threshold value, if so, executing step 106, otherwise, the second scoring latent factor matrix is used as again and is worked as
Scoring latent factor matrix when previous iteration, and execute step 102.
Step 106: according to the second scoring latent factor matrix, obtaining the scoring forecast of distribution knot of sample to be predicted
Fruit.
The embodiment of the invention provides a kind of scoring distribution forecasting methods, this method comprises: building includes for describing
Two scoring latent factor vectors between correlation sign function and for calculates two score latent factor vectors between distance distance
The objective function of function;Training set data is inputted, under determining the scoring latent factor matrix when previous iteration and calculating accordingly
Scoring latent factor matrix when an iteration, to obtain new matrix;New matrix is substituted into objective function, solves objective function
First derivative;When the value of F norm of the first derivative at new matrix is less than preset termination threshold value, obtained according to new matrix to be predicted
The scoring forecast of distribution of sample is used as the scoring latent factor matrix when previous iteration as a result, otherwise, by new matrix again, such as
This circulation.The embodiment of the present invention passes through created symbol function and distance function in objective function, it is contemplated that between scoring label
Correlation, therefore predicted based on this scoring distribution when prediction accuracy can be improved.
The label Distributed learning method of this consideration scoring label correlation provided in an embodiment of the present invention, can be to sample
Scoring distribution predicted.For example, this method can be applied in film scoring forecast of distribution, to carry out movie samples
Score forecast of distribution.Certainly, this method can be also applied in other industry scoring forecast of distribution, such as novel, TV play
Deng.
By taking film score in predicting as an example, can according to the feature of New cinema, and according in default training set have film
Feature and scoring situation, to predict the scoring of New cinema.It is thus desirable to obtain the correlation between feature and scoring, that is, obtain
Obtain the feature rating matrix θ of c row m column.Wherein, c is scoring number, and m is characterized number.
Since θ is potential unknown matrix, therefore in the embodiment of the present invention, θ can be known as the latent factor matrix that scores;Together
When, θ also illustrates the relevance between two scoring labels, therefore is also referred to as apart from mapping matrix.
In detail, θ is the scoring latent factor matrix of c row m column, therefore reflects the corresponding relationship between feature and scoring.
When carrying out film scoring forecast of distribution, feature is that movie features, such as director, protagonist, the playwright, screenwriter of film etc. can be film
Feature;Scoring is film scoring, for example each scoring label is respectively 1 point, 2 points, 3 points, 4 points and 5 points.
Since θ is unknown, therefore a θ can be initialized, and using θ when initializing as scoring when the 0th iteration it is potential because
Submatrix, step 102~106 Lai Zhihang, i.e., according to the 0th iteration when θ calculate θ when the 1st iteration, and be based on the 1st
The operation such as the solution of θ when secondary iteration, Lai Zhihang first derivative and the comparison judgement for terminating threshold value.If being judged as in step 105
It is that is, it is believed that θ when the 1st iteration is ideal θ, therefore the ideal θ can be based on, to predict the scoring of New cinema.If
Be judged as NO in step 105, then can execute step 102~106 again, i.e., according to the 1st iteration when θ calculate the 2nd time repeatedly
For when θ, and θ when based on the 2nd iteration, the solution of Lai Zhihang first derivative, with the operation such as the comparison judgement that terminates threshold value.
So circulation, until iterating to calculate out ideal θ.In this way, the ideal θ can be based on, to predict the scoring of New cinema.
Under normal conditions, can θ and preset training set when initially entering initialization, using the input as objective function
?.After this, can based on the given data in this training set, and according to each secondary iteration go out θ, to execute each time one
The solution of order derivative.
Since the ideal θ is obtained from being continued to optimize based on objective function and training set, and objective function considers
Correlation between scoring label, therefore it is believed that the New cinema predicted based on the ideal θ scoring is accurately, easily
In meeting New cinema scoring truth.
Above-mentioned steps 101 are please referred to, in order to consider the correlation between scoring label, objective function should include above-mentioned
Sign function and distance function, to describe the correlation between scoring label.In one embodiment of the invention, the symbol
Function includes:
Wherein,
The distance function includes:
Wherein, θ is the scoring latent factor matrix of c column m row, and c is scoring number, and m is characterized number, θikFor the i-th row in θ
The numerical value of kth column, θjkThe numerical value arranged for jth row kth in θ.
Based on above content it is found that sign function is for describing scoring latent factor vector θiAnd θjBetween correlation.Its
In, cosine (θi,θj) for calculating the distance between two scoring latent factor vectors, using signal function by this distance
It is converted into positive correlation, negative correlation and uncorrelated three kinds of states.
Based on above content it is found that distance function is used to describe the degree of correlation between scoring latent factor vector.
In the embodiment of the present invention, construct in combination with sign function and distance function apart from mapping function f (θi,θj), to retouch
Correlation between commentary minute mark label.By this apart from mapping function, the latent factor vector of scoring can get, so as to use
Cosine distance come measure two scoring latent factor vector between similarity.
As it can be seen that the combination based on sign function and distance function, can make this scoring Distributed learning mould of objective function
Type considers the correlation between scoring label, when carrying out scoring forecast of distribution, can be mentioned based on the scoring Distributed learning model
High prediction accuracy.
Based on above content, in an embodiment of the invention, the objective function includes:
Wherein, t (θ) is combination of function, λ1For predetermined coefficient.
It in the embodiment of the present invention, can construct apart from mapping function: f (θi,θj)=sgn (cosine (θi,θj))Dis(θi,
θj), to describe the correlation between scoring label.
In detail, f (θi,θj) by correlation multiplied by degree of correlation, obtain being positively correlated distance, negatively correlated distance or not phase
The association results of pass describe the relevance between scoring label well.For example, f (θa,θb) > 0 and value it is bigger, show to work as
When there is grading system a, a possibility that grading system b occur, is bigger.Conversely, f (θa,θb) < 0 and value it is smaller, show when occur
When grading system a, a possibility that grading system b occur, is smaller.
Based on above content, in an embodiment of the invention, the combination of function includes:
Wherein, x is the sample characteristics matrix of n row m column, and d is the sample rating matrix of n row c column, and n is number of samples, dijFor
The numerical value that the i-th row jth arranges in d, xikFor the numerical value that the i-th row kth in x arranges, λ2For predetermined coefficient, | | θ | |FFor the F norm of θ.
In detail, by being introduced in objective function | | θ | |FThis regular terms can play the work for avoiding overfitting
With.
In detail, the realization process for constructing this objective function can be as described below:
Firstly, enabling X=RmIndicate sample space, Y={ y1,y2,...,ycIndicate scoring set.Wherein m is characterized a
Number, c are scoring number.Enable S={ (x1,d1),(x2,d2),...,(xn,dn) indicate training set, wherein xi∈ X indicates i-th
A sample, di={ di1,di2,...,dicIndicate sample xiCorresponding scoring distribution, a shared n sample.dijIt indicates j-th
The degree of scoring i-th of sample of description.Indicate all scorings completely one sample x of descriptioni。
Then, p is enabledi={ p (y1|xi;θ),p(y2|xi;θ),...,p(yc|xi;θ) } indicate sample xiCorresponding pre- assessment
Distribution.In this way, Kullback-Leibler (KL) divergence distance can be used, that is, can be usedTo calculate the gap between true distribution and prediction distribution.
Then, building is as described above apart from mapping function.
Later, using maximum entropy model, f (θ is utilizedi,θj) indicate grading system correlation, obtain such as minor function:
Finally, due to p (yk|xi;θ) meet maximum entropy model, then hasBy
This, can carry out abbreviation to above formula, can obtain following formula:
The resulting formula of this abbreviation can be the objective function constructed.
With the building process of above-mentioned objective function correspondingly, in an embodiment of the invention, the step 101 packet
It includes: prediction scoring distribution corresponding with sample being distributed according to the corresponding true scoring of sample, building is for calculating true scoring point
The first function of gap between cloth and prediction scoring distribution;
It two is commented according to the sign function for describing correlation between two scoring latent factor vectors, and according to for calculating
The distance function of distance between point latent factor vector, building for describe correlation between scoring apart from mapping function;
According to the first function, regular terms and described apart from mapping function, second function is constructed;
It controls the prediction scoring distribution and meets maximum entropy model, to carry out simplifying processing to the second function, thus
Obtain objective function.
Based on above content it is found that in the embodiment of the present invention, first function can with forDistance mapping
Function can be for above-mentioned f (θi,θj), second function can be with for above-mentioned τ (θ), objective function can be for above-mentioned T (θ).
In detail, above-mentioned steps 102~106 are please referred to it is found that for obtain for predict scoring be distributed scoring it is potential because
Submatrix is recycled and executes this iterative process, until the judging result of step 105 is yes when certain an iteration.
Based on above content it is found that since θ is unknown, therefore a θ can be initialized(0), and by θ(0)When as the 0th iteration
θ calculates θ: θ when the 1st iteration through step 102(1).Assuming that being based on θ(1), the judging result of step 105 be it is no, then can root
According to θ(1)To calculate θ: θ when the 2nd iteration(2), and it is based on θ(2)To execute subsequent step.So circulation, until corresponding to l
When secondary iteration, it is based on θ(l), the judging result of step 105 be it is yes, then θ: θ when can be according to the l times iteration(l)To carry out scoring point
Cloth prediction.
Based on this, it is thus necessary to determine that the second scoring when calculating next iteration according to the first scoring latent factor matrix is potential
The specific implementation of factor matrix.
In detail, L-BFGS algorithm is a kind of a kind of algorithm of solution function root proposed on the basis of Newton method.At this
In invention one embodiment, L-BFGS method can be used, objective function is solved:Wherein, Δ=θ(l+1)-θ(l),It is the l times iteration
When first derivative, i.e., by θ(l)After substituting into T (θ), to T (θ) obtained first derivative.H(θ(l)) sea when being the l times iteration
Gloomy matrix.Further abbreviation can obtain:Setting step-length and direction, letter of guarantee numerical stability
Decline, then have θ(l+1)=θ(l)+a(l)p(l).Wherein, p(l)It is the direction of search, a(l)It is step-size in search.
In detail, step-size in search is determined by following calculation formula group:
Wherein, 0 < c1< c2< 1.
Specifically, a step-length can be initialized, for example is 1, and judges whether current step-length meets this formula group,
Step-length when if satisfaction can be the 0th iteration, if being unsatisfactory for that new step can be calculated according to preset step size computation formula
It is long, and judge whether new step-length meets this formula group again.So circulation, until the step-length for meeting the formula group is obtained,
Step-length when using as the 0th iteration.Based on same realization principle, it can be based on the formula group, determine iteration each time
When step-length.
Further, since Hessian matrix is difficult to determine, the core concept of L-BFGS method be calculate an approximate matrix with
In fitting Hessian matrix.
It is in an embodiment of the invention, excellent in order to illustrate a kind of possible scoring latent factor matrix iteration based on this
Change implementation, so, the step 103, comprising:
Determine the fit metric when previous iteration;
Determine first derivative when previous iteration, wherein the first derivative when previous iteration is, by described the
One scoring latent factor matrix substitutes into the objective function, and is based on the training set, and the one of the objective function acquired
Order derivative;
According to the fit metric and the first derivative when previous iteration, the searcher when previous iteration is calculated
To;
According to described search direction, the step-size in search when previous iteration is determined;
It is changed next time according to the first scoring latent factor matrix, described search direction and described search step-length, calculating
For when second scoring latent factor matrix.
In detail, fit metric when the 0th iteration can be fit metric when initializing.
Based on above content, in an embodiment of the invention, the direction of search of the calculating when previous iteration, packet
It includes: calculating the direction of search when previous iteration using formula one;
Step-size in search of the determination when previous iteration, comprising: determine meet the formula group two, work as previous iteration
When step-size in search;
Scoring latent factor matrix when the calculating next iteration, comprising: calculate next iteration using formula three
When scoring latent factor matrix;
The formula one includes:
The formula group two includes:
The formula three includes: θ(l+1)=θ(l)+a(l)p(l);
Wherein, p(l)Direction of search when for the l times iteration, B(l)Fit metric when for the l times iteration, B(0)It is initial
Fit metric when change,First derivative when for the l times iteration, a(l)Step-size in search when for the l times iteration, 0 <
c1< c2< 1, (p(l))TFor p(l)Transposition, θ(l)Scoring latent factor matrix when for the l times iteration, θ(0)When to initialize
Score latent factor matrix.
Based on above content, in an embodiment of the invention, fit metric of the calculating when previous iteration, packet
It includes: calculating the fit metric when previous iteration using formula four;
The formula four includes: B(l+1)=(I- ρ(l)s(l)(u(l))T)B(l)(I-ρ(l)u(l)(s(l))T)+ρ(l)s(l)(s(l))T
Wherein,s(l)=θ(l+1)-θ(l);
Wherein, B(l)Fit metric when for the l times iteration, B(0)Fit metric when to initialize, I are unit matrix,
θ(l)Scoring latent factor matrix when for the l times iteration, θ(0)Scoring latent factor matrix when to initialize,For
First derivative when the l times iteration, (u(l))TFor u(l)Transposition.
In an embodiment of the invention, the step 106 includes: by the second scoring latent factor matrix and to pre-
The corresponding sample characteristics matrix of test sample sheet substitutes into maximum entropy model, calculates the corresponding sample rating matrix of the sample to be predicted,
To obtain the scoring forecast of distribution result of the sample to be predicted;
The maximum entropy model includes:
Wherein, θ is the scoring latent factor matrix of c column m row, and x is the sample characteristics matrix of n row m column, and c is scoring number,
M is characterized number, and n is number of samples, θjkFor the numerical value that jth row kth in θ arranges, xikFor the numerical value that the i-th row kth in x arranges, pi=
{p(y1|xi;θ),p(y2|xi;θ),...,p(yc|xi;θ) }, piFor sample xiCorresponding prediction scoring distribution.
In detail, due to the corresponding sample characteristics matrix of sample to be predicted it is known that and scoring after iteration optimization it is potential because
Submatrix it is known that therefore the corresponding sample rating matrix of sample to be predicted can be calculated, to obtain the scoring of sample to be predicted
Forecast of distribution result.
Based on above-mentioned maximum entropy model it is found that by commenting after the corresponding sample characteristics matrix of sample to be predicted and iteration optimization
After dividing latent factor matrix to substitute into maximum entropy model, the degree of available j-th of scoring, i-th of sample of description, and then obtain
The degree of each scoring i-th of sample of description, to obtain the scoring forecast of distribution result of sample to be predicted.
In summary, created symbol function and the distance function in objective function of novelty of the embodiment of the present invention, to examine
The correlation between scoring label is considered, to improve score in predicting accuracy.In this way, the scoring that can be gone out based on iteration optimization is latent
The verification of prediction accuracy raising is carried out in factor matrix.
In detail, known sample can be divided into two parts, a part corresponds to training set, and a part corresponds to test
Collection.Training set is mainly used for feature according to internal known sample, scoring situation, 101~105 is changed through the above steps
Scoring latent factor matrix after generation optimization.Checksum set is mainly used for the feature according to internal known sample, scoring situation, to
To iteration optimization after scoring latent factor matrix tested.
Based on this, in an embodiment of the invention, after the step 106, this method be may further include: will
The preset corresponding sample characteristics matrix of test set and the second scoring latent factor matrix substitute into the maximum entropy model, meter
Calculate forecast sample rating matrix;
Using at least one evaluation index, the forecast sample rating matrix and the corresponding sample of the test set are verified respectively
Similarity degree between this rating matrix.
In detail, due to the corresponding sample characteristics matrix of test set and true sample rating matrix it is known that therefore can incite somebody to action
Scoring latent factor matrix after the sample characteristics matrix and iteration optimization substitutes into maximum entropy model, calculates the sample scoring of prediction
Matrix.Then, the similarity degree between the sample rating matrix of prediction and true sample rating matrix can be compared, with verification
Whether the scoring latent factor matrix after iteration optimization has enough accuracys.
In an embodiment of the invention, above-mentioned at least one evaluation index can be chi-Square measure (Squard χ 2), phase
Like at least one of degree (Intersection) and fidelity (Fidelity) or multiple.
In detail, these evaluation indexes are illustrated respectively.
The calculation formula of chi-Square measure are as follows:Wherein, dis is meant that distance.
The calculation formula of similarity are as follows:Wherein, sim is meant that similarity.
The calculation formula of fidelity are as follows:Wherein, sim is meant that similarity.
In addition, j is the subscript of scoring label, c is the total number of scoring label.OjIndicate in true scoring label distribution the
The corresponding numerical value of j scoring label, QjIndicate the corresponding numerical value of j-th of scoring label in prediction scoring label distribution.
The chi-Square measure being calculated is smaller, illustrates that two distribution distances are closer, gap is smaller.The similarity being calculated
It is bigger, illustrate that two distributions are more similar.The fidelity being calculated is bigger, illustrates prediction distribution and is really distributed closer.
It is verified, please refers to table 1, following experimental result can be obtained.
1 experimental result of table
Squardχ2 | Intersection | Fidelity | |
LDLLC | 0.0412±0.0213 | 0.8496±0.0238 | 0.9789±0.0107 |
LDSVR | 0.0887±0.0031 | 0.8436±0.0027 | 0.9764±0.0010 |
S-SVR | 0.1040±0.0030 | 0.8277±0.0023 | 0.9722±0.0009 |
M-SVRP | 0.1084±0.0033 | 0.8186±0.0034 | 0.9710±0.0010 |
BGFS-LLD | 0.1176±0.0042 | 0.8186±0.0033 | 0.9683±0.0012 |
IIS-LLD | 0.1195±0.0054 | 0.8172±0.0044 | 0.9676±0.0014 |
AA-kNN | 0.1246±0.0062 | 0.8101±0.0047 | 0.9664±0.0018 |
CPNN | 0.1625±0.0206 | 0.7847±0.0150 | 0.9551±0.0061 |
Wherein, LDLLC represents the scoring label Distributed learning for described in the embodiment of the present invention, considering scoring label correlation
Algorithm.
As known from Table 1, it is compared with other methods, the prediction distribution obtained when being predicted using this method and true distribution
It is more nearly.Therefore, sample scoring distribution is predicted using this method, relatively good prediction effect can be reached.
As shown in Fig. 2, one embodiment of the invention provides another scoring distribution forecasting method, following step is specifically included
It is rapid:
Step 201: prediction scoring distribution corresponding with sample being distributed according to the corresponding true scoring of sample, is constructed based on
Calculate the first function of gap between true scoring distribution and prediction scoring distribution.
In detail, first function can be to be above-mentioned
Step 202: according to the sign function for describing correlation between two scoring latent factor vectors, and according to being used for
The distance function of distance, building map letter for describing the distance of correlation between scoring between two scoring latent factor vectors of calculating
Number.
It in detail, can be for above-mentioned f (θ apart from mapping functioni,θj)。
Step 203: according to first function, regular terms and apart from mapping function, constructing second function.
In detail, second function can be for above-mentioned τ (θ).
Step 204: control forecasting scoring distribution meets maximum entropy model, to carry out simplifying processing to second function, thus
Obtain objective function.
In detail, objective function can be for above-mentioned T (θ).
Step 205: determining the first scoring latent factor matrix when previous iteration.
It in detail, can be using scoring latent factor matrix when initializing as scoring latent factor when the 0th iteration
Matrix.
Step 206: determining the fit metric when previous iteration.
It in detail, can be using fit metric when initializing as fit metric when the 0th iteration.
In detail, it can use fit metric of the calculating of above-mentioned formula four when previous iteration.
Step 207: determining first derivative when previous iteration, wherein the first derivative when previous iteration is, by the
One scoring latent factor matrix substitutes into objective function, and is based on training set, and the first derivative of the objective function acquired.
Step 208: according to fit metric and the first derivative when previous iteration, calculating the searcher when previous iteration
To.
In detail, it can use the direction of search of the calculating of above-mentioned formula one when previous iteration.
Step 209: according to the direction of search, determining the step-size in search when previous iteration.
In detail, it can use the determining step-size in search when previous iteration of above-mentioned formula group two.
Step 210: according to the first scoring latent factor matrix, the direction of search and step-size in search, when calculating next iteration
Second scoring latent factor matrix.
In detail, it can use scoring latent factor matrix when above-mentioned formula three calculates next iteration.
Step 211: the second scoring latent factor matrix being substituted into objective function, and is based on training set, solves objective function
First derivative.
Step 212: judging the value of F norm of the first derivative at the second scoring latent factor matrix, if be less than pre-
If termination threshold value, if so, execute step 213, otherwise, by second scoring latent factor matrix be used as again when previous iteration
Scoring latent factor matrix, and execute step 205.
Step 213: the second scoring latent factor matrix and the corresponding sample characteristics matrix of sample to be predicted being substituted into maximum
Entropy model calculates the corresponding sample rating matrix of sample to be predicted, to obtain the scoring forecast of distribution result of sample to be predicted.
Step 214: the corresponding sample characteristics matrix of preset test set and the second scoring latent factor matrix are substituted into most
Big entropy model calculates forecast sample rating matrix.
Step 215: using at least one evaluation index, verifying forecast sample rating matrix and the corresponding sample of test set respectively
Similarity degree between this rating matrix.
As shown in figure 3, one embodiment of the invention provides a kind of any of the above-described scoring distribution forecasting method of execution
Scoring forecast of distribution device, may include:
Objective function construction unit 301, for constructing objective function, wherein the objective function includes: for describing two
Score latent factor vector between correlation sign function, and, for calculates two score latent factor vectors between distance distance
Function;
Determination unit 302, for determining the first scoring latent factor matrix when previous iteration;
First processing units 303, for calculating the when next iteration according to the first scoring latent factor matrix
Two scoring latent factor matrixes;
First derivative solves unit 304, for the second scoring latent factor matrix to be substituted into the objective function, and
Based on preset training set, the first derivative of the objective function is solved;
The second processing unit 305, for judging F model of the first derivative at the second scoring latent factor matrix
Several value, if be less than preset termination threshold value, if so, otherwise triggering scoring forecast of distribution unit 306 is commented described second
Divide latent factor matrix to be used as the scoring latent factor matrix when previous iteration again, and triggers the determination unit 302;
The scoring forecast of distribution unit 306, for obtaining sample to be predicted according to the second scoring latent factor matrix
This scoring forecast of distribution result.
The contents such as the information exchange between each unit, implementation procedure in above-mentioned apparatus, due to implementing with the method for the present invention
Example is based on same design, and for details, please refer to the description in the embodiment of the method for the present invention, and details are not described herein again.
In addition, one embodiment of the invention provides a kind of readable medium, including execute instruction, when the place of storage control
When executing instruction described in reason device execution, the storage control executes any of the above-described scoring distribution forecasting method.
In addition, one embodiment of the invention provides a kind of storage control, comprising: processor, memory and bus;
The memory is executed instruction for storing, and the processor is connect with the memory by the bus, when
When the storage control is run, the processor executes the described of memory storage and executes instruction, so that the storage
Controller executes any of the above-described scoring distribution forecasting method.
In conclusion the embodiment of the present invention have it is at least following the utility model has the advantages that
1, in the embodiment of the present invention, building include for describe two scoring latent factor vectors between correlation symbol letter
Number and for calculates two score latent factor vectors between distance distance function objective function;Training set data is inputted, is determined
When previous iteration scoring latent factor matrix and accordingly calculating next iteration when scoring latent factor matrix, to obtain
New matrix;New matrix is substituted into objective function, solves the first derivative of objective function;F norm of the first derivative at new matrix
Value when being less than preset termination threshold value, the scoring forecast of distribution of sample to be predicted is obtained as a result, otherwise, by new square according to new matrix
Battle array is again as the scoring latent factor matrix when previous iteration, so circulation.The embodiment of the present invention passes through in objective function
Middle created symbol function and distance function, it is contemplated that scoring label between correlation, therefore predicted based on this scoring be distributed when
Prediction accuracy can be improved.
2, in the embodiment of the present invention, key point is to consider the correlation between scoring label, proposes that a kind of consideration is commented
The scoring label Distributed learning algorithm of minute mark label correlation.The algorithm makes full use of movie features data, and building film first is special
The distribution of sign-scoring apart from mapping function, which includes two parts, and first part is sign function, second part be away from
From function.Secondly new optimization object function is devised using maximum entropy model.Finally, by solving new optimization aim letter
Number, prediction scoring distribution.
3, in the embodiment of the present invention, be compared with other methods, the prediction distribution obtained when being predicted using this method with
True distribution is more nearly.Therefore, sample scoring distribution is predicted using this method, relatively good prediction effect can be reached
Fruit.
It should be noted that, in this document, such as first and second etc relational terms are used merely to an entity
Or operation is distinguished with another entity or operation, is existed without necessarily requiring or implying between these entities or operation
Any actual relationship or order.Moreover, the terms "include", "comprise" or its any other variant be intended to it is non-
It is exclusive to include, so that the process, method, article or equipment for including a series of elements not only includes those elements,
It but also including other elements that are not explicitly listed, or further include solid by this process, method, article or equipment
Some elements.In the absence of more restrictions, the element limited by sentence " including one ", is not arranged
Except there is also other identical factors in the process, method, article or apparatus that includes the element.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through
The relevant hardware of program instruction is completed, and program above-mentioned can store in computer-readable storage medium, the program
When being executed, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes: ROM, RAM, magnetic disk or light
In the various media that can store program code such as disk.
Finally, it should be noted that the foregoing is merely presently preferred embodiments of the present invention, it is merely to illustrate skill of the invention
Art scheme, is not intended to limit the scope of the present invention.Any modification for being made all within the spirits and principles of the present invention,
Equivalent replacement, improvement etc., are included within the scope of protection of the present invention.
Claims (10)
1. a kind of scoring distribution forecasting method characterized by comprising
S1: building objective function, wherein the objective function includes: for correlation between two scoring latent factor vectors of description
Sign function, and, for calculate two scoring latent factor vectors between distance distance function;
S2: the first scoring latent factor matrix when previous iteration is determined;
S3: the second scoring latent factor matrix when next iteration is calculated according to the first scoring latent factor matrix;
S4: the second scoring latent factor matrix is substituted into the objective function, and is based on preset training set, described in solution
The first derivative of objective function;
S5: judge the value of F norm of the first derivative at the second scoring latent factor matrix, if be less than preset
Threshold value is terminated, if so, S6 is executed, otherwise, by the second scoring latent factor matrix again as commenting when previous iteration
Divide latent factor matrix, and executes S2;
S6: according to the second scoring latent factor matrix, the scoring forecast of distribution result of sample to be predicted is obtained.
2. the method according to claim 1, wherein
The sign function includes:
Wherein,
The distance function includes:
Wherein, θ is the scoring latent factor matrix of c column m row, and c is scoring number, and m is characterized number, θikFor the i-th row kth in θ
The numerical value of column, θjkThe numerical value arranged for jth row kth in θ.
3. according to the method described in claim 2, it is characterized in that,
The objective function includes:
Wherein, t (θ) is combination of function, λ1For predetermined coefficient.
4. according to the method described in claim 3, it is characterized in that,
The combination of function includes:
Wherein, x is the sample characteristics matrix of n row m column, and d is the sample rating matrix of n row c column, and n is number of samples, dijFor in d
The numerical value of i-th row jth column, xikFor the numerical value that the i-th row kth in x arranges, λ2For predetermined coefficient, | | θ | |FFor the F norm of θ.
5. the method according to claim 1, wherein
The S1 includes: to be distributed prediction scoring distribution corresponding with sample according to the corresponding true scoring of sample, is constructed based on
Calculate the first function of gap between true scoring distribution and prediction scoring distribution;
According to the sign function for describing correlation between two scoring latent factor vectors, and according to latent for calculating two scorings
Because subvector spacing from distance function, building for describe score between correlation apart from mapping function;
According to the first function, regular terms and described apart from mapping function, second function is constructed;
It controls the prediction scoring distribution and meets maximum entropy model, to carry out simplifying processing to the second function, to obtain
Objective function.
6. the method according to claim 1, wherein
The S3, comprising:
Determine the fit metric when previous iteration;
Determine the first derivative when previous iteration, wherein the first derivative when previous iteration is to comment described first
Divide latent factor matrix to substitute into the objective function, and is based on the training set, and the single order of the objective function acquired is led
Number;
According to the fit metric and the first derivative when previous iteration, the direction of search when previous iteration is calculated;
According to described search direction, the step-size in search when previous iteration is determined;
According to the first scoring latent factor matrix, described search direction and described search step-length, when calculating next iteration
Second scoring latent factor matrix.
7. according to the method described in claim 6, it is characterized in that,
The direction of search of the calculating when previous iteration, comprising: the direction of search when previous iteration is calculated using formula one;
Step-size in search of the determination when previous iteration, comprising: determination meets the formula group two, when previous iteration
Step-size in search;
Scoring latent factor matrix when the calculating next iteration, comprising: when using the calculating next iteration of formula three
Score latent factor matrix;
The formula one includes:
The formula group two includes:
The formula three includes: θ(l+1)=θ(l)+a(l)p(l);
Wherein, p(l)Direction of search when for the l times iteration, B(l)Fit metric when for the l times iteration, B(0)When to initialize
Fit metric,First derivative when for the l times iteration, a(l)Step-size in search when for the l times iteration, 0 < c1<
c2< 1, (p(l))TFor p(l)Transposition, θ(l)Scoring latent factor matrix when for the l times iteration, θ(0)Scoring when to initialize
Latent factor matrix;
And/or
The fit metric of the calculating when previous iteration, comprising: the fit metric when previous iteration is calculated using formula four;
The formula four includes: B(l+1)=(I- ρ(l)s(l)(u(l))T)B(l)(I-ρ(l)u(l)(s(l))T)+ρ(l)s(l)(s(l))T
Wherein,s(l)=θ(l+1)-θ(l);
Wherein, B(l)Fit metric when for the l times iteration, B(0)Fit metric when to initialize, I are unit matrix, θ(l)For
Scoring latent factor matrix when the l times iteration, θ(0)Scoring latent factor matrix when to initialize,It is the l times
First derivative when iteration, (u(l))TFor u(l)Transposition.
8. according to claim 1 to any method in 7, which is characterized in that
The S6 includes: to substitute into the second scoring latent factor matrix and the corresponding sample characteristics matrix of sample to be predicted most
Big entropy model calculates the corresponding sample rating matrix of the sample to be predicted, to obtain the scoring distribution of the sample to be predicted
Prediction result;
The maximum entropy model includes:
Wherein, θ is the scoring latent factor matrix of c column m row, and x is the sample characteristics matrix of n row m column, and c is scoring number, and m is
Characteristic Number, n are number of samples, θjkFor the numerical value that jth row kth in θ arranges, xikFor the numerical value that the i-th row kth in x arranges, pi={ p
(y1|xi;θ),p(y2|xi;θ),...,p(yc|xi;θ) }, piFor sample xiCorresponding prediction scoring distribution.
9. according to the method described in claim 8, it is characterized in that,
After the S6, further comprise: the corresponding sample characteristics matrix of preset test set and second scoring is latent
The maximum entropy model is substituted into factor matrix, calculates forecast sample rating matrix;
Using at least one evaluation index, the forecast sample rating matrix is verified respectively and the corresponding sample of the test set is commented
Similarity degree between sub-matrix.
The scoring forecast of distribution device of distribution forecasting method 10. a kind of execution is scored as described in any in claim 1 to 9,
It is characterized in that, comprising:
Objective function construction unit, for constructing objective function, wherein the objective function includes: latent for describing two scorings
In the sign function because of correlation between subvector, and, for calculate two scoring latent factor vectors between distance distance function;
Determination unit, for determining the first scoring latent factor matrix when previous iteration;
First processing units, it is latent for the second scoring when calculating next iteration according to the first scoring latent factor matrix
In factor matrix;
First derivative solves unit, for the second scoring latent factor matrix to be substituted into the objective function, and based on pre-
If training set, solve the first derivative of the objective function;
The second processing unit, for judging the value of F norm of the first derivative at the second scoring latent factor matrix,
Whether preset termination threshold value is less than, if so, triggering scoring forecast of distribution unit, otherwise, by the second scoring latent factor
Matrix is used as the scoring latent factor matrix when previous iteration again, and triggers the determination unit;
The scoring forecast of distribution unit, for obtaining commenting for sample to be predicted according to the second scoring latent factor matrix
Divide forecast of distribution result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910181625.5A CN109872006A (en) | 2019-03-11 | 2019-03-11 | A kind of scoring distribution forecasting method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910181625.5A CN109872006A (en) | 2019-03-11 | 2019-03-11 | A kind of scoring distribution forecasting method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109872006A true CN109872006A (en) | 2019-06-11 |
Family
ID=66920208
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910181625.5A Pending CN109872006A (en) | 2019-03-11 | 2019-03-11 | A kind of scoring distribution forecasting method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109872006A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390561A (en) * | 2019-07-04 | 2019-10-29 | 四川金赞科技有限公司 | User-financial product of stochastic gradient descent is accelerated to select tendency ultra rapid predictions method and apparatus based on momentum |
CN110928936A (en) * | 2019-10-18 | 2020-03-27 | 平安科技(深圳)有限公司 | Information processing method, device, equipment and storage medium based on reinforcement learning |
-
2019
- 2019-03-11 CN CN201910181625.5A patent/CN109872006A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390561A (en) * | 2019-07-04 | 2019-10-29 | 四川金赞科技有限公司 | User-financial product of stochastic gradient descent is accelerated to select tendency ultra rapid predictions method and apparatus based on momentum |
CN110390561B (en) * | 2019-07-04 | 2022-04-29 | 壹融站信息技术(深圳)有限公司 | User-financial product selection tendency high-speed prediction method and device based on momentum acceleration random gradient decline |
CN110928936A (en) * | 2019-10-18 | 2020-03-27 | 平安科技(深圳)有限公司 | Information processing method, device, equipment and storage medium based on reinforcement learning |
CN110928936B (en) * | 2019-10-18 | 2023-06-16 | 平安科技(深圳)有限公司 | Information processing method, device, equipment and storage medium based on reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103164463B (en) | Method and device for recommending labels | |
WO2019223384A1 (en) | Feature interpretation method and device for gbdt model | |
CN112487805B (en) | Small sample Web service classification method based on meta-learning framework | |
CN113392651B (en) | Method, device, equipment and medium for training word weight model and extracting core words | |
CN109960749B (en) | Model obtaining method, keyword generation method, device, medium and computing equipment | |
CN109145083B (en) | Candidate answer selecting method based on deep learning | |
CN109933792A (en) | Viewpoint type problem based on multi-layer biaxially oriented LSTM and verifying model reads understanding method | |
KR20170096282A (en) | Deep learning type classification method with feature-based weighting | |
CN112329460A (en) | Text topic clustering method, device, equipment and storage medium | |
CN107506617A (en) | The half local disease-associated Forecasting Methodologies of social information miRNA | |
US20200020321A1 (en) | Speech recognition results re-ranking device, speech recognition results re-ranking method, and program | |
US20220092381A1 (en) | Neural architecture search via similarity-based operator ranking | |
CN109872006A (en) | A kind of scoring distribution forecasting method and device | |
CN112528136A (en) | Viewpoint label generation method and device, electronic equipment and storage medium | |
JP5975938B2 (en) | Speech recognition apparatus, speech recognition method and program | |
CN114579892A (en) | User remote access position prediction method based on cross-city interest point matching | |
Ludwig et al. | Deep embedding for spatial role labeling | |
US9348810B2 (en) | Model learning method | |
CN117312856A (en) | Commodity category prediction model training and application method and electronic equipment | |
CN110135507A (en) | A kind of label distribution forecasting method and device | |
Wei | Recommended methods for teaching resources in public English MOOC based on data chunking | |
Chen et al. | Sequential neural networks for noetic end-to-end response selection | |
CN114529191A (en) | Method and apparatus for risk identification | |
CN117616473A (en) | Process video assessment | |
Gutiérrez-Soto et al. | Probabilistic reuse of past search results |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190611 |