CN109872006A - A kind of scoring distribution forecasting method and device - Google Patents

A kind of scoring distribution forecasting method and device Download PDF

Info

Publication number
CN109872006A
CN109872006A CN201910181625.5A CN201910181625A CN109872006A CN 109872006 A CN109872006 A CN 109872006A CN 201910181625 A CN201910181625 A CN 201910181625A CN 109872006 A CN109872006 A CN 109872006A
Authority
CN
China
Prior art keywords
scoring
latent factor
matrix
function
factor matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910181625.5A
Other languages
Chinese (zh)
Inventor
张恒汝
秦琴
徐媛媛
闵帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Petroleum University
Original Assignee
Southwest Petroleum University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Petroleum University filed Critical Southwest Petroleum University
Priority to CN201910181625.5A priority Critical patent/CN109872006A/en
Publication of CN109872006A publication Critical patent/CN109872006A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a kind of scoring distribution forecasting method and device, this method comprises: building includes for describe the sign function of correlation between two scoring latent factor vectors and for calculating two objective functions for scoring the distance function of distance between latent factor vectors;Training set data is inputted, scoring latent factor matrix when determining the scoring latent factor matrix when previous iteration and calculating next iteration accordingly, to obtain new matrix;New matrix is substituted into objective function, solves the first derivative of objective function;When the value of F norm of the first derivative at new matrix is less than preset termination threshold value, the scoring forecast of distribution of sample to be predicted is obtained as a result, otherwise according to new matrix, new matrix is regard as the scoring latent factor matrix when previous iteration again, so recycles.This programme passes through created symbol function and distance function in objective function, it is contemplated that the correlation between scoring label, therefore predict that prediction accuracy can be improved when scoring distribution based on this.

Description

A kind of scoring distribution forecasting method and device
Technical field
The present invention relates to field of computer technology, in particular to a kind of scoring distribution forecasting method and device.
Background technique
Under normal conditions, by the forecast of distribution that scores, it anticipated that the scoring distribution situation of sample to be released.To base In certain foreseeability to reduce investment risk as far as possible.By taking moviemaking as an example, moviemaking is as the tens billion of beauty of value Member global industry, there is thousands of New cinema to put goods on the market every year, but and not all film be all on box office Function.Therefore, either motion picture producer or cinema, require prediction New cinema before publication spectators to this film Like degree.
Currently, the scoring that can use film is distributed and likes degree to characterize spectators to certain film.Naturally, LDL (Label DistrubitionLearning, label Distributed learning) method can be applied in this field.LDL is single label One kind of study and multi-tag study is extensive, is a more generally applicable learning framework, can be obtained in practical application by LDL Label significance distribution, to can effectively solve label fuzzy problem.
Existing work predicted using support vector machines scoring distribution, this method think scoring between be it is independent, do not have There is the relevance between considering to score.However, this relevance is generally existing, for example, for 1-5 points-scoring system and Speech, pessimistic user give 1 point to the film not liked, and optimistic user may provide 2 points or higher to identical film Point.It therefore, the use of the scoring distribution strictly distinguished is improper come the pouplarity for characterizing a film.As it can be seen that existing reality The prediction accuracy of existing mode is not high.
Summary of the invention
The present invention provides a kind of scoring distribution forecasting method and devices, can be improved prediction accuracy.
In order to achieve the above object, the present invention is achieved through the following technical solutions:
On the one hand, the present invention provides a kind of scoring distribution forecasting methods, comprising:
S1: building objective function, wherein the objective function includes: for phase between two scoring latent factor vectors of description Close property sign function, and, for calculates two score latent factor vectors between distance distance function;
S2: the first scoring latent factor matrix when previous iteration is determined;
S3: the second scoring latent factor square when next iteration is calculated according to the first scoring latent factor matrix Battle array;
S4: the second scoring latent factor matrix is substituted into the objective function, and is based on preset training set, is solved The first derivative of the objective function;
S5: judge the value of F norm of the first derivative at the second scoring latent factor matrix, if be less than pre- If termination threshold value, if so, execute S6, otherwise, by it is described second scoring latent factor matrix be used as again when previous iteration Scoring latent factor matrix, and execute S2;
S6: according to the second scoring latent factor matrix, the scoring forecast of distribution result of sample to be predicted is obtained.
Further, the sign function includes:
Wherein,
The distance function includes:
Wherein, θ is the scoring latent factor matrix of c column m row, and c is scoring number, and m is characterized number, θikFor the i-th row in θ The numerical value of kth column, θjkThe numerical value arranged for jth row kth in θ.
Further, the objective function includes:
Wherein, t (θ) is combination of function, λ1For predetermined coefficient.
Further, the combination of function includes:
Wherein, x is the sample characteristics matrix of n row m column, and d is the sample rating matrix of n row c column, and n is number of samples, dijFor The numerical value that the i-th row jth arranges in d, xikFor the numerical value that the i-th row kth in x arranges, λ2For predetermined coefficient, | | θ | |FFor the F norm of θ.
Further, the S1 includes: to be distributed prediction scoring point corresponding with sample according to the corresponding true scoring of sample Cloth constructs the first function for calculating gap between true scoring distribution and prediction scoring distribution;
It two is commented according to the sign function for describing correlation between two scoring latent factor vectors, and according to for calculating The distance function of distance between point latent factor vector, building for describe correlation between scoring apart from mapping function;
According to the first function, regular terms and described apart from mapping function, second function is constructed;
It controls the prediction scoring distribution and meets maximum entropy model, to carry out simplifying processing to the second function, thus Obtain objective function.
Further, the S3, comprising:
Determine the fit metric when previous iteration;
Determine first derivative when previous iteration, wherein the first derivative when previous iteration is, by described the One scoring latent factor matrix substitutes into the objective function, and is based on the training set, and the one of the objective function acquired Order derivative;
According to the fit metric and the first derivative when previous iteration, the searcher when previous iteration is calculated To;
According to described search direction, the step-size in search when previous iteration is determined;
It is changed next time according to the first scoring latent factor matrix, described search direction and described search step-length, calculating For when second scoring latent factor matrix.
Further, the direction of search of the calculating when previous iteration, comprising: work as previous iteration using the calculating of formula one When the direction of search;
Step-size in search of the determination when previous iteration, comprising: determine meet the formula group two, work as previous iteration When step-size in search;
Scoring latent factor matrix when the calculating next iteration, comprising: calculate next iteration using formula three When scoring latent factor matrix;
The formula one includes:
The formula group two includes:
The formula three includes: θ(l+1)(l)+a(l)p(l)
Wherein, p(l)Direction of search when for the l times iteration, B(l)Fit metric when for the l times iteration, B(0)It is initial Fit metric when change,First derivative when for the l times iteration, a(l)Step-size in search when for the l times iteration, 0 < c1< c2< 1, (p(l))TFor p(l)Transposition, θ(l)Scoring latent factor matrix when for the l times iteration, θ(0)When to initialize Score latent factor matrix.
Further, fit metric of the calculating when previous iteration, comprising: work as previous iteration using the calculating of formula four When fit metric;
The formula four includes: B(l+1)=(I- ρ(l)s(l)(u(l))T)B(l)(I-ρ(l)u(l)(s(l))T)+ρ(l)s(l)(s(l))T
Wherein,s(l)(l+1)(l)
Wherein, B(l)Fit metric when for the l times iteration, B(0)Fit metric when to initialize, I are unit matrix, θ(l)Scoring latent factor matrix when for the l times iteration, θ(0)Scoring latent factor matrix when to initialize,For First derivative when the l times iteration, (u(l))TFor u(l)Transposition.
Further, the S6 includes: by the second scoring latent factor matrix and the corresponding sample of sample to be predicted Eigenmatrix substitutes into maximum entropy model, calculates the corresponding sample rating matrix of the sample to be predicted, described to be predicted to obtain The scoring forecast of distribution result of sample;
The maximum entropy model includes:
Wherein, θ is the scoring latent factor matrix of c column m row, and x is the sample characteristics matrix of n row m column, and c is scoring number, M is characterized number, and n is number of samples, θjkFor the numerical value that jth row kth in θ arranges, xikFor the numerical value that the i-th row kth in x arranges, pi= {p(y1|xi;θ),p(y2|xi;θ),...,p(yc|xi;θ) }, piFor sample xiCorresponding prediction scoring distribution.
Further, after the S6, further comprise: by the corresponding sample characteristics matrix of preset test set and institute It states the second scoring latent factor matrix and substitutes into the maximum entropy model, calculate forecast sample rating matrix;
Using at least one evaluation index, the forecast sample rating matrix and the corresponding sample of the test set are verified respectively Similarity degree between this rating matrix.
On the other hand, the present invention provides a kind of scoring distribution for executing any of the above-described scoring distribution forecasting method is pre- Survey device characterized by comprising
Objective function construction unit, for constructing objective function, wherein the objective function includes: to comment for describing two The sign function of correlation between point latent factor vector, and, for calculates two score distance between latent factor vectors apart from letter Number;
Determination unit, for determining the first scoring latent factor matrix when previous iteration;
First processing units, second when for calculating next iteration according to the first scoring latent factor matrix comments Divide latent factor matrix;
First derivative solves unit, for the second scoring latent factor matrix to be substituted into the objective function, and base In preset training set, the first derivative of the objective function is solved;
The second processing unit, for judging F norm of the first derivative at the second scoring latent factor matrix Value, if be less than preset termination threshold value, if so, triggering scoring forecast of distribution unit, otherwise, will it is described second score it is potential Factor matrix is used as the scoring latent factor matrix when previous iteration again, and triggers the determination unit;
The scoring forecast of distribution unit, for obtaining sample to be predicted according to the second scoring latent factor matrix Scoring forecast of distribution result.
The present invention provides a kind of scoring distribution forecasting method and devices, this method comprises: building includes for describing Two scoring latent factor vectors between correlation sign function and for calculates two score latent factor vectors between distance distance The objective function of function;Training set data is inputted, under determining the scoring latent factor matrix when previous iteration and calculating accordingly Scoring latent factor matrix when an iteration, to obtain new matrix;New matrix is substituted into objective function, solves objective function First derivative;When the value of F norm of the first derivative at new matrix is less than preset termination threshold value, obtained according to new matrix to be predicted The scoring forecast of distribution of sample is used as the scoring latent factor matrix when previous iteration as a result, otherwise, by new matrix again, such as This circulation.The present invention passes through created symbol function and distance function in objective function, it is contemplated that the correlation between scoring label Property, therefore predict that prediction accuracy can be improved when scoring distribution based on this.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.
Fig. 1 is a kind of flow chart for scoring distribution forecasting method that one embodiment of the invention provides;
Fig. 2 is the flow chart for another scoring distribution forecasting method that one embodiment of the invention provides;
Fig. 3 is a kind of schematic diagram for scoring forecast of distribution device that one embodiment of the invention provides.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments, based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
As shown in Figure 1, may comprise steps of the embodiment of the invention provides a kind of scoring distribution forecasting method:
Step 101: building objective function, wherein the objective function includes: for describing two scoring latent factor vectors Between correlation sign function, and, for calculate two scoring latent factor vectors between distance distance function.
Step 102: determining the first scoring latent factor matrix when previous iteration.
Step 103: according to it is described first scoring latent factor matrix calculate next iteration when second scoring it is potential because Submatrix.
Step 104: the second scoring latent factor matrix being substituted into the objective function, and is based on preset training Collection, solves the first derivative of the objective function.
Step 105: judging the value of F norm of the first derivative at the second scoring latent factor matrix, if Less than preset termination threshold value, if so, executing step 106, otherwise, the second scoring latent factor matrix is used as again and is worked as Scoring latent factor matrix when previous iteration, and execute step 102.
Step 106: according to the second scoring latent factor matrix, obtaining the scoring forecast of distribution knot of sample to be predicted Fruit.
The embodiment of the invention provides a kind of scoring distribution forecasting methods, this method comprises: building includes for describing Two scoring latent factor vectors between correlation sign function and for calculates two score latent factor vectors between distance distance The objective function of function;Training set data is inputted, under determining the scoring latent factor matrix when previous iteration and calculating accordingly Scoring latent factor matrix when an iteration, to obtain new matrix;New matrix is substituted into objective function, solves objective function First derivative;When the value of F norm of the first derivative at new matrix is less than preset termination threshold value, obtained according to new matrix to be predicted The scoring forecast of distribution of sample is used as the scoring latent factor matrix when previous iteration as a result, otherwise, by new matrix again, such as This circulation.The embodiment of the present invention passes through created symbol function and distance function in objective function, it is contemplated that between scoring label Correlation, therefore predicted based on this scoring distribution when prediction accuracy can be improved.
The label Distributed learning method of this consideration scoring label correlation provided in an embodiment of the present invention, can be to sample Scoring distribution predicted.For example, this method can be applied in film scoring forecast of distribution, to carry out movie samples Score forecast of distribution.Certainly, this method can be also applied in other industry scoring forecast of distribution, such as novel, TV play Deng.
By taking film score in predicting as an example, can according to the feature of New cinema, and according in default training set have film Feature and scoring situation, to predict the scoring of New cinema.It is thus desirable to obtain the correlation between feature and scoring, that is, obtain Obtain the feature rating matrix θ of c row m column.Wherein, c is scoring number, and m is characterized number.
Since θ is potential unknown matrix, therefore in the embodiment of the present invention, θ can be known as the latent factor matrix that scores;Together When, θ also illustrates the relevance between two scoring labels, therefore is also referred to as apart from mapping matrix.
In detail, θ is the scoring latent factor matrix of c row m column, therefore reflects the corresponding relationship between feature and scoring. When carrying out film scoring forecast of distribution, feature is that movie features, such as director, protagonist, the playwright, screenwriter of film etc. can be film Feature;Scoring is film scoring, for example each scoring label is respectively 1 point, 2 points, 3 points, 4 points and 5 points.
Since θ is unknown, therefore a θ can be initialized, and using θ when initializing as scoring when the 0th iteration it is potential because Submatrix, step 102~106 Lai Zhihang, i.e., according to the 0th iteration when θ calculate θ when the 1st iteration, and be based on the 1st The operation such as the solution of θ when secondary iteration, Lai Zhihang first derivative and the comparison judgement for terminating threshold value.If being judged as in step 105 It is that is, it is believed that θ when the 1st iteration is ideal θ, therefore the ideal θ can be based on, to predict the scoring of New cinema.If Be judged as NO in step 105, then can execute step 102~106 again, i.e., according to the 1st iteration when θ calculate the 2nd time repeatedly For when θ, and θ when based on the 2nd iteration, the solution of Lai Zhihang first derivative, with the operation such as the comparison judgement that terminates threshold value. So circulation, until iterating to calculate out ideal θ.In this way, the ideal θ can be based on, to predict the scoring of New cinema.
Under normal conditions, can θ and preset training set when initially entering initialization, using the input as objective function ?.After this, can based on the given data in this training set, and according to each secondary iteration go out θ, to execute each time one The solution of order derivative.
Since the ideal θ is obtained from being continued to optimize based on objective function and training set, and objective function considers Correlation between scoring label, therefore it is believed that the New cinema predicted based on the ideal θ scoring is accurately, easily In meeting New cinema scoring truth.
Above-mentioned steps 101 are please referred to, in order to consider the correlation between scoring label, objective function should include above-mentioned Sign function and distance function, to describe the correlation between scoring label.In one embodiment of the invention, the symbol Function includes:
Wherein,
The distance function includes:
Wherein, θ is the scoring latent factor matrix of c column m row, and c is scoring number, and m is characterized number, θikFor the i-th row in θ The numerical value of kth column, θjkThe numerical value arranged for jth row kth in θ.
Based on above content it is found that sign function is for describing scoring latent factor vector θiAnd θjBetween correlation.Its In, cosine (θij) for calculating the distance between two scoring latent factor vectors, using signal function by this distance It is converted into positive correlation, negative correlation and uncorrelated three kinds of states.
Based on above content it is found that distance function is used to describe the degree of correlation between scoring latent factor vector.
In the embodiment of the present invention, construct in combination with sign function and distance function apart from mapping function f (θij), to retouch Correlation between commentary minute mark label.By this apart from mapping function, the latent factor vector of scoring can get, so as to use Cosine distance come measure two scoring latent factor vector between similarity.
As it can be seen that the combination based on sign function and distance function, can make this scoring Distributed learning mould of objective function Type considers the correlation between scoring label, when carrying out scoring forecast of distribution, can be mentioned based on the scoring Distributed learning model High prediction accuracy.
Based on above content, in an embodiment of the invention, the objective function includes:
Wherein, t (θ) is combination of function, λ1For predetermined coefficient.
It in the embodiment of the present invention, can construct apart from mapping function: f (θij)=sgn (cosine (θij))Dis(θi, θj), to describe the correlation between scoring label.
In detail, f (θij) by correlation multiplied by degree of correlation, obtain being positively correlated distance, negatively correlated distance or not phase The association results of pass describe the relevance between scoring label well.For example, f (θab) > 0 and value it is bigger, show to work as When there is grading system a, a possibility that grading system b occur, is bigger.Conversely, f (θab) < 0 and value it is smaller, show when occur When grading system a, a possibility that grading system b occur, is smaller.
Based on above content, in an embodiment of the invention, the combination of function includes:
Wherein, x is the sample characteristics matrix of n row m column, and d is the sample rating matrix of n row c column, and n is number of samples, dijFor The numerical value that the i-th row jth arranges in d, xikFor the numerical value that the i-th row kth in x arranges, λ2For predetermined coefficient, | | θ | |FFor the F norm of θ.
In detail, by being introduced in objective function | | θ | |FThis regular terms can play the work for avoiding overfitting With.
In detail, the realization process for constructing this objective function can be as described below:
Firstly, enabling X=RmIndicate sample space, Y={ y1,y2,...,ycIndicate scoring set.Wherein m is characterized a Number, c are scoring number.Enable S={ (x1,d1),(x2,d2),...,(xn,dn) indicate training set, wherein xi∈ X indicates i-th A sample, di={ di1,di2,...,dicIndicate sample xiCorresponding scoring distribution, a shared n sample.dijIt indicates j-th The degree of scoring i-th of sample of description.Indicate all scorings completely one sample x of descriptioni
Then, p is enabledi={ p (y1|xi;θ),p(y2|xi;θ),...,p(yc|xi;θ) } indicate sample xiCorresponding pre- assessment Distribution.In this way, Kullback-Leibler (KL) divergence distance can be used, that is, can be usedTo calculate the gap between true distribution and prediction distribution.
Then, building is as described above apart from mapping function.
Later, using maximum entropy model, f (θ is utilizedij) indicate grading system correlation, obtain such as minor function:
Finally, due to p (yk|xi;θ) meet maximum entropy model, then hasBy This, can carry out abbreviation to above formula, can obtain following formula:
The resulting formula of this abbreviation can be the objective function constructed.
With the building process of above-mentioned objective function correspondingly, in an embodiment of the invention, the step 101 packet It includes: prediction scoring distribution corresponding with sample being distributed according to the corresponding true scoring of sample, building is for calculating true scoring point The first function of gap between cloth and prediction scoring distribution;
It two is commented according to the sign function for describing correlation between two scoring latent factor vectors, and according to for calculating The distance function of distance between point latent factor vector, building for describe correlation between scoring apart from mapping function;
According to the first function, regular terms and described apart from mapping function, second function is constructed;
It controls the prediction scoring distribution and meets maximum entropy model, to carry out simplifying processing to the second function, thus Obtain objective function.
Based on above content it is found that in the embodiment of the present invention, first function can with forDistance mapping Function can be for above-mentioned f (θij), second function can be with for above-mentioned τ (θ), objective function can be for above-mentioned T (θ).
In detail, above-mentioned steps 102~106 are please referred to it is found that for obtain for predict scoring be distributed scoring it is potential because Submatrix is recycled and executes this iterative process, until the judging result of step 105 is yes when certain an iteration.
Based on above content it is found that since θ is unknown, therefore a θ can be initialized(0), and by θ(0)When as the 0th iteration θ calculates θ: θ when the 1st iteration through step 102(1).Assuming that being based on θ(1), the judging result of step 105 be it is no, then can root According to θ(1)To calculate θ: θ when the 2nd iteration(2), and it is based on θ(2)To execute subsequent step.So circulation, until corresponding to l When secondary iteration, it is based on θ(l), the judging result of step 105 be it is yes, then θ: θ when can be according to the l times iteration(l)To carry out scoring point Cloth prediction.
Based on this, it is thus necessary to determine that the second scoring when calculating next iteration according to the first scoring latent factor matrix is potential The specific implementation of factor matrix.
In detail, L-BFGS algorithm is a kind of a kind of algorithm of solution function root proposed on the basis of Newton method.At this In invention one embodiment, L-BFGS method can be used, objective function is solved:Wherein, Δ=θ(l+1)(l),It is the l times iteration When first derivative, i.e., by θ(l)After substituting into T (θ), to T (θ) obtained first derivative.H(θ(l)) sea when being the l times iteration Gloomy matrix.Further abbreviation can obtain:Setting step-length and direction, letter of guarantee numerical stability Decline, then have θ(l+1)(l)+a(l)p(l).Wherein, p(l)It is the direction of search, a(l)It is step-size in search.
In detail, step-size in search is determined by following calculation formula group:
Wherein, 0 < c1< c2< 1.
Specifically, a step-length can be initialized, for example is 1, and judges whether current step-length meets this formula group, Step-length when if satisfaction can be the 0th iteration, if being unsatisfactory for that new step can be calculated according to preset step size computation formula It is long, and judge whether new step-length meets this formula group again.So circulation, until the step-length for meeting the formula group is obtained, Step-length when using as the 0th iteration.Based on same realization principle, it can be based on the formula group, determine iteration each time When step-length.
Further, since Hessian matrix is difficult to determine, the core concept of L-BFGS method be calculate an approximate matrix with In fitting Hessian matrix.
It is in an embodiment of the invention, excellent in order to illustrate a kind of possible scoring latent factor matrix iteration based on this Change implementation, so, the step 103, comprising:
Determine the fit metric when previous iteration;
Determine first derivative when previous iteration, wherein the first derivative when previous iteration is, by described the One scoring latent factor matrix substitutes into the objective function, and is based on the training set, and the one of the objective function acquired Order derivative;
According to the fit metric and the first derivative when previous iteration, the searcher when previous iteration is calculated To;
According to described search direction, the step-size in search when previous iteration is determined;
It is changed next time according to the first scoring latent factor matrix, described search direction and described search step-length, calculating For when second scoring latent factor matrix.
In detail, fit metric when the 0th iteration can be fit metric when initializing.
Based on above content, in an embodiment of the invention, the direction of search of the calculating when previous iteration, packet It includes: calculating the direction of search when previous iteration using formula one;
Step-size in search of the determination when previous iteration, comprising: determine meet the formula group two, work as previous iteration When step-size in search;
Scoring latent factor matrix when the calculating next iteration, comprising: calculate next iteration using formula three When scoring latent factor matrix;
The formula one includes:
The formula group two includes:
The formula three includes: θ(l+1)(l)+a(l)p(l)
Wherein, p(l)Direction of search when for the l times iteration, B(l)Fit metric when for the l times iteration, B(0)It is initial Fit metric when change,First derivative when for the l times iteration, a(l)Step-size in search when for the l times iteration, 0 < c1< c2< 1, (p(l))TFor p(l)Transposition, θ(l)Scoring latent factor matrix when for the l times iteration, θ(0)When to initialize Score latent factor matrix.
Based on above content, in an embodiment of the invention, fit metric of the calculating when previous iteration, packet It includes: calculating the fit metric when previous iteration using formula four;
The formula four includes: B(l+1)=(I- ρ(l)s(l)(u(l))T)B(l)(I(l)u(l)(s(l))T)+ρ(l)s(l)(s(l))T
Wherein,s(l)(l+1)(l)
Wherein, B(l)Fit metric when for the l times iteration, B(0)Fit metric when to initialize, I are unit matrix, θ(l)Scoring latent factor matrix when for the l times iteration, θ(0)Scoring latent factor matrix when to initialize,For First derivative when the l times iteration, (u(l))TFor u(l)Transposition.
In an embodiment of the invention, the step 106 includes: by the second scoring latent factor matrix and to pre- The corresponding sample characteristics matrix of test sample sheet substitutes into maximum entropy model, calculates the corresponding sample rating matrix of the sample to be predicted, To obtain the scoring forecast of distribution result of the sample to be predicted;
The maximum entropy model includes:
Wherein, θ is the scoring latent factor matrix of c column m row, and x is the sample characteristics matrix of n row m column, and c is scoring number, M is characterized number, and n is number of samples, θjkFor the numerical value that jth row kth in θ arranges, xikFor the numerical value that the i-th row kth in x arranges, pi= {p(y1|xi;θ),p(y2|xi;θ),...,p(yc|xi;θ) }, piFor sample xiCorresponding prediction scoring distribution.
In detail, due to the corresponding sample characteristics matrix of sample to be predicted it is known that and scoring after iteration optimization it is potential because Submatrix it is known that therefore the corresponding sample rating matrix of sample to be predicted can be calculated, to obtain the scoring of sample to be predicted Forecast of distribution result.
Based on above-mentioned maximum entropy model it is found that by commenting after the corresponding sample characteristics matrix of sample to be predicted and iteration optimization After dividing latent factor matrix to substitute into maximum entropy model, the degree of available j-th of scoring, i-th of sample of description, and then obtain The degree of each scoring i-th of sample of description, to obtain the scoring forecast of distribution result of sample to be predicted.
In summary, created symbol function and the distance function in objective function of novelty of the embodiment of the present invention, to examine The correlation between scoring label is considered, to improve score in predicting accuracy.In this way, the scoring that can be gone out based on iteration optimization is latent The verification of prediction accuracy raising is carried out in factor matrix.
In detail, known sample can be divided into two parts, a part corresponds to training set, and a part corresponds to test Collection.Training set is mainly used for feature according to internal known sample, scoring situation, 101~105 is changed through the above steps Scoring latent factor matrix after generation optimization.Checksum set is mainly used for the feature according to internal known sample, scoring situation, to To iteration optimization after scoring latent factor matrix tested.
Based on this, in an embodiment of the invention, after the step 106, this method be may further include: will The preset corresponding sample characteristics matrix of test set and the second scoring latent factor matrix substitute into the maximum entropy model, meter Calculate forecast sample rating matrix;
Using at least one evaluation index, the forecast sample rating matrix and the corresponding sample of the test set are verified respectively Similarity degree between this rating matrix.
In detail, due to the corresponding sample characteristics matrix of test set and true sample rating matrix it is known that therefore can incite somebody to action Scoring latent factor matrix after the sample characteristics matrix and iteration optimization substitutes into maximum entropy model, calculates the sample scoring of prediction Matrix.Then, the similarity degree between the sample rating matrix of prediction and true sample rating matrix can be compared, with verification Whether the scoring latent factor matrix after iteration optimization has enough accuracys.
In an embodiment of the invention, above-mentioned at least one evaluation index can be chi-Square measure (Squard χ 2), phase Like at least one of degree (Intersection) and fidelity (Fidelity) or multiple.
In detail, these evaluation indexes are illustrated respectively.
The calculation formula of chi-Square measure are as follows:Wherein, dis is meant that distance.
The calculation formula of similarity are as follows:Wherein, sim is meant that similarity.
The calculation formula of fidelity are as follows:Wherein, sim is meant that similarity.
In addition, j is the subscript of scoring label, c is the total number of scoring label.OjIndicate in true scoring label distribution the The corresponding numerical value of j scoring label, QjIndicate the corresponding numerical value of j-th of scoring label in prediction scoring label distribution.
The chi-Square measure being calculated is smaller, illustrates that two distribution distances are closer, gap is smaller.The similarity being calculated It is bigger, illustrate that two distributions are more similar.The fidelity being calculated is bigger, illustrates prediction distribution and is really distributed closer.
It is verified, please refers to table 1, following experimental result can be obtained.
1 experimental result of table
Squardχ2 Intersection Fidelity
LDLLC 0.0412±0.0213 0.8496±0.0238 0.9789±0.0107
LDSVR 0.0887±0.0031 0.8436±0.0027 0.9764±0.0010
S-SVR 0.1040±0.0030 0.8277±0.0023 0.9722±0.0009
M-SVRP 0.1084±0.0033 0.8186±0.0034 0.9710±0.0010
BGFS-LLD 0.1176±0.0042 0.8186±0.0033 0.9683±0.0012
IIS-LLD 0.1195±0.0054 0.8172±0.0044 0.9676±0.0014
AA-kNN 0.1246±0.0062 0.8101±0.0047 0.9664±0.0018
CPNN 0.1625±0.0206 0.7847±0.0150 0.9551±0.0061
Wherein, LDLLC represents the scoring label Distributed learning for described in the embodiment of the present invention, considering scoring label correlation Algorithm.
As known from Table 1, it is compared with other methods, the prediction distribution obtained when being predicted using this method and true distribution It is more nearly.Therefore, sample scoring distribution is predicted using this method, relatively good prediction effect can be reached.
As shown in Fig. 2, one embodiment of the invention provides another scoring distribution forecasting method, following step is specifically included It is rapid:
Step 201: prediction scoring distribution corresponding with sample being distributed according to the corresponding true scoring of sample, is constructed based on Calculate the first function of gap between true scoring distribution and prediction scoring distribution.
In detail, first function can be to be above-mentioned
Step 202: according to the sign function for describing correlation between two scoring latent factor vectors, and according to being used for The distance function of distance, building map letter for describing the distance of correlation between scoring between two scoring latent factor vectors of calculating Number.
It in detail, can be for above-mentioned f (θ apart from mapping functionij)。
Step 203: according to first function, regular terms and apart from mapping function, constructing second function.
In detail, second function can be for above-mentioned τ (θ).
Step 204: control forecasting scoring distribution meets maximum entropy model, to carry out simplifying processing to second function, thus Obtain objective function.
In detail, objective function can be for above-mentioned T (θ).
Step 205: determining the first scoring latent factor matrix when previous iteration.
It in detail, can be using scoring latent factor matrix when initializing as scoring latent factor when the 0th iteration Matrix.
Step 206: determining the fit metric when previous iteration.
It in detail, can be using fit metric when initializing as fit metric when the 0th iteration.
In detail, it can use fit metric of the calculating of above-mentioned formula four when previous iteration.
Step 207: determining first derivative when previous iteration, wherein the first derivative when previous iteration is, by the One scoring latent factor matrix substitutes into objective function, and is based on training set, and the first derivative of the objective function acquired.
Step 208: according to fit metric and the first derivative when previous iteration, calculating the searcher when previous iteration To.
In detail, it can use the direction of search of the calculating of above-mentioned formula one when previous iteration.
Step 209: according to the direction of search, determining the step-size in search when previous iteration.
In detail, it can use the determining step-size in search when previous iteration of above-mentioned formula group two.
Step 210: according to the first scoring latent factor matrix, the direction of search and step-size in search, when calculating next iteration Second scoring latent factor matrix.
In detail, it can use scoring latent factor matrix when above-mentioned formula three calculates next iteration.
Step 211: the second scoring latent factor matrix being substituted into objective function, and is based on training set, solves objective function First derivative.
Step 212: judging the value of F norm of the first derivative at the second scoring latent factor matrix, if be less than pre- If termination threshold value, if so, execute step 213, otherwise, by second scoring latent factor matrix be used as again when previous iteration Scoring latent factor matrix, and execute step 205.
Step 213: the second scoring latent factor matrix and the corresponding sample characteristics matrix of sample to be predicted being substituted into maximum Entropy model calculates the corresponding sample rating matrix of sample to be predicted, to obtain the scoring forecast of distribution result of sample to be predicted.
Step 214: the corresponding sample characteristics matrix of preset test set and the second scoring latent factor matrix are substituted into most Big entropy model calculates forecast sample rating matrix.
Step 215: using at least one evaluation index, verifying forecast sample rating matrix and the corresponding sample of test set respectively Similarity degree between this rating matrix.
As shown in figure 3, one embodiment of the invention provides a kind of any of the above-described scoring distribution forecasting method of execution Scoring forecast of distribution device, may include:
Objective function construction unit 301, for constructing objective function, wherein the objective function includes: for describing two Score latent factor vector between correlation sign function, and, for calculates two score latent factor vectors between distance distance Function;
Determination unit 302, for determining the first scoring latent factor matrix when previous iteration;
First processing units 303, for calculating the when next iteration according to the first scoring latent factor matrix Two scoring latent factor matrixes;
First derivative solves unit 304, for the second scoring latent factor matrix to be substituted into the objective function, and Based on preset training set, the first derivative of the objective function is solved;
The second processing unit 305, for judging F model of the first derivative at the second scoring latent factor matrix Several value, if be less than preset termination threshold value, if so, otherwise triggering scoring forecast of distribution unit 306 is commented described second Divide latent factor matrix to be used as the scoring latent factor matrix when previous iteration again, and triggers the determination unit 302;
The scoring forecast of distribution unit 306, for obtaining sample to be predicted according to the second scoring latent factor matrix This scoring forecast of distribution result.
The contents such as the information exchange between each unit, implementation procedure in above-mentioned apparatus, due to implementing with the method for the present invention Example is based on same design, and for details, please refer to the description in the embodiment of the method for the present invention, and details are not described herein again.
In addition, one embodiment of the invention provides a kind of readable medium, including execute instruction, when the place of storage control When executing instruction described in reason device execution, the storage control executes any of the above-described scoring distribution forecasting method.
In addition, one embodiment of the invention provides a kind of storage control, comprising: processor, memory and bus;
The memory is executed instruction for storing, and the processor is connect with the memory by the bus, when When the storage control is run, the processor executes the described of memory storage and executes instruction, so that the storage Controller executes any of the above-described scoring distribution forecasting method.
In conclusion the embodiment of the present invention have it is at least following the utility model has the advantages that
1, in the embodiment of the present invention, building include for describe two scoring latent factor vectors between correlation symbol letter Number and for calculates two score latent factor vectors between distance distance function objective function;Training set data is inputted, is determined When previous iteration scoring latent factor matrix and accordingly calculating next iteration when scoring latent factor matrix, to obtain New matrix;New matrix is substituted into objective function, solves the first derivative of objective function;F norm of the first derivative at new matrix Value when being less than preset termination threshold value, the scoring forecast of distribution of sample to be predicted is obtained as a result, otherwise, by new square according to new matrix Battle array is again as the scoring latent factor matrix when previous iteration, so circulation.The embodiment of the present invention passes through in objective function Middle created symbol function and distance function, it is contemplated that scoring label between correlation, therefore predicted based on this scoring be distributed when Prediction accuracy can be improved.
2, in the embodiment of the present invention, key point is to consider the correlation between scoring label, proposes that a kind of consideration is commented The scoring label Distributed learning algorithm of minute mark label correlation.The algorithm makes full use of movie features data, and building film first is special The distribution of sign-scoring apart from mapping function, which includes two parts, and first part is sign function, second part be away from From function.Secondly new optimization object function is devised using maximum entropy model.Finally, by solving new optimization aim letter Number, prediction scoring distribution.
3, in the embodiment of the present invention, be compared with other methods, the prediction distribution obtained when being predicted using this method with True distribution is more nearly.Therefore, sample scoring distribution is predicted using this method, relatively good prediction effect can be reached Fruit.
It should be noted that, in this document, such as first and second etc relational terms are used merely to an entity Or operation is distinguished with another entity or operation, is existed without necessarily requiring or implying between these entities or operation Any actual relationship or order.Moreover, the terms "include", "comprise" or its any other variant be intended to it is non- It is exclusive to include, so that the process, method, article or equipment for including a series of elements not only includes those elements, It but also including other elements that are not explicitly listed, or further include solid by this process, method, article or equipment Some elements.In the absence of more restrictions, the element limited by sentence " including one ", is not arranged Except there is also other identical factors in the process, method, article or apparatus that includes the element.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can store in computer-readable storage medium, the program When being executed, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes: ROM, RAM, magnetic disk or light In the various media that can store program code such as disk.
Finally, it should be noted that the foregoing is merely presently preferred embodiments of the present invention, it is merely to illustrate skill of the invention Art scheme, is not intended to limit the scope of the present invention.Any modification for being made all within the spirits and principles of the present invention, Equivalent replacement, improvement etc., are included within the scope of protection of the present invention.

Claims (10)

1. a kind of scoring distribution forecasting method characterized by comprising
S1: building objective function, wherein the objective function includes: for correlation between two scoring latent factor vectors of description Sign function, and, for calculate two scoring latent factor vectors between distance distance function;
S2: the first scoring latent factor matrix when previous iteration is determined;
S3: the second scoring latent factor matrix when next iteration is calculated according to the first scoring latent factor matrix;
S4: the second scoring latent factor matrix is substituted into the objective function, and is based on preset training set, described in solution The first derivative of objective function;
S5: judge the value of F norm of the first derivative at the second scoring latent factor matrix, if be less than preset Threshold value is terminated, if so, S6 is executed, otherwise, by the second scoring latent factor matrix again as commenting when previous iteration Divide latent factor matrix, and executes S2;
S6: according to the second scoring latent factor matrix, the scoring forecast of distribution result of sample to be predicted is obtained.
2. the method according to claim 1, wherein
The sign function includes:
Wherein,
The distance function includes:
Wherein, θ is the scoring latent factor matrix of c column m row, and c is scoring number, and m is characterized number, θikFor the i-th row kth in θ The numerical value of column, θjkThe numerical value arranged for jth row kth in θ.
3. according to the method described in claim 2, it is characterized in that,
The objective function includes:
Wherein, t (θ) is combination of function, λ1For predetermined coefficient.
4. according to the method described in claim 3, it is characterized in that,
The combination of function includes:
Wherein, x is the sample characteristics matrix of n row m column, and d is the sample rating matrix of n row c column, and n is number of samples, dijFor in d The numerical value of i-th row jth column, xikFor the numerical value that the i-th row kth in x arranges, λ2For predetermined coefficient, | | θ | |FFor the F norm of θ.
5. the method according to claim 1, wherein
The S1 includes: to be distributed prediction scoring distribution corresponding with sample according to the corresponding true scoring of sample, is constructed based on Calculate the first function of gap between true scoring distribution and prediction scoring distribution;
According to the sign function for describing correlation between two scoring latent factor vectors, and according to latent for calculating two scorings Because subvector spacing from distance function, building for describe score between correlation apart from mapping function;
According to the first function, regular terms and described apart from mapping function, second function is constructed;
It controls the prediction scoring distribution and meets maximum entropy model, to carry out simplifying processing to the second function, to obtain Objective function.
6. the method according to claim 1, wherein
The S3, comprising:
Determine the fit metric when previous iteration;
Determine the first derivative when previous iteration, wherein the first derivative when previous iteration is to comment described first Divide latent factor matrix to substitute into the objective function, and is based on the training set, and the single order of the objective function acquired is led Number;
According to the fit metric and the first derivative when previous iteration, the direction of search when previous iteration is calculated;
According to described search direction, the step-size in search when previous iteration is determined;
According to the first scoring latent factor matrix, described search direction and described search step-length, when calculating next iteration Second scoring latent factor matrix.
7. according to the method described in claim 6, it is characterized in that,
The direction of search of the calculating when previous iteration, comprising: the direction of search when previous iteration is calculated using formula one;
Step-size in search of the determination when previous iteration, comprising: determination meets the formula group two, when previous iteration Step-size in search;
Scoring latent factor matrix when the calculating next iteration, comprising: when using the calculating next iteration of formula three Score latent factor matrix;
The formula one includes:
The formula group two includes:
The formula three includes: θ(l+1)(l)+a(l)p(l)
Wherein, p(l)Direction of search when for the l times iteration, B(l)Fit metric when for the l times iteration, B(0)When to initialize Fit metric,First derivative when for the l times iteration, a(l)Step-size in search when for the l times iteration, 0 < c1< c2< 1, (p(l))TFor p(l)Transposition, θ(l)Scoring latent factor matrix when for the l times iteration, θ(0)Scoring when to initialize Latent factor matrix;
And/or
The fit metric of the calculating when previous iteration, comprising: the fit metric when previous iteration is calculated using formula four;
The formula four includes: B(l+1)=(I- ρ(l)s(l)(u(l))T)B(l)(I-ρ(l)u(l)(s(l))T)+ρ(l)s(l)(s(l))T
Wherein,s(l)(l+1)(l)
Wherein, B(l)Fit metric when for the l times iteration, B(0)Fit metric when to initialize, I are unit matrix, θ(l)For Scoring latent factor matrix when the l times iteration, θ(0)Scoring latent factor matrix when to initialize,It is the l times First derivative when iteration, (u(l))TFor u(l)Transposition.
8. according to claim 1 to any method in 7, which is characterized in that
The S6 includes: to substitute into the second scoring latent factor matrix and the corresponding sample characteristics matrix of sample to be predicted most Big entropy model calculates the corresponding sample rating matrix of the sample to be predicted, to obtain the scoring distribution of the sample to be predicted Prediction result;
The maximum entropy model includes:
Wherein, θ is the scoring latent factor matrix of c column m row, and x is the sample characteristics matrix of n row m column, and c is scoring number, and m is Characteristic Number, n are number of samples, θjkFor the numerical value that jth row kth in θ arranges, xikFor the numerical value that the i-th row kth in x arranges, pi={ p (y1|xi;θ),p(y2|xi;θ),...,p(yc|xi;θ) }, piFor sample xiCorresponding prediction scoring distribution.
9. according to the method described in claim 8, it is characterized in that,
After the S6, further comprise: the corresponding sample characteristics matrix of preset test set and second scoring is latent The maximum entropy model is substituted into factor matrix, calculates forecast sample rating matrix;
Using at least one evaluation index, the forecast sample rating matrix is verified respectively and the corresponding sample of the test set is commented Similarity degree between sub-matrix.
The scoring forecast of distribution device of distribution forecasting method 10. a kind of execution is scored as described in any in claim 1 to 9, It is characterized in that, comprising:
Objective function construction unit, for constructing objective function, wherein the objective function includes: latent for describing two scorings In the sign function because of correlation between subvector, and, for calculate two scoring latent factor vectors between distance distance function;
Determination unit, for determining the first scoring latent factor matrix when previous iteration;
First processing units, it is latent for the second scoring when calculating next iteration according to the first scoring latent factor matrix In factor matrix;
First derivative solves unit, for the second scoring latent factor matrix to be substituted into the objective function, and based on pre- If training set, solve the first derivative of the objective function;
The second processing unit, for judging the value of F norm of the first derivative at the second scoring latent factor matrix, Whether preset termination threshold value is less than, if so, triggering scoring forecast of distribution unit, otherwise, by the second scoring latent factor Matrix is used as the scoring latent factor matrix when previous iteration again, and triggers the determination unit;
The scoring forecast of distribution unit, for obtaining commenting for sample to be predicted according to the second scoring latent factor matrix Divide forecast of distribution result.
CN201910181625.5A 2019-03-11 2019-03-11 A kind of scoring distribution forecasting method and device Pending CN109872006A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910181625.5A CN109872006A (en) 2019-03-11 2019-03-11 A kind of scoring distribution forecasting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910181625.5A CN109872006A (en) 2019-03-11 2019-03-11 A kind of scoring distribution forecasting method and device

Publications (1)

Publication Number Publication Date
CN109872006A true CN109872006A (en) 2019-06-11

Family

ID=66920208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910181625.5A Pending CN109872006A (en) 2019-03-11 2019-03-11 A kind of scoring distribution forecasting method and device

Country Status (1)

Country Link
CN (1) CN109872006A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390561A (en) * 2019-07-04 2019-10-29 四川金赞科技有限公司 User-financial product of stochastic gradient descent is accelerated to select tendency ultra rapid predictions method and apparatus based on momentum
CN110928936A (en) * 2019-10-18 2020-03-27 平安科技(深圳)有限公司 Information processing method, device, equipment and storage medium based on reinforcement learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390561A (en) * 2019-07-04 2019-10-29 四川金赞科技有限公司 User-financial product of stochastic gradient descent is accelerated to select tendency ultra rapid predictions method and apparatus based on momentum
CN110390561B (en) * 2019-07-04 2022-04-29 壹融站信息技术(深圳)有限公司 User-financial product selection tendency high-speed prediction method and device based on momentum acceleration random gradient decline
CN110928936A (en) * 2019-10-18 2020-03-27 平安科技(深圳)有限公司 Information processing method, device, equipment and storage medium based on reinforcement learning
CN110928936B (en) * 2019-10-18 2023-06-16 平安科技(深圳)有限公司 Information processing method, device, equipment and storage medium based on reinforcement learning

Similar Documents

Publication Publication Date Title
CN103164463B (en) Method and device for recommending labels
WO2019223384A1 (en) Feature interpretation method and device for gbdt model
CN112487805B (en) Small sample Web service classification method based on meta-learning framework
CN113392651B (en) Method, device, equipment and medium for training word weight model and extracting core words
CN109960749B (en) Model obtaining method, keyword generation method, device, medium and computing equipment
CN109145083B (en) Candidate answer selecting method based on deep learning
CN109933792A (en) Viewpoint type problem based on multi-layer biaxially oriented LSTM and verifying model reads understanding method
KR20170096282A (en) Deep learning type classification method with feature-based weighting
CN112329460A (en) Text topic clustering method, device, equipment and storage medium
CN107506617A (en) The half local disease-associated Forecasting Methodologies of social information miRNA
US20200020321A1 (en) Speech recognition results re-ranking device, speech recognition results re-ranking method, and program
US20220092381A1 (en) Neural architecture search via similarity-based operator ranking
CN109872006A (en) A kind of scoring distribution forecasting method and device
CN112528136A (en) Viewpoint label generation method and device, electronic equipment and storage medium
JP5975938B2 (en) Speech recognition apparatus, speech recognition method and program
CN114579892A (en) User remote access position prediction method based on cross-city interest point matching
Ludwig et al. Deep embedding for spatial role labeling
US9348810B2 (en) Model learning method
CN117312856A (en) Commodity category prediction model training and application method and electronic equipment
CN110135507A (en) A kind of label distribution forecasting method and device
Wei Recommended methods for teaching resources in public English MOOC based on data chunking
Chen et al. Sequential neural networks for noetic end-to-end response selection
CN114529191A (en) Method and apparatus for risk identification
CN117616473A (en) Process video assessment
Gutiérrez-Soto et al. Probabilistic reuse of past search results

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190611