CN109872006A

CN109872006A - A kind of scoring distribution forecasting method and device

Info

Publication number: CN109872006A
Application number: CN201910181625.5A
Authority: CN
Inventors: 张恒汝; 秦琴; 徐媛媛; 闵帆
Original assignee: Southwest Petroleum University
Current assignee: Southwest Petroleum University
Priority date: 2019-03-11
Filing date: 2019-03-11
Publication date: 2019-06-11

Abstract

The present invention provides a kind of scoring distribution forecasting method and device, this method comprises: building includes for describe the sign function of correlation between two scoring latent factor vectors and for calculating two objective functions for scoring the distance function of distance between latent factor vectors；Training set data is inputted, scoring latent factor matrix when determining the scoring latent factor matrix when previous iteration and calculating next iteration accordingly, to obtain new matrix；New matrix is substituted into objective function, solves the first derivative of objective function；When the value of F norm of the first derivative at new matrix is less than preset termination threshold value, the scoring forecast of distribution of sample to be predicted is obtained as a result, otherwise according to new matrix, new matrix is regard as the scoring latent factor matrix when previous iteration again, so recycles.This programme passes through created symbol function and distance function in objective function, it is contemplated that the correlation between scoring label, therefore predict that prediction accuracy can be improved when scoring distribution based on this.

Description

A kind of scoring distribution forecasting method and device

Technical field

The present invention relates to field of computer technology, in particular to a kind of scoring distribution forecasting method and device.

Background technique

Under normal conditions, by the forecast of distribution that scores, it anticipated that the scoring distribution situation of sample to be released.To base In certain foreseeability to reduce investment risk as far as possible.By taking moviemaking as an example, moviemaking is as the tens billion of beauty of value Member global industry, there is thousands of New cinema to put goods on the market every year, but and not all film be all on box office Function.Therefore, either motion picture producer or cinema, require prediction New cinema before publication spectators to this film Like degree.

Currently, the scoring that can use film is distributed and likes degree to characterize spectators to certain film.Naturally, LDL (Label DistrubitionLearning, label Distributed learning) method can be applied in this field.LDL is single label One kind of study and multi-tag study is extensive, is a more generally applicable learning framework, can be obtained in practical application by LDL Label significance distribution, to can effectively solve label fuzzy problem.

Existing work predicted using support vector machines scoring distribution, this method think scoring between be it is independent, do not have There is the relevance between considering to score.However, this relevance is generally existing, for example, for 1-5 points-scoring system and Speech, pessimistic user give 1 point to the film not liked, and optimistic user may provide 2 points or higher to identical film Point.It therefore, the use of the scoring distribution strictly distinguished is improper come the pouplarity for characterizing a film.As it can be seen that existing reality The prediction accuracy of existing mode is not high.

Summary of the invention

The present invention provides a kind of scoring distribution forecasting method and devices, can be improved prediction accuracy.

In order to achieve the above object, the present invention is achieved through the following technical solutions:

On the one hand, the present invention provides a kind of scoring distribution forecasting methods, comprising:

S1: building objective function, wherein the objective function includes: for phase between two scoring latent factor vectors of description Close property sign function, and, for calculates two score latent factor vectors between distance distance function；

S2: the first scoring latent factor matrix when previous iteration is determined；

S3: the second scoring latent factor square when next iteration is calculated according to the first scoring latent factor matrix Battle array；

S4: the second scoring latent factor matrix is substituted into the objective function, and is based on preset training set, is solved The first derivative of the objective function；

S5: judge the value of F norm of the first derivative at the second scoring latent factor matrix, if be less than pre- If termination threshold value, if so, execute S6, otherwise, by it is described second scoring latent factor matrix be used as again when previous iteration Scoring latent factor matrix, and execute S2；

S6: according to the second scoring latent factor matrix, the scoring forecast of distribution result of sample to be predicted is obtained.

Further, the sign function includes:

Wherein,

The distance function includes:

Wherein, θ is the scoring latent factor matrix of c column m row, and c is scoring number, and m is characterized number, θ_ikFor the i-th row in θ The numerical value of kth column, θ_jkThe numerical value arranged for jth row kth in θ.

Further, the objective function includes:

Wherein, t (θ) is combination of function, λ₁For predetermined coefficient.

Further, the combination of function includes:

Wherein, x is the sample characteristics matrix of n row m column, and d is the sample rating matrix of n row c column, and n is number of samples, d_ijFor The numerical value that the i-th row jth arranges in d, x_ikFor the numerical value that the i-th row kth in x arranges, λ₂For predetermined coefficient, | | θ | |_FFor the F norm of θ.

Further, the S1 includes: to be distributed prediction scoring point corresponding with sample according to the corresponding true scoring of sample Cloth constructs the first function for calculating gap between true scoring distribution and prediction scoring distribution；

It two is commented according to the sign function for describing correlation between two scoring latent factor vectors, and according to for calculating The distance function of distance between point latent factor vector, building for describe correlation between scoring apart from mapping function；

According to the first function, regular terms and described apart from mapping function, second function is constructed；

It controls the prediction scoring distribution and meets maximum entropy model, to carry out simplifying processing to the second function, thus Obtain objective function.

Further, the S3, comprising:

Determine the fit metric when previous iteration；

Determine first derivative when previous iteration, wherein the first derivative when previous iteration is, by described the One scoring latent factor matrix substitutes into the objective function, and is based on the training set, and the one of the objective function acquired Order derivative；

According to the fit metric and the first derivative when previous iteration, the searcher when previous iteration is calculated To；

According to described search direction, the step-size in search when previous iteration is determined；

It is changed next time according to the first scoring latent factor matrix, described search direction and described search step-length, calculating For when second scoring latent factor matrix.

Further, the direction of search of the calculating when previous iteration, comprising: work as previous iteration using the calculating of formula one When the direction of search；

Step-size in search of the determination when previous iteration, comprising: determine meet the formula group two, work as previous iteration When step-size in search；

Scoring latent factor matrix when the calculating next iteration, comprising: calculate next iteration using formula three When scoring latent factor matrix；

The formula one includes:

The formula group two includes:

The formula three includes: θ^(l+1)=θ^(l)+a^(l)p^(l)；

Wherein, p^(l)Direction of search when for the l times iteration, B^(l)Fit metric when for the l times iteration, B⁽⁰⁾It is initial Fit metric when change,First derivative when for the l times iteration, a^(l)Step-size in search when for the l times iteration, 0 < c₁< c₂< 1, (p^(l))^TFor p^(l)Transposition, θ^(l)Scoring latent factor matrix when for the l times iteration, θ⁽⁰⁾When to initialize Score latent factor matrix.

Further, fit metric of the calculating when previous iteration, comprising: work as previous iteration using the calculating of formula four When fit metric；

The formula four includes: B^(l+1)=(I- ρ^(l)s^(l)(u^(l))^T)B^(l)(I-ρ^(l)u^(l)(s^(l))^T)+ρ^(l)s^(l)(s^(l))^T

Wherein,s^(l)=θ^(l+1)-θ^(l)；

Wherein, B^(l)Fit metric when for the l times iteration, B⁽⁰⁾Fit metric when to initialize, I are unit matrix, θ^(l)Scoring latent factor matrix when for the l times iteration, θ⁽⁰⁾Scoring latent factor matrix when to initialize,For First derivative when the l times iteration, (u^(l))^TFor u^(l)Transposition.

Further, the S6 includes: by the second scoring latent factor matrix and the corresponding sample of sample to be predicted Eigenmatrix substitutes into maximum entropy model, calculates the corresponding sample rating matrix of the sample to be predicted, described to be predicted to obtain The scoring forecast of distribution result of sample；

The maximum entropy model includes:

Wherein, θ is the scoring latent factor matrix of c column m row, and x is the sample characteristics matrix of n row m column, and c is scoring number, M is characterized number, and n is number of samples, θ_jkFor the numerical value that jth row kth in θ arranges, x_ikFor the numerical value that the i-th row kth in x arranges, p_i= {p(y₁|x_i；θ),p(y₂|x_i；θ),...,p(y_c|x_i；θ) }, p_iFor sample x_iCorresponding prediction scoring distribution.

Further, after the S6, further comprise: by the corresponding sample characteristics matrix of preset test set and institute It states the second scoring latent factor matrix and substitutes into the maximum entropy model, calculate forecast sample rating matrix；

Using at least one evaluation index, the forecast sample rating matrix and the corresponding sample of the test set are verified respectively Similarity degree between this rating matrix.

On the other hand, the present invention provides a kind of scoring distribution for executing any of the above-described scoring distribution forecasting method is pre- Survey device characterized by comprising

Objective function construction unit, for constructing objective function, wherein the objective function includes: to comment for describing two The sign function of correlation between point latent factor vector, and, for calculates two score distance between latent factor vectors apart from letter Number；

Determination unit, for determining the first scoring latent factor matrix when previous iteration；

First processing units, second when for calculating next iteration according to the first scoring latent factor matrix comments Divide latent factor matrix；

First derivative solves unit, for the second scoring latent factor matrix to be substituted into the objective function, and base In preset training set, the first derivative of the objective function is solved；

The second processing unit, for judging F norm of the first derivative at the second scoring latent factor matrix Value, if be less than preset termination threshold value, if so, triggering scoring forecast of distribution unit, otherwise, will it is described second score it is potential Factor matrix is used as the scoring latent factor matrix when previous iteration again, and triggers the determination unit；

The scoring forecast of distribution unit, for obtaining sample to be predicted according to the second scoring latent factor matrix Scoring forecast of distribution result.

The present invention provides a kind of scoring distribution forecasting method and devices, this method comprises: building includes for describing Two scoring latent factor vectors between correlation sign function and for calculates two score latent factor vectors between distance distance The objective function of function；Training set data is inputted, under determining the scoring latent factor matrix when previous iteration and calculating accordingly Scoring latent factor matrix when an iteration, to obtain new matrix；New matrix is substituted into objective function, solves objective function First derivative；When the value of F norm of the first derivative at new matrix is less than preset termination threshold value, obtained according to new matrix to be predicted The scoring forecast of distribution of sample is used as the scoring latent factor matrix when previous iteration as a result, otherwise, by new matrix again, such as This circulation.The present invention passes through created symbol function and distance function in objective function, it is contemplated that the correlation between scoring label Property, therefore predict that prediction accuracy can be improved when scoring distribution based on this.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.

Fig. 1 is a kind of flow chart for scoring distribution forecasting method that one embodiment of the invention provides；

Fig. 2 is the flow chart for another scoring distribution forecasting method that one embodiment of the invention provides；

Fig. 3 is a kind of schematic diagram for scoring forecast of distribution device that one embodiment of the invention provides.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments, based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

As shown in Figure 1, may comprise steps of the embodiment of the invention provides a kind of scoring distribution forecasting method:

Step 101: building objective function, wherein the objective function includes: for describing two scoring latent factor vectors Between correlation sign function, and, for calculate two scoring latent factor vectors between distance distance function.

Step 102: determining the first scoring latent factor matrix when previous iteration.

Step 103: according to it is described first scoring latent factor matrix calculate next iteration when second scoring it is potential because Submatrix.

Step 104: the second scoring latent factor matrix being substituted into the objective function, and is based on preset training Collection, solves the first derivative of the objective function.

Step 105: judging the value of F norm of the first derivative at the second scoring latent factor matrix, if Less than preset termination threshold value, if so, executing step 106, otherwise, the second scoring latent factor matrix is used as again and is worked as Scoring latent factor matrix when previous iteration, and execute step 102.

Step 106: according to the second scoring latent factor matrix, obtaining the scoring forecast of distribution knot of sample to be predicted Fruit.

The embodiment of the invention provides a kind of scoring distribution forecasting methods, this method comprises: building includes for describing Two scoring latent factor vectors between correlation sign function and for calculates two score latent factor vectors between distance distance The objective function of function；Training set data is inputted, under determining the scoring latent factor matrix when previous iteration and calculating accordingly Scoring latent factor matrix when an iteration, to obtain new matrix；New matrix is substituted into objective function, solves objective function First derivative；When the value of F norm of the first derivative at new matrix is less than preset termination threshold value, obtained according to new matrix to be predicted The scoring forecast of distribution of sample is used as the scoring latent factor matrix when previous iteration as a result, otherwise, by new matrix again, such as This circulation.The embodiment of the present invention passes through created symbol function and distance function in objective function, it is contemplated that between scoring label Correlation, therefore predicted based on this scoring distribution when prediction accuracy can be improved.

The label Distributed learning method of this consideration scoring label correlation provided in an embodiment of the present invention, can be to sample Scoring distribution predicted.For example, this method can be applied in film scoring forecast of distribution, to carry out movie samples Score forecast of distribution.Certainly, this method can be also applied in other industry scoring forecast of distribution, such as novel, TV play Deng.

By taking film score in predicting as an example, can according to the feature of New cinema, and according in default training set have film Feature and scoring situation, to predict the scoring of New cinema.It is thus desirable to obtain the correlation between feature and scoring, that is, obtain Obtain the feature rating matrix θ of c row m column.Wherein, c is scoring number, and m is characterized number.

Since θ is potential unknown matrix, therefore in the embodiment of the present invention, θ can be known as the latent factor matrix that scores；Together When, θ also illustrates the relevance between two scoring labels, therefore is also referred to as apart from mapping matrix.

In detail, θ is the scoring latent factor matrix of c row m column, therefore reflects the corresponding relationship between feature and scoring. When carrying out film scoring forecast of distribution, feature is that movie features, such as director, protagonist, the playwright, screenwriter of film etc. can be film Feature；Scoring is film scoring, for example each scoring label is respectively 1 point, 2 points, 3 points, 4 points and 5 points.

Since θ is unknown, therefore a θ can be initialized, and using θ when initializing as scoring when the 0th iteration it is potential because Submatrix, step 102~106 Lai Zhihang, i.e., according to the 0th iteration when θ calculate θ when the 1st iteration, and be based on the 1st The operation such as the solution of θ when secondary iteration, Lai Zhihang first derivative and the comparison judgement for terminating threshold value.If being judged as in step 105 It is that is, it is believed that θ when the 1st iteration is ideal θ, therefore the ideal θ can be based on, to predict the scoring of New cinema.If Be judged as NO in step 105, then can execute step 102~106 again, i.e., according to the 1st iteration when θ calculate the 2nd time repeatedly For when θ, and θ when based on the 2nd iteration, the solution of Lai Zhihang first derivative, with the operation such as the comparison judgement that terminates threshold value. So circulation, until iterating to calculate out ideal θ.In this way, the ideal θ can be based on, to predict the scoring of New cinema.

Under normal conditions, can θ and preset training set when initially entering initialization, using the input as objective function ?.After this, can based on the given data in this training set, and according to each secondary iteration go out θ, to execute each time one The solution of order derivative.

Since the ideal θ is obtained from being continued to optimize based on objective function and training set, and objective function considers Correlation between scoring label, therefore it is believed that the New cinema predicted based on the ideal θ scoring is accurately, easily In meeting New cinema scoring truth.

Above-mentioned steps 101 are please referred to, in order to consider the correlation between scoring label, objective function should include above-mentioned Sign function and distance function, to describe the correlation between scoring label.In one embodiment of the invention, the symbol Function includes:

Wherein,

The distance function includes:

Based on above content it is found that sign function is for describing scoring latent factor vector θ_iAnd θ_jBetween correlation.Its In, cosine (θ_i,θ_j) for calculating the distance between two scoring latent factor vectors, using signal function by this distance It is converted into positive correlation, negative correlation and uncorrelated three kinds of states.

Based on above content it is found that distance function is used to describe the degree of correlation between scoring latent factor vector.

In the embodiment of the present invention, construct in combination with sign function and distance function apart from mapping function f (θ_i,θ_j), to retouch Correlation between commentary minute mark label.By this apart from mapping function, the latent factor vector of scoring can get, so as to use Cosine distance come measure two scoring latent factor vector between similarity.

As it can be seen that the combination based on sign function and distance function, can make this scoring Distributed learning mould of objective function Type considers the correlation between scoring label, when carrying out scoring forecast of distribution, can be mentioned based on the scoring Distributed learning model High prediction accuracy.

Based on above content, in an embodiment of the invention, the objective function includes:

Wherein, t (θ) is combination of function, λ₁For predetermined coefficient.

It in the embodiment of the present invention, can construct apart from mapping function: f (θ_i,θ_j)=sgn (cosine (θ_i,θ_j))Dis(θ_i, θ_j), to describe the correlation between scoring label.

In detail, f (θ_i,θ_j) by correlation multiplied by degree of correlation, obtain being positively correlated distance, negatively correlated distance or not phase The association results of pass describe the relevance between scoring label well.For example, f (θ_a,θ_b) > 0 and value it is bigger, show to work as When there is grading system a, a possibility that grading system b occur, is bigger.Conversely, f (θ_a,θ_b) < 0 and value it is smaller, show when occur When grading system a, a possibility that grading system b occur, is smaller.

Based on above content, in an embodiment of the invention, the combination of function includes:

In detail, by being introduced in objective function | | θ | |_FThis regular terms can play the work for avoiding overfitting With.

In detail, the realization process for constructing this objective function can be as described below:

Firstly, enabling X=R^mIndicate sample space, Y={ y₁,y2,...,y_cIndicate scoring set.Wherein m is characterized a Number, c are scoring number.Enable S={ (x₁,d₁),(x₂,d₂),...,(x_n,d_n) indicate training set, wherein x_i∈ X indicates i-th A sample, d_i={ d_i1,d_i2,...,d_icIndicate sample x_iCorresponding scoring distribution, a shared n sample.d_ijIt indicates j-th The degree of scoring i-th of sample of description.Indicate all scorings completely one sample x of description_i。

Then, p is enabled_i={ p (y₁|x_i；θ),p(y₂|x_i；θ),...,p(y_c|x_i；θ) } indicate sample x_iCorresponding pre- assessment Distribution.In this way, Kullback-Leibler (KL) divergence distance can be used, that is, can be usedTo calculate the gap between true distribution and prediction distribution.

Then, building is as described above apart from mapping function.

Later, using maximum entropy model, f (θ is utilized_i,θ_j) indicate grading system correlation, obtain such as minor function:

Finally, due to p (y_k|x_i；θ) meet maximum entropy model, then hasBy This, can carry out abbreviation to above formula, can obtain following formula:

The resulting formula of this abbreviation can be the objective function constructed.

With the building process of above-mentioned objective function correspondingly, in an embodiment of the invention, the step 101 packet It includes: prediction scoring distribution corresponding with sample being distributed according to the corresponding true scoring of sample, building is for calculating true scoring point The first function of gap between cloth and prediction scoring distribution；

Based on above content it is found that in the embodiment of the present invention, first function can with forDistance mapping Function can be for above-mentioned f (θ_i,θ_j), second function can be with for above-mentioned τ (θ), objective function can be for above-mentioned T (θ).

In detail, above-mentioned steps 102~106 are please referred to it is found that for obtain for predict scoring be distributed scoring it is potential because Submatrix is recycled and executes this iterative process, until the judging result of step 105 is yes when certain an iteration.

Based on above content it is found that since θ is unknown, therefore a θ can be initialized⁽⁰⁾, and by θ⁽⁰⁾When as the 0th iteration θ calculates θ: θ when the 1st iteration through step 102⁽¹⁾.Assuming that being based on θ⁽¹⁾, the judging result of step 105 be it is no, then can root According to θ⁽¹⁾To calculate θ: θ when the 2nd iteration⁽²⁾, and it is based on θ⁽²⁾To execute subsequent step.So circulation, until corresponding to l When secondary iteration, it is based on θ^(l), the judging result of step 105 be it is yes, then θ: θ when can be according to the l times iteration^(l)To carry out scoring point Cloth prediction.

Based on this, it is thus necessary to determine that the second scoring when calculating next iteration according to the first scoring latent factor matrix is potential The specific implementation of factor matrix.

In detail, L-BFGS algorithm is a kind of a kind of algorithm of solution function root proposed on the basis of Newton method.At this In invention one embodiment, L-BFGS method can be used, objective function is solved:Wherein, Δ=θ^(l+1)-θ^(l),It is the l times iteration When first derivative, i.e., by θ^(l)After substituting into T (θ), to T (θ) obtained first derivative.H(θ^(l)) sea when being the l times iteration Gloomy matrix.Further abbreviation can obtain:Setting step-length and direction, letter of guarantee numerical stability Decline, then have θ^(l+1)=θ^(l)+a^(l)p^(l).Wherein, p^(l)It is the direction of search, a^(l)It is step-size in search.

In detail, step-size in search is determined by following calculation formula group:

Wherein, 0 < c₁< c₂< 1.

Specifically, a step-length can be initialized, for example is 1, and judges whether current step-length meets this formula group, Step-length when if satisfaction can be the 0th iteration, if being unsatisfactory for that new step can be calculated according to preset step size computation formula It is long, and judge whether new step-length meets this formula group again.So circulation, until the step-length for meeting the formula group is obtained, Step-length when using as the 0th iteration.Based on same realization principle, it can be based on the formula group, determine iteration each time When step-length.

Further, since Hessian matrix is difficult to determine, the core concept of L-BFGS method be calculate an approximate matrix with In fitting Hessian matrix.

It is in an embodiment of the invention, excellent in order to illustrate a kind of possible scoring latent factor matrix iteration based on this Change implementation, so, the step 103, comprising:

Determine the fit metric when previous iteration；

In detail, fit metric when the 0th iteration can be fit metric when initializing.

Based on above content, in an embodiment of the invention, the direction of search of the calculating when previous iteration, packet It includes: calculating the direction of search when previous iteration using formula one；

The formula one includes:

The formula group two includes:

The formula three includes: θ^(l+1)=θ^(l)+a^(l)p^(l)；

Based on above content, in an embodiment of the invention, fit metric of the calculating when previous iteration, packet It includes: calculating the fit metric when previous iteration using formula four；

The formula four includes: B^(l+1)=(I- ρ^(l)s^(l)(u^(l))^T)B^(l)(^I-ρ^(l)u^(l)(s^(l))^T)+ρ^(l)s^(l)(s^(l))^T

Wherein,s^(l)=θ^(l+1)-θ^(l)；

In an embodiment of the invention, the step 106 includes: by the second scoring latent factor matrix and to pre- The corresponding sample characteristics matrix of test sample sheet substitutes into maximum entropy model, calculates the corresponding sample rating matrix of the sample to be predicted, To obtain the scoring forecast of distribution result of the sample to be predicted；

The maximum entropy model includes:

In detail, due to the corresponding sample characteristics matrix of sample to be predicted it is known that and scoring after iteration optimization it is potential because Submatrix it is known that therefore the corresponding sample rating matrix of sample to be predicted can be calculated, to obtain the scoring of sample to be predicted Forecast of distribution result.

Based on above-mentioned maximum entropy model it is found that by commenting after the corresponding sample characteristics matrix of sample to be predicted and iteration optimization After dividing latent factor matrix to substitute into maximum entropy model, the degree of available j-th of scoring, i-th of sample of description, and then obtain The degree of each scoring i-th of sample of description, to obtain the scoring forecast of distribution result of sample to be predicted.

In summary, created symbol function and the distance function in objective function of novelty of the embodiment of the present invention, to examine The correlation between scoring label is considered, to improve score in predicting accuracy.In this way, the scoring that can be gone out based on iteration optimization is latent The verification of prediction accuracy raising is carried out in factor matrix.

In detail, known sample can be divided into two parts, a part corresponds to training set, and a part corresponds to test Collection.Training set is mainly used for feature according to internal known sample, scoring situation, 101~105 is changed through the above steps Scoring latent factor matrix after generation optimization.Checksum set is mainly used for the feature according to internal known sample, scoring situation, to To iteration optimization after scoring latent factor matrix tested.

Based on this, in an embodiment of the invention, after the step 106, this method be may further include: will The preset corresponding sample characteristics matrix of test set and the second scoring latent factor matrix substitute into the maximum entropy model, meter Calculate forecast sample rating matrix；

In detail, due to the corresponding sample characteristics matrix of test set and true sample rating matrix it is known that therefore can incite somebody to action Scoring latent factor matrix after the sample characteristics matrix and iteration optimization substitutes into maximum entropy model, calculates the sample scoring of prediction Matrix.Then, the similarity degree between the sample rating matrix of prediction and true sample rating matrix can be compared, with verification Whether the scoring latent factor matrix after iteration optimization has enough accuracys.

In an embodiment of the invention, above-mentioned at least one evaluation index can be chi-Square measure (Squard χ 2), phase Like at least one of degree (Intersection) and fidelity (Fidelity) or multiple.

In detail, these evaluation indexes are illustrated respectively.

The calculation formula of chi-Square measure are as follows:Wherein, dis is meant that distance.

The calculation formula of similarity are as follows:Wherein, sim is meant that similarity.

The calculation formula of fidelity are as follows:Wherein, sim is meant that similarity.

In addition, j is the subscript of scoring label, c is the total number of scoring label.O_jIndicate in true scoring label distribution the The corresponding numerical value of j scoring label, Q_jIndicate the corresponding numerical value of j-th of scoring label in prediction scoring label distribution.

The chi-Square measure being calculated is smaller, illustrates that two distribution distances are closer, gap is smaller.The similarity being calculated It is bigger, illustrate that two distributions are more similar.The fidelity being calculated is bigger, illustrates prediction distribution and is really distributed closer.

It is verified, please refers to table 1, following experimental result can be obtained.

1 experimental result of table

	Squardχ2	Intersection	Fidelity
				LDLLC	0.0412±0.0213	0.8496±0.0238	0.9789±0.0107
LDSVR	0.0887±0.0031	0.8436±0.0027	0.9764±0.0010
				S-SVR	0.1040±0.0030	0.8277±0.0023	0.9722±0.0009
M-SVRP	0.1084±0.0033	0.8186±0.0034	0.9710±0.0010
				BGFS-LLD	0.1176±0.0042	0.8186±0.0033	0.9683±0.0012
IIS-LLD	0.1195±0.0054	0.8172±0.0044	0.9676±0.0014
				AA-kNN	0.1246±0.0062	0.8101±0.0047	0.9664±0.0018
CPNN	0.1625±0.0206	0.7847±0.0150	0.9551±0.0061

Wherein, LDLLC represents the scoring label Distributed learning for described in the embodiment of the present invention, considering scoring label correlation Algorithm.

As known from Table 1, it is compared with other methods, the prediction distribution obtained when being predicted using this method and true distribution It is more nearly.Therefore, sample scoring distribution is predicted using this method, relatively good prediction effect can be reached.

As shown in Fig. 2, one embodiment of the invention provides another scoring distribution forecasting method, following step is specifically included It is rapid:

Step 201: prediction scoring distribution corresponding with sample being distributed according to the corresponding true scoring of sample, is constructed based on Calculate the first function of gap between true scoring distribution and prediction scoring distribution.

In detail, first function can be to be above-mentioned

Step 202: according to the sign function for describing correlation between two scoring latent factor vectors, and according to being used for The distance function of distance, building map letter for describing the distance of correlation between scoring between two scoring latent factor vectors of calculating Number.

It in detail, can be for above-mentioned f (θ apart from mapping function_i,θ_j)。

Step 203: according to first function, regular terms and apart from mapping function, constructing second function.

In detail, second function can be for above-mentioned τ (θ).

Step 204: control forecasting scoring distribution meets maximum entropy model, to carry out simplifying processing to second function, thus Obtain objective function.

In detail, objective function can be for above-mentioned T (θ).

Step 205: determining the first scoring latent factor matrix when previous iteration.

It in detail, can be using scoring latent factor matrix when initializing as scoring latent factor when the 0th iteration Matrix.

Step 206: determining the fit metric when previous iteration.

It in detail, can be using fit metric when initializing as fit metric when the 0th iteration.

In detail, it can use fit metric of the calculating of above-mentioned formula four when previous iteration.

Step 207: determining first derivative when previous iteration, wherein the first derivative when previous iteration is, by the One scoring latent factor matrix substitutes into objective function, and is based on training set, and the first derivative of the objective function acquired.

Step 208: according to fit metric and the first derivative when previous iteration, calculating the searcher when previous iteration To.

In detail, it can use the direction of search of the calculating of above-mentioned formula one when previous iteration.

Step 209: according to the direction of search, determining the step-size in search when previous iteration.

In detail, it can use the determining step-size in search when previous iteration of above-mentioned formula group two.

Step 210: according to the first scoring latent factor matrix, the direction of search and step-size in search, when calculating next iteration Second scoring latent factor matrix.

In detail, it can use scoring latent factor matrix when above-mentioned formula three calculates next iteration.

Step 211: the second scoring latent factor matrix being substituted into objective function, and is based on training set, solves objective function First derivative.

Step 212: judging the value of F norm of the first derivative at the second scoring latent factor matrix, if be less than pre- If termination threshold value, if so, execute step 213, otherwise, by second scoring latent factor matrix be used as again when previous iteration Scoring latent factor matrix, and execute step 205.

Step 213: the second scoring latent factor matrix and the corresponding sample characteristics matrix of sample to be predicted being substituted into maximum Entropy model calculates the corresponding sample rating matrix of sample to be predicted, to obtain the scoring forecast of distribution result of sample to be predicted.

Step 214: the corresponding sample characteristics matrix of preset test set and the second scoring latent factor matrix are substituted into most Big entropy model calculates forecast sample rating matrix.

Step 215: using at least one evaluation index, verifying forecast sample rating matrix and the corresponding sample of test set respectively Similarity degree between this rating matrix.

As shown in figure 3, one embodiment of the invention provides a kind of any of the above-described scoring distribution forecasting method of execution Scoring forecast of distribution device, may include:

Objective function construction unit 301, for constructing objective function, wherein the objective function includes: for describing two Score latent factor vector between correlation sign function, and, for calculates two score latent factor vectors between distance distance Function；

Determination unit 302, for determining the first scoring latent factor matrix when previous iteration；

First processing units 303, for calculating the when next iteration according to the first scoring latent factor matrix Two scoring latent factor matrixes；

First derivative solves unit 304, for the second scoring latent factor matrix to be substituted into the objective function, and Based on preset training set, the first derivative of the objective function is solved；

The second processing unit 305, for judging F model of the first derivative at the second scoring latent factor matrix Several value, if be less than preset termination threshold value, if so, otherwise triggering scoring forecast of distribution unit 306 is commented described second Divide latent factor matrix to be used as the scoring latent factor matrix when previous iteration again, and triggers the determination unit 302；

The scoring forecast of distribution unit 306, for obtaining sample to be predicted according to the second scoring latent factor matrix This scoring forecast of distribution result.

The contents such as the information exchange between each unit, implementation procedure in above-mentioned apparatus, due to implementing with the method for the present invention Example is based on same design, and for details, please refer to the description in the embodiment of the method for the present invention, and details are not described herein again.

In addition, one embodiment of the invention provides a kind of readable medium, including execute instruction, when the place of storage control When executing instruction described in reason device execution, the storage control executes any of the above-described scoring distribution forecasting method.

In addition, one embodiment of the invention provides a kind of storage control, comprising: processor, memory and bus；

The memory is executed instruction for storing, and the processor is connect with the memory by the bus, when When the storage control is run, the processor executes the described of memory storage and executes instruction, so that the storage Controller executes any of the above-described scoring distribution forecasting method.

In conclusion the embodiment of the present invention have it is at least following the utility model has the advantages that

1, in the embodiment of the present invention, building include for describe two scoring latent factor vectors between correlation symbol letter Number and for calculates two score latent factor vectors between distance distance function objective function；Training set data is inputted, is determined When previous iteration scoring latent factor matrix and accordingly calculating next iteration when scoring latent factor matrix, to obtain New matrix；New matrix is substituted into objective function, solves the first derivative of objective function；F norm of the first derivative at new matrix Value when being less than preset termination threshold value, the scoring forecast of distribution of sample to be predicted is obtained as a result, otherwise, by new square according to new matrix Battle array is again as the scoring latent factor matrix when previous iteration, so circulation.The embodiment of the present invention passes through in objective function Middle created symbol function and distance function, it is contemplated that scoring label between correlation, therefore predicted based on this scoring be distributed when Prediction accuracy can be improved.

2, in the embodiment of the present invention, key point is to consider the correlation between scoring label, proposes that a kind of consideration is commented The scoring label Distributed learning algorithm of minute mark label correlation.The algorithm makes full use of movie features data, and building film first is special The distribution of sign-scoring apart from mapping function, which includes two parts, and first part is sign function, second part be away from From function.Secondly new optimization object function is devised using maximum entropy model.Finally, by solving new optimization aim letter Number, prediction scoring distribution.

3, in the embodiment of the present invention, be compared with other methods, the prediction distribution obtained when being predicted using this method with True distribution is more nearly.Therefore, sample scoring distribution is predicted using this method, relatively good prediction effect can be reached Fruit.

It should be noted that, in this document, such as first and second etc relational terms are used merely to an entity Or operation is distinguished with another entity or operation, is existed without necessarily requiring or implying between these entities or operation Any actual relationship or order.Moreover, the terms "include", "comprise" or its any other variant be intended to it is non- It is exclusive to include, so that the process, method, article or equipment for including a series of elements not only includes those elements, It but also including other elements that are not explicitly listed, or further include solid by this process, method, article or equipment Some elements.In the absence of more restrictions, the element limited by sentence " including one ", is not arranged Except there is also other identical factors in the process, method, article or apparatus that includes the element.

Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can store in computer-readable storage medium, the program When being executed, step including the steps of the foregoing method embodiments is executed；And storage medium above-mentioned includes: ROM, RAM, magnetic disk or light In the various media that can store program code such as disk.

Finally, it should be noted that the foregoing is merely presently preferred embodiments of the present invention, it is merely to illustrate skill of the invention Art scheme, is not intended to limit the scope of the present invention.Any modification for being made all within the spirits and principles of the present invention, Equivalent replacement, improvement etc., are included within the scope of protection of the present invention.

Claims

1. a kind of scoring distribution forecasting method characterized by comprising

S1: building objective function, wherein the objective function includes: for correlation between two scoring latent factor vectors of description Sign function, and, for calculate two scoring latent factor vectors between distance distance function；

S3: the second scoring latent factor matrix when next iteration is calculated according to the first scoring latent factor matrix；

S4: the second scoring latent factor matrix is substituted into the objective function, and is based on preset training set, described in solution The first derivative of objective function；

S5: judge the value of F norm of the first derivative at the second scoring latent factor matrix, if be less than preset Threshold value is terminated, if so, S6 is executed, otherwise, by the second scoring latent factor matrix again as commenting when previous iteration Divide latent factor matrix, and executes S2；

2. the method according to claim 1, wherein

The sign function includes:

Wherein,

The distance function includes:

Wherein, θ is the scoring latent factor matrix of c column m row, and c is scoring number, and m is characterized number, θ_ikFor the i-th row kth in θ The numerical value of column, θ_jkThe numerical value arranged for jth row kth in θ.

3. according to the method described in claim 2, it is characterized in that,

The objective function includes:

Wherein, t (θ) is combination of function, λ₁For predetermined coefficient.

4. according to the method described in claim 3, it is characterized in that,

The combination of function includes:

Wherein, x is the sample characteristics matrix of n row m column, and d is the sample rating matrix of n row c column, and n is number of samples, d_ijFor in d The numerical value of i-th row jth column, x_ikFor the numerical value that the i-th row kth in x arranges, λ₂For predetermined coefficient, | | θ | |_FFor the F norm of θ.

5. the method according to claim 1, wherein

The S1 includes: to be distributed prediction scoring distribution corresponding with sample according to the corresponding true scoring of sample, is constructed based on Calculate the first function of gap between true scoring distribution and prediction scoring distribution；

According to the sign function for describing correlation between two scoring latent factor vectors, and according to latent for calculating two scorings Because subvector spacing from distance function, building for describe score between correlation apart from mapping function；

It controls the prediction scoring distribution and meets maximum entropy model, to carry out simplifying processing to the second function, to obtain Objective function.

6. the method according to claim 1, wherein

The S3, comprising:

Determine the fit metric when previous iteration；

Determine the first derivative when previous iteration, wherein the first derivative when previous iteration is to comment described first Divide latent factor matrix to substitute into the objective function, and is based on the training set, and the single order of the objective function acquired is led Number；

According to the fit metric and the first derivative when previous iteration, the direction of search when previous iteration is calculated；

According to the first scoring latent factor matrix, described search direction and described search step-length, when calculating next iteration Second scoring latent factor matrix.

7. according to the method described in claim 6, it is characterized in that,

The direction of search of the calculating when previous iteration, comprising: the direction of search when previous iteration is calculated using formula one；

Step-size in search of the determination when previous iteration, comprising: determination meets the formula group two, when previous iteration Step-size in search；

Scoring latent factor matrix when the calculating next iteration, comprising: when using the calculating next iteration of formula three Score latent factor matrix；

The formula one includes:

The formula group two includes:

The formula three includes: θ^(l+1)=θ^(l)+a^(l)p^(l)；

Wherein, p^(l)Direction of search when for the l times iteration, B^(l)Fit metric when for the l times iteration, B⁽⁰⁾When to initialize Fit metric,First derivative when for the l times iteration, a^(l)Step-size in search when for the l times iteration, 0 < c₁< c₂< 1, (p^(l))^TFor p^(l)Transposition, θ^(l)Scoring latent factor matrix when for the l times iteration, θ⁽⁰⁾Scoring when to initialize Latent factor matrix；

And/or

The fit metric of the calculating when previous iteration, comprising: the fit metric when previous iteration is calculated using formula four；

Wherein,s^(l)=θ^(l+1)-θ^(l)；

Wherein, B^(l)Fit metric when for the l times iteration, B⁽⁰⁾Fit metric when to initialize, I are unit matrix, θ^(l)For Scoring latent factor matrix when the l times iteration, θ⁽⁰⁾Scoring latent factor matrix when to initialize,It is the l times First derivative when iteration, (u^(l))^TFor u^(l)Transposition.

8. according to claim 1 to any method in 7, which is characterized in that

The S6 includes: to substitute into the second scoring latent factor matrix and the corresponding sample characteristics matrix of sample to be predicted most Big entropy model calculates the corresponding sample rating matrix of the sample to be predicted, to obtain the scoring distribution of the sample to be predicted Prediction result；

The maximum entropy model includes:

Wherein, θ is the scoring latent factor matrix of c column m row, and x is the sample characteristics matrix of n row m column, and c is scoring number, and m is Characteristic Number, n are number of samples, θ_jkFor the numerical value that jth row kth in θ arranges, x_ikFor the numerical value that the i-th row kth in x arranges, p_i={ p (y₁|x_i；θ),p(y₂|x_i；θ),...,p(y_c|x_i；θ) }, p_iFor sample x_iCorresponding prediction scoring distribution.

9. according to the method described in claim 8, it is characterized in that,

After the S6, further comprise: the corresponding sample characteristics matrix of preset test set and second scoring is latent The maximum entropy model is substituted into factor matrix, calculates forecast sample rating matrix；

Using at least one evaluation index, the forecast sample rating matrix is verified respectively and the corresponding sample of the test set is commented Similarity degree between sub-matrix.

The scoring forecast of distribution device of distribution forecasting method 10. a kind of execution is scored as described in any in claim 1 to 9, It is characterized in that, comprising:

Objective function construction unit, for constructing objective function, wherein the objective function includes: latent for describing two scorings In the sign function because of correlation between subvector, and, for calculate two scoring latent factor vectors between distance distance function；

First processing units, it is latent for the second scoring when calculating next iteration according to the first scoring latent factor matrix In factor matrix；

First derivative solves unit, for the second scoring latent factor matrix to be substituted into the objective function, and based on pre- If training set, solve the first derivative of the objective function；

The second processing unit, for judging the value of F norm of the first derivative at the second scoring latent factor matrix, Whether preset termination threshold value is less than, if so, triggering scoring forecast of distribution unit, otherwise, by the second scoring latent factor Matrix is used as the scoring latent factor matrix when previous iteration again, and triggers the determination unit；

The scoring forecast of distribution unit, for obtaining commenting for sample to be predicted according to the second scoring latent factor matrix Divide forecast of distribution result.