CN107133301A

CN107133301A - The Forecasting Methodology and device of probability

Info

Publication number: CN107133301A
Application number: CN201710289582.3A
Authority: CN
Inventors: 李泰�; 肖波; 邢宇航
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2017-04-27
Filing date: 2017-04-27
Publication date: 2017-09-05

Abstract

The disclosure is directed to a kind of Forecasting Methodology of probability and device.This method includes：The fraction of each training sample is determined according to the weight of the feature of each training sample and each feature, each training sample is ranked up according to the fraction of each training sample, anticipation function is obtained according to the fitting of the sample value of each training sample after predicted condition and sequence, the fraction of object to be predicted is determined according to the weight of the feature of object to be predicted and each feature, according to the fraction and anticipation function of object to be predicted, the probability for meeting the object value of object to be predicted predicted condition is predicted.The disclosure saves the magnitude relationship between the sample value of training sample, anticipation function is all maintained good predictive ability for all predicted conditions, is not influenceed and limitation by predicted condition；Meanwhile, in the case where predicted condition changes, without being ranked up again to training sample, the training process of anticipation function is simplified, substantial amounts of system resource and time cost is saved.

Description

The Forecasting Methodology and device of probability

Technical field

This disclosure relates to Data Mining, more particularly to a kind of probability Forecasting Methodology and device.

Background technology

Data mining is one in knowledge discovery in database (Knowledge-Discovery in Databases, KDD) Individual step.Data mining generally refers to be hidden in the process of wherein information by algorithm search from substantial amounts of data.

Machine learning has achieved length as the important tool of data mining on the probability of happening of prediction single incident Foot development, and it is widely used in all kinds of business scenarios.In correlation technique, the generation of specific event is directed to using forecast model Probability is predicted.But, when predicted condition changes, it is necessary to training sample training forecast model be reused, with this It is predicted the predicted condition that is directed to different, adds system burden, and the order of accuarcy predicted the outcome is low.

The content of the invention

To overcome problem present in correlation technique, the disclosure provides the Forecasting Methodology and device of a kind of probability.

According to the first aspect of the embodiment of the present disclosure there is provided a kind of Forecasting Methodology of probability, including：

The fraction of each training sample is determined according to the weight of the feature of each training sample and each feature；

Each training sample is ranked up according to the fraction of each training sample；

Anticipation function is obtained according to the fitting of the sample value of each training sample after predicted condition and sequence；

The fraction of the object to be predicted is determined according to the weight of the feature of object to be predicted and each feature；

According to the fraction and the anticipation function of the object to be predicted, the object value to the object to be predicted meets The probability of the predicted condition is predicted.

For the above method, in a kind of possible implementation, according to the feature of each training sample and each spy The weight levied determines the fraction of each training sample, including：

Training sample set is obtained, the training sample set includes N number of training sample, wherein, N is positive integer；

According to the feature of each training sample, determine the corresponding characteristic vector of each training sample, wherein, each feature to Amount includes m feature respectively, and m is positive integer；

According to the corresponding characteristic vector of each training sample, the weight of each feature is determined；

According to the corresponding characteristic vector of each training sample and the weight of each feature, each training sample is determined respectively Fraction.

For the above method, in a kind of possible implementation, according to the corresponding characteristic vector of each training sample, really The weight of each fixed feature, including：

Each training sample that the training sample is concentrated constitutes sample pair with other each training samples respectively；

Determine the vector difference of the corresponding characteristic vector of two training samples of each sample centering；

According to each sample to corresponding vector difference, the weight of each feature is determined.

For the above method, in a kind of possible implementation, according to each sample to corresponding vector difference, it is determined that respectively The weight of individual feature, including：

The weight of each feature is determined using formula 1 and formula 2；

Wherein, M represents the quantity of sample pair, and w represents the common weight vectors of the corresponding characteristic vector of each training sample, w^TRepresent w transposed vector, x_lRepresent the corresponding characteristic vector of l-th of training sample that the training sample is concentrated, x_rRepresent institute State the corresponding characteristic vector of r-th of training sample of training sample concentration, 1≤l≤N, 1≤r≤N, l ≠ r, w_iRepresent ith Iterate to calculate the result obtained, w_i+1The result that i+1 time iterative calculation is obtained is represented, 1≤i≤m-1, η represents the first coefficient.

For the above method, in a kind of possible implementation, according to the corresponding characteristic vector of each training sample and The weight of each feature, determines the fraction of each training sample respectively, including：

The fraction s of j-th of training sample is determined using formula 3_j；

s_j=w^Tx_jFormula 3；

Wherein, w represents the common weight vectors of the corresponding characteristic vector of each training sample, w=[w₁,w₂..., w_m], w₁...w_mThe 1st weight to m-th of feature, w are represented respectively^TRepresent w transposed vector, x_jRepresent j-th of training sample pair The characteristic vector answered, 1≤j≤N.

For the above method, in a kind of possible implementation, according to each training after predicted condition and sequence The sample value fitting of sample obtains anticipation function, including：

Positive sample is determined according to the sample value of each training sample after predicted condition and sequence, wherein, positive sample is Sample value meets the training sample of predicted condition；

Positive sample cumulative function is obtained according to the fitting of the positive sample of determination；

Calculate the derived function of the positive sample cumulative function；

Anticipation function is obtained according to the fitting of the fraction of the derived function and each training sample.

For the above method, in a kind of possible implementation, positive sample is obtained according to the fitting of the positive sample of determination and tired out Product function, including：

The corresponding positive sample accumulated value C (k) of k-th of training sample is determined, wherein, C (k) is equal to sequence at described k-th The sample number of positive sample before training sample, 1≤k≤N；

It is fitted according to C (1) to C (N), obtains the positive sample cumulative function.

According to the second aspect of the embodiment of the present disclosure there is provided a kind of prediction meanss of probability, including：

Training sample set acquisition module, for obtaining training sample set, the training sample set includes N number of training sample, Wherein, N is positive integer；

Characteristic vector determining module, for determining the corresponding characteristic vector of each training sample, wherein, each characteristic vector Include m feature respectively, m is positive integer；

Weight determination module, for according to the corresponding characteristic vector of each training sample, determining the weight of each feature；

Fraction determining module, for according to the corresponding characteristic vector of each training sample and the weight of each feature, difference Determine the fraction of each training sample；

Order module, is ranked up for the fraction according to each training sample to each training sample；

Positive sample cumulative function determining module, for being fitted according to the sample value of prediction threshold value and each training sample To positive sample cumulative function；

Derived function determining module, the derived function for calculating the positive sample cumulative function；

Anticipation function determining module, for being predicted according to the fitting of the fraction of the derived function and each training sample Function；

Prediction module, is predicted for treating prediction data according to the anticipation function.

For said apparatus, in a kind of possible implementation, the weight determination module includes：

Sample is to composition submodule, and each trains sample to each training sample that the training sample is concentrated with other respectively This composition sample pair, wherein, the sample value of the left side training sample of each sample pair is more than the sample value of the right training sample, respectively Individual sample is to different；

Vector difference determination sub-module, the corresponding characteristic vector of left side training sample and the right for determining each sample pair is trained The vector difference of the corresponding characteristic vector of sample；

Weight determination sub-module, according to each sample to corresponding vector difference, determines the weight of each feature.

For said apparatus, in a kind of possible implementation, the weight determination sub-module is used for：

The weight of each feature is determined using formula 1 and formula 2；

For said apparatus, in a kind of possible implementation, the fraction determining module is used for：

The fraction s of j-th of training sample is determined using formula 3_j；

s_j=w^Tx_jFormula 3；

Wherein, w=[w₁,w₂,…,w_m], w₁…w_mThe weight of the 1st to m-th feature, w are represented respectively^TRepresent w transposition Vector, x_jRepresent the corresponding characteristic vector of j-th of training sample, 1≤j≤N.

For said apparatus, in a kind of possible implementation, the order module is used for：

Each training sample is ranked up according to the descending order of fraction.

For said apparatus, in a kind of possible implementation, the positive sample cumulative function determining module includes：

Positive sample accumulated value determination sub-module, for determining the corresponding positive sample accumulated value C (k) of k-th of training sample, its In, C (k) is equal to sequence sample value in each training sample before k-th of training sample and is more than the prediction threshold value Number of training, 1≤k≤N；

Positive sample cumulative function is fitted submodule, for being fitted according to C (1) to C (N), obtains the positive sample and tires out Product function.

For said apparatus, in a kind of possible implementation, the prediction module includes：

Fraction determination sub-module, for the power according to the corresponding characteristic vector of data to be predicted and each feature Weight, determines the fraction of the data to be predicted；

Predict the outcome determination sub-module, for the fraction and the anticipation function according to the data to be predicted, it is determined that Predict the outcome.

According to the third aspect of the embodiment of the present disclosure there is provided a kind of prediction meanss of probability, including：

Processor；

Memory for storing processor-executable instruction；

Wherein, the processor is configured as：

According to the fourth aspect of the embodiment of the present disclosure there is provided computer-readable recording medium, it is stored thereon with computer and refers to Order, the step of instruction realizes the above method when being executed by processor.

The disclosure determines the fraction of each training sample according to the weight of the feature of each training sample and each feature, Each training sample is ranked up according to the fraction of each training sample, according to each training after predicted condition and sequence The sample value fitting of sample obtains anticipation function, is determined according to the weight of the feature of object to be predicted and each feature to be predicted The fraction of object, according to the fraction and anticipation function of object to be predicted, the object value to object to be predicted meets predicted condition Probability be predicted, thus save the magnitude relationship between the sample value of training sample, make anticipation function for all Predicted condition all maintains good predictive ability, is not influenceed and limitation by predicted condition；Meanwhile, change in predicted condition In the case of, it is not necessary to training sample is ranked up again, the training process of anticipation function is simplified, substantial amounts of system is saved Resource and time cost.

It should be appreciated that the general description of the above and detailed description hereinafter are only exemplary and explanatory, not The disclosure can be limited.

Brief description of the drawings

Accompanying drawing herein is merged in specification and constitutes the part of this specification, shows the implementation for meeting the disclosure Example, and be used to together with specification to explain the principle of the disclosure.

Fig. 1 is a kind of flow chart of the Forecasting Methodology of probability according to an exemplary embodiment.

Fig. 2 is a kind of Forecasting Methodology step S11 of probability according to an example of exemplary embodiment stream Cheng Tu.

Fig. 3 is a kind of Forecasting Methodology step S23 of probability according to an example of exemplary embodiment stream Cheng Tu.

Fig. 4 is a kind of Forecasting Methodology step S14 of probability according to an example of exemplary embodiment stream Cheng Tu.

Fig. 5 a be a kind of probability according to an example of an exemplary embodiment Forecasting Methodology in positive sample tire out The schematic diagram of Product function.

Fig. 5 b be a kind of probability according to an example of an exemplary embodiment Forecasting Methodology in positive sample tire out The schematic diagram of the derived function of Product function.

Fig. 5 c be a kind of probability according to an example of an exemplary embodiment Forecasting Methodology in anticipation function Schematic diagram.

Fig. 6 is a kind of block diagram of the prediction meanss of probability according to an exemplary embodiment.

Fig. 7 is a kind of block diagram of the prediction meanss of probability according to an example of an exemplary embodiment.

Fig. 8 is a kind of block diagram of the device 800 of prediction for probability according to an exemplary embodiment.

Fig. 9 is a kind of block diagram of the device 1900 of prediction for probability according to an exemplary embodiment.

Embodiment

Here exemplary embodiment will be illustrated in detail, its example is illustrated in the accompanying drawings.Following description is related to During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the disclosure.On the contrary, they be only with it is such as appended The example of the consistent apparatus and method of some aspects be described in detail in claims, the disclosure.

Fig. 1 is a kind of flow chart of Forecasting Methodology according to an exemplary embodiment, as shown in figure 1, this method bag Include step S11 to S15.

In step s 11, each training sample is determined according to the weight of the feature of each training sample and each feature Fraction.

As an example of the present embodiment, the power of each training sample feature by Logic Regression Models, can be determined Weight.

As an example of the present embodiment, can be determined according to the feature of each training sample the feature of training sample to Amount, the weight vectors of training sample is determined according to the weight of each feature, it is possible to according to the characteristic vector and instruction of training sample The Inner products for practicing the weight vectors of sample determine the fraction of the training sample.

In step s 12, each training sample is ranked up according to the fraction of each training sample；

In a kind of possible implementation, each training sample is ranked up according to the fraction of each training sample, It can include：Each training sample is ranked up according to the descending order of fraction.

In alternatively possible implementation, each training sample is arranged according to the fraction of each training sample Sequence, can include：Each training sample is ranked up according to the ascending order of fraction.

In alternatively possible implementation, each training sample is arranged according to the fraction of each training sample Sequence, can include：Fraction is normalized；Each is instructed according to the order of the fraction after normalized from big to small Practice sample to be ranked up.

In alternatively possible implementation, each training sample is arranged according to the fraction of each training sample Sequence, can include：Fraction is normalized；Each is instructed according to the order of the fraction after normalized from small to large Practice sample to be ranked up.

It should be noted that those skilled in the art can be to select different modes that fraction is normalized as needed Processing, is not limited herein.

In step s 13, predicted according to the fitting of the sample value of each training sample after predicted condition and sequence Function.

As an example of the present embodiment, predicted condition can be more than prediction threshold value for the object value of object to be predicted. For example, when predict user income more than 3000 probability when, then, and predicted condition for income be more than 3000, prediction threshold value be 3000, wherein, the object value of object to be predicted is the Revenue of user.

In a kind of possible implementation, predicted condition can be less than prediction threshold value for the object value of object to be predicted.

In step S14, point of object to be predicted is determined according to the weight of the feature of object to be predicted and each feature Number.

As an example of this implementation, the feature of object to be predicted can be determined according to the feature of each object to be predicted Vector, weight vectors are determined according to the weight of each feature, it is possible to according to the characteristic vector and weight vectors of object to be predicted Calculate the fraction for obtaining object to be predicted.Wherein, object to be predicted has identical weight vectors with training sample, in other words, During prediction, the weighted value of each feature of object to be predicted can be known constant.

In step S15, according to the fraction and anticipation function of object to be predicted, the object value to object to be predicted meets The probability of predicted condition is predicted.

As an example of the present embodiment, the fraction of object to be predicted is substituted into anticipation function, you can obtain to be predicted The object value of object meets the probability of predicted condition.

, can be according to the prediction after change in the case where predicted condition changes as another example of the present embodiment The ranking results of condition and known training sample are fitted the prediction bar after obtaining for changing by above step S13 method The anticipation function of part；And according to the feature of object to be predicted, object to be predicted is determined by the method for above step S14, S15 Object value meets the probability of the predicted condition after changing.

The present embodiment determines point of each training sample according to the weight of the feature of each training sample and each feature Number, is ranked up according to the fraction of each training sample to each training sample, according to each after predicted condition and sequence The sample value fitting of training sample obtains anticipation function, is determined to treat according to the weight of the feature of object to be predicted and each feature The fraction of object is predicted, according to the fraction and anticipation function of object to be predicted, the object value to object to be predicted meets prediction The probability of condition is predicted.The present embodiment saves the magnitude relationship between the sample value of training sample, makes anticipation function pin Good predictive ability is all maintained to all predicted conditions, is not influenceed and limitation by predicted condition；Meanwhile, in prediction bar In the case that part changes, it is not necessary to training sample is ranked up again, the training process of anticipation function is simplified, save big The system resource and time cost of amount.

Fig. 2 is a kind of Forecasting Methodology step S11 of probability according to an example of exemplary embodiment stream Cheng Tu, as shown in Fig. 2 step S11 includes step S21 to step S24.

In the step s 21, training sample set is obtained, training sample set includes N number of training sample, wherein, N is positive integer.

Wherein, training sample set can be the set of the sample for training pattern.Concentrated in training sample, each training The sample value of sample is known.

In step S22, according to the feature of each training sample, the corresponding characteristic vector of each training sample is determined, its In, each characteristic vector includes m feature respectively, and m is positive integer.

In the present embodiment, the characteristic value of all features of each training sample can use the corresponding feature of the training sample Vector is represented.For example, training sample has m feature, then the training sample can use m dimensional feature vector A=(A₁, A₂..., A_m) represent.

In step S23, according to the corresponding characteristic vector of each training sample, the weight of each feature is determined.

As an example of the present embodiment, can according to PWLR (Pairwise Logistic Regression, in pairs Training logistic regression) model determines the weight of each feature.Wherein, PWLR models can be to patrolling based on paired training sample Collect regression model.

, can be true according to LR (Logistic Regression, logistic regression) model in a kind of possible implementation The weight of each fixed feature.

It should be noted that those skilled in the art can utilize different model (such as Bayesian forecastings as needed Model etc.) weight of each feature is determined, do not limit herein.

In step s 24, according to the corresponding characteristic vector of each training sample and the weight of each feature, determine respectively each The fraction of individual training sample.

As an example of the present embodiment, according to the corresponding characteristic vector of each training sample and the power of each feature Weight, determines the fraction of each training sample, can include respectively：The fraction s of j-th of training sample is determined using formula 3_j；

s_j=w^Tx_jFormula 3；

Wherein, w represents the common weight vectors of the corresponding characteristic vector of each training sample, w=[w₁,w₂,…,w_m], w₁…w_mThe 1st weight to m-th of feature, w are represented respectively^TRepresent w transposed vector, x_jRepresent j-th of training sample correspondence Characteristic vector, 1≤j≤N.

The present embodiment effectively saves sample value between each training sample by calculating the fraction of each training sample Magnitude relationship.

Fig. 3 is a kind of Forecasting Methodology step S23 of probability according to an example of exemplary embodiment stream Cheng Tu, as shown in figure 3, step S23 includes step S31 to S33.

In step S31, each training sample that training sample is concentrated constitutes sample with other each training samples respectively This is right.

, can be by training (Pairwise) model to concentrate training sample in pairs in a kind of possible implementation Training sample constitute sample pair, wherein, each training sample respectively with other each training sample group pair, a left side for each sample pair The sample value of side training sample is more than the sample value of the right training sample, and each sample is to different.Paired training pattern is closed The ordinal relation of sample is noted, any two training sample that training sample is concentrated constitutes training sample pair, and determine to train sample The sequencing relation of two training samples of this centering.

As an example of the implementation, the sample value of the left side training sample of each sample pair is trained more than the right The sample value of sample.

As another example of the implementation, the sample value of the right training sample of each sample pair is instructed more than the left side Practice the sample value of sample.

In alternatively possible implementation, (Listwise) model can be trained to concentrate training sample by list Training sample be ranked up.

In an exemplary application scene, for example, income true to user is predicted, as shown in table 1, training sample Collection includes the training sample that 5 sample names are respectively a, b, c, d and e, and the wherein sample value of training sample can be for user's True income.

Table 1

Training sample title	Sample value	Feature 1	Feature 2	Feature 3	Feature 4	Feature 5	Feature 6
								a	2000	1	0	1	0	0	0
b	1400	0	1	1	0	1	0
								c	1500	0	1	0	0	1	0
d	3200	1	0	0	0	1	0
								e	900	0	1	1	0	0	1

By paired training pattern, according to the sample value magnitude relationship of each training sample, can by training sample a, b, C, d and e constitute 10 samples pair, and this 10 samples are to being respectively：(a,b)、(a,c)、(a,e)、(b,e)、(c,b)、(c,e)、 (d, a), (d, b), (d, c) and (d, e).

In a kind of possible implementation, sample can be set to screening conditions, will be instructed by paired training pattern Practice the training sample in sample set and constitute sample to after, selection meets sample to the sample of screening conditions to being predicted as training The sample pair of model.For example, being more than 500 to screening conditions for the difference of the sample value of two training samples in sample pair in sample In the case of, in above-mentioned 10 samples pair, sample to (a, b), (a, e), (c, e), (d, a), (d, b), (d, c) and (d, e) symbol Sample is closed to screening conditions, then by sample to (a, b), (a, e), (c, e), (d, a), (d, b), (d, c) and (d, e) be as training The sample pair of forecast model.It should be noted that those skilled in the art can select different samples as needed to screening Condition carrys out Screening Samples pair, does not limit herein.

It is more than the sample pair of specified threshold, the sample pair that sample difference can be avoided too small by filtering out sample value difference The influence produced to ranking results, thus enables that sample sequence more preferably embodies the magnitude relationship of training sample value.

In step s 32, the vector difference of the corresponding characteristic vector of two training samples of each sample centering is determined.

For example, according to table 1, sample is in (a, b), the corresponding characteristic vectors of training sample a are x_a=(1,0,1,0,0, 0), the corresponding characteristic vectors of training sample b are x_b=(0,1,1,0,1,0), then characteristic vector x_aWith characteristic vector x_bVector difference For x_a-b=(1, -1,0, -1,0).

In step S33, according to each sample to corresponding vector difference, the weight of each feature is determined.

As an example of the present embodiment, M sample pair, pin are constituted based on N number of training sample that training sample is concentrated It is to the formula of the PWLR models of the M sample pair：

Y=sigmoid (d_l-r) formula 4

Wherein, d_l-r=s_l-s_r, s_l=w^Tx_l, s_r=w^Tx_r, 1≤l≤N, 1≤r≤N, l ≠ r；x_lRepresent training sample set In the corresponding characteristic vectors of training sample l, x_rRepresent the corresponding characteristic vectors of training sample r that training sample is concentrated；Training Sample l and training sample r constitutes a sample pair of M sample centering；s_lRepresent training sample l fraction, s_rRepresent training sample This r fraction, d_l-rRepresent the difference of training sample l and training sample r fraction；W represents that training sample concentrates each training sample The common weight vectors of corresponding characteristic vector；w^TRepresent w transposed vector.

Sigmoid functions (Sigmoid function) are a kind of threshold function tables, it can by variable mappings to it is interval [0, 1] among.The formula of Sigmoid functions is shown below：

According to each sample to corresponding vector difference, the weight of each feature is determined, including：Set up the logarithm for formula 4 Loss function, as shown in Equation 1：

Wherein, 1≤l≤N, 1≤r≤N, l ≠ r；M represents the quantity of sample pair, and w represents the corresponding spy of each training sample Levy the common weight vectors of vector, w^TRepresent w transposed vector, x_lRepresent the corresponding features of training sample l that training sample is concentrated Vector, x_rRepresent the corresponding characteristic vectors of training sample r that training sample is concentrated.

Based on formula 1, weight vectors w can be obtained by stochastic gradient descent algorithm, as shown in Equation 2：

Wherein, w_iRepresent that ith iteration calculates the result obtained, w_i+1Represent the result that i+1 time iterative calculation is obtained, 1 ≤ i≤m-1, η represent the first coefficient.

The present embodiment obtains the weight of each feature by calculating, and enhances the predictive ability of anticipation function.

Fig. 4 is a kind of Forecasting Methodology step S14 of probability according to an example of exemplary embodiment stream Cheng Tu, as shown in figure 4, step S14 includes step S41 to S44.

In step S41, positive sample is determined according to the sample value of each training sample after predicted condition and sequence, its In, positive sample is the training sample that sample value meets predicted condition.

In step S42, positive sample cumulative function is obtained according to the fitting of the positive sample of determination.

As an example of the present embodiment, the corresponding positive sample accumulated value C (k) of k-th of training sample is determined, wherein, C (k) it is equal to the sample number of positive sample of the sequence before k-th of training sample, 1≤k≤N is intended according to C (1) to C (N) Close, obtain positive sample cumulative function.

It should be noted that those skilled in the art can select different function models to be fitted as needed, only Being fitted obtained positive sample cumulative function can continuously can lead, and can be good at embodying positive sample accumulated value with sample number Measure the trend of change.

In step S43, the derived function of positive sample cumulative function is calculated.

In step S44, anticipation function is obtained according to the fitting of the fraction of derived function and each training sample.

Under an exemplary application scenarios, for example, the probability truly taken in for user more than 9000 is predicted, As shown in table 2,5 samples are chosen as training sample, training sample title difference from 1000 true income samples at random For 1,2,3,4 and 5.Wherein, the sample value of training sample can be the true income of user；Each training sample has 10 respectively Feature, characteristic value is in interval [0,1]；Predicted condition is more than 9000 for the object value of object to be predicted.

Table 2

Following result can be calculated according to formula 1, formula 2 and formula 3：

(1) weight vectors：

W=(513.011,61.2365, -576.9,305.469,311.203-110.902, -130.758, - 194.149,-240.42,-203.289)

(2) fraction of training sample 1, training sample 2, training sample 3, training sample 4 and training sample 5：

s₁=-648.2501, s₂=-346.0684, s₃=-17.3523, s₄=40.6698, s₅=172.8547.

(3) order according to training sample fraction from small to large, to being ranked up for training sample 1,2,3,4 and 5, is obtained Arrive：

Training sample 1, training sample 2, training sample 3, training sample 4, the clooating sequence of training sample 5.

According to predicted condition, it may be determined that positive sample is：Training sample 3, training sample 4 and training sample 5.

Training sample sequence in the positive sample of determination and result (3), can obtain each following training sample Positive sample accumulated value C (k), wherein, 1≤k≤5.

C (1)=0, C (2)=0, C (3)=1, C (4)=2, C (5)=3.

In this example, the discrete point fitting positive sample cumulative function that can be changed according to positive sample accumulated value C (k) with k. Segmental cubic polynomials E (o) can be used to be fitted for positive sample accumulated value C (k), positive sample cumulative function is obtained.Its In, o represents training samples number, o ∈ (0 ,+∞), it is desirable to which E (o) is continuous at segmentation, first order derivative continuous, second derivative connects It is continuous.

Segments is adjusted in fit procedure, n-th section of the hop count optimized is chosen and is used as positive sample cumulative function E_n(o).As shown in Equation 5：

E_n(o)=q_no³+t_no²+u_no+v_nFormula 5；

Wherein, o represents training samples number, o ∈ (0 ,+∞), q_nFor the second coefficient, t_nFor the 3rd coefficient, u_nIt is for the 4th Number, v_nFor the 5th coefficient.

Fig. 5 a be a kind of probability according to an example of an exemplary embodiment Forecasting Methodology in positive sample tire out Product function E_n(o) schematic diagram.

In this example, sample cumulative function E can be aligned_n(o) carry out that derivation obtains positive sample cumulative function leads letter Number E_n′(o).As shown in Equation 6：

E_n' (o)=3q_no²+2t_no+u_nFormula 6；

Fig. 5 b be a kind of probability according to an example of an exemplary embodiment Forecasting Methodology in positive sample tire out The derived function E of Product function_n' (o) schematic diagram.

In this example, ranking score fitting that can be according to formula 6 and above described in result (2) obtains anticipation function.

Segmentation quadratic polynomial P (s) can be used for formula 6 and the ranking score described in result (2) is intended above Close, obtain anticipation function.Wherein, it is desirable to which P (s) is continuous at segmentation, first order derivative continuous, segmentation and positive sample cumulative function E_n (o) it is consistent.

In a kind of possible implementation, ranking score can be normalized, and based on normalized Fraction fitting anticipation function afterwards.For example, fraction can be normalized by formula 7,

s_n'=s_n-min(s_n)/[max(s_n)-min(s_n)] formula 7.

It should be noted that those skilled in the art can select different function models to be fitted prediction letter as needed Number, as long as the anticipation function that fitting is obtained can continuously be led, and can appropriately embody derived function and point of each training sample Corresponding relation between number.

Segments is adjusted in fit procedure, n-th section of the hop count optimized is chosen and is used as anticipation function P_n(s)。 As shown in Equation 8：

P_n(s)=α_ns²+β_ns+γ_nFormula 8；

Wherein, s represents the fraction of each training sample, s ∈ (0 ,+∞), α_nRepresent the 6th coefficient, β_nRepresent the 7th coefficient, γ_nRepresent the 8th coefficient.

Fig. 5 c be a kind of probability according to an example of an exemplary embodiment Forecasting Methodology in anticipation function P_n(s) schematic diagram.

, can be according to side of the predicted condition after the change by above step S13 in the case where predicted condition changes Method, fitting obtains being directed to the anticipation function P of the predicted condition after the change_n2(s)；According to the characteristic vector of object to be predicted and respectively The weight of individual feature, it is determined that for the fraction of the predicted condition after change；And according to anticipation function P_n2(s) above step is passed through S14, S15 method determine that the object value of object to be predicted meets the probability of the predicted condition after changing.

In actual applications, the dimension of sample characteristics is generally very high, and training samples number is huge, therefore, for different pre- The huge training sample training pattern of survey condition re-using is costly, and the present embodiment need not resequence to training sample, Only need, according to known sample ordering relation and predicted condition fitting anticipation function, substantial amounts of system resource and time can be saved Cost.Also, because the ordering relation based on training sample and predicted condition fitting obtain anticipation function, there is anticipation function Discrimination is more preferably predicted, accuracy is stronger.

Fig. 6 is a kind of block diagram of prediction meanss according to an exemplary embodiment.Reference picture 6, the device includes instruction Practice sampling fraction determining module 121, order module 122, anticipation function determining module 123, object fraction determining module to be predicted 124th, prediction module 125.The training sample fraction determining module 121 is configured as, according to the feature of each training sample and The weight of each feature determines the fraction of each training sample.The order module 122 is configured as, according to each training sample Fraction is ranked up to each training sample.The anticipation function determining module 123 is configured as, according to predicted condition and sequence The sample value fitting of each training sample afterwards obtains anticipation function.The object fraction determining module 124 to be predicted is configured as, The fraction of object to be predicted is determined according to the weight of the feature of object to be predicted and each feature.The prediction module 125 by with It is set to, according to the fraction and anticipation function of object to be predicted, the object value to object to be predicted meets the probability of predicted condition It is predicted.

Fig. 7 is a kind of block diagram of prediction meanss according to an example of an exemplary embodiment.The device can be with Service chart 1 is to the Forecasting Methodology shown in Fig. 5.For convenience of description, part related to the present embodiment is only illustrated in figure. Label has identical function with Fig. 7 identical components in Fig. 6, for brevity, omits the detailed description to these components. As shown in Figure 7：

In a kind of possible implementation, the training sample fraction determining module 121 includes training sample and obtains son Module 1211, characteristic vector determination sub-module 1212, weight determination sub-module 1213 and training sample fraction determination sub-module 1214.The training sample acquisition submodule 1211 is configured as, and obtains training sample set, and training sample set includes N number of training sample This, wherein, N is positive integer.This feature vector determination sub-module 1212 is configured as, according to the feature of each training sample, really The corresponding characteristic vector of each fixed training sample, wherein, each characteristic vector includes m feature respectively, and m is positive integer.The power Weight determination sub-module 1213 is configured as, and according to the corresponding characteristic vector of each training sample, determines the weight of each feature.Should Training sample fraction determination sub-module 1214 is configured as, according to the corresponding characteristic vector of each training sample and each feature Weight, determines the fraction of each training sample respectively.

In a kind of possible implementation, the weight determination sub-module 1213 include sample to determination sub-module, to The poor determination sub-module of amount and weight calculation submodule.The sample is configured as concentrate the training sample to determination sub-module Each training sample constitutes sample pair with other each training samples respectively.The vector difference determination sub-module is configured to determine that respectively The vector difference of the corresponding characteristic vector of individual two training samples of sample centering.The weight calculation submodule is configured as according to each Sample determines the weight of each feature to corresponding vector difference.

In a kind of possible implementation, the weight calculation submodule is configured as：Determined respectively using formula 1 and formula 2 The weight of individual feature；

In a kind of possible implementation, training sample fraction determination sub-module 1214 is configured as, and is determined using formula 3 The fraction s of j-th of training sample_j：

s_j=w^Tx_jFormula 3；

In a kind of possible implementation, the anticipation function determining module 123 includes positive sample determination sub-module 1231st, positive sample cumulative function determination sub-module 1232, derived function determination sub-module 1233 and anticipation function determination sub-module 1234.The positive sample determination sub-module 1231 is configured as the sample according to each training sample after predicted condition and sequence Value determines positive sample, wherein, positive sample is the training sample that sample value meets predicted condition.The positive sample cumulative function determines son Module 1232 is configured as obtaining positive sample cumulative function according to the fitting of the positive sample of determination.The derived function determination sub-module 1233 It is configured as calculating the derived function of the positive sample cumulative function.The anticipation function determination sub-module 1234 is configured as according to institute The fraction fitting for stating derived function and each training sample obtains anticipation function.

In a kind of possible implementation, the positive sample cumulative function determination sub-module 1232 is accumulated including positive sample It is worth determination sub-module and positive sample cumulative function fitting submodule.The positive sample accumulated value determination sub-module is configured to determine that The corresponding positive sample accumulated value C (k) of k training sample, wherein, C (k) is equal to sequence before k-th of training sample The sample number of positive sample, 1≤k≤N.Positive sample cumulative function fitting submodule is configured as being intended according to C (1) to C (N) Close, obtain the positive sample cumulative function.

On the device in above-described embodiment, wherein modules perform the concrete mode of operation in relevant this method Embodiment in be described in detail, explanation will be not set forth in detail herein.

Fig. 8 is a kind of block diagram of device 800 for being used to predict according to an exemplary embodiment.For example, device 800 Can be mobile phone, computer, digital broadcast terminal, messaging devices, game console, tablet device, Medical Devices, Body-building equipment, personal digital assistant etc..

Reference picture 8, device 800 can include following one or more assemblies：Processing assembly 802, memory 804, power supply Component 806, multimedia groupware 808, audio-frequency assembly 810, the interface 812 of input/output (I/O), sensor cluster 814, and Communication component 816.

The integrated operation of the usual control device 800 of processing assembly 802, such as with display, call, data communication, phase Machine operates the operation associated with record operation.Processing assembly 802 can refer to including one or more processors 820 to perform Order, to complete all or part of step of above-mentioned method.In addition, processing assembly 802 can include one or more modules, just Interaction between processing assembly 802 and other assemblies.For example, processing assembly 802 can include multi-media module, it is many to facilitate Interaction between media component 808 and processing assembly 802.

Memory 804 is configured as storing various types of data supporting the operation in device 800.These data are shown Example includes the instruction of any application program or method for being operated on device 800, and contact data, telephone book data disappears Breath, picture, video etc..Memory 804 can be by any kind of volatibility or non-volatile memory device or their group Close and realize, such as static RAM (SRAM), Electrically Erasable Read Only Memory (EEPROM) is erasable to compile Journey read-only storage (EPROM), programmable read only memory (PROM), read-only storage (ROM), magnetic memory, flash Device, disk or CD.

Power supply module 806 provides electric power for the various assemblies of device 800.Power supply module 806 can include power management system System, one or more power supplys, and other components associated with generating, managing and distributing electric power for device 800.

Multimedia groupware 808 is included in the screen of one output interface of offer between described device 800 and user.One In a little embodiments, screen can include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch-screen, to receive the input signal from user.Touch panel includes one or more touch sensings Device is with the gesture on sensing touch, slip and touch panel.The touch sensor can not only sensing touch or sliding action Border, but also detection touches or slide related duration and pressure with described.In certain embodiments, many matchmakers Body component 808 includes a front camera and/or rear camera.When device 800 be in operator scheme, such as screening-mode or During video mode, front camera and/or rear camera can receive the multi-medium data of outside.Each front camera and Rear camera can be a fixed optical lens system or with focusing and optical zoom capabilities.

Audio-frequency assembly 810 is configured as output and/or input audio signal.For example, audio-frequency assembly 810 includes a Mike Wind (MIC), when device 800 be in operator scheme, when such as call model, logging mode and speech recognition mode, microphone by with It is set to reception external audio signal.The audio signal received can be further stored in memory 804 or via communication set Part 816 is sent.In certain embodiments, audio-frequency assembly 810 also includes a loudspeaker, for exports audio signal.

I/O interfaces 812 is provide interface between processing assembly 802 and peripheral interface module, above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include but be not limited to：Home button, volume button, start button and lock Determine button.

Sensor cluster 814 includes one or more sensors, and the state for providing various aspects for device 800 is commented Estimate.For example, sensor cluster 814 can detect opening/closed mode of device 800, the relative positioning of component is for example described Component is the display and keypad of device 800, and sensor cluster 814 can be with 800 1 components of detection means 800 or device Position change, the existence or non-existence that user contacts with device 800, the orientation of device 800 or acceleration/deceleration and device 800 Temperature change.Sensor cluster 814 can include proximity transducer, be configured to detect in not any physical contact The presence of neighbouring object.Sensor cluster 814 can also include optical sensor, such as CMOS or ccd image sensor, for into As being used in application.In certain embodiments, the sensor cluster 814 can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 816 is configured to facilitate the communication of wired or wireless way between device 800 and other equipment.Device 800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G, or combinations thereof.In an exemplary implementation In example, communication component 816 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 816 also includes near-field communication (NFC) module, to promote junction service.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, device 800 can be believed by one or more application specific integrated circuits (ASIC), numeral Number processor (DSP), digital signal processing appts (DSPD), PLD (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for performing the above method.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instructing, example are additionally provided Such as include the memory 804 of instruction, above-mentioned instruction can be performed to complete the above method by the processor 820 of device 800.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..

Fig. 9 is a kind of block diagram of device 1900 for being used to predict according to an exemplary embodiment.For example, device 1900 may be provided in a server.Reference picture 9, device 1900 include processing assembly 1922, its further comprise one or Multiple processors, and as the memory resource representated by memory 1932, for store can by processing assembly 1922 execution Instruction, such as application program.The application program stored in memory 1932 can include it is one or more each Corresponding to the module of one group of instruction.In addition, processing assembly 1922 is configured as execute instruction, to perform the above method.

Device 1900 can also include the power management that a power supply module 1926 is configured as performs device 1900, one Wired or wireless network interface 1950 is configured as device 1900 being connected to network, and input and output (I/O) interface 1958.Device 1900 can be operated based on the operating system for being stored in memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instructing, example are additionally provided Such as include the memory 1932 of instruction, above-mentioned instruction can be performed to complete the above method by the processing assembly 1922 of device 1900. For example, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, Floppy disk and optical data storage devices etc..

Those skilled in the art will readily occur to its of the disclosure after considering specification and putting into practice invention disclosed herein Its embodiment.The application is intended to any modification, purposes or the adaptations of the disclosure, these modifications, purposes or Person's adaptations follow the general principle of the disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.Description and embodiments are considered only as exemplary, and the true scope of the disclosure and spirit are by following Claim is pointed out.

It should be appreciated that the precision architecture that the disclosure is not limited to be described above and is shown in the drawings, and And various modifications and changes can be being carried out without departing from the scope.The scope of the present disclosure is only limited by appended claim.

Claims

1. a kind of Forecasting Methodology of probability, it is characterised in that including：

According to the fraction and the anticipation function of the object to be predicted, the object value to the object to be predicted meets described The probability of predicted condition is predicted.

2. according to the method described in claim 1, it is characterised in that according to the feature of each training sample and each feature Weight determines the fraction of each training sample, including：

According to the feature of each training sample, the corresponding characteristic vector of each training sample is determined, wherein, each characteristic vector point Not Bao Kuo m feature, m is positive integer；

According to the corresponding characteristic vector of each training sample and the weight of each feature, point of each training sample is determined respectively Number.

3. method according to claim 2, it is characterised in that according to the corresponding characteristic vector of each training sample, it is determined that The weight of each feature, including：

4. method according to claim 3, it is characterised in that according to each sample to corresponding vector difference, determine each The weight of feature, including：

The weight of each feature is determined using formula 1 and formula 2；

Wherein, M represents the quantity of sample pair, and w represents the common weight vectors of the corresponding characteristic vector of each training sample, w^TTable Show w transposed vector, x_lRepresent the corresponding characteristic vector of l-th of training sample that the training sample is concentrated, x_rRepresent the instruction Practice the corresponding characteristic vector of r-th of training sample in sample set, 1≤l≤N, 1≤r≤N, l ≠ r, w_iRepresent ith iteration Calculate the result obtained, w_i+1The result that i+1 time iterative calculation is obtained is represented, 1≤i≤m-1, η represents the first coefficient.

5. method according to claim 2, it is characterised in that according to the corresponding characteristic vector of each training sample and each The weight of feature, determines the fraction of each training sample respectively, including：

The fraction s of j-th of training sample is determined using formula 3_j；

s_j=w^Tx_jFormula 3；

6. according to the method described in claim 1, it is characterised in that according to each training sample after predicted condition and sequence Sample value fitting obtain anticipation function, including：

Calculate the derived function of the positive sample cumulative function；

7. method according to claim 6, it is characterised in that positive sample accumulation letter is obtained according to the fitting of the positive sample of determination Number, including：

The corresponding positive sample accumulated value C (k) of k-th of training sample is determined, wherein, C (k) is equal to sequence in described k-th training The sample number of positive sample before sample, 1≤k≤N；

8. a kind of prediction meanss of probability, it is characterised in that including：

Training sample fraction determining module, for determining each according to the weight of the feature of each training sample and each feature The fraction of training sample；

Anticipation function determining module, for being obtained according to the fitting of the sample value of each training sample after predicted condition and sequence Anticipation function；

Object fraction determining module to be predicted, for described in the weight determination of the feature according to object to be predicted and each feature The fraction of object to be predicted；

Prediction module, for the fraction and the anticipation function according to the object to be predicted, to the object to be predicted The probability that object value meets the predicted condition is predicted.

9. device according to claim 8, it is characterised in that the training sample fraction determining module includes：

Training sample acquisition submodule, for obtaining training sample set, the training sample set includes N number of training sample, wherein, N is positive integer；

Characteristic vector determination sub-module, for the feature according to each training sample, determines the corresponding feature of each training sample Vector, wherein, each characteristic vector includes m feature respectively, and m is positive integer；

Weight determination sub-module, for according to the corresponding characteristic vector of each training sample, determining the weight of each feature；

Training sample fraction determination sub-module, for according to the corresponding characteristic vector of each training sample and the power of each feature Weight, determines the fraction of each training sample respectively.

10. device according to claim 9, it is characterised in that the weight determination sub-module includes：

Sample to determination sub-module, for each training sample for concentrating the training sample respectively with other each training samples This composition sample pair；

Vector difference determination sub-module, the vector difference for determining the corresponding characteristic vector of two training samples of each sample centering；

Weight calculation submodule, for, to corresponding vector difference, determining the weight of each feature according to each sample.

11. device according to claim 10, it is characterised in that the weight calculation submodule is used for：

The weight of each feature is determined using formula 1 and formula 2；

12. device according to claim 9, it is characterised in that the training sample fraction determination sub-module is used for：

The fraction s of j-th of training sample is determined using formula 3_j；

s_j=w^Tx_jFormula 3；

13. device according to claim 8, it is characterised in that the anticipation function determining module includes：

Positive sample determination sub-module, for determining positive sample according to the sample value of each training sample after predicted condition and sequence This, wherein, positive sample is the training sample that sample value meets predicted condition；

Positive sample cumulative function determination sub-module, for obtaining positive sample cumulative function according to the fitting of the positive sample of determination；

Derived function determination sub-module, the derived function for calculating the positive sample cumulative function；

Anticipation function determination sub-module, for obtaining predicting letter according to the fitting of the fraction of the derived function and each training sample Number.

14. device according to claim 13, it is characterised in that the positive sample cumulative function determination sub-module includes：

Positive sample accumulated value determination sub-module, for determining the corresponding positive sample accumulated value C (k) of k-th of training sample, wherein, C (k) it is equal to the sample number of positive sample of the sequence before k-th of training sample, 1≤k≤N；

Positive sample cumulative function is fitted submodule, for being fitted according to C (1) to C (N), obtains the positive sample accumulation letter Number.

15. a kind of prediction meanss of probability, it is characterised in that including：

Processor；

Memory for storing processor-executable instruction；

Wherein, the processor is configured as：

16. a kind of computer-readable recording medium, is stored thereon with computer instruction, it is characterised in that the instruction is by processor The step of method any one of claim 1 to 7 is realized during execution.