CN107563542A - Data predication method and device and electronic equipment - Google Patents

Data predication method and device and electronic equipment Download PDF

Info

Publication number
CN107563542A
CN107563542A CN201710650899.5A CN201710650899A CN107563542A CN 107563542 A CN107563542 A CN 107563542A CN 201710650899 A CN201710650899 A CN 201710650899A CN 107563542 A CN107563542 A CN 107563542A
Authority
CN
China
Prior art keywords
data
sample set
day
linear regression
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710650899.5A
Other languages
Chinese (zh)
Inventor
钱瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201710650899.5A priority Critical patent/CN107563542A/en
Publication of CN107563542A publication Critical patent/CN107563542A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This specification embodiment provides a kind of data predication method and device and electronic equipment, obtains at least one time factor of time series;The time factor is input to gradient lifting decision-tree model;Obtain the first prediction data of the gradient lifting decision-tree model input;First prediction data is input to linear regression model (LRM);Obtain the second prediction data of the linear regression model (LRM) input.

Description

Data predication method and device and electronic equipment
Technical field
This specification embodiment is related to field of computer technology, more particularly to a kind of data predication method and device and electronics Equipment.
Background technology
Data prediction has very big practical value in actual applications, particularly predicts the number in future time sequence According to, such as the financial market trend of stock prices can be predicted, and then formulate reasonably investment tactics and realize maximum gain and minimum Loss;Prediction company circulating fund amount, and then realize Company capital effectively management etc..In general, for the data of time series Prediction, traditionally can individually use such as linear regression, or individually use GBDT (Gradient Boosting Decision Tree, gradient lifting decision tree) return.
The scheme of offer more precisely is needed to carry out data prediction.
The content of the invention
A kind of data predication method and device and electronic equipment that this specification embodiment provides:
A kind of data predication method provided according to this specification embodiment, methods described include:
Obtain the time factor of time series;
The time factor is input to gradient lifting decision-tree model;
Obtain the first prediction data of the gradient lifting decision-tree model input;
First prediction data is input to linear regression model (LRM);
Obtain the second prediction data of the linear regression model (LRM) input.
A kind of data prediction device provided according to this specification embodiment, described device include:
Acquiring unit, obtain the time factor of time series;
First input block, the time factor is input to gradient lifting decision-tree model;
First predicting unit, obtain the first prediction data of the gradient lifting decision-tree model input;
Second input block, first prediction data is input to linear regression model (LRM);
Second predicting unit, obtain the second prediction data of the linear regression model (LRM) input.
The a kind of electronic equipment provided according to this specification embodiment, including:
Processor;
For storing the memory of processor-executable instruction;
Wherein, the processor is configured as:
Obtain the time factor of time series;
The time factor is input to gradient lifting decision-tree model;
Obtain the first prediction data of the gradient lifting decision-tree model input;
First prediction data is input to linear regression model (LRM);
Obtain the second prediction data of the linear regression model (LRM) input.
This specification is namely based on the model (GBDT regression models and linear regression model (LRM)) of the two structures, is getting After the time factor of time series;Decision-tree model is first lifted based on gradient, calculates the first prediction data of the time factor; Linear regression model (LRM) is based on again, calculates the second prediction data of the time factor, second prediction data is final Prediction data.This specification is returned and linear regression progress data prediction by combining GBDT, you can to play linear regression plan The characteristics of conjunction trend, the problem of avoiding being not easy fitted trend when individually using GBDT to return;GBDT recurrence can also be played can Solves such as logical problem with level with fit non-linear, avoiding can not solve with layer when individually using linear regression Secondary property logical problem.In this way, the accuracy of data prediction can be improved.
Brief description of the drawings
Fig. 1 is the data predication method flow chart that the embodiment of this specification one provides;
Fig. 2 is a kind of hardware structure diagram of equipment where the data prediction device that this specification provides;
Fig. 3 is the data prediction apparatus module schematic diagram that the embodiment of this specification one provides;
Fig. 4 is a kind of schematic diagram of the hardware configuration for server that the embodiment of the disclosure one provides;
Fig. 5 is a kind of schematic diagram for server that the embodiment of this specification one provides.
Embodiment
Here exemplary embodiment will be illustrated in detail, its example is illustrated in the accompanying drawings.Following description is related to During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with this specification.On the contrary, they are only and such as institute The example of the consistent apparatus and method of some aspects be described in detail in attached claims, this specification.
It is only merely for the purpose of description specific embodiment in the term that this specification uses, and is not intended to be limiting this explanation Book." one kind " of used singulative, " described " and "the" are also intended to bag in this specification and in the appended claims Most forms are included, unless context clearly shows that other implications.It is also understood that term "and/or" used herein is Refer to and any or all may be combined comprising the associated list items purpose of one or more.
It will be appreciated that though various information may be described using term first, second, third, etc. in this specification, but These information should not necessarily be limited by these terms.These terms are only used for same type of information being distinguished from each other out.For example, do not taking off In the case of this specification scope, the first information can also be referred to as the second information, and similarly, the second information can also be claimed For the first information.Depending on linguistic context, word as used in this " if " can be construed to " ... when " or " when ... " or " in response to determining ".
As previously mentioned, for data prediction, such as linear regression traditionally can be individually used, or individually use GBDT (Gradient Boosting Decision Tree, gradient lifting decision tree) returns.
However, individually by the way of linear regression prediction data, when returning because practically sample data is usual In the presence of it is a large amount of it is nonlinear in the case of, it is therefore desirable to complex curve could be finally fitted by doing substantial amounts of non-linear conversion.The opposing party Face, linear regression with can not solving level nonlinear problem, such as this logical combination of "AND", "or", distance are non- Linear problem.For example, in trading volume is predicted, the feature of two time serieses be present, whether it " is activity that a feature is Day ", whether another be characterized in " being weekend ";Under normal circumstances, the trading volume of active day can show as rising, inactive day Trading volume can show as declining;The trading volume at weekend can be shown as declining, and the trading volume at non-weekend can be shown as Rise;But when be active day be weekend again in the case of (such as because the public holiday will be adjusted to working day at weekend), transaction It is bigger to measure the degree risen;Such case is that linear regression is unpredictable, so situation as once occurring, using linear The result meeting deviation of regression forecasting is larger.
Individually returned to using GBDT, it is necessary to which smoothly sample data could fitted trend.It is described that smoothly sample data can To refer to, the degree that sample data deviates standard deviation is the smaller the better.Actual sample data is usually jiggly, show as by Year inflationary spiral, sample data increase year by year, cause sample data to deviate the degree of standard deviation and also increase year by year.Although can profit This jiggly trend is made up with the mode such as difference processing, logization processing, it can only accomplishing to mitigate and can not do To trend is completely eliminated.Therefore, it is also undesirable using the result precision of GBDT regression forecastings.
In order to provide data prediction scheme more precisely, a kind of data prediction of this specification shown in Fig. 1 refer to Embodiment of the method, as shown in figure 1, this method may comprise steps of:
Step 110:Obtain the time factor of time series.
In this explanation, the time factor can be expressed as some day in a certain year.
To predict in financial market exemplified by the trading volume of stock market, it is assumed that be currently on July 1st, 2017, it is necessary to predict tomorrow The trading volume on July 2nd, 1, then user can input on July 2nd, 2017;Correspondingly, for carrying out the clothes of data prediction The time factor that business end is got can be on July 2nd, 2017.
It is noted that this specification embodiment can also disposably predict data corresponding to multiple time factors.Example As, it is necessary to predict following 1 week data, then service end can get 7 time factors corresponding to 7 days futures.
Step 120:The time factor is input to gradient lifting decision-tree model.
Step 130:Obtain the first prediction data of the gradient lifting decision-tree model input.
In this specification, GBDT (Gradient Boosting Decision Tree) is also known as MART
(Multiple Additive Regression Tree)、GBT(Gradient Boosting Tree)、GTB (Gradient Tree Boosting) or GBRT (Gradient Boosting Regression Tree), this specification In be collectively referred to as GBDT.The GBDT is a kind of decision Tree algorithms of iteration, and the algorithm can be made up of more decision trees, is owned The result of these decision trees can obtain final result after adding up.
In general, GBDT iteration can use it is preceding to Distribution Algorithm (Forward Stagewise Algorithm), and And weak learner can use CART regression tree models.In GBDT iteration, it is assumed that the strong learner that previous round iteration obtains It is ft-1(x), loss function is L (y, ft-1(x)), then the target of epicycle iteration can find a CART regression tree model Weak learner ht(x), so that loss L (y, the f of epicyclet(x))=L (y, ft-1(x))+ht(x) it is minimum.It is to be understood that The decision tree that epicycle iteration is found, it is that the loss of sample to be allowed becomes smaller as far as possible.
In this specification, the gradient lifts decision-making number (GBDT) model, can train to obtain in the following way:
A1:Obtain the sample set for training;The sample set is by some time factor, data to forming;
Such as sample set Z as follows:
Z={ (x1,y1),(x2,y2),(x3,y3),...,(xi,yi),...,(xn,yn), wherein, xiIt can represent i-th Time factor, yiThe data of i-th of time factor can be represented, sample set Z has n group samples.
A2:Train to obtain gradient lifting decision tree mould with reference to the sample set based on gradient lifting decision tree regression algorithm Type.
Generally, gradient lifting decision tree regression algorithm also needs to configure maximum iteration T and loss function L.
The maximum iteration T can be the empirical value artificially set.
The loss function L can use the loss function that calculation is returned for GBDT commonly used in the trade:
The first:Mean square deviation, L (y, f (x))=(y-f (x))2
Second:Definitely loss, L (y, f (x))=| y-f (x) |
Corresponding to negative gradient error is:
sign(yi-f(xi));
The sign functions are sign function.
The third:Huber loses:
Wherein, δ represents it can is the empirical value artificially set;
| y-f (x) | during≤δ,
| y-f (x) | during > δ,
Corresponding to negative gradient error is
Huber losses are a kind of mean square deviations and the compromise mode definitely lost, can be adopted for deep abnormity point With absolute loss, mean square deviation can be used for the point of immediate vicinity.
4th kind:Quantile is lost:
Wherein, θ represents quantile;The quantile can be an empirical value being manually set.
Corresponding to negative gradient error is
It should be noted that above-mentioned loss function is merely illustrative, the not loss letter to specifically using in this specification Number is defined.
Training process introduced below:
First, it is as follows to initialize a weak learner:
Then:Carry out the iterative calculation of T (maximum iteration) wheels;
Iterative process is as follows each time:
B1:According to i=1 in sample set Z, 2 ..., n data, negative gradient is calculated.
Equation below can be passed through by calculating negative gradient:
Wherein, rtiFor i-th of negative gradient,For local derviation symbol, L (y, f (xi)) it is loss function.That is, each X are all corresponding with a negative gradient, i.e. (xi,rti), i=1,2 ..., n.
B2:According to the negative gradient calculated, a CART regression tree is fitted.
As it was previously stated, according to negative gradient (xi,rti), i=1,2 ..., n;Can is fitted a CART regression tree, wherein, Corresponding leaf node region is Rtj, j=1,2 ..., J, J is the number of the CART regression trees leaf node.
B3:According to the area foliage of the CART regression trees, best-fit values are calculated.
Equation below can be passed through by calculating best-fit values:
Wherein, ctjFor best-fit values, RtjFor the area foliage of CART regression trees, and j=1,2 ..., J, J are that CART is returned Return the number of leaf child node.
B4:Update strong learner
B5:Weak learner using the strong learner after renewal as next round iteration, carry out next round iteration;Until reaching Maximum iteration.
Finally, the expression formula (as follows) for iterating to calculate the strong learner drawn for the last time is defined as GBDT moulds Type.
So far, GBDT model constructions are completed.
In this specification, the sample set of input can be with as follows:
Z=
(x11,y11),(x12,y12),(x13,y13),...,(x1n,y1n)
(x21,y21),(x22,y22),(x23,y23),...,(x2n,y2n)
...
(xm1,ym1),(xm2,ym2),(xm3,ym3),...,(xmn,ymn)
}
Wherein, xmnIt can represent that n-th day of m, such as the January 2 of the 1st year can be designated as x12;ymnCan be represented The m data of n-th day.
The sample set Z inputted in one specifically embodiment, during the structure GBDT models, it is also necessary to smoothly located Reason, so as to eliminate average trend and inflationary spiral.
The average trend can refer to, Change in Mean situation year by year, such as increases year by year, reduces year by year Or increase sometimes and reduce sometimes.There is average trend in sample data, easily cause the unstable of sample data, and then GBDT is unable to fitted trend.
For example, the data mean value that the data mean value that the data mean value of the 1st year is the 1, the 2nd year is the 2, the 3rd year is 3, then It can increase year by year taking human as average trend.
The training sample set can eliminate average trend in the following way:
C1:Equation 1 below is fitted using one-variable linear regression combination sample set, calculates slope and the intercept of each year.
ymn=xmnwm+bmFormula 1
Wherein, xmn(independent variable) is n-th day of m in sample set;ymn(dependent variable) is n-th of m in sample set It data;wmFor m slope, bmFor m intercept.
C2:According to the slope and intercept, the average trend based on equation 2 below elimination sample intensive data.
Ymn=ymn-(bm-b1)-xmn*wmFormula 2
Wherein, YmnTo eliminate the m data of n-th day after average trend, ymnTo eliminate m n-th day before average trend Data, bmFor m intercept, b1For the intercept of the 1st year, xmnFor m n-th day, wmFor m slope.
By above-mentioned formula 2, the sample set eliminated after average trend can be with as follows:
Z=
(x11,Y11),(x12,Y12),(x13,Y13),...,(x1n,Y1n)
(x21,Y21),(x22,Y22),(x23,Y23),...,(x2n,Y2n)
...
(xm1,Ym1),(xm2,Ym2),(xm3,Ym3),...,(xmn,Ymn)
}
The inflationary spiral can refer to, standard deviation intensity of variation, such as increase year by year, reduce year by year year by year Or increase sometimes and reduce sometimes.There is inflationary spiral in sample data, easily cause the unstable of sample data, and then GBDT is unable to fitted trend.
It should be noted that eliminating inflationary spiral is performed on the basis of the sample set of average trend is eliminated.
The training sample set can eliminate inflationary spiral in the following way:
D1:In the case of the average trend of the elimination sample intensive data, each year is calculated according to equation below 3 According to arithmetic mean of instantaneous value.
Arithmetic mean of instantaneous value (Arithmetic Mean), also known as average.
Shown in the formula equation below 3 for calculating arithmetic mean of instantaneous value:
Wherein, μmFor m arithmetic mean of instantaneous value, YmnTo eliminate the m data of n-th day after average trend.
Below with the data instance of the 1st year in sample set,
The data of the 1st year include (Y11,Y12,Y13,...,Y1n)
Then, the arithmetic mean of instantaneous value of the 1st year
D2:According to formula 4, the standard deviation of each annual data is calculated.
Standard deviation (Standard Deviation), also known as mean square deviation or standard deviation.Standard deviation can reflect one The dispersion degree of data intensive data.
Shown in the formula equation below 4 for calculating standard deviation:
Wherein, YmiFor the m data of i-th day, i=1,2 ..., n, σmFor m standard deviation, μmFor m calculation Art average value.
Still with the data instance of the 1st year in sample set,
The standard deviation of the 1st year
D3:According to formula 5, the coefficient of expansion of each year of calculating.
Shown in the formula equation below 5 for calculating the coefficient of expansion:
Wherein, PmFor the m coefficient of expansion, σmFor m standard deviation, σ1For the standard deviation of the 1st year.
D4:According to the coefficient of expansion, the inflationary spiral of sample intensive data is eliminated based on equation 6 below.
Ymn'=μm+(Ymnm)*PmFormula 6
Wherein, Ymn' it is to eliminate the m data of n-th day after inflationary spiral, μmFor m arithmetic mean of instantaneous value, YmnTo disappear Except the m data of n-th day, P before inflationary spiralmFor the m coefficient of expansion.
By above-mentioned formula 6, the sample set eliminated after inflationary spiral can be with as follows:
Z=
(x11,Y11'),(x12,Y12'),(x13,Y13'),...,(x1n,Y1n')
(x21,Y21'),(x22,Y22'),(x23,Y23'),...,(x2n,Y2n')
...
(xm1,Ym1'),(xm2,Ym2'),(xm3,Ym3'),...,(xmn,Ymn')
}
By said process, the average trend for the GBDT sample intensive datas modeled and inflationary spiral are eliminated, is made It is steady to obtain sample intensive data;So, it is easy to GBDT fitted trends, so as to find periodicity, seasonal, festivals or holidays data Rule.
Step 140:First prediction data is input to linear regression model (LRM).
Step 150:Obtain the second prediction data of the linear regression model (LRM) input.
The linear regression model (LRM) trains to obtain in the following way:
E1:The sample set is inputted into the gradient lifting decision-tree model.
E2:Obtain the recurrence sample set of the gradient lifting decision-tree model input.
As it was previously stated, after GBDT models are constructed, can continue according to the GBDT models to the x in sample set ZmnEnter Line number is it was predicted that generation recurrence sample set is as follows:
Z=
(x11,z11),(x12,z12),(x13,z13),...,(x1n,z1n)
(x21,z21),(x22,z22),(x23,z23),...,(x2n,z2n)
...
(xm1,zm1),(xm2,zm2),(xm3,zm3),...,(xmn,zmn)
}
Wherein, xmnN-th day of m, z can be representedmnN-th day of the m that GBDT model predictions obtain can be represented Data.
E3:Train to obtain linear regression model (LRM) with reference to the recurrence sample set based on linear regression algorithm.
Combined based on linear regression algorithm and thus return sample set training linear regression model (LRM).
Linear regression (Linear Regression), be it is a kind of using linear regression side into least square function pair one The regression analysis that relation is modeled between individual or multiple independents variable and dependent variable.This function generally can be one or more The referred to as linear combination of the model parameter of regression coefficient.
In regression analysis, an independent variable and a dependent variable are only included, and the relation of the two can be near with straight line Like expression, this regression analysis is referred to as simple linear regression analysis.If regression analysis include it is two or more from Variable, and be linear relationship between dependent variable and independent variable, then referred to as multiple linear regression analysis.In this specification, use It is simple linear regression analysis.
Based on linear regression algorithm with reference to the recurrence sample set, can train to obtain linear regression model (LRM).That is, by GBDT The input of the prediction data that model obtains linear regression the most.
Shown in the expression-form equation below 7 of linear regression, it can typically be expressed as:
zmn=xmnW+b formula 7
Wherein, w is slope, and b is intercept.It should be noted that the linear regression can be multiple linear regression, with preceding public affairs One-variable linear regression shown in formula 1 is different.
After optimal w and b is found, the linear regression model (LRM), which is just built, to be completed.
The linear regression expression formula finally given is the linear regression model (LRM).
This specification is namely based on the model (GBDT regression models and linear regression model (LRM)) of the two structures, is getting After the time factor of time series;Decision-tree model is first lifted based on gradient, calculates the first prediction data of the time factor; Linear regression model (LRM) is based on again, calculates the second prediction data of the time factor, second prediction data is final Prediction data.
This specification is returned and linear regression progress data prediction by combining GBDT, you can to play linear regression fit The characteristics of trend, the problem of avoiding being not easy fitted trend when individually using GBDT to return;Can also play that GBDT returns can be with Fit non-linear solves such as logical problem with level, and avoiding can not solve with level when individually using linear regression Property logical problem.In this way, the accuracy of data prediction can be improved.
Further, using one-variable linear regression fitted trend, slope and intercept are obtained, and according to the slope and intercept Eliminate the average trend of data;Further, the coefficient of expansion is obtained by calculating standard deviation, and according to the coefficient of expansion Eliminate the inflationary spiral of data so that the sample data region of training pattern is steady, the data that the model prediction so built goes out It is more accurate.
Corresponding with foregoing data predication method embodiment, this specification additionally provides the implementation of data prediction device Example.Described device embodiment can be realized by software, can also be realized by way of hardware or software and hardware combining.With soft It is by nonvolatile memory by the processor of equipment where it as the device on a logical meaning exemplified by part is realized In corresponding computer program instructions read in internal memory what operation was formed.For hardware view, as shown in Fig. 2 being this theory A kind of hardware structure diagram of equipment where bright book data prediction device, except the processor shown in Fig. 2, network interface, internal memory with And outside nonvolatile memory, the equipment in embodiment where device may be used also generally according to the actual functional capability of the data prediction Including other hardware, to be repeated no more to this.
Data predication method embodiment shown in corresponding diagram 1, the data that the embodiment of this specification one shown in Figure 3 provides The module map of prediction meanss, described device include:
Acquiring unit 310, obtain the time factor of time series;
First input block 320, the time factor is input to gradient lifting decision-tree model;
First predicting unit 330, obtain the first prediction data of the gradient lifting decision-tree model input;
Second input block 340, first prediction data is input to linear regression model (LRM);
Second predicting unit 350, obtain the second prediction data of the linear regression model (LRM) input.
In an optional embodiment:
The gradient lifting decision-tree model is obtained by following subelement training:
Subelement is obtained, obtains the sample set for training;The sample set is by some time factor, data to forming;
First training subelement, train to obtain gradient carrying with reference to the sample set based on gradient lifting decision tree regression algorithm Rise decision-tree model;
The linear regression model (LRM) is obtained by following subelement training:
Subelement is inputted, the sample set is inputted into the gradient lifts decision-tree model;
Subelement is exported, obtains the recurrence sample set of the gradient lifting decision-tree model input;
Second training subelement, train to obtain linear regression mould with reference to the recurrence sample set based on linear regression algorithm Type.
In an optional embodiment:
Before the described first training subelement, described device also includes:
Subelement is eliminated, eliminates average trend and the inflationary spiral of the sample intensive data.
In an optional embodiment:
The sample set eliminates average trend by following subelement:
First computation subunit, using one-variable linear regression with reference to the sample set fitting formula ymn=xmnwm+bm, calculate The slope and intercept of each year;Wherein, xmnFor n-th day of m in sample set;ymnFor n-th day of m in sample set Data;wmFor m slope, bmFor m intercept;
Second computation subunit, according to the slope and intercept, and it is based on formula Ymn=ymn-(bm-b1)-xmn*wmEliminate The average trend of sample intensive data;Wherein, YmnTo eliminate the m data of n-th day after average trend, ymnTo eliminate average The m data of n-th day before trend, bmFor m intercept, b1For the intercept of the 1st year, xmnFor m n-th day, wmFor m The slope in year.
In an optional embodiment:
The sample set eliminates inflationary spiral by following subelement:
3rd computation subunit, in the case of the average trend of the elimination sample intensive data, according to formulaCalculate the arithmetic mean of instantaneous value of each annual data;Wherein, μmFor m arithmetic Average value, YmnTo eliminate the m data of n-th day after average trend;
4th computation subunit, according to formulaCalculate the mark of each annual data It is accurate poor;Wherein, YmiFor the m data of i-th day, i=1,2 ..., n, σmFor m standard deviation, μmFor m arithmetic Average value;
5th computation subunit, according to formulaCalculate the coefficient of expansion of each year;Wherein, PmFor m The coefficient of expansion, σmFor m standard deviation, σ1For the standard deviation of the 1st year;
6th computation subunit, according to the coefficient of expansion, based on formula Ymn'=μm+(Ymnm)*PmEliminate sample set The inflationary spiral of middle data;Wherein, Ymn' it is to eliminate the m data of n-th day after inflationary spiral, μmFor m arithmetic average Value, YmnTo eliminate the m data of n-th day before inflationary spiral, PmFor the m coefficient of expansion.
System, device, module or the unit that above-described embodiment illustrates, it can specifically be realized by computer chip or entity, Or realized by the product with certain function.One kind typically realizes that equipment is computer, and the concrete form of computer can To be personal computer, laptop computer, cell phone, camera phone, smart phone, personal digital assistant, media play In device, navigation equipment, E-mail receiver/send equipment, game console, tablet PC, wearable device or these equipment The combination of any several equipment.
The function of unit and the implementation process of effect specifically refer to and step are corresponded in the above method in said apparatus Implementation process, it will not be repeated here.
For device embodiment, because it corresponds essentially to embodiment of the method, so related part is real referring to method Apply the part explanation of example.Device embodiment described above is only schematical, wherein described be used as separating component The unit of explanation can be or may not be physically separate, can be as the part that unit is shown or can also It is not physical location, you can with positioned at a place, or can also be distributed on multiple NEs.Can be according to reality Need to select some or all of module therein to realize the purpose of this specification scheme.Those of ordinary skill in the art are not In the case of paying creative work, you can to understand and implement.
Figure 3 above describes inner function module and the structural representation of data prediction device, its substantial executive agent Can be a kind of electronic equipment, such as server, Fig. 4 are a kind of hardware knots of server according to an exemplary embodiment The schematic diagram of structure, reference picture 4, the server can include:
Processor;
For storing the memory of processor-executable instruction;
Wherein, the processor is configured as:
Obtain the time factor of time series;
The time factor is input to gradient lifting decision-tree model;
Obtain the first prediction data of the gradient lifting decision-tree model input;
First prediction data is input to linear regression model (LRM);
Obtain the second prediction data of the linear regression model (LRM) input.
Preferably, the gradient lifting decision-tree model trains to obtain in the following way:
Obtain the sample set for training;The sample set is by some time factor, data to forming;
Train to obtain gradient lifting decision-tree model with reference to the sample set based on gradient lifting decision tree regression algorithm;
The linear regression model (LRM) trains to obtain in the following way:
The sample set is inputted into the gradient lifting decision-tree model;
Obtain the recurrence sample set of the gradient lifting decision-tree model input;
Train to obtain linear regression model (LRM) with reference to the recurrence sample set based on linear regression algorithm.
Preferably, train to obtain gradient lifting with reference to the sample set based on gradient lifting decision tree regression algorithm described Before decision-tree model, methods described also includes:
Eliminate average trend and the inflationary spiral of the sample intensive data.
Preferably, the sample set eliminates average trend in the following way:
Using one-variable linear regression with reference to the sample set fitting formula ymn=xmnwm+bm, calculate each year slope and Intercept;Wherein, xmnFor n-th day of m in sample set;ymnFor the data of n-th day of m in sample set;wmFor m's Slope, bmFor m intercept;
According to the slope and intercept, and it is based on formula Ymn=ymn-(bm-b1)-xmn*wmEliminate the equal of sample intensive data Value trend;Wherein, YmnTo eliminate the m data of n-th day after average trend, ymnTo eliminate m n-th day before average trend Data, bmFor m intercept, b1For the intercept of the 1st year, xmnFor m n-th day, wmFor m slope.
Preferably, the sample set eliminates inflationary spiral in the following way:
In the case of the average trend of the elimination sample intensive data, according to formulaCalculate the arithmetic mean of instantaneous value of each annual data;Wherein, μmFor m arithmetic Average value, YmnTo eliminate the m data of n-th day after average trend;
According to formulaCalculate the standard deviation of each annual data;Wherein, YmiFor The m data of i-th day, i=1,2 ..., n, σmFor m standard deviation, μmFor m arithmetic mean of instantaneous value;
According to formulaCalculate the coefficient of expansion of each year;Wherein, PmFor the m coefficient of expansion, σmFor M standard deviation, σ1For the standard deviation of the 1st year;
According to the coefficient of expansion, based on formula Ymn'=μm+(Ymnm)*PmThe expansion for eliminating sample intensive data becomes Gesture;Wherein, Ymn' it is to eliminate the m data of n-th day after inflationary spiral, μmFor m arithmetic mean of instantaneous value, YmnIt is swollen to eliminate The m data of n-th day, P before swollen trendmFor the m coefficient of expansion.
In the embodiment of above-mentioned electronic equipment, it should be appreciated that the processor can be CPU (English: Central Processing Unit, referred to as:CPU), it can also be other general processors, digital signal processor (English: Digital Signal Processor, referred to as:DSP), application specific integrated circuit (English:Application Specific Integrated Circuit, referred to as:ASIC) etc..General processor can be microprocessor or the processor can also be Any conventional processor etc., and foregoing memory can be read-only storage (English:Read-only memory, abbreviation: ROM), random access memory (English:Random access memory, referred to as:RAM), flash memory, hard disk or solid State hard disk.The step of method with reference to disclosed in the embodiment of the present invention, can be embodied directly in hardware processor and perform completion, or Hardware and software module combination in person's processor perform completion.
Fig. 5 is a kind of schematic diagram of server 1000 according to an exemplary embodiment.Reference picture 5, server 1000 include processing component 1022, and it further comprises one or more processors, and as depositing representated by memory 1032 Memory resource, can be by the instruction of the execution of processing component 1022, such as application program for storing.Stored in memory 1032 Application program can include it is one or more each correspond to the module of one group of instruction.In addition, processing component 1022 Execute instruction is configured as, to perform all or part of step of the above-mentioned picture retrieval method based on convolutional neural networks.
Server 1000 can also include the power management that a power supply module 1026 is configured as execute server 1000, One wired or wireless network interface 1050 is configured as server 1000 being connected to network, and an input and output (I/O) Interface 1058.Server 1000 can be operated based on the operating system for being stored in memory 1032, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment Divide mutually referring to what each embodiment stressed is the difference with other embodiment.Set especially for electronics For standby embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is real referring to method Apply the part explanation of example.
Those skilled in the art will readily occur to this specification after considering specification and putting into practice invention disclosed herein Other embodiments.This specification is intended to any modification, purposes or adaptations of this specification, these modifications, Purposes or adaptations follow the general principle of this specification and undocumented in the art including this specification Common knowledge or conventional techniques.Description and embodiments be considered only as it is exemplary, the true scope of this specification and Spirit is pointed out by following claim.
It should be appreciated that the precision architecture that this specification is not limited to be described above and is shown in the drawings, And various modifications and changes can be being carried out without departing from the scope.The scope of this specification is only limited by appended claim System.

Claims (11)

1. a kind of data predication method, methods described include:
Obtain the time factor of time series;
The time factor is input to gradient lifting decision-tree model;
Obtain the first prediction data of the gradient lifting decision-tree model input;
First prediction data is input to linear regression model (LRM);
Obtain the second prediction data of the linear regression model (LRM) input.
2. according to the method for claim 1, the gradient lifting decision-tree model trains to obtain in the following way:
Obtain the sample set for training;The sample set is by some time factor, data to forming;
Train to obtain gradient lifting decision-tree model with reference to the sample set based on gradient lifting decision tree regression algorithm;
The linear regression model (LRM) trains to obtain in the following way:
The sample set is inputted into the gradient lifting decision-tree model;
Obtain the recurrence sample set of the gradient lifting decision-tree model input;
Train to obtain linear regression model (LRM) with reference to the recurrence sample set based on linear regression algorithm.
3. decision tree regression algorithm according to the method for claim 2, is lifted with reference to the sample set based on gradient described Before training obtains gradient lifting decision-tree model, methods described also includes:
Eliminate average trend and the inflationary spiral of the sample intensive data.
4. according to the method for claim 3, the sample set eliminates average trend in the following way:
Using one-variable linear regression with reference to the sample set fitting formula ymn=xmnwm+bm, the slope and intercept of each year of calculating; Wherein, xmnFor n-th day of m in sample set;ymnFor the data of n-th day of m in sample set;wmFor m slope, bmFor m intercept;
According to the slope and intercept, and it is based on formula Ymn=ymn-(bm-b1)-xmn*wmThe average for eliminating sample intensive data becomes Gesture;Wherein, YmnTo eliminate the m data of n-th day after average trend, ymnTo eliminate the m numbers of n-th day before average trend According to bmFor m intercept, b1For the intercept of the 1st year, xmnFor m n-th day, wmFor m slope.
5. according to the method for claim 4, the sample set eliminates inflationary spiral in the following way:
In the case of the average trend of the elimination sample intensive data, according to formulaCalculate the arithmetic mean of instantaneous value of each annual data;Wherein, μmFor m arithmetic Average value, YmnTo eliminate the m data of n-th day after average trend;
According to formulaCalculate the standard deviation of each annual data;Wherein, YmiFor m The data of i-th day, i=1,2 ..., n, σmFor m standard deviation, μmFor m arithmetic mean of instantaneous value;
According to formulaCalculate the coefficient of expansion of each year;Wherein, PmFor the m coefficient of expansion, σmFor m Standard deviation, σ1For the standard deviation of the 1st year;
According to the coefficient of expansion, based on formula Ymn'=μm+(Ymnm)*PmEliminate the inflationary spiral of sample intensive data;Its In, Ymn' it is to eliminate the m data of n-th day after inflationary spiral, μmFor m arithmetic mean of instantaneous value, YmnTo eliminate inflationary spiral The preceding m data of n-th day, PmFor the m coefficient of expansion.
6. a kind of data prediction device, described device include:
Acquiring unit, obtain the time factor of time series;
First input block, the time factor is input to gradient lifting decision-tree model;
First predicting unit, obtain the first prediction data of the gradient lifting decision-tree model input;
Second input block, first prediction data is input to linear regression model (LRM);
Second predicting unit, obtain the second prediction data of the linear regression model (LRM) input.
7. device according to claim 6, the gradient lifting decision-tree model is obtained by following subelement training:
Subelement is obtained, obtains the sample set for training;The sample set is by some time factor, data to forming;
First training subelement, train to obtain gradient lifting with reference to the sample set based on gradient lifting decision tree regression algorithm and determine Plan tree-model;
The linear regression model (LRM) is obtained by following subelement training:
Subelement is inputted, the sample set is inputted into the gradient lifts decision-tree model;
Subelement is exported, obtains the recurrence sample set of the gradient lifting decision-tree model input;
Second training subelement, train to obtain linear regression model (LRM) with reference to the recurrence sample set based on linear regression algorithm.
8. device according to claim 7, before the described first training subelement, described device also includes:
Subelement is eliminated, eliminates average trend and the inflationary spiral of the sample intensive data.
9. device according to claim 8, the sample set eliminates average trend by following subelement:
First computation subunit, using one-variable linear regression with reference to the sample set fitting formula ymn=xmnwm+bm, calculate each The slope and intercept in year;Wherein, xmnFor n-th day of m in sample set;ymnFor the data of n-th day of m in sample set; wmFor m slope, bmFor m intercept;
Second computation subunit, according to the slope and intercept, and it is based on formula Ymn=ymn-(bm-b1)-xmn*wmEliminate sample The average trend of intensive data;Wherein, YmnTo eliminate the m data of n-th day after average trend, ymnTo eliminate average trend The preceding m data of n-th day, bmFor m intercept, b1For the intercept of the 1st year, xmnFor m n-th day, wmFor m's Slope.
10. device according to claim 9, the sample set eliminates inflationary spiral by following subelement:
3rd computation subunit, in the case of the average trend of the elimination sample intensive data, according to formulaCalculate the arithmetic mean of instantaneous value of each annual data;Wherein, μmFor m arithmetic Average value, YmnTo eliminate the m data of n-th day after average trend;
4th computation subunit, according to formulaCalculate the standard deviation of each annual data; Wherein, YmiFor the m data of i-th day, i=1,2 ..., n, σmFor m standard deviation, μmFor m arithmetic average Value;
5th computation subunit, according to formulaCalculate the coefficient of expansion of each year;Wherein, PmFor the swollen of m Swollen coefficient, σmFor m standard deviation, σ1For the standard deviation of the 1st year;
6th computation subunit, according to the coefficient of expansion, based on formula Ymn'=μm+(Ymnm)*PmEliminate sample intensive data Inflationary spiral;Wherein, Ymn' it is to eliminate the m data of n-th day after inflationary spiral, μmFor m arithmetic mean of instantaneous value, Ymn To eliminate the m data of n-th day before inflationary spiral, PmFor the m coefficient of expansion.
11. a kind of electronic equipment, including:
Processor;
For storing the memory of processor-executable instruction;
Wherein, the processor is configured as:
Obtain the time factor of time series;
The time factor is input to gradient lifting decision-tree model;
Obtain the first prediction data of the gradient lifting decision-tree model input;
First prediction data is input to linear regression model (LRM);
Obtain the second prediction data of the linear regression model (LRM) input.
CN201710650899.5A 2017-08-02 2017-08-02 Data predication method and device and electronic equipment Pending CN107563542A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710650899.5A CN107563542A (en) 2017-08-02 2017-08-02 Data predication method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710650899.5A CN107563542A (en) 2017-08-02 2017-08-02 Data predication method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN107563542A true CN107563542A (en) 2018-01-09

Family

ID=60974199

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710650899.5A Pending CN107563542A (en) 2017-08-02 2017-08-02 Data predication method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN107563542A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427658A (en) * 2018-03-12 2018-08-21 北京奇艺世纪科技有限公司 A kind of data predication method, device and electronic equipment
CN108597227A (en) * 2018-05-29 2018-09-28 重庆大学 Road traffic flow forecasting method under freeway toll station
CN108696543A (en) * 2018-08-24 2018-10-23 海南大学 Distributed reflection Denial of Service attack detection based on depth forest, defence method
CN108763314A (en) * 2018-04-26 2018-11-06 深圳市腾讯计算机系统有限公司 A kind of interest recommends method, apparatus, server and storage medium
CN109300046A (en) * 2018-08-01 2019-02-01 平安科技(深圳)有限公司 Electronic device, the vehicle insurance based on the road conditions factor survey dispatching method and storage medium
CN110245802A (en) * 2019-06-20 2019-09-17 杭州安脉盛智能技术有限公司 Based on the cigarette void-end rate prediction technique and system for improving gradient promotion decision tree
CN112804943A (en) * 2018-10-03 2021-05-14 株式会社岛津制作所 Method for creating learning completion model, luminance adjustment method, and image processing apparatus
CN113793507A (en) * 2021-11-16 2021-12-14 湖南工商大学 Available parking space prediction method and device, computer equipment and storage medium
CN115102202A (en) * 2022-07-25 2022-09-23 中国华能集团清洁能源技术研究院有限公司 Energy storage control method based on rolling type real-time electricity price prediction

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427658A (en) * 2018-03-12 2018-08-21 北京奇艺世纪科技有限公司 A kind of data predication method, device and electronic equipment
CN108763314A (en) * 2018-04-26 2018-11-06 深圳市腾讯计算机系统有限公司 A kind of interest recommends method, apparatus, server and storage medium
US11593894B2 (en) 2018-04-26 2023-02-28 Tencent Technology (Shenzhen) Company Limited Interest recommendation method, computer device, and storage medium
CN108763314B (en) * 2018-04-26 2021-01-19 深圳市腾讯计算机系统有限公司 Interest recommendation method, device, server and storage medium
CN108597227B (en) * 2018-05-29 2021-05-25 重庆大学 Method for predicting traffic flow of lower lane of highway toll station
CN108597227A (en) * 2018-05-29 2018-09-28 重庆大学 Road traffic flow forecasting method under freeway toll station
CN109300046A (en) * 2018-08-01 2019-02-01 平安科技(深圳)有限公司 Electronic device, the vehicle insurance based on the road conditions factor survey dispatching method and storage medium
CN108696543A (en) * 2018-08-24 2018-10-23 海南大学 Distributed reflection Denial of Service attack detection based on depth forest, defence method
CN108696543B (en) * 2018-08-24 2021-01-05 海南大学 Distributed reflection denial of service attack detection and defense method based on deep forest
CN112804943A (en) * 2018-10-03 2021-05-14 株式会社岛津制作所 Method for creating learning completion model, luminance adjustment method, and image processing apparatus
CN112804943B (en) * 2018-10-03 2023-09-26 株式会社岛津制作所 Learning completion model creation method, brightness adjustment method, and image processing apparatus
CN110245802B (en) * 2019-06-20 2021-08-24 杭州安脉盛智能技术有限公司 Cigarette empty-head rate prediction method and system based on improved gradient lifting decision tree
CN110245802A (en) * 2019-06-20 2019-09-17 杭州安脉盛智能技术有限公司 Based on the cigarette void-end rate prediction technique and system for improving gradient promotion decision tree
CN113793507A (en) * 2021-11-16 2021-12-14 湖南工商大学 Available parking space prediction method and device, computer equipment and storage medium
CN115102202A (en) * 2022-07-25 2022-09-23 中国华能集团清洁能源技术研究院有限公司 Energy storage control method based on rolling type real-time electricity price prediction
CN115102202B (en) * 2022-07-25 2022-11-29 中国华能集团清洁能源技术研究院有限公司 Energy storage control method based on rolling type real-time electricity price prediction

Similar Documents

Publication Publication Date Title
CN107563542A (en) Data predication method and device and electronic equipment
Khare et al. A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees
Celik et al. Evaluating and forecasting banking crises through neural network models: An application for Turkish banking sector
CN109902222A (en) Recommendation method and device
Shachmurove Applying artificial neural networks to business, economics and finance
CN112580733B (en) Classification model training method, device, equipment and storage medium
CN108564393A (en) Potential customers' methods of marking, device and system
Wu et al. Wavelet fuzzy cognitive maps
Odegua Predicting bank loan default with extreme gradient boosting
CN115018656B (en) Risk identification method, and training method, device and equipment of risk identification model
CN115357554A (en) Graph neural network compression method and device, electronic equipment and storage medium
Cao et al. Gamma and vega hedging using deep distributional reinforcement learning
Ngo et al. Does reinforcement learning outperform deep learning and traditional portfolio optimization models in frontier and developed financial markets?
CN112292699A (en) Determining action selection guidelines for an execution device
US20210406932A1 (en) Information processing apparatus, information processing method and program thereof
Modhej et al. Integrating inverse data envelopment analysis and neural network to preserve relative efficiency values
Kalaycı et al. Optimal model description of finance and human factor indices
CN116383708A (en) Transaction account identification method and device
CN112470123A (en) Determining action selection guidelines for an execution device
Egloff et al. Optimal importance sampling for credit portfolios with stochastic approximation
CN114529399A (en) User data processing method, device, computer equipment and storage medium
CN111179070A (en) Loan risk timeliness prediction system and method based on LSTM
Prüser et al. Nonlinearities in macroeconomic tail risk through the lens of big data quantile regressions
Balibek et al. A visual interactive approach for scenario-based stochastic multi-objective problems and an application
US11989777B2 (en) Pairing and grouping user profiles accessed from pre-current systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1248876

Country of ref document: HK

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20191209

Address after: P.O. Box 31119, grand exhibition hall, hibiscus street, 802 West Bay Road, Grand Cayman, Cayman Islands

Applicant after: Innovative advanced technology Co., Ltd

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Co., Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180109