CN107885967A - A kind of regression model hyperparameter optimization method - Google Patents
A kind of regression model hyperparameter optimization method Download PDFInfo
- Publication number
- CN107885967A CN107885967A CN201710997220.XA CN201710997220A CN107885967A CN 107885967 A CN107885967 A CN 107885967A CN 201710997220 A CN201710997220 A CN 201710997220A CN 107885967 A CN107885967 A CN 107885967A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- parameter
- comparability
- hyper parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16Z—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS, NOT OTHERWISE PROVIDED FOR
- G16Z99/00—Subject matter not provided for in other main groups of this subclass
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of regression model hyperparameter optimization method, including Step 1: it is p to train hyper parameter successively on all data setslRegression model, obtain candidate's hyper parameter models that L is trained;Step 2, each candidate's hyper parameter model is obtained in each sample (xi, yi) on error;Step 3, calculated direction similarity matrix;Step 4, the Comparability for calculating parameters;Step 5, the hyper parameter with minimum Comparability is found, returned as optimized parameter;Parameter optimisation procedure provided by the present invention need not be manually set other parameters, and optimization process is disturbed without subjectivity;The optimization process of the present invention divides without data, has high efficiency and certainty;The main calculating section of the inventive method is relatively independent, can be with parallel processing in big data;The efficiency of the inventive method is 68 times of cross validation method.
Description
Technical field
The invention belongs to machine learning Optimization Modeling field, and in particular to a kind of hyperparameter optimization side towards regression model
Method.
Background technology
Under the background of data and information rapid growth, core driver amount of the machine learning as data mining,
As key link indispensable in knowledge extraction process.Rule of thumb between inferred from input data input data and output data
Corresponding relation is machine learning a kind of major issue to be solved.When input data and output data are numeric type data
When, such issues that be exactly regression forecasting problem.For example, according to the conditional forecasting precipitation such as temperature, humidity, according to somewhere population
Amount, GDP and consumption of resident index predict electric load, utilize the closing price, amount of increase and conclusion of the business of stock price index futures index proxima luce (prox. luc)
Amount carrys out regression forecasting its second day opening price etc..
At present, many outstanding regression models have been used in various actual prediction problems.Common recurrence mould
Type has:Ridge regression (Ridge regression), lasso trick return (Least Absolute Shrinkage and Selection
Operator, LASSO), support vector regression (Support vector regressor), ElasticNet (LASSO and ridge
The mixture of recurrence) etc..These models actually can regard an optimization problem (regular terms master here with regular terms as
If in order to prevent model over-fitting), and the regular parameter in regular terms usually requires to set in advance.Once regular parameter is set
Improper, the prediction effect of model may be excessively poor.In practice, parameter value is specified even from experience, prediction effect may not
Difference, but it is nor best.
For an actual prediction problem, different models has different prediction effects.Even identical model, it is super
Parameter (regular parameter as mentioned above) is different, and prediction accuracy can also differ greatly.There is no a kind of model or a fixation
Parameter always there is optimal prediction accuracy in all data.Therefore, when using certain outstanding regression model come pre-
, it is necessary to cautiously select or adjust hyper parameter when surveying practical problem.
Cross validation (Cross-validation, CV) method is a kind of common general parameter system of selection, and it passes through
It is trained in another part data and is verified to estimate the predictive ability of parameters drag in a part of data, from
And select it considers that the best parameter of predictive ability.However, training and verification process need data to carry out random division, this is just
Bring the uncertainty of estimation error and the nonuniqueness of parameter selection;Other data division, training and verification step increase
The complexity calculated.When computing capability is limited, and actual demand is more urgent (real-time estimate of such as short-term traffic flow), this
Kind parameter optimization method seems unable to do what one wishes.Therefore the hyper parameter of regression model how is efficiently and accurately selected, is effectively to use
Regression model carries out the important foundation of actual prediction.
The content of the invention
The invention aims to solve the above problems, a kind of efficient, accurate parameter optimization method is proposed, so as to carry
Rise the predictive ability of regression model.The present invention abandons " data division, training and checking " pattern of cross validation method, directly exists
Regression model is trained in initial data;Select most preferably to join using the similarity between the training error corresponding to each hyper parameter
Number.Efficiency is so both improved, in turn ensures that the uniqueness of acquired results.
A kind of regression model hyperparameter optimization method of the present invention, comprises the following steps:
Step 1: in all data set { (xi,yi), i=1,2 ..., n on train the hyper parameter to be p successivelylRegression model
M (| p), obtain the L candidate's hyper parameter models trained m (| pl), l=1,2 ..., L };
Step 2, obtain each candidate's hyper parameter model m (| pl) in each sample (xi,yi) on error:
Step 3, calculated direction similarity matrix S=(suv)L×L;
Step 4, the Comparability of l-th of parameter are all row k 2l-k column elements in the similarity matrix of direction
Average value, wherein row sequence number k need 1≤k of satisfaction≤l;Comparability SS (the p of parameters are calculated according to the following formulal):
Wherein, w=min { l, L-l+1 }, u, v ∈ { 1,2 ..., L }, pu,pv∈P;
Step 5, the hyper parameter with minimum Comparability is found, returned as optimized parameter;
Make optimized parameter p*=p1, minimum Comparability SS*=1;For all parameters in P, perform successively as follows
Process:Compare its Comparability SS (pl) with SS* size, if SS (pl) < SS*, then update SS*=SS (pl), p*=
pl;
Last p* is exactly selected optimal parameter.
The advantage of the invention is that:
(1) parameter optimisation procedure provided by the present invention need not be manually set other parameters, and optimization process is disturbed without subjectivity;
(2) optimization process of the invention divides without data, has high efficiency and certainty;
(3) the main calculating section of the inventive method is relatively independent, can be with parallel processing in big data;
(4) efficiency of the inventive method is 6-8 times of cross validation method;
(5) method of the invention is close with the predictive ability of parameter selected by conventional cross validation method.
Brief description of the drawings
Fig. 1 is a kind of flow chart of regression model hyperparameter optimization method of the present invention.
Fig. 2 figures compared with the optimization time of 10 folding cross validations (10FCV) for the present invention.
Fig. 3 is the prediction error of the prediction error of parameter and parameter selected by 10 folding cross validations (10FCV) selected by the present invention
Compare figure.
Embodiment
Below in conjunction with drawings and examples, the present invention is described in further detail.
Hyperparameter optimization problem:Known regression data collection { (xi,yi), i=1,2 ..., n } and regression model m (| p), it is right
Needed in the hyper parameter p of model from L candidate parameter { pl, l=1,2 ..., L in select one of this most suitable data set ginseng
Number p*, that is, to cause regression model m (| p*) that best precision of prediction can be reached on this data set.
Symbol represents:{(xi,yi), i=1,2 ..., n } represent existing regression data collection, wherein xi,yiIs represented respectively
The input vector and output valve of i sample, n are the sample size in data set;M (| p) regression model is represented, wherein p is model
Hyper parameter, its span is the set P={ p of the equal difference comprising L element or Geometric Sequencel, l=1,2 ...,
L};P* is optimal hyper parameter, and p* ∈ P;Positive integer k value is between 1 and l;P is taken for hyper parameterlRegression model m
(·|pl) in i-th of sample (xi,yi) on training error;S represent scale be L × L direction similarity matrix, matrix it is every
Individual element is suv, u, v ∈ { 1,2 ..., L } and p hereu,pv∈P;I () is indicator function, i.e., logic is true return 1, otherwise
Return to 0;SS(pl) represent hyper parameter plComparability;Constant w=min { l, L-l+1 }.
Specifically, the present invention is a kind of regression model hyperparameter optimization method, flow is as shown in figure 1, comprise the following steps:
Initial problem is:For numeric type data collection { (x to be learnedi,yi), i=1,2 ..., n }, there is regression model m
(| p), wherein p is hyper parameter undetermined.Candidate's equal difference/wait than parameter sets is P={ pl, l=1,2 ..., L }.
Step 1: in all data set { (xi,yi), i=1,2 ..., n on train the hyper parameter to be p successivelylRegression model
M (| p), obtain the L candidate's hyper parameter models trained m (| pl), l=1,2 ..., L };Wherein, xi,yiRepresent respectively
The input and output value of i-th of sample, n are the sample size of data set, and the p in regression model m (| p) is super ginseng to be optimized
Number, its candidate parameter set is P={ pl, l=1,2 ..., L }, L is the number of candidate's hyper parameter;
Step 2, obtain each candidate's hyper parameter model m (| pl) in each sample (xi,yi) on error:
Predicted value m (the x of modeli|pl) subtract true output yiAs error
In the case where data, model and parameter are given, what training error was to determine.
Step 3, calculated direction similarity matrix S=(suv)L×L, wherein sequence number index u, v ∈ { 1,2 ..., L }, any two
Individual candidate's hyper parameter pu,pvU row v column elements s in ∈ P, matrix SuvCalculation it is as follows:
Wherein:I () is indicator function, i.e. inequality1 is returned during establishment, otherwise returns to 0;
Assuming that there is L candidate parameter.The direction similarity of any two parameter is corresponding training error symbol (or direction)
Identical frequency.All direction similarities form L × L direction similarity matrix.Obviously, this matrix is symmetrical matrix
And leading diagonal value is 1.
Step 4, the Comparability of l-th of parameter are all row k 2l-k column elements in the similarity matrix of direction
Average value, wherein row sequence number k need 1≤k of satisfaction≤l;Comparability SS (the p of parameters are calculated according to the following formulal):
Wherein, w=min { l, L-l+1 }, u, v ∈ { 1,2 ..., L }, pu,pv∈P;
The Comparability of l-th parameter (l=1,2 ..., L) is all row k 2l-k row in the similarity matrix of direction
The average value (k≤l) of element.For example, the Comparability of the 1st parameter is the column element of direction the 1st row of similarity matrix the 1st
Value;The Comparability of 2nd parameter is being averaged for the column element of direction the 1st row of similarity matrix the 3rd and the column element of the 2nd row the 2nd
Value;The Comparability of 3rd parameter is the column element of direction the 1st row of similarity matrix the 5th, the column element of the 2nd row the 4th and the 3rd row
The average value of 3rd column element, by that analogy.
Step 5, the hyper parameter with minimum Comparability is found, returned as optimized parameter;
Make optimized parameter p*=p1, minimum Comparability SS*=1;For all parameters in P, perform successively as follows
Process:Compare its Comparability SS (pl) with SS* size, if SS (pl) < SS*, then update SS*=SS (pl), p*=
pl;
Last p* is exactly the hyper parameter value after optimization.
The effect of the present invention can be further illustrated by following simulation result.
All regression datas are all from UCI public data collection (http://archive.ics.uci.edu/ml/
Datasets.html), respectively Housing, Energy efficiency, Concrete, MG, Airfoil self-
noise、Yacht Hydrodynamics、Geographical Original of Music、Skill Craft Master
Table、Combined Cycle Power Plant、Condition Based Maintenance.Regression model is using support
Vector regression model, parameter to be optimized are core scale parameter γ, and candidate collection is { 2-5,2-4,···,25}。
Fig. 2 is the comparison of the parameter optimization time on above-mentioned 10 data sets.Due to 10FCV take than this method it is more, this
In counted the ratio that the former parameter optimization time and the latter optimize the time.Both are repeated 10 times, and have little deviation, are drawn in figure
The average value and standard deviation of time ratios are gone out.As seen from the figure, efficiency of the invention is 6-8 times of cross validation method.
Fig. 3 compares the prediction error of the regression forecasting error and parameter selected by 10FCV of parameter selected by this method.For ease of
Compare, a diagonal has been added in figure as reference line.As seen from the figure, on 10 data sets two kinds prediction errors diagonal
Near line.Therefore the prediction error of parameter selected by two methods is more or less the same.
The invention discloses a kind of regressive prediction model parameter optimization method, for being selected from numerous candidate parameters
The best hyper parameter of generalization ability (or predictive ability).The present invention be applied to ordered grid type candidate parameter set, it generally by
One equal difference or Geometric Sequence composition.The principal character of the present invention has:Whole parameter optimisation procedure need not be manually set any ginseng
Number, eliminates subjective disturbing factor;Optimization process divides without data, both improves efficiency, turn avoid data division and brings
Randomness;It is relatively independent that the part of main amount of calculation is accounted in method, can be with parallel processing in big data;This method is selected to join
Number is close with the effect of parameter selected by conventional cross validation method, and efficiency is 6-8 times of the latter.
Claims (1)
1. a kind of regression model hyperparameter optimization method, comprises the following steps:
Step 1: in all data set { (xi,yi), i=1,2 ..., n on train the hyper parameter to be p successivelylRegression model m (
| p), obtain candidate's hyper parameter models that L trains m (| pl), l=1,2 ..., L };Wherein, xi,yiI-th is represented respectively
The input and output value of individual sample, n are the sample size of data set, and the p in regression model m (| p) is hyper parameter to be optimized,
Its candidate parameter set is P={ pl, l=1,2 ..., L }, L is the number of candidate's hyper parameter;
Step 2, obtain each candidate's hyper parameter model m (| pl) in each sample (xi,yi) on error:
<mrow>
<msubsup>
<mi>e</mi>
<mi>i</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>p</mi>
<mi>l</mi>
</msub>
<mo>)</mo>
</mrow>
</msubsup>
<mo>=</mo>
<mi>m</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>|</mo>
<msub>
<mi>p</mi>
<mi>l</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
Predicted value m (the x of modeli|pl) subtract true output yiAs error
Step 3, calculated direction similarity matrix S=(suv)L×L, wherein sequence number index u, v ∈ { 1,2 ..., L }, any two time
Select hyper parameter pu,pvU row v column elements s in ∈ P, matrix SuvCalculation it is as follows:
<mrow>
<msub>
<mi>s</mi>
<mrow>
<mi>u</mi>
<mi>v</mi>
</mrow>
</msub>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<mi>I</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>e</mi>
<mi>i</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>p</mi>
<mi>u</mi>
</msub>
<mo>)</mo>
</mrow>
</msubsup>
<mo>&CenterDot;</mo>
<msubsup>
<mi>e</mi>
<mi>i</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>p</mi>
<mi>v</mi>
</msub>
<mo>)</mo>
</mrow>
</msubsup>
<mo>></mo>
<mn>0</mn>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein:I () is indicator function, i.e. inequality1 is returned during establishment, otherwise returns to 0;
Step 4, the Comparability of l-th parameter are that all row k 2l-k column elements are averaged in the similarity matrix of direction
Value, wherein row sequence number k needs 1≤k of satisfaction≤l;Comparability SS (the p of parameters are calculated according to the following formulal):
<mrow>
<mi>S</mi>
<mi>S</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>p</mi>
<mi>l</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mi>w</mi>
</mfrac>
<munder>
<munder>
<mi>&Sigma;</mi>
<mrow>
<mi>u</mi>
<mo>+</mo>
<mi>v</mi>
<mo>=</mo>
<mn>2</mn>
<mi>l</mi>
</mrow>
</munder>
<mrow>
<mi>u</mi>
<mo>&le;</mo>
<mi>v</mi>
</mrow>
</munder>
<msub>
<mi>s</mi>
<mrow>
<mi>u</mi>
<mi>v</mi>
</mrow>
</msub>
<mo>,</mo>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>3</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein, w=min { l, L-l+1 }, u, v ∈ { 1,2 ..., L }, pu,pv∈P;
Step 5, the hyper parameter with minimum Comparability is found, returned as optimized parameter;
Make optimized parameter p*=p1, minimum Comparability SS*=1;For all parameters in P, following process is performed successively:
Compare its Comparability SS (pl) and SS*Size, if SS (pl) < SS*, then SS is updated*=SS (pl), p*=pl;
Last p*It is exactly selected optimal parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710997220.XA CN107885967A (en) | 2017-10-24 | 2017-10-24 | A kind of regression model hyperparameter optimization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710997220.XA CN107885967A (en) | 2017-10-24 | 2017-10-24 | A kind of regression model hyperparameter optimization method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107885967A true CN107885967A (en) | 2018-04-06 |
Family
ID=61782117
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710997220.XA Pending CN107885967A (en) | 2017-10-24 | 2017-10-24 | A kind of regression model hyperparameter optimization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107885967A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109508455A (en) * | 2018-10-18 | 2019-03-22 | 山西大学 | A kind of GloVe hyper parameter tuning method |
CN109816116A (en) * | 2019-01-17 | 2019-05-28 | 腾讯科技(深圳)有限公司 | The optimization method and device of hyper parameter in machine learning model |
CN110084374A (en) * | 2019-04-24 | 2019-08-02 | 第四范式(北京)技术有限公司 | Construct method, apparatus and prediction technique, device based on the PU model learnt |
CN111723342A (en) * | 2020-06-22 | 2020-09-29 | 杭州电力设备制造有限公司 | Transformer top layer oil temperature prediction method based on elastic network regression model |
CN113053113A (en) * | 2021-03-11 | 2021-06-29 | 湖南交通职业技术学院 | PSO-Welsch-Ridge-based anomaly detection method and device |
-
2017
- 2017-10-24 CN CN201710997220.XA patent/CN107885967A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109508455A (en) * | 2018-10-18 | 2019-03-22 | 山西大学 | A kind of GloVe hyper parameter tuning method |
CN109508455B (en) * | 2018-10-18 | 2021-11-19 | 山西大学 | GloVe super-parameter tuning method |
CN109816116A (en) * | 2019-01-17 | 2019-05-28 | 腾讯科技(深圳)有限公司 | The optimization method and device of hyper parameter in machine learning model |
CN109816116B (en) * | 2019-01-17 | 2021-01-29 | 腾讯科技(深圳)有限公司 | Method and device for optimizing hyper-parameters in machine learning model |
CN110084374A (en) * | 2019-04-24 | 2019-08-02 | 第四范式(北京)技术有限公司 | Construct method, apparatus and prediction technique, device based on the PU model learnt |
CN111723342A (en) * | 2020-06-22 | 2020-09-29 | 杭州电力设备制造有限公司 | Transformer top layer oil temperature prediction method based on elastic network regression model |
CN111723342B (en) * | 2020-06-22 | 2023-11-07 | 杭州电力设备制造有限公司 | Transformer top layer oil temperature prediction method based on elastic network regression model |
CN113053113A (en) * | 2021-03-11 | 2021-06-29 | 湖南交通职业技术学院 | PSO-Welsch-Ridge-based anomaly detection method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107885967A (en) | A kind of regression model hyperparameter optimization method | |
CN104536412B (en) | Photoetching procedure dynamic scheduling method based on index forecasting and solution similarity analysis | |
CN109685252A (en) | Building energy consumption prediction technique based on Recognition with Recurrent Neural Network and multi-task learning model | |
CN107169633A (en) | A kind of gas line network, gas storage peak regulating plan integrated evaluating method | |
CN103810101A (en) | Software defect prediction method and system | |
CN108062302B (en) | A kind of recognition methods of text information and device | |
CN100507460C (en) | Dynamic soft measuring and form establishing method base pulse response formwork and parameter optumization | |
CN109242149A (en) | A kind of student performance early warning method and system excavated based on educational data | |
CN101288089A (en) | Load prediction based on-line and off-line training of neural networks | |
CN106600001B (en) | Glass furnace Study of Temperature Forecasting method based on Gaussian mixtures relational learning machine | |
CN111047085A (en) | Hybrid vehicle working condition prediction method based on meta-learning | |
CN108090788A (en) | Ad conversion rates predictor method based on temporal information integrated model | |
CN106649479A (en) | Probability graph-based transformer state association rule mining method | |
CN106503853A (en) | A kind of foreign exchange transaction forecast model based on multiple scale convolutional neural networks | |
CN111178585A (en) | Fault reporting amount prediction method based on multi-algorithm model fusion | |
CN107644297A (en) | A kind of energy-saving of motor system amount calculates and verification method | |
CN103699947A (en) | Meta learning-based combined prediction method for time-varying nonlinear load of electrical power system | |
CN115146580A (en) | Integrated circuit path delay prediction method based on feature selection and deep learning | |
Nasir | Modern services export performances among emerging and developed Asian economies | |
CN107451684A (en) | Stock market's probability forecasting method based on core stochastic approximation | |
CN105894138A (en) | Optimum weighted composite prediction method for shipment amount of manufacturing industry | |
CN105787265A (en) | Atomic spinning top random error modeling method based on comprehensive integration weighting method | |
CN103605493A (en) | Parallel sorting learning method and system based on graphics processing unit | |
CN105354644A (en) | Financial time series prediction method based on integrated empirical mode decomposition and 1-norm support vector machine quantile regression | |
CN109493921A (en) | A kind of atmospheric distillation process modeling approach based on multi-agent system model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180406 |