CN107885967A

CN107885967A - A kind of regression model hyperparameter optimization method

Info

Publication number: CN107885967A
Application number: CN201710997220.XA
Authority: CN
Inventors: 姜高霞; 王文剑; 杜航原
Original assignee: Shanxi University
Current assignee: Shanxi University
Priority date: 2017-10-24
Filing date: 2017-10-24
Publication date: 2018-04-06

Abstract

The invention discloses a kind of regression model hyperparameter optimization method, including Step 1: it is p to train hyper parameter successively on all data sets_lRegression model, obtain candidate's hyper parameter models that L is trained；Step 2, each candidate's hyper parameter model is obtained in each sample (x_i, y_i) on error；Step 3, calculated direction similarity matrix；Step 4, the Comparability for calculating parameters；Step 5, the hyper parameter with minimum Comparability is found, returned as optimized parameter；Parameter optimisation procedure provided by the present invention need not be manually set other parameters, and optimization process is disturbed without subjectivity；The optimization process of the present invention divides without data, has high efficiency and certainty；The main calculating section of the inventive method is relatively independent, can be with parallel processing in big data；The efficiency of the inventive method is 68 times of cross validation method.

Description

A kind of regression model hyperparameter optimization method

Technical field

The invention belongs to machine learning Optimization Modeling field, and in particular to a kind of hyperparameter optimization side towards regression model Method.

Background technology

Under the background of data and information rapid growth, core driver amount of the machine learning as data mining, As key link indispensable in knowledge extraction process.Rule of thumb between inferred from input data input data and output data Corresponding relation is machine learning a kind of major issue to be solved.When input data and output data are numeric type data When, such issues that be exactly regression forecasting problem.For example, according to the conditional forecasting precipitation such as temperature, humidity, according to somewhere population Amount, GDP and consumption of resident index predict electric load, utilize the closing price, amount of increase and conclusion of the business of stock price index futures index proxima luce (prox. luc) Amount carrys out regression forecasting its second day opening price etc..

At present, many outstanding regression models have been used in various actual prediction problems.Common recurrence mould Type has：Ridge regression (Ridge regression), lasso trick return (Least Absolute Shrinkage and Selection Operator, LASSO), support vector regression (Support vector regressor), ElasticNet (LASSO and ridge The mixture of recurrence) etc..These models actually can regard an optimization problem (regular terms master here with regular terms as If in order to prevent model over-fitting), and the regular parameter in regular terms usually requires to set in advance.Once regular parameter is set Improper, the prediction effect of model may be excessively poor.In practice, parameter value is specified even from experience, prediction effect may not Difference, but it is nor best.

For an actual prediction problem, different models has different prediction effects.Even identical model, it is super Parameter (regular parameter as mentioned above) is different, and prediction accuracy can also differ greatly.There is no a kind of model or a fixation Parameter always there is optimal prediction accuracy in all data.Therefore, when using certain outstanding regression model come pre- , it is necessary to cautiously select or adjust hyper parameter when surveying practical problem.

Cross validation (Cross-validation, CV) method is a kind of common general parameter system of selection, and it passes through It is trained in another part data and is verified to estimate the predictive ability of parameters drag in a part of data, from And select it considers that the best parameter of predictive ability.However, training and verification process need data to carry out random division, this is just Bring the uncertainty of estimation error and the nonuniqueness of parameter selection；Other data division, training and verification step increase The complexity calculated.When computing capability is limited, and actual demand is more urgent (real-time estimate of such as short-term traffic flow), this Kind parameter optimization method seems unable to do what one wishes.Therefore the hyper parameter of regression model how is efficiently and accurately selected, is effectively to use Regression model carries out the important foundation of actual prediction.

The content of the invention

The invention aims to solve the above problems, a kind of efficient, accurate parameter optimization method is proposed, so as to carry Rise the predictive ability of regression model.The present invention abandons " data division, training and checking " pattern of cross validation method, directly exists Regression model is trained in initial data；Select most preferably to join using the similarity between the training error corresponding to each hyper parameter Number.Efficiency is so both improved, in turn ensures that the uniqueness of acquired results.

A kind of regression model hyperparameter optimization method of the present invention, comprises the following steps：

Step 1: in all data set { (x_i,y_i), i=1,2 ..., n on train the hyper parameter to be p successively_lRegression model M (| p), obtain the L candidate's hyper parameter models trained m (| p_l), l=1,2 ..., L }；

Step 2, obtain each candidate's hyper parameter model m (| p_l) in each sample (x_i,y_i) on error：

Step 3, calculated direction similarity matrix S=(s_uv)_L×L；

Step 4, the Comparability of l-th of parameter are all row k 2l-k column elements in the similarity matrix of direction Average value, wherein row sequence number k need 1≤k of satisfaction≤l；Comparability SS (the p of parameters are calculated according to the following formula_l)：

Wherein, w=min { l, L-l+1 }, u, v ∈ { 1,2 ..., L }, p_u,p_v∈P；

Step 5, the hyper parameter with minimum Comparability is found, returned as optimized parameter；

Make optimized parameter p*=p₁, minimum Comparability SS*=1；For all parameters in P, perform successively as follows Process：Compare its Comparability SS (p_l) with SS* size, if SS (p_l) ＜ SS*, then update SS*=SS (p_l), p*= p_l；

Last p* is exactly selected optimal parameter.

The advantage of the invention is that：

(1) parameter optimisation procedure provided by the present invention need not be manually set other parameters, and optimization process is disturbed without subjectivity；

(2) optimization process of the invention divides without data, has high efficiency and certainty；

(3) the main calculating section of the inventive method is relatively independent, can be with parallel processing in big data；

(4) efficiency of the inventive method is 6-8 times of cross validation method；

(5) method of the invention is close with the predictive ability of parameter selected by conventional cross validation method.

Brief description of the drawings

Fig. 1 is a kind of flow chart of regression model hyperparameter optimization method of the present invention.

Fig. 2 figures compared with the optimization time of 10 folding cross validations (10FCV) for the present invention.

Fig. 3 is the prediction error of the prediction error of parameter and parameter selected by 10 folding cross validations (10FCV) selected by the present invention Compare figure.

Embodiment

Below in conjunction with drawings and examples, the present invention is described in further detail.

Hyperparameter optimization problem：Known regression data collection { (x_i,y_i), i=1,2 ..., n } and regression model m (| p), it is right Needed in the hyper parameter p of model from L candidate parameter { p_l, l=1,2 ..., L in select one of this most suitable data set ginseng Number p*, that is, to cause regression model m (| p*) that best precision of prediction can be reached on this data set.

Symbol represents：{(x_i,y_i), i=1,2 ..., n } represent existing regression data collection, wherein x_i,y_iIs represented respectively The input vector and output valve of i sample, n are the sample size in data set；M (| p) regression model is represented, wherein p is model Hyper parameter, its span is the set P={ p of the equal difference comprising L element or Geometric Sequence_l, l=1,2 ..., L}；P* is optimal hyper parameter, and p* ∈ P；Positive integer k value is between 1 and l；P is taken for hyper parameter_lRegression model m (·|p_l) in i-th of sample (x_i,y_i) on training error；S represent scale be L × L direction similarity matrix, matrix it is every Individual element is s_uv, u, v ∈ { 1,2 ..., L } and p here_u,p_v∈P；I () is indicator function, i.e., logic is true return 1, otherwise Return to 0；SS(p_l) represent hyper parameter p_lComparability；Constant w=min { l, L-l+1 }.

Specifically, the present invention is a kind of regression model hyperparameter optimization method, flow is as shown in figure 1, comprise the following steps：

Initial problem is：For numeric type data collection { (x to be learned_i,y_i), i=1,2 ..., n }, there is regression model m (| p), wherein p is hyper parameter undetermined.Candidate's equal difference/wait than parameter sets is P={ p_l, l=1,2 ..., L }.

Step 1: in all data set { (x_i,y_i), i=1,2 ..., n on train the hyper parameter to be p successively_lRegression model M (| p), obtain the L candidate's hyper parameter models trained m (| p_l), l=1,2 ..., L }；Wherein, x_i,y_iRepresent respectively The input and output value of i-th of sample, n are the sample size of data set, and the p in regression model m (| p) is super ginseng to be optimized Number, its candidate parameter set is P={ p_l, l=1,2 ..., L }, L is the number of candidate's hyper parameter；

Predicted value m (the x of model_i|p_l) subtract true output y_iAs error

In the case where data, model and parameter are given, what training error was to determine.

Step 3, calculated direction similarity matrix S=(s_uv)_L×L, wherein sequence number index u, v ∈ { 1,2 ..., L }, any two Individual candidate's hyper parameter p_u,p_vU row v column elements s in ∈ P, matrix S_uvCalculation it is as follows：

Wherein：I () is indicator function, i.e. inequality1 is returned during establishment, otherwise returns to 0；

Assuming that there is L candidate parameter.The direction similarity of any two parameter is corresponding training error symbol (or direction) Identical frequency.All direction similarities form L × L direction similarity matrix.Obviously, this matrix is symmetrical matrix And leading diagonal value is 1.

Wherein, w=min { l, L-l+1 }, u, v ∈ { 1,2 ..., L }, p_u,p_v∈P；

The Comparability of l-th parameter (l=1,2 ..., L) is all row k 2l-k row in the similarity matrix of direction The average value (k≤l) of element.For example, the Comparability of the 1st parameter is the column element of direction the 1st row of similarity matrix the 1st Value；The Comparability of 2nd parameter is being averaged for the column element of direction the 1st row of similarity matrix the 3rd and the column element of the 2nd row the 2nd Value；The Comparability of 3rd parameter is the column element of direction the 1st row of similarity matrix the 5th, the column element of the 2nd row the 4th and the 3rd row The average value of 3rd column element, by that analogy.

Last p* is exactly the hyper parameter value after optimization.

The effect of the present invention can be further illustrated by following simulation result.

All regression datas are all from UCI public data collection (http://archive.ics.uci.edu/ml/ Datasets.html), respectively Housing, Energy efficiency, Concrete, MG, Airfoil self- noise、Yacht Hydrodynamics、Geographical Original of Music、Skill Craft Master Table、Combined Cycle Power Plant、Condition Based Maintenance.Regression model is using support Vector regression model, parameter to be optimized are core scale parameter γ, and candidate collection is { 2^-5,2^-4,···,2⁵}。

Fig. 2 is the comparison of the parameter optimization time on above-mentioned 10 data sets.Due to 10FCV take than this method it is more, this In counted the ratio that the former parameter optimization time and the latter optimize the time.Both are repeated 10 times, and have little deviation, are drawn in figure The average value and standard deviation of time ratios are gone out.As seen from the figure, efficiency of the invention is 6-8 times of cross validation method.

Fig. 3 compares the prediction error of the regression forecasting error and parameter selected by 10FCV of parameter selected by this method.For ease of Compare, a diagonal has been added in figure as reference line.As seen from the figure, on 10 data sets two kinds prediction errors diagonal Near line.Therefore the prediction error of parameter selected by two methods is more or less the same.

The invention discloses a kind of regressive prediction model parameter optimization method, for being selected from numerous candidate parameters The best hyper parameter of generalization ability (or predictive ability).The present invention be applied to ordered grid type candidate parameter set, it generally by One equal difference or Geometric Sequence composition.The principal character of the present invention has：Whole parameter optimisation procedure need not be manually set any ginseng Number, eliminates subjective disturbing factor；Optimization process divides without data, both improves efficiency, turn avoid data division and brings Randomness；It is relatively independent that the part of main amount of calculation is accounted in method, can be with parallel processing in big data；This method is selected to join Number is close with the effect of parameter selected by conventional cross validation method, and efficiency is 6-8 times of the latter.

Claims

1. a kind of regression model hyperparameter optimization method, comprises the following steps：

Step 1: in all data set { (x_i,y_i), i=1,2 ..., n on train the hyper parameter to be p successively_lRegression model m ( | p), obtain candidate's hyper parameter models that L trains m (| p_l), l=1,2 ..., L }；Wherein, x_i,y_iI-th is represented respectively The input and output value of individual sample, n are the sample size of data set, and the p in regression model m (| p) is hyper parameter to be optimized, Its candidate parameter set is P={ p_l, l=1,2 ..., L }, L is the number of candidate's hyper parameter；

Predicted value m (the x of model_i|p_l) subtract true output y_iAs error

Step 3, calculated direction similarity matrix S=(s_uv)_L×L, wherein sequence number index u, v ∈ { 1,2 ..., L }, any two time Select hyper parameter p_u,p_vU row v column elements s in ∈ P, matrix S_uvCalculation it is as follows：

<mrow> <msub> <mi>s</mi> <mrow> <mi>u</mi> <mi>v</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mi>I</mi> <mrow> <mo>(</mo> <msubsup> <mi>e</mi> <mi>i</mi> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>u</mi> </msub> <mo>)</mo> </mrow> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>e</mi> <mi>i</mi> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>v</mi> </msub> <mo>)</mo> </mrow> </msubsup> <mo>></mo> <mn>0</mn> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>

Step 4, the Comparability of l-th parameter are that all row k 2l-k column elements are averaged in the similarity matrix of direction Value, wherein row sequence number k needs 1≤k of satisfaction≤l；Comparability SS (the p of parameters are calculated according to the following formula_l)：

<mrow> <mi>S</mi> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>l</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>w</mi> </mfrac> <munder> <munder> <mi>&Sigma;</mi> <mrow> <mi>u</mi> <mo>+</mo> <mi>v</mi> <mo>=</mo> <mn>2</mn> <mi>l</mi> </mrow> </munder> <mrow> <mi>u</mi> <mo>&le;</mo> <mi>v</mi> </mrow> </munder> <msub> <mi>s</mi> <mrow> <mi>u</mi> <mi>v</mi> </mrow> </msub> <mo>,</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>

Wherein, w=min { l, L-l+1 }, u, v ∈ { 1,2 ..., L }, p_u,p_v∈P；

Make optimized parameter p^*=p₁, minimum Comparability SS^*=1；For all parameters in P, following process is performed successively： Compare its Comparability SS (p_l) and SS^*Size, if SS (p_l) ＜ SS^*, then SS is updated^*=SS (p_l), p^*=p_l；

Last p^*It is exactly selected optimal parameter.