CN109376869A

CN109376869A - A kind of super ginseng optimization system of machine learning based on asynchronous Bayes optimization and method

Info

Publication number: CN109376869A
Application number: CN201811588608.5A
Authority: CN
Inventors: 王建飞; 刘杰; 杨诏; 叶丹; 钟华
Original assignee: Institute of Software of CAS
Current assignee: Institute of Software of CAS
Priority date: 2018-12-25
Filing date: 2018-12-25
Publication date: 2019-02-22
Also published as: WO2020133952A1

Abstract

The present invention relates to a kind of super ginseng optimization system of machine learning based on asynchronous Bayes optimization and methods, comprising: Bayes's optimization module, model parameter pool model, Kmeans cluster module, task scheduling modules, adaptive determining model degree of parallelism module；The present invention efficiently carries out automation to the machine learning under big data environment and adjusts ginseng, efficiently use multiprocessing parallel calculation ability, it is efficient to carry out big data machine learning automation tune ginseng, so that people can better use big data machine learning in production practice.

Description

A kind of super ginseng optimization system of machine learning based on asynchronous Bayes optimization and method

Technical field

The present invention relates to a kind of super ginseng optimization system of machine learning based on asynchronous Bayes optimization and methods.Belong to calculating Machine artificial intelligence field.

Background technique

With the development of cloud computing and big data technology, machine learning techniques become the hot spot of academia and business circles.So And machine learning is related to a large amount of theoretical knowledges, while machine learning model includes quantity of parameters, needs experience ability abundant Design an efficient model.In order to promote machine learning techniques widely to apply, it is effectively reduced and carries out machine learning application Threshold, automatic machinery study (Automatic Machine Learning, abbreviation AutoML) technology is come into being, i.e., logical It crosses and provides automatic technology to each link of machine learning, allow beginner that can also carry out machine learning model training and application.

The core of AutoML is the automation tune ginseng of machine learning model, that is, automatically selects hyper parameter, the selection of hyper parameter To machine learning using extremely important, different hyper parameters directly affect the effect (ratio of machine learning application in production practice Such as predictablity rate), the hyper parameter selection course of machine learning model is as shown in Figure 1, since machine learning model is usually wrapped Containing quantity of parameters, parameter space is huge, how efficiently to carry out adjusting ginseng being a urgent problem to be solved.Currently used tune ginseng Method has: using artificial tune ginseng, Grid search and Random search as the simple parameter adjustment method of representative；To be based on Bayes Method of optimization etc. is the heuristic of representative.The schematic diagram of Grid search and Random search are as shown in Figure 2.

Artificial tune ginseng is a kind of most simple parameter adjustment method for most having artistry again.It, can be in face of a machine learning application Tune ginseng is carried out using the artificial method for adjusting ginseng, so that it is determined that model parameter can be based on experienced machine learning expert Empirical value, which carries out artificial adjust, joins；For unfamiliar machine learning new person, artificial " trial-and-error method " can be carried out and (carried out enough Experiment, a preferable parameter of group model effect can be found).In general, artificial tune ginseng is the process taken time and effort.

Grid search is one of simplest automation parameter adjustment method.The thought of Grid search is simple and direct , user only needs to define one group of parameter value range, and according to certain interval combination parameter, then corresponding training pattern is chosen Select the corresponding parameter of the best model of model evaluation.In general, the parameter combination space of Grid search is larger, for example, for Logistic regression application, it is assumed that have 5 parameters, each parameter has 10 possible values, then entire interblock space will be 10⁵, Being trained to so multi-model will be a very time-consuming process.Since parameter combination space is usually larger, Grid Search is suitable for the very short scene of model training time-consuming, is difficult to play a role in big data scene.

For the deficiency of Grid search, some scholars have studied Random search, press different from Grid search Fixed interval exhaustion parameter combination, Random search it is random select parameter combination.The research table of Bergstra et al. Bright: the effect of Random search will not be poorer than Grid search under normal circumstances.Random search it is random select ginseng Array is closed, and can avoid the mutual redundancy between parameter point to a certain extent.Random search the problem is that: if Certain two parameter point is from closer (for example, Euclidean distance is smaller in space) is obtained, then the two parameter points are exactly to be mutually redundant , search efficiency can be reduced, for high-dimensional feature space (when parameter is more), is easily trapped into some regional area.

The above method is all the carry out parameter space search of violence, and search efficiency is lower, no longer suitable in big data environment With.Bayes's optimization is a kind of optimization algorithm (the sequential model based based on model of serializing Optimization), for Bayes's optimization using trained model information as priori knowledge, guidance generates next parameter Point must can faster obtain best model effect, can greatly accelerate entirely compared to Grid search and Random search Ginseng process is adjusted, is the almost super ginseng optimization method of best machine learning model at present.

Classical Bayes optimizes existing deficiency: optimization process is a serial process, can not efficiently use multimachine Computation capability under big data environment, still remains the lower problem of efficiency so that big data machine learning be difficult to carry out from Dynamicization adjusts ginseng, can not successfully manage big data environment.How classical Bayes to be optimized and carry out parallelization, successfully manages big data Environment preferably carries out social production reality to people to preferably use big data machine learning techniques in production practice It tramples and is of great significance.

Currently, optimizing for the Bayes under distributed environment, a large amount of research work is based primarily upon synchronous Batch and carries out Research.In general, often existing between synchronous execution pattern (Bulk Synchronous Parallel, BSP mode) each task The case where waiting mutually；Each of asynchronous execution mode (Asynchronous Synchronous Parallel, SSP mode) It does not need to wait mutually between business, therefore, it is more efficient that asynchronous execution mode (SSP) compares synchronous execution pattern (BSP).Such as figure Shown in 3, there are three calculate node in Fig. 3, it can be found that in the synchronous mode of left side the execution of task 4,5,6 need to wait task 1, 2,3 end；The execution of task 4,5,6 withouts waiting for the end of task 1,2,3 in the asynchronous mode of right side.In general, identical Number of tasks, asynchronous mode can execute faster compared to synchronous mode to be terminated.

The asynchronous Bayes optimization that Kandasamy K et al. is proposed is a kind of parallel mode to classical Bayes optimization, But the method in paper is responsible for the assessment of a model using each calculate node, this use single model can not effectively Big data is trained, while can not also cope with multiple models convergent scene simultaneously.

The problems such as Bayes's optimization under above-mentioned big data border is lower there are efficiency, so that machine learning under big data environment Automation adjusts the availability of ginseng technology lower.

Summary of the invention

The technology of the present invention solves the problems, such as: adjusting ginseng for being difficult to carry out machine learning automation under big data environment this is asked Topic overcomes the deficiencies of the prior art and provide a kind of super ginseng optimization system and method based on asynchronous Bayes optimization, to big data Machine learning under environment efficiently carries out automation and adjusts ginseng, efficiently uses multiprocessing parallel calculation ability, efficient count greatly It is automated according to machine learning and adjusts ginseng, so that people can better use big data machine learning in production practice.

A kind of the technology of the present invention solution: the super ginseng optimization method of machine learning based on asynchronous Bayes optimization, comprising: Bayes's optimization module, model parameter pool model, Kmeans cluster module, task scheduling modules, adaptive determining model are parallel Spend module；

Bayes's optimization module realizes Bayesian Optimization Algorithm, generates candidate parameter point；Receive model parameter pond module Or after the signal of Kmeans cluster module, model parameter point and corresponding model-evaluation index, update in Access Model parameter pond Bayes's optimization；Get interface is provided for model (machine learning model) parameter (model hyper parameter, such as learning rate, regularization system The parameters such as number) pond module calls directly；For multi-model while convergent scene, GetBatch interface is provided and is clustered for Kmeans Module is called；GetBatch interface realizes following algorithm: random generation L (L > 10000) a parameter point, calculates L parameter The corresponding EI of point (revenue function value, Bayes optimize the standard for generating candidate parameter point) value, find l (l>200and l< 1000) a maximum parameter point of EI value executes gradient descent algorithm since each parameter point, finds local best points return；

Model parameter pond module, be responsible for model parameter point management work, specifically include: obtain model hyper parameter point, Parameter point in model parameter pond is supplied to computing cluster and the functions such as uses by parameter point replacement in model parameter pond.Model ginseng Number pond mainly passes through an array and realizes, model parameter point is abstracted into parameter point object, provides Push and Pull interface for meter Calculate cluster and the interaction of model parameter pond.Model parameter pond obtains model parameter point from Bayes's optimization module by Get interface, leads to It crosses GetBatch interface and obtains multiple groups inequality parameter point from Kmeans cluster module.Parameter point in model parameter pond can be counted Cluster (Spark cluster) Pull is calculated, can receive the model-evaluation index of computing cluster Push.

Kmeans cluster module, Kmeans cluster are mainly used to generate the parameter point of multiple inequalities；By model parameter pond mould Block calls, and receives the signal for generating k inequality parameter point；Bayes's optimization module is called to generate K (typically larger than k) a original time Select parameter point；It is clustered by Kmeans and candidate parameter point is polymerized to k class, then select the maximum parameter of revenue function value in k class Point returns result to model parameter pond module to generate the parameter point of k inequality.

Task scheduling modules, whether the model in the module of judgment models parameter pond should deconditioning.Specifically, including: Model convergence and Early Stopping algorithm.Whether model convergence, computation model precision reach the threshold value being set in advance, If reaching threshold value, restrain, it is otherwise, not converged；Early Stopping calculate first history training pattern to should The mean value E (P) of the model-evaluation index P of preceding iteration wheel number, if "current" model evaluation index p < E (P) * 0.9, stops instructing Practice, otherwise, continues to train.The module is main and model parameter pond interacts, and parameter corresponds to model in judgment models parameter pond State, send a signal to model parameter pond module.

Adaptively cover half type degree of parallelism module really, the degree of parallelism of model in adaptive determination computing cluster.This module The main computational efficiency for passing through the corresponding model parameter pond of experimental evaluation difference model parameter pond size, obtains optimal computed performance Corresponding model parameter pond size.Specifically, calculating the time-consuming that different model parameter ponds size executes a wheel model iteration, does and return One change processing, more time-consuming length are repeated to test repeatedly, be held to obtain best model in order to avoid the influence of enchancement factor The corresponding model parameter pond size of row performance.This module is mainly called by model parameter pond module, and initialization model parameter is used to Pond size.

A kind of super ginseng optimization method of machine learning based on asynchronous Bayes optimization of the invention, comprising the following steps:

(1) adaptive determining model degree of parallelism module is executed, so that it is determined that computing cluster best model degree of parallelism, by result Pass to model parameter pond module；

(2) model parameter pond module does some initial works, such as model parameter pond size (being assumed to be n)；

(3) Bayes's optimization module does some initial works, for example initialization Bayes optimizes hyper parameter space configuration, Bayes's Optimized Iterative wheel number etc.；

(4) model parameter pond module calls Kmeans cluster module to be filled into mould to generate n initial parameter point Shape parameter pond；

(5) computing cluster corresponds to model to the parameter in the module of model parameter pond and carries out a wheel model iteration, and model is commented Valence index is sent to model parameter pond module；

(6) scheduler module judges that parameter corresponds to model and is according to parameter point and model-evaluation index in the module of model parameter pond It is no should deconditioning, if should stop, to model parameter pond module send deconditioning signal；

(7) if model parameter pond module receives the model stop signal of scheduler module, the model parameter that please be looked for novelty Point (if a model stops, requesting Bayes's optimization module to generate a parameter point；If multiple models stop, asking Kmeans cluster module is asked to generate multiple parameters point), parameter point is used by computing cluster in the module of model parameter pond, starts a wheel Model training repeats the above process until reaching Bayes optimizes outage threshold (Bayes's Optimized Iterative wheel number).

The advantages of the present invention over the prior art are that: super ginseng optimization method Grid Search of present mainstream, That there are efficiency is lower by Random Search, needs a large amount of computing resource；Didactic Bayes's optimization method presence can only go here and there The deficiencies of row executes, and can not efficiently use multiprocessing parallel calculation ability under distributed environment.Which results under big data environment, very Hardly possible carries out the super ginseng optimization of machine learning, asynchronous Bayes's optimization method proposed by the present invention, by based on the different of model parameter pond Step model training can be very good to realize the super ginseng optimization of asynchronous parallel, while it is efficient to retain Bayes's optimization itself again Super ginseng optimization efficiency.The present invention can efficiently use multimachine computing capability under distributed environment, allow engineering under big data environment Practise automation adjust ginseng is possibly realized, thus be conducive to people social production practice in better by big data machine learning into The analysis of row data, mining data value.

Detailed description of the invention

Fig. 1 is that model selects and adjust ginseng process schematic；

Fig. 2 is Grid searchda (left figure) and Random search (right figure) schematic diagram；

Fig. 3 is synchronous to execute (left figure) and asynchronous execution figure (right figure)；

Fig. 4 is the overall framework figure of present system；

Fig. 5 is model parameter pond module diagram in the present invention；

Fig. 6 is Kmeans cluster module schematic diagram in the present invention；

Fig. 7 is the task scheduling modules implementation flow chart in the present invention；

Fig. 8 is the adaptive cover half type degree of parallelism module implementation flow chart really in the present invention.

Specific embodiment

Technical solution of the present invention can be expressed as Fig. 4, specifically include that Bayes's optimization module, model parameter pool model, Kmeans cluster module, task scheduling modules, adaptive determining model degree of parallelism module.Pass through the collaboration work of above-mentioned several modules Make, the super ginseng optimization method of the machine learning based on Bayes's optimization proposed by the present invention may be implemented.

In above-mentioned module, Bayes's optimization module:

Bayes's optimization module is basic technology of the invention, this module mainly realizes the excellent method of Bayes, Bayes Relationship modeling between model-evaluation index and parameter point can produce more meaningful parameter point by optimization.In the present invention, shellfish Ye Si optimization is responsible for generating candidate parameter point, receives feedback information (parameter point and corresponding model from model parameter pond module Evaluation index).

In above-mentioned module, model parameter pond module:

Model parameter pond is one of key technology of the invention, and building block technique figure is as shown in Figure 5.It is responsible for mould in model parameter pond The management of shape parameter point receives the parameter point generated from Bayes's optimization module, directly receives shellfish for single model parameter point The parameter point that this optimization module of leaf generates is optimized by Bayes generate multiple groups candidate parameter first for multiple model parameter points Then point generates the parameter point of multiple groups inequality by Kmeans cluster module.

Model parameter pond module provides Push and Pull interface, and computing cluster (Spark cluster) can be with Pull model parameter Point, is then trained；After model convergence, computing cluster Push model-evaluation index to model parameter pond.Since model is joined Number is different, and the reasons such as randomness of machine learning model, model itself usually has the different training times, is based on this, so that it may To realize efficient asynchronous parallel tune ginseng.

In above-mentioned module, Kmeans cluster module:

Kmeans cluster module is the key that one of of the invention.Machine learning model declines especially with gradient and carries out The model of solution, such as the models such as logistic regression and support vector machines, in general, these models can be received by tens of wheel iteration It holds back, when model training, calculate node can correspond to model to parameter in model parameter pond and carry out a wheel iteration, then judgement convergence Property, at this moment the case where this there is multiple models while restraining optimized using Bayes and generates multiple groups candidate parameter point and will lead It sends one's regards and selects the mutual redundancy of parameter point, automated so as to cause entire machine learning and adjust ginseng efficiency lower.

This module mainly realizes Kmeans clustering algorithm, receives Bayes and optimizes the original time candidate parameter of multiple groups generated Then point carries out Kmeans clustering processing, generate k inequality and make the parameter point of revenue function larger (such as EI function), Above-mentioned parameter point is filled into the module of model parameter pond.

Kmeans cluster generates multiple groups candidate parameter point schematic diagram and (shows tetra- parameters of A, B, C, D in figure as shown in Figure 6 Point, actual parameter point are more), horizontal axis representation parameter point, the longitudinal axis represents revenue function value, it is assumed that three models are restrained simultaneously, directly Connecing will likely be generated parameter point A, B, C (corresponding financial value is larger) using Bayes's optimization three groups of candidate parameter points of generation, but It is that there are mutual redundancy, parameter point distances smaller (such as Euclidean distance) by parameter point A and B, it will reduce the tune of Bayes's optimization Join efficiency.It is clustered by using Kmeans, candidate parameter point A, B, C and D is clustered, theoretically, A and B will be converged to together In one kind, the biggish parameter point A of one of financial value is returned, parameter point A, C and D of inequality will be generated in this way, to mention The efficiency of high Bayes's optimization.

In above-mentioned module, task scheduling modules:

Task scheduling modules are one of important modules of the invention.This module mainly carries out the model in model parameter pond Convergence judgement, task scheduling modules include model convergence and Early Stopping technology two parts.

Model convergence, which mainly passes through computation model precision and whether reaches the threshold value being set in advance, to be judged, if reached It is then restrained to threshold value, it is otherwise, not converged.

In some machine learning applications, in machine learning model training process, some performance-relevant information are exactly can ?.Especially when model training is iteration, performance curve (Performance curve) is just available, such as The model solved using gradient decline, with trained progress, often model accuracy rate (Accuracy) is higher and higher, and every When one wheel iteration (Epoch) terminates, available model accuracy rate.Using the curve between accuracy rate and train epochs, It may determine that currently whether trained model is possible to more preferable than known best model effect, for that can not obtain known to ratio The better model of best model effect can terminate the training of model, discharge corresponding computing resource, in time to more go Assess promising model.Based on the algorithm of above-mentioned thought, referred to as Early Stopping algorithm.By using Early Stopping technology can effectively accelerate entire machine learning automation to adjust ginseng process.

In above-mentioned module, adaptive cover half type degree of parallelism module really:

Adaptively cover half type degree of parallelism module is one of important module of the invention really.Model degree of parallelism refers to calculating The Number of Models being performed simultaneously in cluster, model degree of parallelism directly affect the calculated performance of entire cluster, and model degree of parallelism is set Set excessive, the too small calculated performance that can all influence entire computing cluster, this module can adaptive cover half type degree of parallelism really.

For Spark platform machine learning model, adaptively the working principle of cover half type degree of parallelism module is really: passing through The computational efficiency in the corresponding model parameter pond of experimental evaluation difference model parameter pond size, so that it is corresponding to obtain optimal computed performance Model parameter pond size, comprise the concrete steps that: calculate different model parameter ponds size execute one wheel model iteration time-consuming, then Time normalization is done, more time-consuming length is repeated to test repeatedly (such as 3 times), be found most in order to avoid the influence of enchancement factor The corresponding model parameter pond size of good model execution performance.Compared to the entire time-consuming for adjusting ginseng process, above process time-consuming is micro- It is inappreciaple.

Below in conjunction with specific embodiments and the drawings, the present invention is described in detail.

Present example using Python as programming language, which use big data processing platform Spark and MLlib distributed machines learning database based on Spark, for the super ginseng optimization problem of machine learning under big data environment.Below It is specifically addressed by taking the classical machine learning model logistic regression being commonly used under big data environment as an example.

As shown in figure 4, the method is specifically implemented by the following steps:

1. Bayes's optimization module

This module needs to realize Bayesian Optimization Algorithm, and Bayes, which optimizes, needs an initial parameter area, mentions simultaneously It is as follows for parameter space configuration interface:

For Logic Regression Models, main hyper parameter has: maxIter: model iteration wheel number, regParam: regularization Coefficient, tol: model convergence precision etc..One parameter space range of initial configuration is needed, is configured as follows:

Parameter name	Meaning of parameters	Use interface
			maxIter	Model iteration wheel number	hp.randint(‘maxIter’,100)
regParam	Regularization coefficient	hp.loguniform(‘regParam’,1e-3,1e+2)
			tol	Convergence precision	hp.loguniform(‘tol’,1e-4,1e-6)

When model parameter pond needs one group of parameter, Bayes's optimization module directly generates one group of parameter feedback to mould Shape parameter pond module (is degenerated to classical Bayesian Optimization Algorithm)；When model parameter pond needs multiple groups parameter (assuming that For K), Bayes, which optimizes, generates the original candidate parameter point of multiple groups, and the parameter point of K group inequality is generated by Kmeans cluster module Feed back to model parameter pond module.

2. model parameter pond module

As shown in figure 5, this module mainly realizes model parameter pond, it is responsible for the management work of model parameter.Entire machine Learning automaton tune ginseng this module of process is primarily present three phases: initial phase, first stage and second stage.

Initial phase includes: to execute the adaptive determining big little module in model parameter pond, determines that model parameter pond size is (false It is set as k)；

First stage includes: to call Bayes's optimization module, and random generation k group parameter point is filled into model parameter pond； The corresponding model of parameter in calculate node training pattern parameter pond；It is excellent that parameter point and corresponding model evaluation are fed back into Bayes Change, the Gaussian process of initialization Bayes's optimization；

Second stage includes: execution task scheduling modules, and whether the corresponding model of parameter restrains in judgment models parameter pond, The Number of Models m of Statistical Convergence, does respective markers in model parameter pond；Task scheduling modules are executed, according to Early The corresponding model of parameter in Stopping algorithm judgment models parameter pond whether should deconditioning, statistics should deconditioning Number of Models n, do respective markers in model parameter pond；Calculate node calculates above-mentioned m+n model in test data set Model-evaluation index；Record best model and corresponding best model evaluation index；M+n that above-mentioned training is completed Model feedback optimizes to Bayes, updates Gaussian process；Optimize in conjunction with Bayes, is clustered using Kmeans and generate m+n group candidate Parameter point fills model parameter pond；Calculate node carries out a wheel model training to the corresponding model of parameter in model parameter pond It updates；Circulation executes the above process, until reaching specified optimization wheel number, returns to best model, best model evaluation index.

3.Kmeans cluster module

As shown in fig. 6, Kmeans cluster module realizes a kind of Kmeans algorithm, mainly to being produced in Bayes's optimization module Raw multiple groups initial parameter point carries out cluster operation, to generate the parameter point of K group inequality.

A kind of algorithm of the multiple candidate parameter points of generation based on Kmeans mainly comprises the following processes: random generation L (L > 10000) a parameter point；Calculating the corresponding EI of L parameter point, (revenue function value, Bayes, which optimizes, generates candidate parameter point Standard) value, finds a maximum parameter point of EI value of l (l<1000 l>200and)；Gradient decline is executed since each parameter point Algorithm finds local best points；Kmeans clustering algorithm is carried out to above-mentioned l local best points, and it is maximum to return to EI value Parameter point.

It clusters target: if candidate parameter point is closer (for example Euclidean distance is smaller), will form redundancy, reduce and adjust Join efficiency, it is desirable to obtain multiple inequalities and make the biggish parameter point of EI value.

Cluster data: using parameter point parameter value and revenue function value as feature, be normalized (avoid feature it Between scale it is inconsistent caused by cluster Problem of Failure), sample point is polymerized to k (parameter point number to be generated) class.

Cluster result: from every one kind of cluster result, select one the maximum parameter point of revenue function value is returned.

As an example with Fig. 6, original candidate parameter point includes L parameter point: A, B, C, D, clusters and produces by Kmeans Given birth to A, B, C, D three classes find and make the maximum parameter point of revenue function value in every one kind, A, C, D, then A, C, D inequality and So that revenue function value is larger.

4. task scheduling modules

Task scheduling modules realize Early Stopping algorithm.This module receives the parameter point in model parameter pond The judgement of holding back property.

As soon as iteration is taken turns in the every progress of model in model parameter pond, task scheduling modules will do it convergence judgement, task Scheduler module mainly according to the model in convergence precision judgment models parameter pond whether should deconditioning, it is whole in order to effectively accelerate A tune ginseng process can prejudge whether model is possible to obtain most by the performance curve in Early Stoping technology Good modelling effect can terminate the training that can not obtain the model of best model effect in time, to start next group of parameter The training of point.

As shown in fig. 7, task scheduling modules obtain the data in model parameter pond: model parameter point, model-evaluation index Deng, first according to model convergence judgment models whether should deconditioning, computation model precision w judges whether to reach model The threshold value W of setting restrains if reaching threshold value；Otherwise, not converged.If above-mentioned judging result be it is not converged, continue to make Judged with Early Stopping algorithm, calculates model-evaluation index P of the model in current iteration wheel number of trained mistake Mean value E (P), if "current" model evaluation index p < E (P) * 0.9, then it is assumed that the training of the model should be terminated.

5. adaptive cover half type degree of parallelism module really

As shown in figure 8, this module mainly realizes a kind of algorithm of adaptive cover half type degree of parallelism really.It is responsible for determining mould The fair-sized in shape parameter pond.

It for Logic Regression Models, comprises the concrete steps that: calculating different model parameter ponds size and execute a wheel model iteration Time-consuming, then does time normalization, more time-consuming length, in order to avoid the influence of enchancement factor, repeat to test repeatedly (such as 3 It is secondary), find the corresponding model parameter pond size of best model execution performance.Compared to the entire time-consuming for adjusting ginseng process, above-mentioned mistake Journey time-consuming is inappreciable.

It such as Logic Regression Models, comprises the concrete steps that: determining an initial model parameter pond magnitude range, such as I=1~e (e represents maximum model parameter pond size configuration, e > 1), is dimensioned to i for model parameter pond, repeats a wheel Model iteration (in order to avoid enchancement factor, repeats 3 times) three times.Circulation executes the above process, is sorted according to time-consuming length, Time-consuming shortest i is returned as model parameter pond size.

Above embodiments are provided just for the sake of the description purpose of the present invention, and are not intended to limit the scope of the invention.This The range of invention is defined by the following claims.It does not depart from spirit and principles of the present invention and the various equivalent replacements made and repairs Change, should all cover within the scope of the present invention.

Claims

1. a kind of super ginseng optimization system of machine learning based on asynchronous Bayes optimization, it is characterised in that: including Bayes's optimization Module, model parameter pool model, Kmeans cluster module, task scheduling modules, adaptive determining model degree of parallelism module；

Bayes's optimization module realizes Bayesian Optimization Algorithm, generates candidate parameter point；Receive model parameter pond module or After the signal of Kmeans cluster module, model parameter point and corresponding model-evaluation index, update pattra leaves in Access Model parameter pond This optimization；Get interface is provided to call directly for model parameter pond module；For multi-model while convergent scene, provide GetBatch interface is called for Kmeans cluster module；GetBatch interface realizes following algorithm: random L parameter of generation Point, L > 10000 calculate the corresponding EI value of L parameter point, i.e. revenue function value, find the l maximum parameter point of EI value, from every A parameter point starts to execute gradient descent algorithm, finds local best points；l>200andl<1000；

Model parameter pond module, is responsible for the management work of model parameter point, specifically includes: to obtain model hyper parameter point, model Parameter point in model parameter pond is supplied to computing cluster and uses function by parameter point replacement in parameter pond；Model parameter pond is logical It crosses an array to realize, model parameter point is abstracted into parameter point object, provides Push and Pull interface for computing cluster and mould The interaction of shape parameter pond；Model parameter pond module calls Bayes's optimization module Get interface to obtain model parameter point, passes through GetBatch interface obtains multiple groups inequality parameter point from Kmeans cluster module；Parameter point in the module of model parameter pond is calculated Cluster Pull receives the model-evaluation index of computing cluster Push；

Kmeans cluster module generates the parameter point of multiple inequalities by Kmeans cluster；It is called, connects by model parameter pond module Receive the signal for generating k inequality parameter point；Bayes's optimization module is called to generate multiple original candidates parameter points；Pass through Kmeans Candidate parameter point is polymerized to k class by cluster, the maximum parameter point of revenue function value in k class is then selected, to generate k inequality Parameter point returns result to model parameter pond module；K is greater than k；

Task scheduling modules, whether the model in the module of judgment models parameter pond should deconditioning；It specifically includes: model convergence Property and Early Stopping algorithm；Model convergence, whether computation model precision reaches the threshold value being set in advance, if reached Threshold value then restrains, otherwise, not converged；Early Stopping algorithm is realized are as follows: training pattern is corresponding for calculating history first The mean value E (P) of the model-evaluation index P of current iteration wheel number, if "current" model evaluation index p < E (P) * 0.9, stops instructing Practice, otherwise, continues to train；The task scheduling modules and model parameter pond module interact, in the module of judgment models parameter pond Parameter corresponds to the state of model, sends a signal to model parameter pond module；

Adaptively cover half type degree of parallelism module really, the degree of parallelism of model in adaptive determination computing cluster；It is commented by experiment Estimate the computational efficiency in the corresponding model parameter pond of different model parameter ponds size, obtains the corresponding model parameter of optimal computed performance Pond size；The module is called by model parameter pond module, is used to initialization model parameter pond size.

2. a kind of super ginseng optimization method of machine learning based on asynchronous Bayes optimization, it is characterised in that: the following steps are included:

(1) adaptive determining model degree of parallelism module is executed, so that it is determined that computing cluster best model degree of parallelism, result is transmitted Give model parameter pond module；

(2) model parameter pond module carries out initial work, including model parameter pond size；

(3) Bayes's optimization module carries out initial work, including initialization Bayes optimizes hyper parameter space configuration, Bayes Optimized Iterative wheel number；

(4) model parameter pond module calls Kmeans cluster module, to generate n initial parameter point, is filled into model ginseng In number pond module；

(5) computing cluster corresponds to model to the parameter in the module of model parameter pond and carries out a wheel model iteration, and model evaluation is referred to Mark is sent to model parameter pond module；

(6) scheduler module judges that parameter corresponds to whether model answers according to parameter point and model-evaluation index in the module of model parameter pond The deconditioning sends deconditioning signal to model parameter pond module if should stop；

(7) if model parameter pond module receives the model stop signal of scheduler module, the model parameter point that please be look for novelty, such as One model of fruit stops, then Bayes's optimization module is requested to generate a parameter point；If multiple models stop, requesting Kmeans cluster module generates multiple parameters point；Parameter point is used by computing cluster in the module of model parameter pond, starts a wheel model Type training repeats the above process until reaching Bayes optimizes outage threshold, i.e. Bayes's Optimized Iterative wheel number.