WO2020133952A1

WO2020133952A1 - Asynchronous bayesian optimization-based machine learning super-parameter optimization system and method

Info

Publication number: WO2020133952A1
Application number: PCT/CN2019/091485
Authority: WO
Inventors: 刘杰; 王建飞; 杨诏; 叶丹; 钟华
Original assignee: 中国科学院软件研究所
Priority date: 2018-12-25
Filing date: 2019-06-17
Publication date: 2020-07-02
Also published as: CN109376869A

Abstract

The present invention relates to an asynchronous Bayesian optimization-based machine learning super-parameter optimization system and method. The system comprises: a Bayesian optimization module, a model parameter pool model, a Kmeans clustering module, a task scheduling module, and an adaptive determining model parallelism module. The present invention efficiently performs automatic parameter adjustment on machine learning in a big data environment, effectively uses multi-host parallel computing capability, and efficiently performs automatic parameter adjustment for big data machine learning, so that people can better use big data machine learning in production practice.

Description

Machine learning superparameter optimization system and method based on asynchronous Bayesian optimization

Technical field

The invention relates to a machine learning hyperparameter optimization system and method based on asynchronous Bayesian optimization. Belongs to the field of computer artificial intelligence.

Background technique

With the development of cloud computing and big data technology, machine learning technology has become a hot spot in academia and enterprises. However, machine learning involves a lot of theoretical knowledge. At the same time, machine learning models contain a lot of parameters. It requires a lot of experience to design an efficient model. In order to promote the wider application of machine learning technology and effectively lower the threshold for developing machine learning applications, automatic machine learning (Automatic Machine Learning (AutoML) technology) came into being, that is, by providing automation technology to all aspects of machine learning, beginners can also Machine learning model training and application can be carried out.

The core of AutoML is the automatic tuning of machine learning models, that is, the automatic selection of hyperparameters. The selection of hyperparameters is very important for machine learning applications. Different hyperparameters directly affect the effect of machine learning applications in production practice (such as prediction accuracy, etc.) ), the hyper-parameter selection process of the machine learning model is shown in Figure 1. Because the machine learning model usually contains a large number of parameters and the parameter space is huge, how to efficiently adjust the parameters is an urgent problem to be solved. At present, the commonly used parameter adjustment methods are: simple parameter adjustment methods represented by manual parameter adjustment, Grid search and Random search; heuristic methods represented by Bayesian optimization-based methods. The schematic diagram of Grid search and Random search is shown in Figure 2.

Manual tuning is one of the simplest and most artistic tuning methods. In the face of a machine learning application, you can use manual tuning to adjust the parameters to determine the model parameters. For experienced machine learning experts, you can manually adjust the parameters based on the experience value; for novice machine learning newcomers, you can perform Manual "trial and error method" (carry out enough experiments to find a set of parameters with good model effect). Generally, manual adjustment is a time-consuming and labor-intensive process.

Grid search is one of the simplest automated parameter adjustment methods. The idea of Grid search is simple and straightforward. Users only need to define a range of parameter values, combine the parameters at a certain interval, correspond to the training model, and then select the parameters corresponding to the model with the best model evaluation. Generally, the parameter combination space of Grid search is large. For example, for logistic regression applications, assuming there are 5 parameters, each parameter has 10 possible values, then the entire combination space will be 10 ⁵ , so many models are trained It will be a very time-consuming process. Because the parameter combination space is usually large, Grid search is suitable for scenarios where model training takes a very short time, and it is difficult to play a role in large data scenarios.

In response to the shortage of Grid search, some scholars have studied Random search. Unlike Grid search, which exhausts parameter combinations at fixed intervals, Random search randomly selects parameter combinations. The research of Bergstra et al. shows that under normal circumstances Random search will not be worse than Grid search. Random search Random selection of parameter combinations can avoid mutual redundancy between parameter points to a certain extent. The problem with Random search is that if certain two parameter points are relatively close (for example, the Euclidean distance in space is small), then these two parameter points are redundant with each other, which will reduce the search efficiency. For high-dimensional features Space (when there are many parameters), it is easy to fall into a local area.

The above methods are violently performing parameter space search, and the search efficiency is low, and it is no longer applicable in a big data environment. Bayesian optimization is a serialized model-based optimization algorithm. Bayesian optimization uses the trained model information as a priori knowledge to guide the generation of the next parameter point, which can be obtained faster. The best model effect, compared to Grid search and Random search, can greatly speed up the entire parameter adjustment process, and is currently the best method for super-parameter optimization of machine learning models.

The shortcomings of classic Bayesian optimization are: the optimization process is a serial process, which cannot effectively use the parallel computing capabilities of multiple machines. In the big data environment, there are still problems of low efficiency, which makes it difficult to automate the adjustment of big data machine learning. Participants cannot effectively cope with the big data environment. How to parallelize the classic Bayesian optimization and effectively respond to the big data environment, so as to better use big data machine learning technology in production practice, is of great significance to people's better social production practice.

At present, for Bayesian optimization in a distributed environment, a lot of research work is mainly based on synchronous Batch. Generally, there is a situation where each task in the synchronous execution mode (Bulk Synchronous Parallel, BSP mode) waits for each other; the asynchronous execution mode (Asynchronous Synchronous Parallel, SSP mode) does not need to wait for each other between tasks, therefore, the asynchronous execution mode ( SSP) is more efficient than synchronous execution mode (BSP). As shown in Figure 3, there are three computing nodes in Figure 3, you can find that the execution of

tasks

4, 5, 6 in the left synchronous mode needs to wait for the end of

tasks

1, 2, 3; the

tasks

4, 5, The execution of 6 does not need to wait for the end of

tasks

1, 2, and 3. Generally, for the same number of tasks, asynchronous mode can finish execution faster than synchronous mode.

The asynchronous Bayesian optimization proposed by Kandasamy K and others is a parallel method for classical Bayesian optimization, but the method in the paper uses each computing node to be responsible for the evaluation of a model, which makes it impossible for a single model to effectively use big data. At the same time, it is unable to cope with the scenario where multiple models converge at the same time.

The Bayesian optimization in the above big data environment has problems such as low efficiency, which makes the availability of machine learning automation parameter adjustment technology in the big data environment low.

Summary of the invention

The technology of the present invention solves the problem: Aiming at the problem that it is difficult to carry out machine learning automatic parameter adjustment in the big data environment, to overcome the shortcomings of the existing technology, a super parameter optimization system and method based on asynchronous Bayesian optimization is provided, The machine learning under the high-efficiency automatic parameter adjustment, effectively use the parallel computing capabilities of multiple machines, and the high-efficiency automatic parameter adjustment of big data machine learning, so that people can better use big data machine learning in production practice.

Technical solution of the present invention: A machine learning hyperparameter optimization system based on asynchronous Bayesian optimization, including: Bayesian optimization module, model parameter pool model, Kmeans clustering module, task scheduling module, adaptive determination of model parallelism Module

Bayesian optimization module, implement Bayesian optimization algorithm, generate candidate parameter points; provide get interface for model (machine learning model) parameters (model hyperparameters, such as learning rate, regularization coefficient and other parameters) pool module directly call; In the scenario where multiple models converge at the same time, the GetBatch interface is provided for the Kmeans clustering module to call; the GetBatch interface implements the following algorithm: randomly generates L (L>10000) parameter points, and calculates the EI (revenue function value corresponding to the L parameter points). Bayesian optimization produces standard values of candidate parameter points, finds l (l>200 and l<1000) parameter points with the largest EI value, and executes the gradient descent algorithm from each parameter point to find the local best;

The model parameter pool module is responsible for the management of model parameter points, including: acquiring model hyper-parameter points, replacing parameter points in the model parameter pool, and providing parameter points in the model parameter pool to the computing cluster for use. The model parameter pool is mainly realized by an array, which abstracts the model parameter points into parameter point objects, and provides Push and Pull interfaces for interaction between the computing cluster and the model parameter pool. The model parameter pool obtains model parameter points from the Bayesian optimization module through the Get interface, and obtains multiple sets of mutually different parameter points from the Kmeans clustering module through the GetBatch interface. The parameter points in the model parameter pool can be pulled by the computing cluster (Spark cluster) and can receive the model evaluation index of the computing cluster Push.

Kmeans clustering module, which generates multiple different parameter points through Kmeans clustering; it is called by the model parameter pool module and receives signals that generate k different parameter points; calling the Bayesian optimization module to generate K (usually greater than k) Original candidate parameter points; the candidate parameter points are clustered into k categories through Kmeans clustering, and then the parameter points with the largest revenue function value in the k categories are selected, thereby generating k mutually different parameter points, and returning the results to the model parameter pool module.

The task scheduling module determines whether the model in the model parameter pool module should stop training. Specifically, it includes: model convergence and Early Stopping algorithm. Model convergence, calculate whether the accuracy of the model reaches the threshold set in advance, if it reaches the threshold, it will converge, otherwise, it does not converge; Early Stopping first calculates the average E(P ), if the current model evaluation index p <E (P) * 0.9, then stop training, otherwise, continue training. This module mainly interacts with the model parameter pool, judges the state of the model corresponding to the parameters in the model parameter pool, and sends a signal to the model parameter pool.

Adaptively determine the model parallelism module, adaptively determine the parallelism of the models in the computing cluster. This module mainly evaluates the calculation efficiency of model parameter pools corresponding to different model parameter pool sizes through experiments to obtain the model parameter pool size corresponding to the best calculation performance. Specifically, it calculates the time-consuming time to perform a round of model iteration for different model parameter pool sizes, and then normalizes the time to compare the length of time. In order to avoid the influence of random factors, the experiment is repeated multiple times to obtain the best model execution performance correspondence Model parameter pool size. This module is mainly used by the model parameter pool to initialize the model parameter pool size.

The machine learning hyperparameter optimization method based on asynchronous Bayesian optimization of the present invention includes the following steps:

(1) Execute the adaptive model parallelism module to determine the optimal model parallelism of the computing cluster and pass the results to the model parameter pool module;

(2) The model parameter pool module does some initialization work, such as the size of the model parameter pool (assuming n);

(3) The Bayesian optimization module does some initialization work, such as initializing the Bayesian optimization hyperparameter space configuration, Bayesian optimization iteration round number, etc.;

(4) The model parameter pool module calls the Kmeans clustering module to generate the initial n parameter points and fill it into the model parameter pool;

(5) The computing cluster performs a round of model iteration on the model corresponding to the parameters in the model parameter pool module, and sends the model evaluation index to the model parameter pool module;

(6) The scheduling module determines whether the corresponding model of the parameter should stop training according to the parameter points and model evaluation indicators in the model parameter pool module, and if it should stop, sends a training stop signal to the model parameter pool module;

(7) The model parameter pool module requests a new model parameter point if it receives a model stop signal from the scheduling module (if one model stops, it requests the Bayesian optimization module to generate a parameter point; if multiple models stop, it requests Kmeans clustering module generates multiple parameter points), the parameter points in the model parameter pool module are used by the computing cluster, start a round of model training, repeat the above process until reaching the Bayesian optimization stop threshold (Bayesian optimization iteration number of rounds).

Compared with the prior art, the advantages of the present invention are: the current mainstream superparameter optimization methods Grid Search and Random Search have low efficiency and require a large amount of computing resources; the heuristic Bayesian optimization method can only be executed serially, It is unable to effectively utilize the parallel computing capabilities of multiple machines in a distributed environment. This makes it difficult to carry out machine learning hyperparameter optimization in a big data environment. The asynchronous Bayesian optimization method proposed by the present invention can achieve asynchronous parallel hyperparameter optimization through asynchronous model training of the model parameter pool. The efficiency of superparametric optimization of Bayesian optimization itself can be retained. The invention can effectively utilize the multi-computer computing power in a distributed environment, making it possible to automatically adjust parameters of machine learning in a big data environment, thereby facilitating people to better perform data analysis and data mining through big data machine learning in social production practice value.

BRIEF DESCRIPTION

Figure 1 is a schematic diagram of the model selection and parameter adjustment process;

Figure 2 is a schematic diagram of Grid searchda (left picture) and Random search (right picture);

Figure 3 synchronous execution (left picture) and asynchronous execution diagram (right picture);

4 is a general framework diagram of the system of the present invention;

5 is a schematic diagram of a model parameter pool module in the present invention;

6 is a schematic diagram of the Kmeans clustering module in the present invention;

7 is a flowchart of implementation of the task scheduling module in the present invention;

FIG. 8 is a flowchart of the implementation of the adaptive determination model parallelism module in the present invention.

detailed description

The technical solution of the present invention can be represented as FIG. 4 and mainly includes: a Bayesian optimization module, a model parameter pool model, a Kmeans clustering module, a task scheduling module, and an adaptive determination model parallelism module. Through the cooperative work of the above-mentioned several modules, the machine learning hyperparameter optimization method based on Bayesian optimization proposed by the present invention can be realized.

Among the above modules, the Bayesian optimization module:

The Bayesian optimization module is the basic technology of the present invention. This module mainly implements the Bayesian optimization method. Bayesian optimization models the relationship between the model evaluation index and the parameter points, which can generate more meaningful parameter points. In the present invention, Bayesian optimization is responsible for generating candidate parameter points and receiving feedback information (parameter points and corresponding model evaluation indicators) from the model parameter pool.

Among the above modules, the model parameter pool module:

The model parameter pool is one of the key technologies of the present invention, and the module technical diagram is shown in FIG. 5. The model parameter pool is responsible for the management of model parameter points, receiving the parameter points generated from the Bayesian optimization module. For a single model parameter point, it directly receives the parameter points generated by the Bayesian optimization module. The optimization generates multiple sets of candidate parameter points, and then generates multiple sets of mutually different parameter points through the Kmeans clustering module.

The model parameter pool module provides Push and Pull interfaces. The calculation cluster (Spark cluster) can pull the model parameter points and then train; after the model converges, calculate the cluster Push model evaluation index to the model parameter pool. Due to different model parameters and the randomness of the machine learning model itself, the model usually has different training time. Based on this, efficient asynchronous parallel parameter tuning can be achieved.

Among the above modules, the Kmeans clustering module:

Kmeans clustering module is one of the keys of the present invention. Machine learning models, especially models that use gradient descent to solve, such as logistic regression and support vector machines. Usually, these models can converge after dozens of iterations. During model training, the calculation node will evaluate the parameters in the model parameter pool. Perform a round of iterations on the corresponding model, and then judge the convergence. This means that multiple models converge at the same time. At this time, using Bayesian optimization to generate multiple sets of candidate parameter points will cause the candidate parameter points to be redundant, resulting in the entire machine. The efficiency of learning automation is low.

This module mainly implements the Kmeans clustering algorithm, receives multiple sets of original candidate parameter points generated by Bayesian optimization, and then performs Kmeans clustering processing to generate k mutually different parameters that make the revenue function larger (such as the EI function) Point, fill the above parameter points into the model parameter pool module.

The schematic diagram of generating multiple sets of candidate parameter points by Kmeans clustering is shown in Figure 6 (the figure shows four parameter points A, B, C, and D, and there are many actual parameter points). The horizontal axis represents the parameter points, and the vertical axis represents the value of the income function Assuming that the three models converge at the same time, directly using Bayesian optimization to generate three sets of candidate parameter points will likely produce parameter points A, B, and C (corresponding to a larger return value), but parameter points A and B are mutually redundant. If the distance between parameter points is small (such as Euclidean distance), it will reduce the efficiency of Bayesian optimization. By using Kmeans clustering, the candidate parameter points A, B, C, and D are clustered. In theory, A and B will be clustered into the same class, and one of the parameter points A with a larger return value will be returned, which will produce Different parameter points A, C and D, thereby improving the efficiency of Bayesian optimization.

Among the above modules, the task scheduling module:

The task scheduling module is one of the important modules of the present invention. This module mainly judges the convergence of the model in the model parameter pool. The task scheduling module includes two parts: model convergence and Early Stopping technology.

The model convergence is mainly judged by calculating whether the accuracy of the model reaches the threshold set in advance. If it reaches the threshold, it will converge; otherwise, it will not converge.

In some machine learning applications, some performance-related information is available during the machine learning model training process. Especially when the model training is iterative, the performance curve (Performance curve) is available. For example, for the model using gradient descent solution, as the training progresses, the model accuracy rate (Accuracy) is often higher and higher, and each round At the end of the iteration (Epoch), you can get the model accuracy. Using the curve between the accuracy rate and the number of training steps, you can determine whether the currently trained model is likely to be better than the known best model. For the model that is impossible to obtain a better effect than the known best model, you can timely Terminate the model training and release the corresponding computing resources, so as to evaluate more promising models. The algorithm based on the above idea is called Early Stopping algorithm. Through the use of Early Stopping technology, you can effectively accelerate the entire machine learning automated tuning process.

Among the above modules, the module for adaptively determining the parallelism of the model:

The adaptive determination model parallelism module is one of the important modules of the present invention. Model parallelism refers to the number of models that are executed simultaneously in the computing cluster. The model parallelism directly affects the computing performance of the entire cluster. If the model parallelism is set too large or too small, it will affect the computing performance of the entire computing cluster. Adaptation determines the parallelism of the model.

For the Spark platform machine learning model, the working principle of adaptively determining the model parallelism module is to evaluate the calculation efficiency of the model parameter pool corresponding to different model parameter pool sizes through experiments, so as to obtain the model parameter pool size corresponding to the best computing performance, The specific steps are: calculate the time-consuming time to perform a round of model iteration for different model parameter pool sizes, and then do the time normalization, which is more time-consuming. To avoid the influence of random factors, repeat the test multiple times (such as 3 times) to find the best The size of the model parameter pool corresponding to the best model execution performance. Compared to the time-consuming process of the entire parameter adjustment process, the time-consuming process described above is negligible.

The present invention will be described in detail below with reference to specific embodiments and drawings.

The example of the present invention uses the Python language as the programming language, in which the big data processing platform Spark and the Spark-based MLlib distributed machine learning library are used to solve the machine learning hyperparameter optimization problem in the big data environment. The following is a detailed description of the classic machine learning model logistic regression that is often used in the big data environment.

As shown in FIG. 4, the specific implementation of the present invention is as follows:

1. Bayesian optimization module

This module needs to implement the Bayesian optimization algorithm. Bayesian optimization requires an initial parameter range, and provides the parameter space configuration interface as follows:

For logistic regression models, the main hyperparameters are: maxIter: number of iterations of the model, regParam: regularization coefficient, tol: accuracy of model convergence, etc. An initial parameter space range needs to be configured, as follows:

When the model parameter pool requires 1 set of parameters, the Bayesian optimization module directly generates a set of parameters to feed back to the model parameter pool (reduced to the classic Bayesian optimization algorithm); when the model parameter pool requires multiple sets of parameters (assuming For K), Bayesian optimization generates multiple sets of original candidate parameter points, and generates K sets of different parameter points through the Kmeans clustering module.

2. Model parameter pool module

As shown in Figure 5, this module mainly implements the model parameter pool and is responsible for the management of model parameters. The entire machine learning automation parameter adjustment process This module mainly has three stages: initialization stage, first stage and second stage.

The initialization phase includes: executing an adaptive algorithm for determining the size of the model parameter pool module to determine the size of the model parameter pool (assuming k);

The first stage includes: calling the Bayesian optimization module, randomly generating k sets of parameter points, and filling them into the model parameter pool; computing nodes to train the model corresponding to the parameters in the model parameter pool; feeding back the parameter points and the corresponding model evaluation to Bayesian Optimization, initializing the Gaussian process of Bayesian optimization;

The second stage includes: executing the task scheduling module, judging whether the model corresponding to the parameters in the model parameter pool has converged, counting the number of models that have converged, and marking the model parameter pool accordingly; executing the task scheduling module, judging the model according to the Early Stopping algorithm Whether the model corresponding to the parameter in the parameter pool should stop training, count the number of models that should stop training n, and mark it in the model parameter pool; the calculation node calculates the model evaluation index of the above m+n models on the test data set ; Record the best model and the corresponding best model evaluation index; feedback the m+n models completed by the above training to Bayesian optimization and update the Gaussian process; combined with Bayesian optimization, use Kmeans clustering to generate m+n Set candidate parameter points to fill the model parameter pool; the computing node performs a round of model training update on the model corresponding to the parameters in the model parameter pool; loop through the above process until the specified number of optimization rounds is reached, returning the best model and the best model Evaluation indicators.

3. Kmeans clustering module

As shown in Fig. 6, the Kmeans clustering module implements a Kmeans algorithm, which mainly performs clustering operations on multiple sets of original parameter points generated in the Bayesian optimization module, thereby generating K sets of mutually different parameter points.

A Kmeans-based algorithm for generating multiple candidate parameter points mainly includes the following processes: randomly generating L (L>10000) parameter points; calculating the EI (return function value corresponding to L parameter points, Bayesian optimization generates candidates Standard value of parameter points, find l (l>200 and l<1000) parameter points with the largest EI value; start the gradient descent algorithm from each parameter point to find the local best; perform Kmeans on the above l local best Clustering algorithm, and return the parameter point with the largest EI value.

Clustering target: If the candidate parameter points are close to each other (for example, the Euclidean distance is small), redundancy will be formed and the efficiency of parameter adjustment will be reduced, hoping to obtain multiple different parameter points that make the EI value larger.

Clustering data: take the parameter value of the parameter points and the value of the revenue function as features, perform normalization processing (to avoid clustering failure caused by inconsistent scales between features), and cluster the sample points into k (the number of parameter points to be generated) )class.

Clustering result: from each clustering result, choose a parameter point that maximizes the value of the return function and return.

Taking Fig. 6 as an example, the original candidate parameter points include L parameter points: A, B, C, and D. After Kmeans clustering, three types of A, B, C, and D are generated. The largest parameter point, A, C, and D, then A, C, and D are different from each other and make the value of the revenue function larger.

4. Task scheduling module

The task scheduling module implements the Early Stopping algorithm. This module judges the convergence of the parameter points in the model parameter pool.

Each iteration of the model in the model parameter pool, the task scheduling module will make a convergence judgment. The task scheduling module mainly judges whether the model in the model parameter pool should stop training based on the convergence accuracy. In order to effectively speed up the entire parameter adjustment process, pass The performance curve in Early Stoping technology can pre-judge whether the model is likely to achieve the best model effect, and can terminate the training of the model that cannot achieve the best model effect in time, thus starting the training of the next set of parameter points.

As shown in Figure 7, the task scheduling module obtains the data in the model parameter pool: model parameter points, model evaluation indicators, etc. First of all, according to the model convergence, it is judged whether the model should stop training, calculate the model accuracy w, and judge whether it reaches the model setting. Threshold W, if it reaches the threshold, it will converge; otherwise, it will not converge. If the above judgment result is not converged, continue to use the Early Stopping algorithm to calculate the average E(P) of the model evaluation index P of the already trained model in the current iteration round, if the current model evaluation index p<E(P) *0.9, the training of the model should be terminated.

5. Adaptively determine model parallelism module

As shown in Figure 8, this module mainly implements an adaptive algorithm to determine the parallelism of the model. Responsible for determining the reasonable size of the model parameter pool.

For the logistic regression model, the specific steps are: calculate the time taken to perform a round of model iteration for different model parameter pool sizes, and then do the time normalization to compare the length of time. To avoid the influence of random factors, repeat the test multiple times (such as 3 Times), find the model parameter pool size corresponding to the best model execution performance. Compared to the time-consuming process of the entire parameter adjustment process, the time-consuming process described above is negligible.

For the logistic regression model, the specific steps are: determine an initial model parameter pool size range, such as i = 1 ~ e (e represents the maximum model parameter pool size configuration, e>1), set the model parameter pool size to i, Repeat one round of model iteration three times (to avoid random factors, repeat three times). The above process is executed in a loop, sorted according to the length of time, and the shortest time i is returned as the size of the model parameter pool.

The above embodiments are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent replacements and modifications made without departing from the spirit and principle of the present invention should be covered within the scope of the present invention.

Claims

A machine learning hyperparameter optimization system based on asynchronous Bayesian optimization, characterized by including:

The Bayesian optimization module generates candidate parameter points according to the Bayesian optimization algorithm; provides the Get interface for the model parameter pool module to directly call; for the scenario where multiple models converge simultaneously, the GetBatch interface is provided for the Kmeans clustering module to call;

The model parameter pool module is used to obtain model hyperparameter points, replace parameter points in the model parameter pool, and provide parameter points in the model parameter pool to the computing cluster for use;

Kmeans clustering module, generating multiple different parameter points through Kmeans clustering;

Task scheduling module to determine whether the model in the model parameter pool module should stop training;

Adaptively determine the model parallelism module, adaptively determine the parallelism of the models in the computing cluster.
The system according to claim 1, wherein the GetBatch interface of the Bayesian optimization module randomly generates L parameter points, calculates the EI value corresponding to the L parameter points, and finds the l parameter with the largest EI value Point, start the gradient descent algorithm from each parameter point to find the local best.
The system according to claim 1, wherein the Bayesian optimization module Get interface includes Hp.choice(label, options), Hp.randint(label, upper), Hp.uniform(label, low, high ), Hp.logunifom (label, low, high).
The system according to claim 1, wherein the model parameter pool of the model parameter pool module is implemented by an array, abstracting model parameter points into parameter point objects, and providing Push and Pull interfaces for computing clusters and model parameter pools Interaction; the model parameter pool module obtains model parameter points from the Bayesian optimization module through the Get interface, and obtains multiple sets of mutually different parameter points from the Kmeans clustering module through the GetBatch interface; the parameter points in the model parameter pool module are accessed by the computing cluster Pull, Receive model evaluation indicators for computing cluster Push.
The system according to claim 1, wherein the Kmeans clustering module is called by the model parameter pool module and receives signals generating k mutually different parameter points; calling the Bayesian optimization module to generate K candidate parameter points; Kmeans clustering is used to cluster candidate parameter points into k categories, and then select a parameter point with the largest value of the revenue function for each category, thereby generating k different parameter points and returning the results to the model parameter pool module; where K is greater than k.
The system according to claim 1, wherein the task scheduling module and the model parameter pool module interact to determine the state of the model corresponding to the parameters in the model parameter pool module according to the Early Stopping algorithm or the model convergence accuracy, and whether to send the training to stop Signal to the model parameter pool module.
The system according to claim 1, wherein the adaptive determining model parallelism module evaluates the calculation efficiency of the model parameter pool corresponding to different model parameter pool sizes through experiments to obtain the model parameter pool corresponding to the best calculation performance Size; this module is called by the model parameter pool module to initialize the model parameter pool size.
A machine learning hyperparameter optimization method based on asynchronous Bayesian optimization is characterized by the following steps:

(1) Execute the adaptive model parallelism module to determine the optimal model parallelism of the computing cluster, and pass the results to the model parameter pool module;

(2) The model parameter pool module performs initialization work, including the model parameter pool size;

(3) The Bayesian optimization module performs initialization work, including initializing Bayesian optimization parameter space configuration and Bayesian optimization iteration number;

(4) The model parameter pool module calls the Kmeans clustering module to generate the initial k parameter points and fill it into the model parameter pool module;

(5) The computing cluster performs a round of model iteration on the model corresponding to the parameters in the model parameter pool module, sends the model evaluation indicators to the model parameter pool module, and then the model parameter pool module interacts with the task scheduling module;

(6) The task scheduling module determines whether the corresponding model of the parameter should be stopped and counts the number according to the parameter points and model evaluation indicators in the model parameter pool module, and sends the above information to the model parameter pool module;

(7) The model parameter pool module requests a new model parameter point if it receives a model stop signal from the task scheduling module, if a model stops, it requests the Bayesian optimization module to generate a parameter point; if multiple models stop, then Request the Kmeans clustering module to generate multiple parameter points; the parameter points in the model parameter pool module are used by the computing cluster, start a round of model training, and repeat the above process until the Bayesian optimization stop threshold is reached, that is, the number of Bayesian optimization iterations, Complete machine learning hyperparameter optimization.
The method of claim 8, wherein the number of parameter points generated by the Kmeans clustering module in step (7) is the number of models that should be stopped by the task scheduling module.