WO2020133952A1 - Asynchronous bayesian optimization-based machine learning super-parameter optimization system and method - Google Patents

Asynchronous bayesian optimization-based machine learning super-parameter optimization system and method Download PDF

Info

Publication number
WO2020133952A1
WO2020133952A1 PCT/CN2019/091485 CN2019091485W WO2020133952A1 WO 2020133952 A1 WO2020133952 A1 WO 2020133952A1 CN 2019091485 W CN2019091485 W CN 2019091485W WO 2020133952 A1 WO2020133952 A1 WO 2020133952A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
module
parameter
model parameter
points
Prior art date
Application number
PCT/CN2019/091485
Other languages
French (fr)
Chinese (zh)
Inventor
刘杰
王建飞
杨诏
叶丹
钟华
Original Assignee
中国科学院软件研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院软件研究所 filed Critical 中国科学院软件研究所
Publication of WO2020133952A1 publication Critical patent/WO2020133952A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the invention relates to a machine learning hyperparameter optimization system and method based on asynchronous Bayesian optimization. Belongs to the field of computer artificial intelligence.
  • the core of AutoML is the automatic tuning of machine learning models, that is, the automatic selection of hyperparameters.
  • the selection of hyperparameters is very important for machine learning applications. Different hyperparameters directly affect the effect of machine learning applications in production practice (such as prediction accuracy, etc.) ), the hyper-parameter selection process of the machine learning model is shown in Figure 1. Because the machine learning model usually contains a large number of parameters and the parameter space is huge, how to efficiently adjust the parameters is an urgent problem to be solved.
  • the commonly used parameter adjustment methods are: simple parameter adjustment methods represented by manual parameter adjustment, Grid search and Random search; heuristic methods represented by Bayesian optimization-based methods.
  • the schematic diagram of Grid search and Random search is shown in Figure 2.
  • Manual tuning is one of the simplest and most artistic tuning methods. In the face of a machine learning application, you can use manual tuning to adjust the parameters to determine the model parameters. For experienced machine learning experts, you can manually adjust the parameters based on the experience value; for novice machine learning newcomers, you can perform Manual "trial and error method" (carry out enough experiments to find a set of parameters with good model effect). Generally, manual adjustment is a time-consuming and labor-intensive process.
  • Grid search is one of the simplest automated parameter adjustment methods.
  • the idea of Grid search is simple and straightforward. Users only need to define a range of parameter values, combine the parameters at a certain interval, correspond to the training model, and then select the parameters corresponding to the model with the best model evaluation.
  • the parameter combination space of Grid search is large. For example, for logistic regression applications, assuming there are 5 parameters, each parameter has 10 possible values, then the entire combination space will be 10 5 , so many models are trained It will be a very time-consuming process. Because the parameter combination space is usually large, Grid search is suitable for scenarios where model training takes a very short time, and it is difficult to play a role in large data scenarios.
  • Random search In response to the shortage of Grid search, some researchers have studied Random search. Unlike Grid search, which exhausts parameter combinations at fixed intervals, Random search randomly selects parameter combinations. The research of Bergstra et al. shows that under normal circumstances Random search will not be worse than Grid search. Random search Random selection of parameter combinations can avoid mutual redundancy between parameter points to a certain extent. The problem with Random search is that if certain two parameter points are relatively close (for example, the Euclidean distance in space is small), then these two parameter points are redundant with each other, which will reduce the search efficiency. For high-dimensional features Space (when there are many parameters), it is easy to fall into a local area.
  • Bayesian optimization is a serialized model-based optimization algorithm. Bayesian optimization uses the trained model information as a priori knowledge to guide the generation of the next parameter point, which can be obtained faster. The best model effect, compared to Grid search and Random search, can greatly speed up the entire parameter adjustment process, and is currently the best method for super-parameter optimization of machine learning models.
  • the asynchronous Bayesian optimization proposed by Kandasamy K and others is a parallel method for classical Bayesian optimization, but the method in the paper uses each computing node to be responsible for the evaluation of a model, which makes it impossible for a single model to effectively use big data. At the same time, it is unable to cope with the scenario where multiple models converge at the same time.
  • the Bayesian optimization in the above big data environment has problems such as low efficiency, which makes the availability of machine learning automation parameter adjustment technology in the big data environment low.
  • the technology of the present invention solves the problem: Aiming at the problem that it is difficult to carry out machine learning automatic parameter adjustment in the big data environment, to overcome the shortcomings of the existing technology, a super parameter optimization system and method based on asynchronous Bayesian optimization is provided,
  • the machine learning under the high-efficiency automatic parameter adjustment effectively use the parallel computing capabilities of multiple machines, and the high-efficiency automatic parameter adjustment of big data machine learning, so that people can better use big data machine learning in production practice.
  • a machine learning hyperparameter optimization system based on asynchronous Bayesian optimization including: Bayesian optimization module, model parameter pool model, Kmeans clustering module, task scheduling module, adaptive determination of model parallelism Module
  • Bayesian optimization module implement Bayesian optimization algorithm, generate candidate parameter points; provide get interface for model (machine learning model) parameters (model hyperparameters, such as learning rate, regularization coefficient and other parameters) pool module directly call;
  • model machine learning model
  • model hyperparameters such as learning rate, regularization coefficient and other parameters
  • the GetBatch interface is provided for the Kmeans clustering module to call; the GetBatch interface implements the following algorithm: randomly generates L (L>10000) parameter points, and calculates the EI (revenue function value corresponding to the L parameter points).
  • Bayesian optimization produces standard values of candidate parameter points, finds l (l>200 and l ⁇ 1000) parameter points with the largest EI value, and executes the gradient descent algorithm from each parameter point to find the local best;
  • the model parameter pool module is responsible for the management of model parameter points, including: acquiring model hyper-parameter points, replacing parameter points in the model parameter pool, and providing parameter points in the model parameter pool to the computing cluster for use.
  • the model parameter pool is mainly realized by an array, which abstracts the model parameter points into parameter point objects, and provides Push and Pull interfaces for interaction between the computing cluster and the model parameter pool.
  • the model parameter pool obtains model parameter points from the Bayesian optimization module through the Get interface, and obtains multiple sets of mutually different parameter points from the Kmeans clustering module through the GetBatch interface.
  • the parameter points in the model parameter pool can be pulled by the computing cluster (Spark cluster) and can receive the model evaluation index of the computing cluster Push.
  • Kmeans clustering module which generates multiple different parameter points through Kmeans clustering; it is called by the model parameter pool module and receives signals that generate k different parameter points; calling the Bayesian optimization module to generate K (usually greater than k) Original candidate parameter points; the candidate parameter points are clustered into k categories through Kmeans clustering, and then the parameter points with the largest revenue function value in the k categories are selected, thereby generating k mutually different parameter points, and returning the results to the model parameter pool module.
  • the task scheduling module determines whether the model in the model parameter pool module should stop training. Specifically, it includes: model convergence and Early Stopping algorithm. Model convergence, calculate whether the accuracy of the model reaches the threshold set in advance, if it reaches the threshold, it will converge, otherwise, it does not converge; Early Stopping first calculates the average E(P ), if the current model evaluation index p ⁇ E (P) * 0.9, then stop training, otherwise, continue training. This module mainly interacts with the model parameter pool, judges the state of the model corresponding to the parameters in the model parameter pool, and sends a signal to the model parameter pool.
  • Adaptively determine the model parallelism module adaptively determine the parallelism of the models in the computing cluster.
  • This module mainly evaluates the calculation efficiency of model parameter pools corresponding to different model parameter pool sizes through experiments to obtain the model parameter pool size corresponding to the best calculation performance. Specifically, it calculates the time-consuming time to perform a round of model iteration for different model parameter pool sizes, and then normalizes the time to compare the length of time. In order to avoid the influence of random factors, the experiment is repeated multiple times to obtain the best model execution performance correspondence Model parameter pool size.
  • This module is mainly used by the model parameter pool to initialize the model parameter pool size.
  • the machine learning hyperparameter optimization method based on asynchronous Bayesian optimization of the present invention includes the following steps:
  • the model parameter pool module does some initialization work, such as the size of the model parameter pool (assuming n);
  • the Bayesian optimization module does some initialization work, such as initializing the Bayesian optimization hyperparameter space configuration, Bayesian optimization iteration round number, etc.;
  • the model parameter pool module calls the Kmeans clustering module to generate the initial n parameter points and fill it into the model parameter pool;
  • the computing cluster performs a round of model iteration on the model corresponding to the parameters in the model parameter pool module, and sends the model evaluation index to the model parameter pool module;
  • the scheduling module determines whether the corresponding model of the parameter should stop training according to the parameter points and model evaluation indicators in the model parameter pool module, and if it should stop, sends a training stop signal to the model parameter pool module;
  • the model parameter pool module requests a new model parameter point if it receives a model stop signal from the scheduling module (if one model stops, it requests the Bayesian optimization module to generate a parameter point; if multiple models stop, it requests Kmeans clustering module generates multiple parameter points), the parameter points in the model parameter pool module are used by the computing cluster, start a round of model training, repeat the above process until reaching the Bayesian optimization stop threshold (Bayesian optimization iteration number of rounds).
  • the advantages of the present invention are: the current mainstream superparameter optimization methods Grid Search and Random Search have low efficiency and require a large amount of computing resources; the heuristic Bayesian optimization method can only be executed serially, It is unable to effectively utilize the parallel computing capabilities of multiple machines in a distributed environment. This makes it difficult to carry out machine learning hyperparameter optimization in a big data environment.
  • the asynchronous Bayesian optimization method proposed by the present invention can achieve asynchronous parallel hyperparameter optimization through asynchronous model training of the model parameter pool. The efficiency of superparametric optimization of Bayesian optimization itself can be retained.
  • the invention can effectively utilize the multi-computer computing power in a distributed environment, making it possible to automatically adjust parameters of machine learning in a big data environment, thereby facilitating people to better perform data analysis and data mining through big data machine learning in social production practice value.
  • Figure 1 is a schematic diagram of the model selection and parameter adjustment process
  • Figure 2 is a schematic diagram of Grid searchda (left picture) and Random search (right picture);
  • FIG. 5 is a schematic diagram of a model parameter pool module in the present invention.
  • FIG. 6 is a schematic diagram of the Kmeans clustering module in the present invention.
  • FIG. 8 is a flowchart of the implementation of the adaptive determination model parallelism module in the present invention.
  • the technical solution of the present invention can be represented as FIG. 4 and mainly includes: a Bayesian optimization module, a model parameter pool model, a Kmeans clustering module, a task scheduling module, and an adaptive determination model parallelism module.
  • a Bayesian optimization module mainly includes: a Bayesian optimization module, a model parameter pool model, a Kmeans clustering module, a task scheduling module, and an adaptive determination model parallelism module.
  • the Bayesian optimization module :
  • the Bayesian optimization module is the basic technology of the present invention. This module mainly implements the Bayesian optimization method. Bayesian optimization models the relationship between the model evaluation index and the parameter points, which can generate more meaningful parameter points. In the present invention, Bayesian optimization is responsible for generating candidate parameter points and receiving feedback information (parameter points and corresponding model evaluation indicators) from the model parameter pool.
  • model parameter pool module
  • the model parameter pool is one of the key technologies of the present invention, and the module technical diagram is shown in FIG. 5.
  • the model parameter pool is responsible for the management of model parameter points, receiving the parameter points generated from the Bayesian optimization module. For a single model parameter point, it directly receives the parameter points generated by the Bayesian optimization module. The optimization generates multiple sets of candidate parameter points, and then generates multiple sets of mutually different parameter points through the Kmeans clustering module.
  • the model parameter pool module provides Push and Pull interfaces.
  • the calculation cluster (Spark cluster) can pull the model parameter points and then train; after the model converges, calculate the cluster Push model evaluation index to the model parameter pool. Due to different model parameters and the randomness of the machine learning model itself, the model usually has different training time. Based on this, efficient asynchronous parallel parameter tuning can be achieved.
  • the Kmeans clustering module :
  • Kmeans clustering module is one of the keys of the present invention.
  • Machine learning models especially models that use gradient descent to solve, such as logistic regression and support vector machines. Usually, these models can converge after dozens of iterations.
  • the calculation node will evaluate the parameters in the model parameter pool. Perform a round of iterations on the corresponding model, and then judge the convergence. This means that multiple models converge at the same time.
  • using Bayesian optimization to generate multiple sets of candidate parameter points will cause the candidate parameter points to be redundant, resulting in the entire machine. The efficiency of learning automation is low.
  • This module mainly implements the Kmeans clustering algorithm, receives multiple sets of original candidate parameter points generated by Bayesian optimization, and then performs Kmeans clustering processing to generate k mutually different parameters that make the revenue function larger (such as the EI function) Point, fill the above parameter points into the model parameter pool module.
  • Kmeans clustering algorithm receives multiple sets of original candidate parameter points generated by Bayesian optimization, and then performs Kmeans clustering processing to generate k mutually different parameters that make the revenue function larger (such as the EI function) Point, fill the above parameter points into the model parameter pool module.
  • FIG. 6 The schematic diagram of generating multiple sets of candidate parameter points by Kmeans clustering is shown in Figure 6 (the figure shows four parameter points A, B, C, and D, and there are many actual parameter points).
  • the horizontal axis represents the parameter points
  • the vertical axis represents the value of the income function
  • the candidate parameter points A, B, C, and D are clustered. In theory, A and B will be clustered into the same class, and one of the parameter points A with a larger return value will be returned, which will produce Different parameter points A, C and D, thereby improving the efficiency of Bayesian optimization.
  • the task scheduling module :
  • the task scheduling module is one of the important modules of the present invention. This module mainly judges the convergence of the model in the model parameter pool.
  • the task scheduling module includes two parts: model convergence and Early Stopping technology.
  • the model convergence is mainly judged by calculating whether the accuracy of the model reaches the threshold set in advance. If it reaches the threshold, it will converge; otherwise, it will not converge.
  • some performance-related information is available during the machine learning model training process.
  • the performance curve (Performance curve) is available.
  • the model accuracy rate (Accuracy) is often higher and higher, and each round At the end of the iteration (Epoch), you can get the model accuracy.
  • the curve between the accuracy rate and the number of training steps you can determine whether the currently trained model is likely to be better than the known best model.
  • the model that is impossible to obtain a better effect than the known best model you can timely Terminate the model training and release the corresponding computing resources, so as to evaluate more promising models.
  • the algorithm based on the above idea is called Early Stopping algorithm. Through the use of Early Stopping technology, you can effectively accelerate the entire machine learning automated tuning process.
  • the module for adaptively determining the parallelism of the model the module for adaptively determining the parallelism of the model:
  • the adaptive determination model parallelism module is one of the important modules of the present invention.
  • Model parallelism refers to the number of models that are executed simultaneously in the computing cluster.
  • the model parallelism directly affects the computing performance of the entire cluster. If the model parallelism is set too large or too small, it will affect the computing performance of the entire computing cluster. Adaptation determines the parallelism of the model.
  • the working principle of adaptively determining the model parallelism module is to evaluate the calculation efficiency of the model parameter pool corresponding to different model parameter pool sizes through experiments, so as to obtain the model parameter pool size corresponding to the best computing performance,
  • the specific steps are: calculate the time-consuming time to perform a round of model iteration for different model parameter pool sizes, and then do the time normalization, which is more time-consuming. To avoid the influence of random factors, repeat the test multiple times (such as 3 times) to find the best The size of the model parameter pool corresponding to the best model execution performance. Compared to the time-consuming process of the entire parameter adjustment process, the time-consuming process described above is negligible.
  • the example of the present invention uses the Python language as the programming language, in which the big data processing platform Spark and the Spark-based MLlib distributed machine learning library are used to solve the machine learning hyperparameter optimization problem in the big data environment.
  • Python language the programming language
  • the big data processing platform Spark and the Spark-based MLlib distributed machine learning library are used to solve the machine learning hyperparameter optimization problem in the big data environment.
  • the following is a detailed description of the classic machine learning model logistic regression that is often used in the big data environment.
  • Bayesian optimization requires an initial parameter range, and provides the parameter space configuration interface as follows:
  • the Bayesian optimization module directly generates a set of parameters to feed back to the model parameter pool (reduced to the classic Bayesian optimization algorithm); when the model parameter pool requires multiple sets of parameters (assuming For K), Bayesian optimization generates multiple sets of original candidate parameter points, and generates K sets of different parameter points through the Kmeans clustering module.
  • this module mainly implements the model parameter pool and is responsible for the management of model parameters.
  • the entire machine learning automation parameter adjustment process This module mainly has three stages: initialization stage, first stage and second stage.
  • the initialization phase includes: executing an adaptive algorithm for determining the size of the model parameter pool module to determine the size of the model parameter pool (assuming k);
  • the first stage includes: calling the Bayesian optimization module, randomly generating k sets of parameter points, and filling them into the model parameter pool; computing nodes to train the model corresponding to the parameters in the model parameter pool; feeding back the parameter points and the corresponding model evaluation to Bayesian Optimization, initializing the Gaussian process of Bayesian optimization;
  • the second stage includes: executing the task scheduling module, judging whether the model corresponding to the parameters in the model parameter pool has converged, counting the number of models that have converged, and marking the model parameter pool accordingly; executing the task scheduling module, judging the model according to the Early Stopping algorithm Whether the model corresponding to the parameter in the parameter pool should stop training, count the number of models that should stop training n, and mark it in the model parameter pool; the calculation node calculates the model evaluation index of the above m+n models on the test data set ; Record the best model and the corresponding best model evaluation index; feedback the m+n models completed by the above training to Bayesian optimization and update the Gaussian process; combined with Bayesian optimization, use Kmeans clustering to generate m+n Set candidate parameter points to fill the model parameter pool; the computing node performs a round of model training update on the model corresponding to the parameters in the model parameter pool; loop through the above process until the specified number of optimization rounds is reached, returning the best model and the best model Evaluation indicators.
  • the Kmeans clustering module implements a Kmeans algorithm, which mainly performs clustering operations on multiple sets of original parameter points generated in the Bayesian optimization module, thereby generating K sets of mutually different parameter points.
  • a Kmeans-based algorithm for generating multiple candidate parameter points mainly includes the following processes: randomly generating L (L>10000) parameter points; calculating the EI (return function value corresponding to L parameter points, Bayesian optimization generates candidates Standard value of parameter points, find l (l>200 and l ⁇ 1000) parameter points with the largest EI value; start the gradient descent algorithm from each parameter point to find the local best; perform Kmeans on the above l local best Clustering algorithm, and return the parameter point with the largest EI value.
  • Clustering target If the candidate parameter points are close to each other (for example, the Euclidean distance is small), redundancy will be formed and the efficiency of parameter adjustment will be reduced, hoping to obtain multiple different parameter points that make the EI value larger.
  • Clustering data take the parameter value of the parameter points and the value of the revenue function as features, perform normalization processing (to avoid clustering failure caused by inconsistent scales between features), and cluster the sample points into k (the number of parameter points to be generated) )class.
  • Clustering result from each clustering result, choose a parameter point that maximizes the value of the return function and return.
  • the original candidate parameter points include L parameter points: A, B, C, and D.
  • L parameter points A, B, C, and D.
  • three types of A, B, C, and D are generated.
  • the largest parameter point, A, C, and D, then A, C, and D are different from each other and make the value of the revenue function larger.
  • the task scheduling module implements the Early Stopping algorithm. This module judges the convergence of the parameter points in the model parameter pool.
  • the task scheduling module will make a convergence judgment.
  • the task scheduling module mainly judges whether the model in the model parameter pool should stop training based on the convergence accuracy. In order to effectively speed up the entire parameter adjustment process, pass The performance curve in Early Stoping technology can pre-judge whether the model is likely to achieve the best model effect, and can terminate the training of the model that cannot achieve the best model effect in time, thus starting the training of the next set of parameter points.
  • the task scheduling module obtains the data in the model parameter pool: model parameter points, model evaluation indicators, etc.
  • the model should stop training, calculate the model accuracy w, and judge whether it reaches the model setting.
  • Threshold W if it reaches the threshold, it will converge; otherwise, it will not converge. If the above judgment result is not converged, continue to use the Early Stopping algorithm to calculate the average E(P) of the model evaluation index P of the already trained model in the current iteration round, if the current model evaluation index p ⁇ E(P) *0.9, the training of the model should be terminated.
  • this module mainly implements an adaptive algorithm to determine the parallelism of the model. Responsible for determining the reasonable size of the model parameter pool.
  • the specific steps are: calculate the time taken to perform a round of model iteration for different model parameter pool sizes, and then do the time normalization to compare the length of time. To avoid the influence of random factors, repeat the test multiple times (such as 3 Times), find the model parameter pool size corresponding to the best model execution performance. Compared to the time-consuming process of the entire parameter adjustment process, the time-consuming process described above is negligible.

Abstract

The present invention relates to an asynchronous Bayesian optimization-based machine learning super-parameter optimization system and method. The system comprises: a Bayesian optimization module, a model parameter pool model, a Kmeans clustering module, a task scheduling module, and an adaptive determining model parallelism module. The present invention efficiently performs automatic parameter adjustment on machine learning in a big data environment, effectively uses multi-host parallel computing capability, and efficiently performs automatic parameter adjustment for big data machine learning, so that people can better use big data machine learning in production practice.

Description

一种基于异步贝叶斯优化的机器学习超参优化系统及方法Machine learning superparameter optimization system and method based on asynchronous Bayesian optimization 技术领域Technical field
本发明涉及一种基于异步贝叶斯优化的机器学习超参优化系统及方法。属于计算机人工智能领域。The invention relates to a machine learning hyperparameter optimization system and method based on asynchronous Bayesian optimization. Belongs to the field of computer artificial intelligence.
背景技术Background technique
随着云计算和大数据技术的发展,机器学习技术成为学术界和企业界的热点。然而机器学习涉及大量理论知识,同时机器学习模型包含大量参数,需要有丰富的经验才能设计一个高效的模型。为了促进机器学习技术更广泛的应用,有效降低开展机器学习应用的门槛,自动化机器学习(Automatic Machine Learning,简称AutoML)技术应运而生,即通过对机器学习各环节提供自动化技术,让初学者也可以开展机器学习模型训练和应用。With the development of cloud computing and big data technology, machine learning technology has become a hot spot in academia and enterprises. However, machine learning involves a lot of theoretical knowledge. At the same time, machine learning models contain a lot of parameters. It requires a lot of experience to design an efficient model. In order to promote the wider application of machine learning technology and effectively lower the threshold for developing machine learning applications, automatic machine learning (Automatic Machine Learning (AutoML) technology) came into being, that is, by providing automation technology to all aspects of machine learning, beginners can also Machine learning model training and application can be carried out.
AutoML的核心是机器学习模型的自动化调参,即自动选择超参数,超参数的选择对机器学习应用非常重要,不同超参数直接影响着机器学习应用在生产实践中的效果(比如预测准确率等),机器学习模型的超参数选择过程如图1所示,由于机器学习模型通常包含大量参数,参数空间巨大,如何高效的进行调参是一个亟待解决的问题。目前常用的调参方法有:以人工调参、Grid search和Random search为代表的简单调参方法;以基于贝叶斯优化的方法等为代表的启发式方法。Grid search和Random search的示意图如图2所示。The core of AutoML is the automatic tuning of machine learning models, that is, the automatic selection of hyperparameters. The selection of hyperparameters is very important for machine learning applications. Different hyperparameters directly affect the effect of machine learning applications in production practice (such as prediction accuracy, etc.) ), the hyper-parameter selection process of the machine learning model is shown in Figure 1. Because the machine learning model usually contains a large number of parameters and the parameter space is huge, how to efficiently adjust the parameters is an urgent problem to be solved. At present, the commonly used parameter adjustment methods are: simple parameter adjustment methods represented by manual parameter adjustment, Grid search and Random search; heuristic methods represented by Bayesian optimization-based methods. The schematic diagram of Grid search and Random search is shown in Figure 2.
人工调参是一种最简单又最有艺术性的调参方法。面对一个机器学习应用,可以使用人工调参的方法进行调参,从而确定模型参数,对于有经验的机器学习专家,可以基于经验值进行人工调参;对于没有经验的机器学习新人,可以进行人工“试错法”(进行足够多的实验,可以找到一组模型效果较好的参数)。通常,人工调参是一个耗时耗力的过程。Manual tuning is one of the simplest and most artistic tuning methods. In the face of a machine learning application, you can use manual tuning to adjust the parameters to determine the model parameters. For experienced machine learning experts, you can manually adjust the parameters based on the experience value; for novice machine learning newcomers, you can perform Manual "trial and error method" (carry out enough experiments to find a set of parameters with good model effect). Generally, manual adjustment is a time-consuming and labor-intensive process.
Grid search是最简单的自动化调参方法之一。Grid search的思想是简单而直接的,用户只需要定义一组参数取值范围,按照一定的间隔组合参数,对应训练模型,然后挑选出模型评价最好的模型对应的参数。通常,Grid search的参数组合空间较大,比如,对于逻辑回归应用,假设有5个参数,每个参数有10个可能的取值,那么整个组合空间将是10 5,对这么多模型进行训练将是一个非常耗时的过程。由于参数组合空间通常较大,Grid search适用于模型训练耗时非常短的场景,在大数据场景很难发挥作用。 Grid search is one of the simplest automated parameter adjustment methods. The idea of Grid search is simple and straightforward. Users only need to define a range of parameter values, combine the parameters at a certain interval, correspond to the training model, and then select the parameters corresponding to the model with the best model evaluation. Generally, the parameter combination space of Grid search is large. For example, for logistic regression applications, assuming there are 5 parameters, each parameter has 10 possible values, then the entire combination space will be 10 5 , so many models are trained It will be a very time-consuming process. Because the parameter combination space is usually large, Grid search is suitable for scenarios where model training takes a very short time, and it is difficult to play a role in large data scenarios.
针对Grid search的不足,一些学者研究了Random search,不同于Grid search按固 定的间隔穷举参数组合,Random search随机的挑选参数组合。Bergstra等人的研究表明:一般情况下Random search的效果不会比Grid search差。Random search随机的挑选参数组合,可以一定程度上避免参数点之间的相互冗余。Random search存在的问题是:如果某两个参数点离得比较近(比如,空间中欧氏距离较小),那么这两个参数点就是互为冗余的,会降低搜索效率,对于高维特征空间(参数较多时),容易陷入某个局部区域。In response to the shortage of Grid search, some scholars have studied Random search. Unlike Grid search, which exhausts parameter combinations at fixed intervals, Random search randomly selects parameter combinations. The research of Bergstra et al. shows that under normal circumstances Random search will not be worse than Grid search. Random search Random selection of parameter combinations can avoid mutual redundancy between parameter points to a certain extent. The problem with Random search is that if certain two parameter points are relatively close (for example, the Euclidean distance in space is small), then these two parameter points are redundant with each other, which will reduce the search efficiency. For high-dimensional features Space (when there are many parameters), it is easy to fall into a local area.
上述方法都是暴力的进行参数空间搜索,搜索效率较低,在大数据环境,不再适用。贝叶斯优化是一种序列化的基于模型的优化算法(sequential model based optimization),贝叶斯优化将已经训练好的模型信息作为先验知识,指导产生下一个参数点,可以更快得获得最佳模型效果,相比Grid search和Random search可以大大加速整个调参过程,是目前几乎最好的机器学习模型超参优化方法。The above methods are violently performing parameter space search, and the search efficiency is low, and it is no longer applicable in a big data environment. Bayesian optimization is a serialized model-based optimization algorithm. Bayesian optimization uses the trained model information as a priori knowledge to guide the generation of the next parameter point, which can be obtained faster. The best model effect, compared to Grid search and Random search, can greatly speed up the entire parameter adjustment process, and is currently the best method for super-parameter optimization of machine learning models.
经典贝叶斯优化存在的不足是:优化过程是一个串行的过程,无法有效利用多机并行计算能力,大数据环境下,依然存在效率较低的问题,使得大数据机器学习难以进行自动化调参,无法有效应对大数据环境。如何将经典贝叶斯优化进行并行化,有效应对大数据环境,从而在生产实践中更好的使用大数据机器学习技术,对人们更好的进行社会生产实践具有重要意义。The shortcomings of classic Bayesian optimization are: the optimization process is a serial process, which cannot effectively use the parallel computing capabilities of multiple machines. In the big data environment, there are still problems of low efficiency, which makes it difficult to automate the adjustment of big data machine learning. Participants cannot effectively cope with the big data environment. How to parallelize the classic Bayesian optimization and effectively respond to the big data environment, so as to better use big data machine learning technology in production practice, is of great significance to people's better social production practice.
目前,对于分布式环境下的贝叶斯优化,大量的研究工作主要基于同步Batch进行研究。通常,同步执行模式(Bulk Synchronous Parallel,BSP模式)各个任务之间往往存在互相等待的情况;异步执行模式(Asynchronous Synchronous Parallel,SSP模式)各个任务之间不需要互相等待,因此,异步执行模式(SSP)相比同步执行模式(BSP)效率更高。如图3所示,图3中有三个计算节点,可以发现左侧同步模式中任务4、5、6的执行需要等待任务1、2、3的结束;右侧异步模式中任务4、5、6的执行不需要等待任务1、2、3的结束。通常,相同的任务数,异步模式相比同步模式能够更快的执行结束。At present, for Bayesian optimization in a distributed environment, a lot of research work is mainly based on synchronous Batch. Generally, there is a situation where each task in the synchronous execution mode (Bulk Synchronous Parallel, BSP mode) waits for each other; the asynchronous execution mode (Asynchronous Synchronous Parallel, SSP mode) does not need to wait for each other between tasks, therefore, the asynchronous execution mode ( SSP) is more efficient than synchronous execution mode (BSP). As shown in Figure 3, there are three computing nodes in Figure 3, you can find that the execution of tasks 4, 5, 6 in the left synchronous mode needs to wait for the end of tasks 1, 2, 3; the tasks 4, 5, The execution of 6 does not need to wait for the end of tasks 1, 2, and 3. Generally, for the same number of tasks, asynchronous mode can finish execution faster than synchronous mode.
Kandasamy K等人提出的异步贝叶斯优化是对经典贝叶斯优化的一种并行方式,但是论文中的方法使用每个计算节点负责一个模型的评估,这使得单个模型无法有效使用大数据进行训练,同时也无法应对多个模型同时收敛的场景。The asynchronous Bayesian optimization proposed by Kandasamy K and others is a parallel method for classical Bayesian optimization, but the method in the paper uses each computing node to be responsible for the evaluation of a model, which makes it impossible for a single model to effectively use big data. At the same time, it is unable to cope with the scenario where multiple models converge at the same time.
上述大数据境下的贝叶斯优化存在效率较低等问题,使得大数据环境下机器学习自动化调参技术的可用性较低。The Bayesian optimization in the above big data environment has problems such as low efficiency, which makes the availability of machine learning automation parameter adjustment technology in the big data environment low.
发明内容Summary of the invention
本发明技术解决问题:针对大数据环境下难以开展机器学习自动化调参这一问题, 克服现有技术的不足,提供一种基于异步贝叶斯优化的超参优化系统及方法,对大数据环境下的机器学习高效的进行自动化调参,有效利用多机并行计算能力,高效的进行大数据机器学习自动化调参,从而使得人们在生产实践中可以更好地使用大数据机器学习。The technology of the present invention solves the problem: Aiming at the problem that it is difficult to carry out machine learning automatic parameter adjustment in the big data environment, to overcome the shortcomings of the existing technology, a super parameter optimization system and method based on asynchronous Bayesian optimization is provided, The machine learning under the high-efficiency automatic parameter adjustment, effectively use the parallel computing capabilities of multiple machines, and the high-efficiency automatic parameter adjustment of big data machine learning, so that people can better use big data machine learning in production practice.
本发明技术解决方案:一种基于异步贝叶斯优化的机器学习超参优化系统,包括:贝叶斯优化模块、模型参数池模型、Kmeans聚类模块、任务调度模块、自适应确定模型并行度模块;Technical solution of the present invention: A machine learning hyperparameter optimization system based on asynchronous Bayesian optimization, including: Bayesian optimization module, model parameter pool model, Kmeans clustering module, task scheduling module, adaptive determination of model parallelism Module
贝叶斯优化模块,实现贝叶斯优化算法,产生候选参数点;提供get接口供模型(机器学习模型)参数(模型超参数,比如学习率、正则化系数等参数)池模块直接调用;针对多模型同时收敛的场景,提供GetBatch接口供Kmeans聚类模块调用;GetBatch接口实现了如下算法:随机的产生L(L>10000)个参数点,计算L个参数点对应的EI(收益函数值,贝叶斯优化产生候选参数点的标准)值,找到l(l>200and l<1000)个EI值最大的参数点,从每个参数点开始执行梯度下降算法,找到局部最优点;Bayesian optimization module, implement Bayesian optimization algorithm, generate candidate parameter points; provide get interface for model (machine learning model) parameters (model hyperparameters, such as learning rate, regularization coefficient and other parameters) pool module directly call; In the scenario where multiple models converge at the same time, the GetBatch interface is provided for the Kmeans clustering module to call; the GetBatch interface implements the following algorithm: randomly generates L (L>10000) parameter points, and calculates the EI (revenue function value corresponding to the L parameter points). Bayesian optimization produces standard values of candidate parameter points, finds l (l>200 and l<1000) parameter points with the largest EI value, and executes the gradient descent algorithm from each parameter point to find the local best;
模型参数池模块,负责模型参数点的管理工作,具体的包括:获取模型超参数点、模型参数池中参数点替换、将模型参数池中的参数点提供给计算集群使用等功能。模型参数池主要通过一个数组实现,将模型参数点抽象成参数点对象,提供Push和Pull接口供计算集群和模型参数池交互。模型参数池通过Get接口从贝叶斯优化模块获取模型参数点,通过GetBatch接口从Kmeans聚类模块获取多组互异参数点。模型参数池中的参数点可以被计算集群(Spark集群)Pull,可以接收计算集群Push的模型评价指标。The model parameter pool module is responsible for the management of model parameter points, including: acquiring model hyper-parameter points, replacing parameter points in the model parameter pool, and providing parameter points in the model parameter pool to the computing cluster for use. The model parameter pool is mainly realized by an array, which abstracts the model parameter points into parameter point objects, and provides Push and Pull interfaces for interaction between the computing cluster and the model parameter pool. The model parameter pool obtains model parameter points from the Bayesian optimization module through the Get interface, and obtains multiple sets of mutually different parameter points from the Kmeans clustering module through the GetBatch interface. The parameter points in the model parameter pool can be pulled by the computing cluster (Spark cluster) and can receive the model evaluation index of the computing cluster Push.
Kmeans聚类模块,通过Kmeans聚类产生多个互异的参数点;被模型参数池模块调用,接收产生k个互异参数点的信号;调用贝叶斯优化模块产生K(通常大于k)个原始候选参数点;通过Kmeans聚类将候选参数点聚成k类,然后选择k类中收益函数值最大的参数点,从而产生k个互异的参数点,将结果返回给模型参数池模块。Kmeans clustering module, which generates multiple different parameter points through Kmeans clustering; it is called by the model parameter pool module and receives signals that generate k different parameter points; calling the Bayesian optimization module to generate K (usually greater than k) Original candidate parameter points; the candidate parameter points are clustered into k categories through Kmeans clustering, and then the parameter points with the largest revenue function value in the k categories are selected, thereby generating k mutually different parameter points, and returning the results to the model parameter pool module.
任务调度模块,判断模型参数池模块中的模型是否应该停止训练。具体的,包括:模型收敛性和Early Stopping算法。模型收敛性,计算模型精度是否达到提前设定的阈值,如果达到阈值,则收敛,否则,未收敛;Early Stopping首先计算历史已经训练模型对应当前迭代轮数的模型评价指标P的均值E(P),如果当前模型评价指标p<E(P)*0.9,则停止训练,否则,继续训练。该模块主要和模型参数池进行交互,判断模型参数池中参数对应模型的状态,发送信号给模型参数池。The task scheduling module determines whether the model in the model parameter pool module should stop training. Specifically, it includes: model convergence and Early Stopping algorithm. Model convergence, calculate whether the accuracy of the model reaches the threshold set in advance, if it reaches the threshold, it will converge, otherwise, it does not converge; Early Stopping first calculates the average E(P ), if the current model evaluation index p <E (P) * 0.9, then stop training, otherwise, continue training. This module mainly interacts with the model parameter pool, judges the state of the model corresponding to the parameters in the model parameter pool, and sends a signal to the model parameter pool.
自适应的确定模型并行度模块,自适应的确定计算集群中模型的并行度。本模块主要通过实验评估不同模型参数池大小对应的模型参数池的计算效率,获得最佳计算性能 对应的模型参数池大小。具体的,计算不同模型参数池大小执行一轮模型迭代的耗时,然后做时间归一化,比较耗时长短,为了避免随机因素的影响,重复实验多次,从而得到最佳模型执行性能对应的模型参数池大小。本模块主要被模型参数池调用,用来初始化模型参数池大小。Adaptively determine the model parallelism module, adaptively determine the parallelism of the models in the computing cluster. This module mainly evaluates the calculation efficiency of model parameter pools corresponding to different model parameter pool sizes through experiments to obtain the model parameter pool size corresponding to the best calculation performance. Specifically, it calculates the time-consuming time to perform a round of model iteration for different model parameter pool sizes, and then normalizes the time to compare the length of time. In order to avoid the influence of random factors, the experiment is repeated multiple times to obtain the best model execution performance correspondence Model parameter pool size. This module is mainly used by the model parameter pool to initialize the model parameter pool size.
本发明的一种基于异步贝叶斯优化的机器学习超参优化方法,包括以下步骤:The machine learning hyperparameter optimization method based on asynchronous Bayesian optimization of the present invention includes the following steps:
(1)执行自适应确定模型并行度模块,从而确定计算集群最佳模型并行度,将结果传递给模型参数池模块;(1) Execute the adaptive model parallelism module to determine the optimal model parallelism of the computing cluster and pass the results to the model parameter pool module;
(2)模型参数池模块做一些初始化工作,比如模型参数池大小(假设为n);(2) The model parameter pool module does some initialization work, such as the size of the model parameter pool (assuming n);
(3)贝叶斯优化模块做一些初始化工作,比如初始化贝叶斯优化超参数空间配置,贝叶斯优化迭代轮数等;(3) The Bayesian optimization module does some initialization work, such as initializing the Bayesian optimization hyperparameter space configuration, Bayesian optimization iteration round number, etc.;
(4)模型参数池模块调用Kmeans聚类模块,从而产生初始的n个参数点,填充到模型参数池;(4) The model parameter pool module calls the Kmeans clustering module to generate the initial n parameter points and fill it into the model parameter pool;
(5)计算集群对模型参数池模块中的参数对应模型进行一轮模型迭代,将模型评价指标发送到模型参数池模块;(5) The computing cluster performs a round of model iteration on the model corresponding to the parameters in the model parameter pool module, and sends the model evaluation index to the model parameter pool module;
(6)调度模块根据模型参数池模块中参数点及模型评价指标判断参数对应模型是否应该停止训练,如果应该停止,向模型参数池模块发送停止训练信号;(6) The scheduling module determines whether the corresponding model of the parameter should stop training according to the parameter points and model evaluation indicators in the model parameter pool module, and if it should stop, sends a training stop signal to the model parameter pool module;
(7)模型参数池模块如果接收到调度模块的模型停止信号,则请求新的模型参数点(如果一个模型停止,则请求贝叶斯优化模块产生一个参数点;如果多个模型停止,则请求Kmeans聚类模块产生多个参数点),模型参数池模块中参数点被计算集群使用,开始一轮模型训练,重复上述过程直至达到贝叶斯优化停止阈值(贝叶斯优化迭代轮数)。(7) The model parameter pool module requests a new model parameter point if it receives a model stop signal from the scheduling module (if one model stops, it requests the Bayesian optimization module to generate a parameter point; if multiple models stop, it requests Kmeans clustering module generates multiple parameter points), the parameter points in the model parameter pool module are used by the computing cluster, start a round of model training, repeat the above process until reaching the Bayesian optimization stop threshold (Bayesian optimization iteration number of rounds).
本发明与现有技术相比的优点在于:现在主流的超参优化方法Grid Search、Random Search存在效率较低,需要大量的计算资源;启发式的贝叶斯优化方法存在只能串行执行,无法有效利用分布式环境下多机并行计算能力等不足。这导致了大数据环境下,很难开展机器学习超参优化,本发明提出的异步贝叶斯优化方法,通过模型参数池的异步模型训练可以很好的实现异步并行的超参优化,同时又可以保留贝叶斯优化本身高效的超参优化效率。本发明可以有效利用分布式环境下多机计算能力,让大数据环境下机器学习自动化调参成为可能,从而有利于人们在社会生产实践中更好地通过大数据机器学习进行数据分析,挖掘数据价值。Compared with the prior art, the advantages of the present invention are: the current mainstream superparameter optimization methods Grid Search and Random Search have low efficiency and require a large amount of computing resources; the heuristic Bayesian optimization method can only be executed serially, It is unable to effectively utilize the parallel computing capabilities of multiple machines in a distributed environment. This makes it difficult to carry out machine learning hyperparameter optimization in a big data environment. The asynchronous Bayesian optimization method proposed by the present invention can achieve asynchronous parallel hyperparameter optimization through asynchronous model training of the model parameter pool. The efficiency of superparametric optimization of Bayesian optimization itself can be retained. The invention can effectively utilize the multi-computer computing power in a distributed environment, making it possible to automatically adjust parameters of machine learning in a big data environment, thereby facilitating people to better perform data analysis and data mining through big data machine learning in social production practice value.
附图说明BRIEF DESCRIPTION
图1为模型选择和调参过程示意图;Figure 1 is a schematic diagram of the model selection and parameter adjustment process;
图2为Grid searchda(左图)和Random search(右图)示意图;Figure 2 is a schematic diagram of Grid searchda (left picture) and Random search (right picture);
图3同步执行(左图)和异步执行图(右图);Figure 3 synchronous execution (left picture) and asynchronous execution diagram (right picture);
图4为本发明系统的总体框架图;4 is a general framework diagram of the system of the present invention;
图5为本发明中模型参数池模块示意图;5 is a schematic diagram of a model parameter pool module in the present invention;
图6为本发明中Kmeans聚类模块示意图;6 is a schematic diagram of the Kmeans clustering module in the present invention;
图7为本发明中的任务调度模块实现流程图;7 is a flowchart of implementation of the task scheduling module in the present invention;
图8为本发明中的自适应的确定模型并行度模块实现流程图。FIG. 8 is a flowchart of the implementation of the adaptive determination model parallelism module in the present invention.
具体实施方式detailed description
本发明的技术方案可以表示为图4,主要包括:贝叶斯优化模块、模型参数池模型、Kmeans聚类模块、任务调度模块、自适应确定模型并行度模块。通过上述几个模块的协同工作,可以实现本发明提出的基于贝叶斯优化的机器学习超参优化方法。The technical solution of the present invention can be represented as FIG. 4 and mainly includes: a Bayesian optimization module, a model parameter pool model, a Kmeans clustering module, a task scheduling module, and an adaptive determination model parallelism module. Through the cooperative work of the above-mentioned several modules, the machine learning hyperparameter optimization method based on Bayesian optimization proposed by the present invention can be realized.
上述模块中,贝叶斯优化模块:Among the above modules, the Bayesian optimization module:
贝叶斯优化模块是本发明的基础技术,本模块主要实现了贝叶斯优方法,贝叶斯优化将模型评价指标和参数点之间的关系建模,可以产生更有意义的参数点。本发明中,贝叶斯优化负责产生候选参数点,从模型参数池接收反馈信息(参数点以及对应的模型评价指标)。The Bayesian optimization module is the basic technology of the present invention. This module mainly implements the Bayesian optimization method. Bayesian optimization models the relationship between the model evaluation index and the parameter points, which can generate more meaningful parameter points. In the present invention, Bayesian optimization is responsible for generating candidate parameter points and receiving feedback information (parameter points and corresponding model evaluation indicators) from the model parameter pool.
上述模块中,模型参数池模块:Among the above modules, the model parameter pool module:
模型参数池是本发明的关键技术之一,模块技术图如图5所示。模型参数池负责模型参数点的管理,接收从贝叶斯优化模块产生的参数点,对于单个模型参数点直接接收贝叶斯优化模块产生的参数点,对于多个模型参数点首先由贝叶斯优化产生多组候选参数点,然后经过Kmeans聚类模块产生多组互异的参数点。The model parameter pool is one of the key technologies of the present invention, and the module technical diagram is shown in FIG. 5. The model parameter pool is responsible for the management of model parameter points, receiving the parameter points generated from the Bayesian optimization module. For a single model parameter point, it directly receives the parameter points generated by the Bayesian optimization module. The optimization generates multiple sets of candidate parameter points, and then generates multiple sets of mutually different parameter points through the Kmeans clustering module.
模型参数池模块提供Push和Pull接口,计算集群(Spark集群)可以Pull模型参数点,然后进行训练;模型收敛之后,计算集群Push模型评价指标到模型参数池。由于模型参数不同,机器学习模型本身的随机性等原因,模型通常会有不同的训练时间,基于此,就可以实现高效的异步并行调参。The model parameter pool module provides Push and Pull interfaces. The calculation cluster (Spark cluster) can pull the model parameter points and then train; after the model converges, calculate the cluster Push model evaluation index to the model parameter pool. Due to different model parameters and the randomness of the machine learning model itself, the model usually has different training time. Based on this, efficient asynchronous parallel parameter tuning can be achieved.
上述模块中,Kmeans聚类模块:Among the above modules, the Kmeans clustering module:
Kmeans聚类模块是本发明的关键之一。机器学习模型,尤其是使用梯度下降进行求解的模型,比如逻辑回归和支持向量机等模型,通常,这些模型经过数十轮迭代就可以收敛,模型训练时,计算节点会对模型参数池中参数对应模型进行一轮迭代,然后判断收敛性,这就存在多个模型同时收敛的情况,这时使用贝叶斯优化产生多组候选参数点将会导致候选参数点相互冗余,从而导致整个机器学习自动化调参效率较低。Kmeans clustering module is one of the keys of the present invention. Machine learning models, especially models that use gradient descent to solve, such as logistic regression and support vector machines. Usually, these models can converge after dozens of iterations. During model training, the calculation node will evaluate the parameters in the model parameter pool. Perform a round of iterations on the corresponding model, and then judge the convergence. This means that multiple models converge at the same time. At this time, using Bayesian optimization to generate multiple sets of candidate parameter points will cause the candidate parameter points to be redundant, resulting in the entire machine. The efficiency of learning automation is low.
本模块主要实现了Kmeans聚类算法,接收贝叶斯优化产生的多组原始候候选参数点,然后进行Kmeans聚类处理,产生k个互异且使得收益函数较大(比如EI函数)的参数点,将上述参数点填充到模型参数池模块中。This module mainly implements the Kmeans clustering algorithm, receives multiple sets of original candidate parameter points generated by Bayesian optimization, and then performs Kmeans clustering processing to generate k mutually different parameters that make the revenue function larger (such as the EI function) Point, fill the above parameter points into the model parameter pool module.
Kmeans聚类产生多组候选参数点示意图如图6所示(图中显示A、B、C、D四个参数点,实际参数点较多),横轴代表参数点,纵轴代表收益函数值,假设三个模型同时收敛,直接使用贝叶斯优化产生三组候选参数点将可能产生参数点A、B、C(对应的收益值较大),但是参数点A和B存在相互冗余,参数点距离较小(比如欧式距离),将会降低贝叶斯优化的调参效率。通过使用Kmeans聚类,对候选参数点A、B、C和D进行聚类,理论上,A和B将会聚到同一类中,返回其中一个收益值较大的参数点A,这样将会产生互异的参数点A、C和D,从而提高贝叶斯优化的效率。The schematic diagram of generating multiple sets of candidate parameter points by Kmeans clustering is shown in Figure 6 (the figure shows four parameter points A, B, C, and D, and there are many actual parameter points). The horizontal axis represents the parameter points, and the vertical axis represents the value of the income function Assuming that the three models converge at the same time, directly using Bayesian optimization to generate three sets of candidate parameter points will likely produce parameter points A, B, and C (corresponding to a larger return value), but parameter points A and B are mutually redundant. If the distance between parameter points is small (such as Euclidean distance), it will reduce the efficiency of Bayesian optimization. By using Kmeans clustering, the candidate parameter points A, B, C, and D are clustered. In theory, A and B will be clustered into the same class, and one of the parameter points A with a larger return value will be returned, which will produce Different parameter points A, C and D, thereby improving the efficiency of Bayesian optimization.
上述模块中,任务调度模块:Among the above modules, the task scheduling module:
任务调度模块是本发明的重要模块之一。本模块主要对模型参数池中的模型进行收敛性判断,任务调度模块包括模型收敛性和Early Stopping技术两部分。The task scheduling module is one of the important modules of the present invention. This module mainly judges the convergence of the model in the model parameter pool. The task scheduling module includes two parts: model convergence and Early Stopping technology.
模型收敛性主要是通过计算模型精度是否达到提前设定的阈值进行判断,如果达到阈值则收敛,否则,未收敛。The model convergence is mainly judged by calculating whether the accuracy of the model reaches the threshold set in advance. If it reaches the threshold, it will converge; otherwise, it will not converge.
一些机器学习应用中,在机器学习模型训练过程中,一些性能相关的信息就是可用的。尤其是当模型训练是迭代的,性能曲线(Performance curve)就是可用的,比如对于使用梯度下降求解的模型,随着训练的进行,往往模型准确率(Accuracy)越来越高,并且每一轮迭代(Epoch)结束的时候,可以得到模型准确率。使用准确率和训练步数之间的曲线,就可以判断当前训练的模型是否有可能比已知最佳模型效果更好,对于不可能获得比已知最佳模型效果更好的模型,可以及时终止模型的训练,释放相应的计算资源,从而更多的去评估有希望的模型。基于上述思想的算法,称为Early Stopping算法。通过使用Early Stopping技术可以有效加速整个机器学习自动化调参过程。In some machine learning applications, some performance-related information is available during the machine learning model training process. Especially when the model training is iterative, the performance curve (Performance curve) is available. For example, for the model using gradient descent solution, as the training progresses, the model accuracy rate (Accuracy) is often higher and higher, and each round At the end of the iteration (Epoch), you can get the model accuracy. Using the curve between the accuracy rate and the number of training steps, you can determine whether the currently trained model is likely to be better than the known best model. For the model that is impossible to obtain a better effect than the known best model, you can timely Terminate the model training and release the corresponding computing resources, so as to evaluate more promising models. The algorithm based on the above idea is called Early Stopping algorithm. Through the use of Early Stopping technology, you can effectively accelerate the entire machine learning automated tuning process.
上述模块中,自适应的确定模型并行度模块:Among the above modules, the module for adaptively determining the parallelism of the model:
自适应的确定模型并行度模块是本发明的重要模块之一。模型并行度指的是计算集群中同时执行的模型个数,模型并行度直接影响着整个集群的计算性能,模型并行度设置过大、过小都会影响整个计算集群的计算性能,本模块可以自适应的确定模型并行度。The adaptive determination model parallelism module is one of the important modules of the present invention. Model parallelism refers to the number of models that are executed simultaneously in the computing cluster. The model parallelism directly affects the computing performance of the entire cluster. If the model parallelism is set too large or too small, it will affect the computing performance of the entire computing cluster. Adaptation determines the parallelism of the model.
针对Spark平台机器学习模型,自适应的确定模型并行度模块的工作原理是:通过实验评估不同模型参数池大小对应的模型参数池的计算效率,从而获得最佳计算性能对应的模型参数池大小,具体步骤是:计算不同模型参数池大小执行一轮模型迭代的耗时,然后做时间归一化,比较耗时长短,为了避免随机因素的影响,重复试验多次(比如3 次),找到最佳模型执行性能对应的模型参数池大小。相比于整个调参过程的耗时,上述过程耗时是微不足道的。For the Spark platform machine learning model, the working principle of adaptively determining the model parallelism module is to evaluate the calculation efficiency of the model parameter pool corresponding to different model parameter pool sizes through experiments, so as to obtain the model parameter pool size corresponding to the best computing performance, The specific steps are: calculate the time-consuming time to perform a round of model iteration for different model parameter pool sizes, and then do the time normalization, which is more time-consuming. To avoid the influence of random factors, repeat the test multiple times (such as 3 times) to find the best The size of the model parameter pool corresponding to the best model execution performance. Compared to the time-consuming process of the entire parameter adjustment process, the time-consuming process described above is negligible.
以下结合具体实施例和附图对本发明进行详细说明。The present invention will be described in detail below with reference to specific embodiments and drawings.
本发明实例以Python语言作为编程语言,其中使用了大数据处理平台Spark以及基于Spark的MLlib分布式机器学习库,针对大数据环境下的机器学习超参优化问题。下面以大数据环境下经常使用的经典的机器学习模型逻辑回归进行具体阐述。The example of the present invention uses the Python language as the programming language, in which the big data processing platform Spark and the Spark-based MLlib distributed machine learning library are used to solve the machine learning hyperparameter optimization problem in the big data environment. The following is a detailed description of the classic machine learning model logistic regression that is often used in the big data environment.
如图4所示,本发明具体实现如下:As shown in FIG. 4, the specific implementation of the present invention is as follows:
1.贝叶斯优化模块1. Bayesian optimization module
本模块需要实现贝叶斯优化算法,贝叶斯优化需要一个初始的参数范围,同时提供参数空间配置接口如下:This module needs to implement the Bayesian optimization algorithm. Bayesian optimization requires an initial parameter range, and provides the parameter space configuration interface as follows:
Figure PCTCN2019091485-appb-000001
Figure PCTCN2019091485-appb-000001
针对逻辑回归模型,主要的超参数有:maxIter:模型迭代轮数、regParam:正则化系数、tol:模型收敛精度等。需要初始配置一个参数空间范围,配置如下:For logistic regression models, the main hyperparameters are: maxIter: number of iterations of the model, regParam: regularization coefficient, tol: accuracy of model convergence, etc. An initial parameter space range needs to be configured, as follows:
Figure PCTCN2019091485-appb-000002
Figure PCTCN2019091485-appb-000002
当模型参数池需要1组参数的时候,贝叶斯优化模块直接产生一组参数反馈给模型参数池(退化成经典的贝叶斯优化算法);当模型参数池需要多组参数的时候(假设为K),贝叶斯优化产生多组原始的候选参数点,经过Kmeans聚类模块产生K组互异的参数点。When the model parameter pool requires 1 set of parameters, the Bayesian optimization module directly generates a set of parameters to feed back to the model parameter pool (reduced to the classic Bayesian optimization algorithm); when the model parameter pool requires multiple sets of parameters (assuming For K), Bayesian optimization generates multiple sets of original candidate parameter points, and generates K sets of different parameter points through the Kmeans clustering module.
2.模型参数池模块2. Model parameter pool module
如图5所示,本模块主要实现了模型参数池,负责模型参数的管理工作。整个机器学习自动化调参过程本模块主要存在三个阶段:初始化阶段、第一阶段和第二阶段。As shown in Figure 5, this module mainly implements the model parameter pool and is responsible for the management of model parameters. The entire machine learning automation parameter adjustment process This module mainly has three stages: initialization stage, first stage and second stage.
初始化阶段包括:执行自适应确定模型参数池大小模块算法,确定模型参数池大小(假设为k);The initialization phase includes: executing an adaptive algorithm for determining the size of the model parameter pool module to determine the size of the model parameter pool (assuming k);
第一阶段包括:调用贝叶斯优化模块,随机的产生k组参数点,填充到模型参数池;计算节点训练模型参数池中参数对应的模型;将参数点和对应的模型评价反馈给贝叶斯优化,初始化贝叶斯优化的高斯过程;The first stage includes: calling the Bayesian optimization module, randomly generating k sets of parameter points, and filling them into the model parameter pool; computing nodes to train the model corresponding to the parameters in the model parameter pool; feeding back the parameter points and the corresponding model evaluation to Bayesian Optimization, initializing the Gaussian process of Bayesian optimization;
第二阶段包括:执行任务调度模块,判断模型参数池中参数对应的模型是否收敛,统计收敛的模型个数m,在模型参数池中做相应标记;执行任务调度模块,根据Early Stopping算法判断模型参数池中的参数对应的模型是否应该停止训练,统计应该停止训练的模型个数n,在模型参数池中做相应标记;计算节点计算上述m+n个模型在测试数据集上的模型评价指标;记录最佳模型,以及对应的最佳模型评价指标;将上述训练完成的m+n个模型反馈给贝叶斯优化,更新高斯过程;结合贝叶斯优化,使用Kmeans聚类产生m+n组候选参数点,填充模型参数池;计算节点对模型参数池中的参数对应的模型进行一轮模型训练更新;循环执行上述过程,直到达到指定的优化轮数,返回最佳模型、最佳模型评价指标。The second stage includes: executing the task scheduling module, judging whether the model corresponding to the parameters in the model parameter pool has converged, counting the number of models that have converged, and marking the model parameter pool accordingly; executing the task scheduling module, judging the model according to the Early Stopping algorithm Whether the model corresponding to the parameter in the parameter pool should stop training, count the number of models that should stop training n, and mark it in the model parameter pool; the calculation node calculates the model evaluation index of the above m+n models on the test data set ; Record the best model and the corresponding best model evaluation index; feedback the m+n models completed by the above training to Bayesian optimization and update the Gaussian process; combined with Bayesian optimization, use Kmeans clustering to generate m+n Set candidate parameter points to fill the model parameter pool; the computing node performs a round of model training update on the model corresponding to the parameters in the model parameter pool; loop through the above process until the specified number of optimization rounds is reached, returning the best model and the best model Evaluation indicators.
3.Kmeans聚类模块3. Kmeans clustering module
如图6所示,Kmeans聚类模块实现了一种Kmeans算法,主要对贝叶斯优化模块中产生的多组原始参数点进行聚类操作,从而产生K组互异的参数点。As shown in Fig. 6, the Kmeans clustering module implements a Kmeans algorithm, which mainly performs clustering operations on multiple sets of original parameter points generated in the Bayesian optimization module, thereby generating K sets of mutually different parameter points.
一种基于Kmeans的产生多个候选参数点的算法主要包括如下过程:随机的产生L(L>10000)个参数点;计算L个参数点对应的EI(收益函数值,贝叶斯优化产生候选参数点的标准)值,找到l(l>200and l<1000)个EI值最大的参数点;从每个参数点开始执行梯度下降算法,找到局部最优点;对上述l个局部最优点进行Kmeans聚类算法,并且返回EI值最大的参数点。A Kmeans-based algorithm for generating multiple candidate parameter points mainly includes the following processes: randomly generating L (L>10000) parameter points; calculating the EI (return function value corresponding to L parameter points, Bayesian optimization generates candidates Standard value of parameter points, find l (l>200 and l<1000) parameter points with the largest EI value; start the gradient descent algorithm from each parameter point to find the local best; perform Kmeans on the above l local best Clustering algorithm, and return the parameter point with the largest EI value.
聚类目标:如果候选参数点距离较近(比如欧氏距离较小),则会形成冗余,降低调参效率,希望得到多个互异的并且使得EI值较大的参数点。Clustering target: If the candidate parameter points are close to each other (for example, the Euclidean distance is small), redundancy will be formed and the efficiency of parameter adjustment will be reduced, hoping to obtain multiple different parameter points that make the EI value larger.
聚类数据:将参数点参数值和收益函数值作为特征,进行归一化处理(避免特征之间尺度不一致导致的聚类失效问题),将样本点聚成k(待产生的参数点个数)类。Clustering data: take the parameter value of the parameter points and the value of the revenue function as features, perform normalization processing (to avoid clustering failure caused by inconsistent scales between features), and cluster the sample points into k (the number of parameter points to be generated) )class.
聚类结果:从聚类结果的每一类中,选择一个使得收益函数值最大的参数点返回。Clustering result: from each clustering result, choose a parameter point that maximizes the value of the return function and return.
以图6作为例子,原始的候选参数点包括L个参数点:A、B、C、D,经过Kmeans聚类产生了A、B,C,D三类,找到每一类中使得收益函数值最大的参数点,A、C、D, 那么A、C、D互异且使得收益函数值较大。Taking Fig. 6 as an example, the original candidate parameter points include L parameter points: A, B, C, and D. After Kmeans clustering, three types of A, B, C, and D are generated. The largest parameter point, A, C, and D, then A, C, and D are different from each other and make the value of the revenue function larger.
4.任务调度模块4. Task scheduling module
任务调度模块实现Early Stopping算法。本模块对模型参数池中的参数点进行收敛性判断。The task scheduling module implements the Early Stopping algorithm. This module judges the convergence of the parameter points in the model parameter pool.
模型参数池中的模型每进行一轮迭代,任务调度模块就会进行收敛性判断,任务调度模块主要根据收敛精度判断模型参数池中的模型是否应该停止训练,为了有效加速整个调参过程,通过Early Stoping技术中的性能曲线,可以预先判断模型是否有可能取得最佳模型效果,可以及时终止不可能取得最佳模型效果的模型的训练,从而开始下一组参数点的训练。Each iteration of the model in the model parameter pool, the task scheduling module will make a convergence judgment. The task scheduling module mainly judges whether the model in the model parameter pool should stop training based on the convergence accuracy. In order to effectively speed up the entire parameter adjustment process, pass The performance curve in Early Stoping technology can pre-judge whether the model is likely to achieve the best model effect, and can terminate the training of the model that cannot achieve the best model effect in time, thus starting the training of the next set of parameter points.
如图7所示,任务调度模块获取模型参数池中的数据:模型参数点、模型评价指标等,首先根据模型收敛性判断模型是否应该停止训练,计算模型精度w,判断是否达到模型设定的阈值W,如果达到阈值,则收敛;否则,未收敛。如果上述判断结果是未收敛,则继续使用Early Stopping算法判断,计算已经训练过的模型在当前迭代轮数的模型评价指标P的均值E(P),如果当前模型评价指标p<E(P)*0.9,则认为应该终止该模型的训练。As shown in Figure 7, the task scheduling module obtains the data in the model parameter pool: model parameter points, model evaluation indicators, etc. First of all, according to the model convergence, it is judged whether the model should stop training, calculate the model accuracy w, and judge whether it reaches the model setting. Threshold W, if it reaches the threshold, it will converge; otherwise, it will not converge. If the above judgment result is not converged, continue to use the Early Stopping algorithm to calculate the average E(P) of the model evaluation index P of the already trained model in the current iteration round, if the current model evaluation index p<E(P) *0.9, the training of the model should be terminated.
5.自适应的确定模型并行度模块5. Adaptively determine model parallelism module
如图8所示,本模块主要实现了一种自适应的确定模型并行度的算法。负责确定模型参数池的合理大小。As shown in Figure 8, this module mainly implements an adaptive algorithm to determine the parallelism of the model. Responsible for determining the reasonable size of the model parameter pool.
针对逻辑回归模型,具体步骤是:计算不同模型参数池大小执行一轮模型迭代的耗时,然后做时间归一化,比较耗时长短,为了避免随机因素的影响,重复试验多次(比如3次),找到最佳模型执行性能对应的模型参数池大小。相比于整个调参过程的耗时,上述过程耗时是微不足道的。For the logistic regression model, the specific steps are: calculate the time taken to perform a round of model iteration for different model parameter pool sizes, and then do the time normalization to compare the length of time. To avoid the influence of random factors, repeat the test multiple times (such as 3 Times), find the model parameter pool size corresponding to the best model execution performance. Compared to the time-consuming process of the entire parameter adjustment process, the time-consuming process described above is negligible.
比如针对逻辑回归模型,具体步骤是:确定一个初始的模型参数池大小范围,比如i=1~e(e代表最大模型参数池大小配置,e>1),将模型参数池大小设置为i,重复执行一轮模型迭代三次(为了避免随机因素,重复执行3次)。循环执行上述过程,根据耗时长短排序,返回耗时最短的i作为模型参数池大小。For the logistic regression model, the specific steps are: determine an initial model parameter pool size range, such as i = 1 ~ e (e represents the maximum model parameter pool size configuration, e>1), set the model parameter pool size to i, Repeat one round of model iteration three times (to avoid random factors, repeat three times). The above process is executed in a loop, sorted according to the length of time, and the shortest time i is returned as the size of the model parameter pool.
提供以上实施例仅仅是为了描述本发明的目的,而并非要限制本发明的范围。本发明的范围由所附权利要求限定。不脱离本发明的精神和原理而做出的各种等同替换和修改,均应涵盖在本发明的范围之内。The above embodiments are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent replacements and modifications made without departing from the spirit and principle of the present invention should be covered within the scope of the present invention.

Claims (9)

  1. 一种基于异步贝叶斯优化的机器学习超参优化系统,其特征在于包括:A machine learning hyperparameter optimization system based on asynchronous Bayesian optimization, characterized by including:
    贝叶斯优化模块,根据贝叶斯优化算法产生候选参数点;提供Get接口供模型参数池模块直接调用;针对多模型同时收敛的场景,提供GetBatch接口供Kmeans聚类模块调用;The Bayesian optimization module generates candidate parameter points according to the Bayesian optimization algorithm; provides the Get interface for the model parameter pool module to directly call; for the scenario where multiple models converge simultaneously, the GetBatch interface is provided for the Kmeans clustering module to call;
    模型参数池模块,用以获取模型超参数点、模型参数池中参数点替换、将模型参数池中的参数点提供给计算集群使用;The model parameter pool module is used to obtain model hyperparameter points, replace parameter points in the model parameter pool, and provide parameter points in the model parameter pool to the computing cluster for use;
    Kmeans聚类模块,通过Kmeans聚类产生多个互异的参数点;Kmeans clustering module, generating multiple different parameter points through Kmeans clustering;
    任务调度模块,判断模型参数池模块中的模型是否应该停止训练;Task scheduling module to determine whether the model in the model parameter pool module should stop training;
    自适应的确定模型并行度模块,自适应的确定计算集群中模型的并行度。Adaptively determine the model parallelism module, adaptively determine the parallelism of the models in the computing cluster.
  2. 如权利要求1所述的系统,其特征在于,所述贝叶斯优化模块的GetBatch接口随机的产生L个参数点,计算L个参数点对应的EI值,从中找到l个EI值最大的参数点,从每个参数点开始执行梯度下降算法,找到局部最优点。The system according to claim 1, wherein the GetBatch interface of the Bayesian optimization module randomly generates L parameter points, calculates the EI value corresponding to the L parameter points, and finds the l parameter with the largest EI value Point, start the gradient descent algorithm from each parameter point to find the local best.
  3. 如权利要求1所述的系统,其特征在于,所述贝叶斯优化模块Get接口包括Hp.choice(label,options)、Hp.randint(label,upper)、Hp.uniform(label,low,high)、Hp.logunifom(label,low,high)。The system according to claim 1, wherein the Bayesian optimization module Get interface includes Hp.choice(label, options), Hp.randint(label, upper), Hp.uniform(label, low, high ), Hp.logunifom (label, low, high).
  4. 如权利要求1所述的系统,其特征在于,所述模型参数池模块的模型参数池通过一个数组实现,将模型参数点抽象成参数点对象,提供Push和Pull接口供计算集群和模型参数池交互;模型参数池模块通过Get接口从贝叶斯优化模块获取模型参数点,通过GetBatch接口从Kmeans聚类模块获取多组互异参数点;模型参数池模块中的参数点被计算集群Pull访问,接收计算集群Push的模型评价指标。The system according to claim 1, wherein the model parameter pool of the model parameter pool module is implemented by an array, abstracting model parameter points into parameter point objects, and providing Push and Pull interfaces for computing clusters and model parameter pools Interaction; the model parameter pool module obtains model parameter points from the Bayesian optimization module through the Get interface, and obtains multiple sets of mutually different parameter points from the Kmeans clustering module through the GetBatch interface; the parameter points in the model parameter pool module are accessed by the computing cluster Pull, Receive model evaluation indicators for computing cluster Push.
  5. 如权利要求1所述的系统,其特征在于,所述Kmeans聚类模块被模型参数池模块调用,接收产生k个互异参数点的信号;调用贝叶斯优化模块产生K个候选参数点;通过Kmeans聚类将候选参数点聚成k类,然后每类选择一个收益函数值最大的参数点,从而产生k个互异的参数点,将结果返回给模型参数池模块;其中K大于k。The system according to claim 1, wherein the Kmeans clustering module is called by the model parameter pool module and receives signals generating k mutually different parameter points; calling the Bayesian optimization module to generate K candidate parameter points; Kmeans clustering is used to cluster candidate parameter points into k categories, and then select a parameter point with the largest value of the revenue function for each category, thereby generating k different parameter points and returning the results to the model parameter pool module; where K is greater than k.
  6. 如权利要求1所述的系统,其特征在于,所述任务调度模块和模型参数池模块进行交互,根据Early Stopping算法或模型收敛精度判断模型参数池模块中参数对应模型的状态,发送是否停止训练的信号给模型参数池模块。The system according to claim 1, wherein the task scheduling module and the model parameter pool module interact to determine the state of the model corresponding to the parameters in the model parameter pool module according to the Early Stopping algorithm or the model convergence accuracy, and whether to send the training to stop Signal to the model parameter pool module.
  7. 如权利要求1所述的系统,其特征在于,所述自适应的确定模型并行度模块通过实验评估不同模型参数池大小对应的模型参数池的计算效率,获得最佳计算性能对应的模型参数池大小;该模块被模型参数池模块调用,用来初始化模型参数池大小。The system according to claim 1, wherein the adaptive determining model parallelism module evaluates the calculation efficiency of the model parameter pool corresponding to different model parameter pool sizes through experiments to obtain the model parameter pool corresponding to the best calculation performance Size; this module is called by the model parameter pool module to initialize the model parameter pool size.
  8. 一种基于异步贝叶斯优化的机器学习超参优化方法,其特征在于包括以下步骤:A machine learning hyperparameter optimization method based on asynchronous Bayesian optimization is characterized by the following steps:
    (1)执行自适应确定模型并行度模块,从而确定计算集群最佳模型并行度,将结果传递给模型参数池模块;(1) Execute the adaptive model parallelism module to determine the optimal model parallelism of the computing cluster, and pass the results to the model parameter pool module;
    (2)模型参数池模块进行初始化工作,包括模型参数池大小;(2) The model parameter pool module performs initialization work, including the model parameter pool size;
    (3)贝叶斯优化模块进行初始化工作,包括初始化贝叶斯优化参数空间配置,贝叶斯优化迭代轮数;(3) The Bayesian optimization module performs initialization work, including initializing Bayesian optimization parameter space configuration and Bayesian optimization iteration number;
    (4)模型参数池模块调用Kmeans聚类模块,从而产生初始的k个参数点,填充到模型参数池模块中;(4) The model parameter pool module calls the Kmeans clustering module to generate the initial k parameter points and fill it into the model parameter pool module;
    (5)计算集群对模型参数池模块中的参数对应模型进行一轮模型迭代,将模型评价指标发送到模型参数池模块,然后模型参数池模块和任务调度模块交互;(5) The computing cluster performs a round of model iteration on the model corresponding to the parameters in the model parameter pool module, sends the model evaluation indicators to the model parameter pool module, and then the model parameter pool module interacts with the task scheduling module;
    (6)任务调度模块根据模型参数池模块中参数点及模型评价指标判断参数对应模型是否应该停止并统计个数,并将上述信息发送到模型参数池模块;(6) The task scheduling module determines whether the corresponding model of the parameter should be stopped and counts the number according to the parameter points and model evaluation indicators in the model parameter pool module, and sends the above information to the model parameter pool module;
    (7)模型参数池模块如果接收到任务调度模块的模型停止信号,则请求新的模型参数点,如果一个模型停止,则请求贝叶斯优化模块产生一个参数点;如果多个模型停止,则请求Kmeans聚类模块产生多个参数点;模型参数池模块中参数点被计算集群使用,开始一轮模型训练,重复上述过程直至达到贝叶斯优化停止阈值,即贝叶斯优化迭代轮数,完成机器学习超参优化。(7) The model parameter pool module requests a new model parameter point if it receives a model stop signal from the task scheduling module, if a model stops, it requests the Bayesian optimization module to generate a parameter point; if multiple models stop, then Request the Kmeans clustering module to generate multiple parameter points; the parameter points in the model parameter pool module are used by the computing cluster, start a round of model training, and repeat the above process until the Bayesian optimization stop threshold is reached, that is, the number of Bayesian optimization iterations, Complete machine learning hyperparameter optimization.
  9. 如权利要求8所述的方法,其特征在于,步骤(7)所述Kmeans聚类模块产生的参数点个数,是任务调度模块统计的应该停止训练的模型个数。The method of claim 8, wherein the number of parameter points generated by the Kmeans clustering module in step (7) is the number of models that should be stopped by the task scheduling module.
PCT/CN2019/091485 2018-12-25 2019-06-17 Asynchronous bayesian optimization-based machine learning super-parameter optimization system and method WO2020133952A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811588608.5 2018-12-25
CN201811588608.5A CN109376869A (en) 2018-12-25 2018-12-25 A kind of super ginseng optimization system of machine learning based on asynchronous Bayes optimization and method

Publications (1)

Publication Number Publication Date
WO2020133952A1 true WO2020133952A1 (en) 2020-07-02

Family

ID=65371987

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/091485 WO2020133952A1 (en) 2018-12-25 2019-06-17 Asynchronous bayesian optimization-based machine learning super-parameter optimization system and method

Country Status (2)

Country Link
CN (1) CN109376869A (en)
WO (1) WO2020133952A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI733270B (en) * 2019-12-11 2021-07-11 中華電信股份有限公司 Training device and training method for optimized hyperparameter configuration of machine learning model

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376869A (en) * 2018-12-25 2019-02-22 中国科学院软件研究所 A kind of super ginseng optimization system of machine learning based on asynchronous Bayes optimization and method
JP7124768B2 (en) * 2019-03-05 2022-08-24 日本電信電話株式会社 Parameter estimation device, method and program
CN110334732A (en) * 2019-05-20 2019-10-15 北京思路创新科技有限公司 A kind of Urban Air Pollution Methods and device based on machine learning
CN110619423B (en) * 2019-08-06 2023-04-07 平安科技(深圳)有限公司 Multitask prediction method and device, electronic equipment and storage medium
CN110659741A (en) * 2019-09-03 2020-01-07 浩鲸云计算科技股份有限公司 AI model training system and method based on piece-splitting automatic learning
CN111027709B (en) * 2019-11-29 2021-02-12 腾讯科技(深圳)有限公司 Information recommendation method and device, server and storage medium
CN111797833A (en) * 2020-05-21 2020-10-20 中国科学院软件研究所 Automatic machine learning method and system oriented to remote sensing semantic segmentation
CN113742991A (en) * 2020-05-30 2021-12-03 华为技术有限公司 Model and data joint optimization method and related device
CN112261721B (en) * 2020-10-19 2023-03-31 南京爱而赢科技有限公司 Combined beam distribution method based on Bayes parameter-adjusting support vector machine
CN113305853B (en) * 2021-07-28 2021-10-12 季华实验室 Optimized welding parameter obtaining method and device, electronic equipment and storage medium
CN115470910A (en) * 2022-10-20 2022-12-13 晞德软件(北京)有限公司 Automatic parameter adjusting method based on Bayesian optimization and K-center sampling

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989374A (en) * 2015-03-03 2016-10-05 阿里巴巴集团控股有限公司 Online model training method and equipment
CN108062587A (en) * 2017-12-15 2018-05-22 清华大学 The hyper parameter automatic optimization method and system of a kind of unsupervised machine learning
CN108446302A (en) * 2018-01-29 2018-08-24 东华大学 A kind of personalized recommendation system of combination TensorFlow and Spark
CN109062782A (en) * 2018-06-27 2018-12-21 阿里巴巴集团控股有限公司 A kind of selection method of regression test case, device and equipment
CN109376869A (en) * 2018-12-25 2019-02-22 中国科学院软件研究所 A kind of super ginseng optimization system of machine learning based on asynchronous Bayes optimization and method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104200087B (en) * 2014-06-05 2018-10-02 清华大学 For the parameter optimization of machine learning and the method and system of feature tuning
CN108470210A (en) * 2018-04-02 2018-08-31 中科弘云科技(北京)有限公司 A kind of optimum option method of hyper parameter in deep learning
CN108573281A (en) * 2018-04-11 2018-09-25 中科弘云科技(北京)有限公司 A kind of tuning improved method of the deep learning hyper parameter based on Bayes's optimization

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989374A (en) * 2015-03-03 2016-10-05 阿里巴巴集团控股有限公司 Online model training method and equipment
CN108062587A (en) * 2017-12-15 2018-05-22 清华大学 The hyper parameter automatic optimization method and system of a kind of unsupervised machine learning
CN108446302A (en) * 2018-01-29 2018-08-24 东华大学 A kind of personalized recommendation system of combination TensorFlow and Spark
CN109062782A (en) * 2018-06-27 2018-12-21 阿里巴巴集团控股有限公司 A kind of selection method of regression test case, device and equipment
CN109376869A (en) * 2018-12-25 2019-02-22 中国科学院软件研究所 A kind of super ginseng optimization system of machine learning based on asynchronous Bayes optimization and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ERIC P XING ET AL: "Strategies and Principles of Distributed Machine Learning on Big Data", ENGINEERING, vol. 2, no. 2, 30 June 2016 (2016-06-30), pages 179 - 195, XP055405966, ISSN: 2095-8099, DOI: 10.1016/J.ENG.2016.02.008 *
KANG, LIANGYI ET AL: "Survey on Parallel and Distributed Optimization Algorithms for Scalable Machine Learning", JOURNAL OF SOFTWARE, vol. 29, no. 1, 31 January 2018 (2018-01-31), pages 109 - 130, XP009521771, ISSN: 1000-9825, DOI: 10.13328/j.cnki.jos.005376 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI733270B (en) * 2019-12-11 2021-07-11 中華電信股份有限公司 Training device and training method for optimized hyperparameter configuration of machine learning model

Also Published As

Publication number Publication date
CN109376869A (en) 2019-02-22

Similar Documents

Publication Publication Date Title
WO2020133952A1 (en) Asynchronous bayesian optimization-based machine learning super-parameter optimization system and method
US20220027746A1 (en) Gradient-based auto-tuning for machine learning and deep learning models
Guo et al. Cloud resource scheduling with deep reinforcement learning and imitation learning
CN111026549B (en) Automatic test resource scheduling method for power information communication equipment
WO2023240845A1 (en) Distributed computation method, system and device, and storage medium
CN111274036B (en) Scheduling method of deep learning task based on speed prediction
CN109891438B (en) Numerical quantum experiment method and system
CN113627871B (en) Workflow scheduling method, system and storage medium based on multi-target particle swarm algorithm
CN110727506B (en) SPARK parameter automatic tuning method based on cost model
WO2015066979A1 (en) Machine learning method for mapreduce task resource configuration parameters
CN112651483B (en) Cloud manufacturing service combination optimization method for large-scale multi-batch task collaboration
Liu et al. Predicting of job failure in compute cloud based on online extreme learning machine: a comparative study
CN110825522A (en) Spark parameter self-adaptive optimization method and system
CN112052081A (en) Task scheduling method and device and electronic equipment
CN106408031A (en) Super parameter optimization method of least squares support vector machine
Dorronsoro et al. Combining machine learning and genetic algorithms to solve the independent tasks scheduling problem
Banjongkan et al. A Study of Job Failure Prediction at Job Submit-State and Job Start-State in High-Performance Computing System: Using Decision Tree Algorithms [J]
Fan et al. An evaluation model and benchmark for parallel computing frameworks
Shang et al. Research on the application of artificial intelligence and distributed parallel computing in archives classification
EP4107631A1 (en) System and method for machine learning for system deployments without performance regressions
Yu et al. Accelerating distributed training in heterogeneous clusters via a straggler-aware parameter server
CN113900942B (en) Method for generating simplified test case set of flight control machine-mounted model
CN111652269A (en) Active machine learning method and device based on crowd-sourcing interaction
CN113991752B (en) Quasi-real-time intelligent control method and system for power grid
CN110048886A (en) A kind of efficient cloud configuration selection algorithm of big data analysis task

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19904044

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 061021)

122 Ep: pct application non-entry in european phase

Ref document number: 19904044

Country of ref document: EP

Kind code of ref document: A1