WO2021135025A1 - 超参数的优化装置和方法 - Google Patents
超参数的优化装置和方法 Download PDFInfo
- Publication number
- WO2021135025A1 WO2021135025A1 PCT/CN2020/089575 CN2020089575W WO2021135025A1 WO 2021135025 A1 WO2021135025 A1 WO 2021135025A1 CN 2020089575 W CN2020089575 W CN 2020089575W WO 2021135025 A1 WO2021135025 A1 WO 2021135025A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- hyperparameter
- xgbest
- xpbest
- hyperparameters
- vector
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/086—Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Definitions
- the present invention relates to artificial intelligence (AI), and particularly relates to a hyperparameter optimization device.
- AI artificial intelligence
- the invention also relates to an optimization method of hyperparameters.
- Model parameters and model hyperparameters in machine learning are different in terms of function and source. Simply put, model parameters are configuration variables inside the model, and their values can be estimated with data. Specifically, model parameters have the following characteristics: model parameters are required for model prediction; model parameter values can define model functions; model parameters are obtained by data estimation or data learning; model parameters are generally not manually set by practitioners; model parameters are usually used as learning Part of the model is saved; usually optimization algorithms are used to estimate model parameters, which are an effective search for possible values of parameters.
- Some examples of model parameters include: weights in artificial neural networks, support vectors in support vector machines, coefficients in linear regression or logistic regression.
- Model hyperparameters are configurations external to the model, and their values cannot be estimated from data, and parameter values must be manually set.
- model hyperparameters are often used in the process of estimating model parameters; model hyperparameters are usually directly specified by practitioners; model hyperparameters can usually be set using heuristic methods; model hyperparameters are usually given according to Adjusted for predictive modeling issues. How to get the optimal value of model hyperparameters: For a given problem, we cannot know the optimal value of model hyperparameters. But we can use the rule of thumb to find the optimal value, or copy the value used in other problems, or through trial and error.
- model hyperparameters include: the learning rate of the training neural network, the C and sigma hyperparameters of the support vector machine, and the k in the k neighborhood.
- the artificial intelligence algorithm model also includes hyperparameters.
- Hyperparameters are usually used to define the structure of the model itself.
- the model includes a multi-layer network.
- the nodes of each layer network correspond to a function.
- the function forms an output signal output by processing multiple input signals.
- the weight of the input signal belongs to the training parameter, and the training parameter needs to be obtained by training with samples.
- the number of layers of the network in the model needs to be set before training, so it is a hyperparameter; similar functions such as the degree of polynomials also need to be set before training, so they are also hyperparameters.
- the hyperparameter settings are also different. When the task changes, the hyperparameter values often need to be changed.
- the learning rate is probably the most important hyperparameter.
- Hyperparameter optimization or model selection is a problem when selecting a set of optimal hyperparameters for a learning algorithm.
- the general purpose is to optimize the performance measurement of the algorithm on an independent data set. Cross-validation is usually used to estimate this generalization performance.
- Hyperparameter optimization is in contrast with actual learning problems. These problems are usually transformed into optimization problems, but the loss function on the training set is optimized. In fact, the learning algorithm learning can model/reconstruct the input parameters very well, and the hyperparameter optimization is to ensure that the model does not filter its data through adjustments like regularization.
- the current hyperparameter optimization methods include: grid search, Bayesian optimization, random search, gradient-based optimization, and so on.
- the traditional method of performing hyperparameter optimization is grid search or parameter sweep, which is simply an exhaustive search through a manually specified subset of the hyperparameter space of the learning algorithm.
- Grid search algorithms must be guided by certain performance metrics, usually measured by cross-validation on the training set or evaluation of the retained validation set. Since the parameter space of the machine learner may include the real value or unbounded value space of some parameters, it may be necessary to manually set the boundary and discretization before applying the grid search.
- Bayesian optimization includes a statistical model of functions ranging from hyperparameter values to targets evaluated on the validation set. Intuitively, this method assumes some smooth but noisy functions as a mapping from hyperparameters to targets.
- Bayesian optimization one purpose is to collect observations in order to display as few machine learning models as possible, while displaying as much information about the function as possible, especially the best position.
- Bayesian optimization relies on assuming a very general prior function, which when combined with the observed hyperparameter values and corresponding output, produces a function distribution. This method selects hyperparameters iteratively to observe (experimental run), in a way of selling (hyperparameters with the most uncertain results) and using (hyperparameters expected to have good results).
- Bayesian optimization has been proven because it is possible to perform fewer experiments on grid search and random search to obtain better results before the quality of the experiment is run. Since grid search is an exhaustive and potentially expensive method, several alternatives have been proposed.
- Chinese invention patent application CN110110862A discloses a hyperparameter optimization method based on an adaptive model.
- the method is based on an adaptive model and can adapt to the search space and data set size of the model to be optimized. This method has poor parallelism and requires a large amount of Data has certain limitations.
- the technical problem to be solved by the present invention is to provide a hyper-parameter optimization device, which is suitable for image recognition technology and can automatically optimize the hyper-parameters of the image recognition algorithm, so that while reducing manpower input, the algorithm model can obtain better results after training. Good model.
- the present invention also discloses a hyperparameter optimization method, which can be applied to image recognition technology.
- the method has fast process, high efficiency, good parallelism, does not require a large amount of data, and can be applied to medium-sized data and computing resources. In smaller cases, the scope of application has been expanded.
- the present invention adopts the following technical solutions:
- the method for optimizing hyperparameters includes the steps:
- Step 1 Extract all the hyperparameters included in the algorithm model and vectorize all the hyperparameters to form a hyperparameter vector.
- Step 2 Assign a value to the hyperparameter vector and change the value of the hyperparameter vector.
- Step 3 Evaluate the performance of the algorithm model corresponding to the hyperparameter vectors of various values and form corresponding evaluation values, and select the value of the hyperparameter vector with the best evaluation value as the hyperparameter vector The final optimized value.
- a further improvement is that the algorithm model is an algorithm model corresponding to the task.
- a further improvement is that when the task changes, the hyperparameters of the algorithm model need to be optimized.
- hyperparameter optimization method is suitable for image recognition methods
- algorithm model is an image recognition algorithm model
- hyperparameters include categorical numerical parameters and option parameters.
- a further improvement is that, in the hyperparameter vector, the numerical parameter is directly expressed in the form of a floating-point number, and the option parameter is converted into a one-hot parameter.
- steps two and three are implemented by particle swarm optimization, including:
- Vi’ Vi*w+ra*ca*(Xpbest-Xi)+rb*cb*(Xgbest-Xi);
- w, ca, cb are preset parameters
- ra, rb are random numbers from 0 to 1
- Xpbest is the best historical result
- Xgbest is the best overall historical result
- Vi' is the iterated Vi
- a further improvement is that the particle swarm algorithm also includes:
- a further improvement is that the steps to update Xpbest and Xgbest according to Pi’ include:
- a further improvement is that the particle swarm algorithm also includes the realization of:
- Pi' is not improved relative to Pi, it also includes using the corresponding probability to randomly generate the coordinates of Xi for the next iteration.
- a further improvement is that if Xgbest is not updated after iteration 1-5, then the iteration ends;
- hyper-parameters are parameters whose values are set before starting the learning process, rather than parameter data obtained through training. Under normal circumstances, it is necessary to optimize the hyperparameters and select a set of optimal hyperparameters for the learning machine to improve the performance and effect of learning.
- Hyperparameters define higher-level concepts about the model, such as complexity or learning ability. You cannot learn directly from the data in the standard model training process, and need to be pre-defined. It can be decided by setting different values, training different types and choosing better test values.
- Some examples of hyperparameters number of trees or depth of trees, number of potential factors in matrix decomposition, learning rate (multiple modes), number of hidden layers in deep neural networks, number of clusters in k-means clustering.
- the hyperparameter optimization device includes:
- the hyperparameter extraction unit is used to extract all the hyperparameters included in the algorithm model and vectorize all the hyperparameters to form a hyperparameter vector.
- the hyperparameter vector assignment unit is used to assign a value to the hyperparameter vector and change the value of the hyperparameter vector.
- the hyperparameter vector evaluation unit is used to evaluate the performance of the algorithm model corresponding to the hyperparameter vector of various values and form the corresponding evaluation value, and select the value of the hyperparameter vector with the best evaluation value as the value of the hyperparameter vector The final optimized value of the hyperparameter vector.
- a further improvement is that the algorithm model is an algorithm model corresponding to the task.
- a further improvement is that when the task changes, the hyperparameters of the algorithm model need to be optimized.
- hyperparameter optimization device is suitable for image recognition devices;
- algorithm model is an image recognition algorithm model.
- hyperparameters include categorical numerical parameters and option parameters.
- a further improvement is that, in the hyperparameter vector, the numerical parameter is directly expressed in the form of floating-point numbers, and the option parameter is converted into a one-hot parameter.
- hyperparameter vector assignment unit and the hyperparameter vector evaluation unit form a particle swarm algorithm module for implementing:
- a number of the hyperparameter vectors are initialized, the hyperparameter vectors obtained are set to Xi, the evaluation value corresponding to Xi is obtained, and the evaluation value corresponding to Xi is set to Pi.
- Vi’ Vi*w+ra*ca*(Xpbest-Xi)+rb*cb*(Xgbest-Xi);
- w, ca, cb are preset parameters
- ra, rb are random numbers from 0 to 1
- Xpbest is the best historical result
- Xgbest is the best overall historical result
- Vi' is the iterated Vi
- a further improvement is that the particle swarm algorithm module also includes the realization of:
- a further improvement is that the steps to update Xpbest and Xgbest according to Pi’ include:
- a further improvement is that the particle swarm algorithm module also includes the realization of:
- Pi' is not improved relative to Pi, it also includes using the corresponding probability to randomly generate the coordinates of Xi for the next iteration.
- a further improvement is that if Xgbest is not updated after iteration 1-5, then the iteration ends;
- an embodiment of the present invention also provides a hyperparameter optimization device, including: at least one processor; a memory coupled with the at least one processor, the memory storing executable instructions, wherein the When the execution instruction is executed by the at least one processor, the method as described in any one of the above second aspect is realized.
- an embodiment of the present invention also provides a chip for executing the method in the above-mentioned first aspect.
- the chip includes a processor, which is used to call and run a computer program from the memory, so that the device installed with the chip is used to execute the method in the second aspect described above.
- an embodiment of the present invention also provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above-mentioned second aspect is implemented. The method described.
- an embodiment of the present invention also provides a computer program product, including computer program instructions, which cause a computer to execute the method in the above second aspect.
- the present invention automatically extracts the hyperparameters included in the algorithm model and vectorizes the hyperparameters to form a hyperparameter vector.
- a hyperparameter vector includes all the hyperparameters of the algorithm model; the vectorized hyperparameter vector is convenient for assignment. After the hyperparameter vector is assigned, the value of the hyperparameter of the algorithm model can be changed. By calculating the performance of the algorithm model corresponding to the various assigned hyperparameter vectors, the evaluation value of the corresponding assigned hyperparameter vector can be obtained. The evaluation value can be compared.
- the present invention can automatically optimize the algorithm hyperparameters, for example, through The particle swarm algorithm finds the final optimized value of the hyperparameter vector, which can reduce manpower input and improve the optimization efficiency of hyperparameters.
- the evaluation value is the performance obtained by training the algorithm model and testing on the test set, so the best hyperparameter is selected At the same time, the obtained training model will be the best, and the performance on the designated test set will be the best. Therefore, the present invention can also enable the algorithm model to obtain a better model after training at the same time.
- the present invention does not require the user to have the knowledge and experience of relevant algorithm model optimization after the hyperparameter optimization is performed automatically, so the user scope of the present invention is expanded.
- the method of the present invention has a fast process, high efficiency, good parallelism, does not require a large amount of data, can be applied to situations with medium data volume and small computing resources, and expands the scope of application .
- Figure 1 is a structural diagram of a hyperparameter optimization device according to an embodiment of the present invention.
- Fig. 2 is a flowchart of a hyperparameter optimization method according to an embodiment of the present invention.
- the inventor of the solution found that, in the prior art, the existing hyperparameter training tools generally only support training purely through a preset algorithm model after data input. Facing a new task often does not work well. However, if you need to optimize the algorithm model, you need to have the knowledge of related algorithm model optimization, manual design and programming to achieve optimization, and hyperparameters usually need to be empirically adjusted slowly. The user range is narrow.
- Other automatic algorithm model optimization algorithms in the world generally use cyclic neural network (RNN) and other methods to automatically design network algorithm models. This method has a slow process, poor parallelism, and a large amount of data. For the medium amount of data (for example, millions of data), the situation with small computing resources is not applicable.
- RNN cyclic neural network
- FIG. 1 it is a structural diagram of a hyperparameter optimization device of an embodiment of the present invention.
- the hyperparameter optimization device of the embodiment of the present invention can be applied to an image recognition device, including:
- the hyperparameter extraction unit 1 is used for automatically extracting all hyperparameters included in the algorithm model and vectorizing all the hyperparameters to form a hyperparameter vector.
- the algorithm model is an algorithm model corresponding to the task. When the task changes, the hyperparameters of the algorithm model need to be optimized.
- the algorithm model is an image recognition algorithm model.
- the hyperparameters include categorical numerical parameters and option parameters.
- the numeric parameter is directly expressed in the form of a floating point number, and the option parameter is converted into a onehot parameter.
- the hyperparameter vector assignment unit 2 is used to automatically assign a value to the hyperparameter vector and automatically change the value of the hyperparameter vector.
- the hyperparameter vector evaluation unit 3 is used to evaluate the performance of the algorithm model corresponding to the hyperparameter vector of various values and form the corresponding evaluation value, and select the value of the hyperparameter vector with the best evaluation value As the final optimized value of the hyperparameter vector.
- the hyperparameter vector assignment unit 2 and the hyperparameter vector evaluation unit 3 form a particle swarm algorithm module, which is used to realize:
- a number of the hyperparameter vectors are initialized, the hyperparameter vectors obtained are set to Xi, the evaluation value corresponding to Xi is obtained, and the evaluation value corresponding to Xi is set to Pi.
- Vi’ Vi*w+ra*ca*(Xpbest-Xi)+rb*cb*(Xgbest-Xi);
- w, ca, cb are preset parameters
- ra, rb are random numbers from 0 to 1
- Xpbest is the best historical result
- Xgbest is the best overall historical result
- Vi' is the iterated Vi
- a further improvement is that the particle swarm algorithm module also includes the realization of:
- Xpbest and Xgbest are updated according to Pi’, and the updated Xpbest and Xgbest are Xpbest’ and Xgbest’.
- the steps to update Xpbest and Xgbest according to Pi’ include:
- Pi' is better than Pi, then take Xi' as the next iteration of Xi, Vi' as the next round of Vi, Xpbest' as the next round of Xpbest, and Xgbest' as the next round of Xgbest to re-iterate, iterate After multiple rounds, the finally obtained Xgbest is used as the final optimized value of the hyperparameter vector. Or, if Pi' is not improved relative to Pi, it also includes using the corresponding probability to re-generate the coordinates of Xi for the next iteration randomly.
- the conditions for ending the iteration include: if Xgbest is not updated after iteration 1-5, then ending the iteration; or ending the iteration by setting a time, for example, setting the iteration to one night or other system setting time.
- the embodiment of the present invention automatically extracts the hyperparameters included in the algorithm model and vectorizes the hyperparameters to form a hyperparameter vector.
- a hyperparameter vector includes all the hyperparameters of the algorithm model; the vectorized hyperparameter vector is convenient to perform Assignment, after assigning the hyperparameter vector, the value of the hyperparameter of the algorithm model can be changed.
- the evaluation of the corresponding assigned hyperparameter vector can be obtained. Value, and the evaluation value can be compared.
- the embodiment of the present invention can automatically optimize the algorithm hyperparameter
- the particle swarm algorithm can be used to find the final optimized value of the hyperparameter vector, thereby reducing manpower input and improving the optimization efficiency of hyperparameters.
- the evaluation value is the performance obtained by training the algorithm model and testing on the test set, so choose the best In hyperparameters, the obtained training model will be the best, and the performance on the specified test set will be the best. Therefore, the embodiment of the present invention can also enable the algorithm model to obtain a better model after training at the same time.
- the embodiment of the present invention automatically optimizes the hyperparameters, the user does not need to have knowledge and experience of relevant algorithm model optimization, so the user scope of the embodiment of the present invention is expanded.
- FIG. 2 it is a flowchart of a method for optimizing hyperparameters in an embodiment of the present invention.
- the method for optimizing hyperparameters in an embodiment of the present invention is applicable to the image recognition method and includes the following steps:
- Step 1 Automatically extract all the hyperparameters included in the algorithm model and vectorize all the hyperparameters to form a hyperparameter vector.
- the algorithm model is an algorithm model corresponding to the task.
- the algorithm model is an image recognition algorithm model.
- the hyperparameters include categorical numerical parameters and option parameters.
- the numerical parameter is directly expressed in the form of a floating point number, and the option parameter is converted into a one-hot parameter.
- Step 2 Automatically assign a value to the hyperparameter vector and automatically change the value of the hyperparameter vector.
- Step 3 Evaluate the performance of the algorithm model corresponding to the hyperparameter vectors of various values and form corresponding evaluation values, and select the value of the hyperparameter vector with the best evaluation value as the hyperparameter vector The final optimized value.
- Steps 2 and 3 are implemented by particle swarm algorithm, including:
- Vi’ Vi*w+ra*ca*(Xpbest-Xi)+rb*cb*(Xgbest-Xi);
- w, ca, cb are preset parameters
- ra, rb are random numbers from 0 to 1
- Xpbest is the best historical result
- Xgbest is the best overall historical result
- Vi' is the iterated Vi
- the particle swarm algorithm further includes:
- Xpbest and Xgbest are updated according to Pi’, and the updated Xpbest and Xgbest are Xpbest’ and Xgbest’.
- the steps of updating Xpbest and Xgbest according to Pi' include:
- Pi' is better than Pi, then take Xi' as the next iteration of Xi, Vi' as the next round of Vi, Xpbest' as the next round of Xpbest, and Xgbest' as the next round of Xgbest to re-iterate, iterate After multiple rounds, the finally obtained Xgbest is used as the final optimized value of the hyperparameter vector. If Pi' is not improved relative to Pi, it also includes using the corresponding probability to randomly generate the coordinates of Xi for the next iteration.
- the conditions for the end of the iteration include: if Xgbest is not updated after iteration 1-5, the iteration ends;
- end the iteration by setting a time such as setting the iteration to one night or other system setting time.
- the present invention also provides a hyperparameter optimization device, including:
- At least one processor a memory coupled with the at least one processor, and the memory stores executable instructions, where the executable instructions, when executed by the at least one processor, enable the method of the second aspect of this embodiment to be implemented.
- This embodiment provides a hyperparameter optimization device, which includes: at least one processor; and a memory coupled with the at least one processor.
- the processor and memory can be set separately or integrated together.
- the memory may include random access memory, flash memory, read-only memory, programmable read-only memory, non-volatile memory, or registers.
- the processor may be a central processing unit (Central Processing Unit, CPU) or the like.
- a graphics processor Graphic Processing Unit, GPU
- the processor can execute executable instructions stored in the memory to implement the various processes described herein.
- the memory in this embodiment may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
- the non-volatile memory can be ROM (Read-Only Memory), PROM (Programmable ROM, Programmable Read-Only Memory), EPROM (ErasablePROM, Erasable Programmable Read-Only Memory), EEPROM (Electrically EPROM, Electrically EPROM). Erasable programmable read-only memory) or flash memory.
- the volatile memory may be RAM (Random Access Memory), which is used as an external cache.
- RAM random access memory
- SRAM StaticRAM, static random access memory
- DRAM DynamicRAM, dynamic random access memory
- SDRAM SynchronousDRAM, synchronous dynamic random access memory
- DDRSDRAM DoubleDataRate SDRAM, double data rate synchronous dynamic random access memory
- ESDRAM Enhanced SDRAM, enhanced synchronous dynamic random access memory
- SLDRAM SynchronousDRAM, synchronous connection dynamic random access memory
- DRRAM DirectRambusRAM, direct RAM bus random access memory.
- the memory 42 described herein is intended to include, but is not limited to, these and any other suitable types of memory.
- the memory stores the following elements, upgrade packages, executable units, or data structures, or a subset of them, or an extended set of them: operating systems and applications.
- the operating system includes various system programs, such as a framework layer, a core library layer, and a driver layer, which are used to implement various basic services and process hardware-based tasks.
- Application programs including various application programs, are used to implement various application services.
- a program that implements the method of the embodiment of the present invention may be included in an application program.
- the processor calls a program or instruction stored in the memory, specifically, a program or instruction stored in an application program, and the processor is used to execute the method steps provided in the second aspect.
- an embodiment of the present invention also provides a chip for executing the method in the above second aspect.
- the chip includes a processor, which is used to call and run a computer program from the memory, so that the device installed with the chip is used to execute the method in the second aspect described above.
- the present invention also provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method of the second aspect of the present invention are implemented.
- the machine-readable storage medium may include, but is not limited to, various known and unknown types of non-volatile memory.
- an embodiment of the present invention also provides a computer program product, including computer program instructions, and the computer program instructions cause a computer to execute the method in the second aspect described above.
- the disclosed system, device, and method may be implemented in other ways.
- the division of units is only a logical function division, and there may be other division methods in actual implementation.
- multiple units or components can be combined or integrated into another system.
- the coupling between the various units may be direct coupling or indirect coupling.
- the functional units in the embodiments of the present application may be integrated into one processing unit, or may be separate physical existences, and so on.
- the size of the sequence number of each process does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not correspond to the embodiments of the present application.
- the implementation process constitutes any limitation.
- the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a machine-readable storage medium. Therefore, the technical solution of the present application may be embodied in the form of a software product.
- the software product may be stored in a machine-readable storage medium, which may include a number of instructions to make an electronic device execute the technical solutions described in the embodiments of the present application. All or part of the process.
- the foregoing storage media may include various media capable of storing program codes, such as ROM, RAM, removable disks, hard disks, magnetic disks, or optical disks.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Medical Informatics (AREA)
- Physiology (AREA)
- Feedback Control In General (AREA)
- Stored Programmes (AREA)
Abstract
Description
Claims (26)
- 一种超参数的优化方法,其特征在于,包括步骤:步骤一、提取算法模型中所包括的所有超参数并将所有所述超参数向量化并形成超参数向量;步骤二、对所述超参数向量进行赋值并改变所述超参数向量的值;步骤三、评价各种取值的所述超参数向量对应的所述算法模型的性能表现并形成对应的评价值,选择评价值最好的所述超参数向量的取值作为所述超参数向量的最终优化值。
- 如权利要求1所述的超参数的优化方法,其特征在于:所述算法模型为任务对应的算法模型。
- 如权利要求2所述的超参数的优化方法,其特征在于:当任务改变时,所述算法模型的所述超参数需要进行优化。
- 如权利要求1-3任一项所述的超参数的优化方法,其特征在于:所述方法适用于图像识别方法;所述算法模型为图像识别算法模型。
- 如权利要求1所述的超参数的优化方法,其特征在于:所述超参数向量中,所述超参数包括分类数值型参数和选项型参数。
- 如权利要求5所述的超参数的优化方法,其特征在于:所述超参数向量中,所述数值型参数直接通过浮点数形式表示,所述选项型参数则转化为独热型参数。
- 如权利要求1所述的超参数的优化方法,其特征在于:步骤二和步骤三通过粒子群算法实现,包括:初始化若干个所述超参数向量,令获得的所述超参数向量为Xi,获得Xi对应的评价值,令Xi对应的评价值为Pi;对各个Xi进行迭代,令Xi的迭代方向为Vi,Vi的迭代方程为:Vi’=Vi*w+ra*ca*(Xpbest-Xi)+rb*cb*(Xgbest-Xi);其中,w,ca,cb为预设参数,ra,rb为0~1的随机数,Xpbest为最好历史最好的结果,Xgbest为总体历史最好的结果,Vi'为迭代后的Vi;计算得Vi'后,让Xi加上Vi’获得Xi’;获得Xi’后,计算Xi’对应的的评价值Pi’。
- 如权利要求7所述的超参数的优化方法,其特征在于:所述粒子群 算法还包括:根据Pi’更新Xpbest和Xgbest,更新后的Xpbest和Xgbest为Xpbest’和Xgbest’;如果Pi’优于Pi,则以Xi’作为下一轮迭代的Xi,Vi’作为下一轮的Vi,Xpbest’作为下一轮的Xpbest,Xgbest’作为下一轮的Xgbest重新进行迭代,迭代多轮后,以最终得到的Xgbest作为所述超参数向量的最终优化值。
- 如权利要求8所述的超参数的优化方法,其特征在于:根据Pi’更新Xpbest和Xgbest的步骤包括:如果Ppbest比Pi'好,Xpbest’取Xpbest,否则Xpbest取Xi',Ppbest表示Xpbest对应的评价值;如Pgbest比Ppbest比好,则Xgbest’取Xgbest,否则Xgbest’取Xpbest’,Pgbest表示Xgbest对应的评价值。
- 如权利要求9所述的超参数的优化方法,其特征在于:所述粒子群算法还包括实现:如果Pi’相对于Pi没有提升,还包括则采用相应的概率重新随机生成下一轮迭代的Xi的坐标。
- 如权利要求10所述的超参数的优化方法,其特征在于:如果迭代1-5轮后Xgbest没有更新,则结束迭代;或者,通过设定时间来结束迭代。
- 一种超参数的优化装置,其特征在于,包括:超参数提取单元,用于提取算法模型中所包括的所有超参数并将所有所述超参数向量化并形成超参数向量;超参数向量赋值单元,用于对所述超参数向量进行赋值并改变所述超参数向量的值;超参数向量评价单元,用于评价各种取值的所述超参数向量对应的所述算法模型的性能表现并形成对应的评价值,选择评价值最好的所述超参数向量的取值作为所述超参数向量的最终优化值。
- 如权利要求12所述的超参数的优化装置,其特征在于:所述算法 模型为任务对应的算法模型。
- 如权利要求13所述的超参数的优化装置,其特征在于:当任务改变时,所述算法模型的所述超参数需要进行优化。
- 如权利要求12-14任一项所述的超参数的优化装置,其特征在于:所述装置适用于图像识别装置;所述算法模型为图像识别算法模型。
- 如权利要求12所述的超参数的优化装置,其特征在于:所述超参数向量中,所述超参数包括分类数值型参数和选项型参数。
- 如权利要求16所述的超参数的优化装置,其特征在于:所述超参数向量中,所述数值型参数直接通过浮点数形式表示,所述选项型参数则转化为独热型参数。
- 如权利要求12所述的超参数的优化装置,其特征在于:所述超参数向量赋值单元和所述超参数向量评价单元组成粒子群算法模块,用于实现:初始化若干个所述超参数向量,令获得的所述超参数向量为Xi,获得Xi对应的评价值,令Xi对应的评价值为Pi;对各个Xi进行迭代,令Xi的迭代方向为Vi,Vi的迭代方程为:Vi’=Vi*w+ra*ca*(Xpbest-Xi)+rb*cb*(Xgbest-Xi);其中,w,ca,cb为预设参数,ra,rb为0~1的随机数,Xpbest为最好历史最好的结果,Xgbest为总体历史最好的结果,Vi'为迭代后的Vi;计算得Vi'后,让Xi加上Vi’获得Xi’;获得Xi’后,计算Xi’对应的的评价值Pi’。
- 如权利要求18所述的超参数的优化装置,其特征在于:所述粒子群算法模块还包括实现:根据Pi’更新Xpbest和Xgbest,更新后的Xpbest和Xgbest为Xpbest’和Xgbest’;如果Pi’优于Pi,则以Xi’作为下一轮迭代的Xi,Vi’作为下一轮的Vi,Xpbest’作为下一轮的Xpbest,Xgbest’作为下一轮的Xgbest重新进行迭代,迭代多轮后,以最终得到的Xgbest作为所述超参数向量的最终优化值。
- 如权利要求19所述的超参数的优化装置,其特征在于:根据Pi’更新Xpbest和Xgbest的步骤包括:如果Ppbest比Pi'好,Xpbest’取Xpbest,否则Xpbest取Xi',Ppbest表示Xpbest对应的评价值;如Pgbest比Ppbest比好,则Xgbest’取Xgbest,否则Xgbest’取Xpbest’,Pgbest表示Xgbest对应的评价值。
- 如权利要求20所述的超参数的优化装置,其特征在于:所述粒子群算法模块还包括实现:如果Pi’相对于Pi没有提升,还包括则采用相应的概率重新随机生成下一轮迭代的Xi的坐标。
- 如权利要求21所述的超参数的优化装置,其特征在于:如果迭代1-5轮后Xgbest没有更新,则结束迭代;或者,通过设定时间来结束迭代。
- 一种超参数的优化装置,其特征在于,包括:至少一个处理器;与所述至少一个处理器耦合的存储器,所述存储器存储有可执行指令,其中,所述可执行指令在被所述至少一个处理器执行时使得实现根据权利要求1至11中任一项所述的方法。
- 一种芯片,其特征在于,包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有所述芯片的设备执行:如权利要求1至11中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至11中任一项所述的方法的步骤。
- 一种计算机程序产品,其特征在于,包括计算机程序指令,该计算机程序指令使得计算机执行如权利要求1至11中任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911389194.8 | 2019-12-30 | ||
CN201911389194.8A CN111160459A (zh) | 2019-12-30 | 2019-12-30 | 超参数的优化装置和方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021135025A1 true WO2021135025A1 (zh) | 2021-07-08 |
Family
ID=70559138
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/089575 WO2021135025A1 (zh) | 2019-12-30 | 2020-05-11 | 超参数的优化装置和方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111160459A (zh) |
WO (1) | WO2021135025A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR3136299A1 (fr) | 2022-01-04 | 2023-12-08 | Alcom Technologies | Procédé d’optimisation des hyperparamètres d’un modèle d’apprentissage automatique |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113053113A (zh) * | 2021-03-11 | 2021-06-29 | 湖南交通职业技术学院 | 一种基于PSO-Welsch-Ridge的异常检测方法及装置 |
CN113780575B (zh) * | 2021-08-30 | 2024-02-20 | 征图智能科技(江苏)有限公司 | 一种基于渐进式的深度学习模型的视觉分类方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106408031A (zh) * | 2016-09-29 | 2017-02-15 | 南京航空航天大学 | 一种最小二乘支持向量机的超参优化方法 |
CN108446741A (zh) * | 2018-03-29 | 2018-08-24 | 中国石油大学(华东) | 机器学习超参数重要性评估方法、系统及存储介质 |
CN110443364A (zh) * | 2019-06-21 | 2019-11-12 | 深圳大学 | 一种深度神经网络多任务超参数优化方法及装置 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW201010407A (en) * | 2008-08-19 | 2010-03-01 | Univ Nat Kaohsiung Applied Sci | Color image noise reduction method using particle swarm optimization and cellular neural network |
CN105281615A (zh) * | 2015-11-12 | 2016-01-27 | 广西师范大学 | 一种基于改进粒子群算法优化无刷直流电机模糊控制器的方法 |
CN110399917B (zh) * | 2019-07-24 | 2023-04-18 | 东北大学 | 一种基于超参数优化cnn的图像分类方法 |
-
2019
- 2019-12-30 CN CN201911389194.8A patent/CN111160459A/zh active Pending
-
2020
- 2020-05-11 WO PCT/CN2020/089575 patent/WO2021135025A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106408031A (zh) * | 2016-09-29 | 2017-02-15 | 南京航空航天大学 | 一种最小二乘支持向量机的超参优化方法 |
CN108446741A (zh) * | 2018-03-29 | 2018-08-24 | 中国石油大学(华东) | 机器学习超参数重要性评估方法、系统及存储介质 |
CN110443364A (zh) * | 2019-06-21 | 2019-11-12 | 深圳大学 | 一种深度神经网络多任务超参数优化方法及装置 |
Non-Patent Citations (1)
Title |
---|
ZHANG XUAN, WANG HONGLI: "LSSVM Based on PSO and Its Applications to Time Series Prediction", CHINA MECHANICAL ENGINEERING, ZHONGGUO JIXIE GONGCHENG ZAZHISHE, WUHAN, CN, vol. 22, no. 21, 1 January 2011 (2011-01-01), CN, pages 2572 - 2576, XP055828244, ISSN: 1004-132X * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR3136299A1 (fr) | 2022-01-04 | 2023-12-08 | Alcom Technologies | Procédé d’optimisation des hyperparamètres d’un modèle d’apprentissage automatique |
Also Published As
Publication number | Publication date |
---|---|
CN111160459A (zh) | 2020-05-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yang et al. | Pointflow: 3d point cloud generation with continuous normalizing flows | |
US12099906B2 (en) | Parallel development and deployment for machine learning models | |
WO2021135025A1 (zh) | 超参数的优化装置和方法 | |
JP6969637B2 (ja) | 因果関係分析方法および電子デバイス | |
JP7478145B2 (ja) | 機械学習モデルの自動生成 | |
US10074054B2 (en) | Systems and methods for Bayesian optimization using non-linear mapping of input | |
CN114207635A (zh) | 使用元建模对机器学习和深度学习模型进行快速准确的超参数优化 | |
US20190005377A1 (en) | Artificial neural network reduction to reduce inference computation time | |
US20200125945A1 (en) | Automated hyper-parameterization for image-based deep model learning | |
JP2005276225A (ja) | テーブルを使用したツリーの学習 | |
US20220036232A1 (en) | Technology for optimizing artificial intelligence pipelines | |
CN105976421B (zh) | 一种渲染程序的在线优化方法 | |
US20240330130A1 (en) | Graph machine learning for case similarity | |
Harde et al. | Design and implementation of ACO feature selection algorithm for data stream mining | |
CN112686299A (zh) | 计算机执行的神经网络模型获取方法及装置 | |
Reese et al. | Predict better with less training data using a QNN | |
Deng et al. | Multi-label image recognition in anime illustration with graph convolutional networks | |
Blagoveshchenskii et al. | Hybrid algorithms for optimization and diagnostics of hydromechanical systems used in food production biotechnology | |
US11928562B2 (en) | Framework for providing improved predictive model | |
US20220405599A1 (en) | Automated design of architectures of artificial neural networks | |
Nguyen et al. | High resolution self-organizing maps | |
US20240256742A1 (en) | MACHINE LEARNING CLASSIFICATION AND REDUCTION OF cad PARTS FOR RAPID DESIGN TO SIMULATION | |
US20240161263A1 (en) | Method for inspecting defects of product by using 2d image information | |
EP4198837A1 (en) | Method and system for global explainability of neural networks | |
US20230195842A1 (en) | Automated feature engineering for predictive modeling using deep reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20910188 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20910188 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25.01.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20910188 Country of ref document: EP Kind code of ref document: A1 |