WO2021135025A1 - 超参数的优化装置和方法 - Google Patents

超参数的优化装置和方法 Download PDF

Info

Publication number
WO2021135025A1
WO2021135025A1 PCT/CN2020/089575 CN2020089575W WO2021135025A1 WO 2021135025 A1 WO2021135025 A1 WO 2021135025A1 CN 2020089575 W CN2020089575 W CN 2020089575W WO 2021135025 A1 WO2021135025 A1 WO 2021135025A1
Authority
WO
WIPO (PCT)
Prior art keywords
hyperparameter
xgbest
xpbest
hyperparameters
vector
Prior art date
Application number
PCT/CN2020/089575
Other languages
English (en)
French (fr)
Inventor
章子誉
王益县
Original Assignee
上海依图网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海依图网络科技有限公司 filed Critical 上海依图网络科技有限公司
Publication of WO2021135025A1 publication Critical patent/WO2021135025A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Definitions

  • the present invention relates to artificial intelligence (AI), and particularly relates to a hyperparameter optimization device.
  • AI artificial intelligence
  • the invention also relates to an optimization method of hyperparameters.
  • Model parameters and model hyperparameters in machine learning are different in terms of function and source. Simply put, model parameters are configuration variables inside the model, and their values can be estimated with data. Specifically, model parameters have the following characteristics: model parameters are required for model prediction; model parameter values can define model functions; model parameters are obtained by data estimation or data learning; model parameters are generally not manually set by practitioners; model parameters are usually used as learning Part of the model is saved; usually optimization algorithms are used to estimate model parameters, which are an effective search for possible values of parameters.
  • Some examples of model parameters include: weights in artificial neural networks, support vectors in support vector machines, coefficients in linear regression or logistic regression.
  • Model hyperparameters are configurations external to the model, and their values cannot be estimated from data, and parameter values must be manually set.
  • model hyperparameters are often used in the process of estimating model parameters; model hyperparameters are usually directly specified by practitioners; model hyperparameters can usually be set using heuristic methods; model hyperparameters are usually given according to Adjusted for predictive modeling issues. How to get the optimal value of model hyperparameters: For a given problem, we cannot know the optimal value of model hyperparameters. But we can use the rule of thumb to find the optimal value, or copy the value used in other problems, or through trial and error.
  • model hyperparameters include: the learning rate of the training neural network, the C and sigma hyperparameters of the support vector machine, and the k in the k neighborhood.
  • the artificial intelligence algorithm model also includes hyperparameters.
  • Hyperparameters are usually used to define the structure of the model itself.
  • the model includes a multi-layer network.
  • the nodes of each layer network correspond to a function.
  • the function forms an output signal output by processing multiple input signals.
  • the weight of the input signal belongs to the training parameter, and the training parameter needs to be obtained by training with samples.
  • the number of layers of the network in the model needs to be set before training, so it is a hyperparameter; similar functions such as the degree of polynomials also need to be set before training, so they are also hyperparameters.
  • the hyperparameter settings are also different. When the task changes, the hyperparameter values often need to be changed.
  • the learning rate is probably the most important hyperparameter.
  • Hyperparameter optimization or model selection is a problem when selecting a set of optimal hyperparameters for a learning algorithm.
  • the general purpose is to optimize the performance measurement of the algorithm on an independent data set. Cross-validation is usually used to estimate this generalization performance.
  • Hyperparameter optimization is in contrast with actual learning problems. These problems are usually transformed into optimization problems, but the loss function on the training set is optimized. In fact, the learning algorithm learning can model/reconstruct the input parameters very well, and the hyperparameter optimization is to ensure that the model does not filter its data through adjustments like regularization.
  • the current hyperparameter optimization methods include: grid search, Bayesian optimization, random search, gradient-based optimization, and so on.
  • the traditional method of performing hyperparameter optimization is grid search or parameter sweep, which is simply an exhaustive search through a manually specified subset of the hyperparameter space of the learning algorithm.
  • Grid search algorithms must be guided by certain performance metrics, usually measured by cross-validation on the training set or evaluation of the retained validation set. Since the parameter space of the machine learner may include the real value or unbounded value space of some parameters, it may be necessary to manually set the boundary and discretization before applying the grid search.
  • Bayesian optimization includes a statistical model of functions ranging from hyperparameter values to targets evaluated on the validation set. Intuitively, this method assumes some smooth but noisy functions as a mapping from hyperparameters to targets.
  • Bayesian optimization one purpose is to collect observations in order to display as few machine learning models as possible, while displaying as much information about the function as possible, especially the best position.
  • Bayesian optimization relies on assuming a very general prior function, which when combined with the observed hyperparameter values and corresponding output, produces a function distribution. This method selects hyperparameters iteratively to observe (experimental run), in a way of selling (hyperparameters with the most uncertain results) and using (hyperparameters expected to have good results).
  • Bayesian optimization has been proven because it is possible to perform fewer experiments on grid search and random search to obtain better results before the quality of the experiment is run. Since grid search is an exhaustive and potentially expensive method, several alternatives have been proposed.
  • Chinese invention patent application CN110110862A discloses a hyperparameter optimization method based on an adaptive model.
  • the method is based on an adaptive model and can adapt to the search space and data set size of the model to be optimized. This method has poor parallelism and requires a large amount of Data has certain limitations.
  • the technical problem to be solved by the present invention is to provide a hyper-parameter optimization device, which is suitable for image recognition technology and can automatically optimize the hyper-parameters of the image recognition algorithm, so that while reducing manpower input, the algorithm model can obtain better results after training. Good model.
  • the present invention also discloses a hyperparameter optimization method, which can be applied to image recognition technology.
  • the method has fast process, high efficiency, good parallelism, does not require a large amount of data, and can be applied to medium-sized data and computing resources. In smaller cases, the scope of application has been expanded.
  • the present invention adopts the following technical solutions:
  • the method for optimizing hyperparameters includes the steps:
  • Step 1 Extract all the hyperparameters included in the algorithm model and vectorize all the hyperparameters to form a hyperparameter vector.
  • Step 2 Assign a value to the hyperparameter vector and change the value of the hyperparameter vector.
  • Step 3 Evaluate the performance of the algorithm model corresponding to the hyperparameter vectors of various values and form corresponding evaluation values, and select the value of the hyperparameter vector with the best evaluation value as the hyperparameter vector The final optimized value.
  • a further improvement is that the algorithm model is an algorithm model corresponding to the task.
  • a further improvement is that when the task changes, the hyperparameters of the algorithm model need to be optimized.
  • hyperparameter optimization method is suitable for image recognition methods
  • algorithm model is an image recognition algorithm model
  • hyperparameters include categorical numerical parameters and option parameters.
  • a further improvement is that, in the hyperparameter vector, the numerical parameter is directly expressed in the form of a floating-point number, and the option parameter is converted into a one-hot parameter.
  • steps two and three are implemented by particle swarm optimization, including:
  • Vi’ Vi*w+ra*ca*(Xpbest-Xi)+rb*cb*(Xgbest-Xi);
  • w, ca, cb are preset parameters
  • ra, rb are random numbers from 0 to 1
  • Xpbest is the best historical result
  • Xgbest is the best overall historical result
  • Vi' is the iterated Vi
  • a further improvement is that the particle swarm algorithm also includes:
  • a further improvement is that the steps to update Xpbest and Xgbest according to Pi’ include:
  • a further improvement is that the particle swarm algorithm also includes the realization of:
  • Pi' is not improved relative to Pi, it also includes using the corresponding probability to randomly generate the coordinates of Xi for the next iteration.
  • a further improvement is that if Xgbest is not updated after iteration 1-5, then the iteration ends;
  • hyper-parameters are parameters whose values are set before starting the learning process, rather than parameter data obtained through training. Under normal circumstances, it is necessary to optimize the hyperparameters and select a set of optimal hyperparameters for the learning machine to improve the performance and effect of learning.
  • Hyperparameters define higher-level concepts about the model, such as complexity or learning ability. You cannot learn directly from the data in the standard model training process, and need to be pre-defined. It can be decided by setting different values, training different types and choosing better test values.
  • Some examples of hyperparameters number of trees or depth of trees, number of potential factors in matrix decomposition, learning rate (multiple modes), number of hidden layers in deep neural networks, number of clusters in k-means clustering.
  • the hyperparameter optimization device includes:
  • the hyperparameter extraction unit is used to extract all the hyperparameters included in the algorithm model and vectorize all the hyperparameters to form a hyperparameter vector.
  • the hyperparameter vector assignment unit is used to assign a value to the hyperparameter vector and change the value of the hyperparameter vector.
  • the hyperparameter vector evaluation unit is used to evaluate the performance of the algorithm model corresponding to the hyperparameter vector of various values and form the corresponding evaluation value, and select the value of the hyperparameter vector with the best evaluation value as the value of the hyperparameter vector The final optimized value of the hyperparameter vector.
  • a further improvement is that the algorithm model is an algorithm model corresponding to the task.
  • a further improvement is that when the task changes, the hyperparameters of the algorithm model need to be optimized.
  • hyperparameter optimization device is suitable for image recognition devices;
  • algorithm model is an image recognition algorithm model.
  • hyperparameters include categorical numerical parameters and option parameters.
  • a further improvement is that, in the hyperparameter vector, the numerical parameter is directly expressed in the form of floating-point numbers, and the option parameter is converted into a one-hot parameter.
  • hyperparameter vector assignment unit and the hyperparameter vector evaluation unit form a particle swarm algorithm module for implementing:
  • a number of the hyperparameter vectors are initialized, the hyperparameter vectors obtained are set to Xi, the evaluation value corresponding to Xi is obtained, and the evaluation value corresponding to Xi is set to Pi.
  • Vi’ Vi*w+ra*ca*(Xpbest-Xi)+rb*cb*(Xgbest-Xi);
  • w, ca, cb are preset parameters
  • ra, rb are random numbers from 0 to 1
  • Xpbest is the best historical result
  • Xgbest is the best overall historical result
  • Vi' is the iterated Vi
  • a further improvement is that the particle swarm algorithm module also includes the realization of:
  • a further improvement is that the steps to update Xpbest and Xgbest according to Pi’ include:
  • a further improvement is that the particle swarm algorithm module also includes the realization of:
  • Pi' is not improved relative to Pi, it also includes using the corresponding probability to randomly generate the coordinates of Xi for the next iteration.
  • a further improvement is that if Xgbest is not updated after iteration 1-5, then the iteration ends;
  • an embodiment of the present invention also provides a hyperparameter optimization device, including: at least one processor; a memory coupled with the at least one processor, the memory storing executable instructions, wherein the When the execution instruction is executed by the at least one processor, the method as described in any one of the above second aspect is realized.
  • an embodiment of the present invention also provides a chip for executing the method in the above-mentioned first aspect.
  • the chip includes a processor, which is used to call and run a computer program from the memory, so that the device installed with the chip is used to execute the method in the second aspect described above.
  • an embodiment of the present invention also provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above-mentioned second aspect is implemented. The method described.
  • an embodiment of the present invention also provides a computer program product, including computer program instructions, which cause a computer to execute the method in the above second aspect.
  • the present invention automatically extracts the hyperparameters included in the algorithm model and vectorizes the hyperparameters to form a hyperparameter vector.
  • a hyperparameter vector includes all the hyperparameters of the algorithm model; the vectorized hyperparameter vector is convenient for assignment. After the hyperparameter vector is assigned, the value of the hyperparameter of the algorithm model can be changed. By calculating the performance of the algorithm model corresponding to the various assigned hyperparameter vectors, the evaluation value of the corresponding assigned hyperparameter vector can be obtained. The evaluation value can be compared.
  • the present invention can automatically optimize the algorithm hyperparameters, for example, through The particle swarm algorithm finds the final optimized value of the hyperparameter vector, which can reduce manpower input and improve the optimization efficiency of hyperparameters.
  • the evaluation value is the performance obtained by training the algorithm model and testing on the test set, so the best hyperparameter is selected At the same time, the obtained training model will be the best, and the performance on the designated test set will be the best. Therefore, the present invention can also enable the algorithm model to obtain a better model after training at the same time.
  • the present invention does not require the user to have the knowledge and experience of relevant algorithm model optimization after the hyperparameter optimization is performed automatically, so the user scope of the present invention is expanded.
  • the method of the present invention has a fast process, high efficiency, good parallelism, does not require a large amount of data, can be applied to situations with medium data volume and small computing resources, and expands the scope of application .
  • Figure 1 is a structural diagram of a hyperparameter optimization device according to an embodiment of the present invention.
  • Fig. 2 is a flowchart of a hyperparameter optimization method according to an embodiment of the present invention.
  • the inventor of the solution found that, in the prior art, the existing hyperparameter training tools generally only support training purely through a preset algorithm model after data input. Facing a new task often does not work well. However, if you need to optimize the algorithm model, you need to have the knowledge of related algorithm model optimization, manual design and programming to achieve optimization, and hyperparameters usually need to be empirically adjusted slowly. The user range is narrow.
  • Other automatic algorithm model optimization algorithms in the world generally use cyclic neural network (RNN) and other methods to automatically design network algorithm models. This method has a slow process, poor parallelism, and a large amount of data. For the medium amount of data (for example, millions of data), the situation with small computing resources is not applicable.
  • RNN cyclic neural network
  • FIG. 1 it is a structural diagram of a hyperparameter optimization device of an embodiment of the present invention.
  • the hyperparameter optimization device of the embodiment of the present invention can be applied to an image recognition device, including:
  • the hyperparameter extraction unit 1 is used for automatically extracting all hyperparameters included in the algorithm model and vectorizing all the hyperparameters to form a hyperparameter vector.
  • the algorithm model is an algorithm model corresponding to the task. When the task changes, the hyperparameters of the algorithm model need to be optimized.
  • the algorithm model is an image recognition algorithm model.
  • the hyperparameters include categorical numerical parameters and option parameters.
  • the numeric parameter is directly expressed in the form of a floating point number, and the option parameter is converted into a onehot parameter.
  • the hyperparameter vector assignment unit 2 is used to automatically assign a value to the hyperparameter vector and automatically change the value of the hyperparameter vector.
  • the hyperparameter vector evaluation unit 3 is used to evaluate the performance of the algorithm model corresponding to the hyperparameter vector of various values and form the corresponding evaluation value, and select the value of the hyperparameter vector with the best evaluation value As the final optimized value of the hyperparameter vector.
  • the hyperparameter vector assignment unit 2 and the hyperparameter vector evaluation unit 3 form a particle swarm algorithm module, which is used to realize:
  • a number of the hyperparameter vectors are initialized, the hyperparameter vectors obtained are set to Xi, the evaluation value corresponding to Xi is obtained, and the evaluation value corresponding to Xi is set to Pi.
  • Vi’ Vi*w+ra*ca*(Xpbest-Xi)+rb*cb*(Xgbest-Xi);
  • w, ca, cb are preset parameters
  • ra, rb are random numbers from 0 to 1
  • Xpbest is the best historical result
  • Xgbest is the best overall historical result
  • Vi' is the iterated Vi
  • a further improvement is that the particle swarm algorithm module also includes the realization of:
  • Xpbest and Xgbest are updated according to Pi’, and the updated Xpbest and Xgbest are Xpbest’ and Xgbest’.
  • the steps to update Xpbest and Xgbest according to Pi’ include:
  • Pi' is better than Pi, then take Xi' as the next iteration of Xi, Vi' as the next round of Vi, Xpbest' as the next round of Xpbest, and Xgbest' as the next round of Xgbest to re-iterate, iterate After multiple rounds, the finally obtained Xgbest is used as the final optimized value of the hyperparameter vector. Or, if Pi' is not improved relative to Pi, it also includes using the corresponding probability to re-generate the coordinates of Xi for the next iteration randomly.
  • the conditions for ending the iteration include: if Xgbest is not updated after iteration 1-5, then ending the iteration; or ending the iteration by setting a time, for example, setting the iteration to one night or other system setting time.
  • the embodiment of the present invention automatically extracts the hyperparameters included in the algorithm model and vectorizes the hyperparameters to form a hyperparameter vector.
  • a hyperparameter vector includes all the hyperparameters of the algorithm model; the vectorized hyperparameter vector is convenient to perform Assignment, after assigning the hyperparameter vector, the value of the hyperparameter of the algorithm model can be changed.
  • the evaluation of the corresponding assigned hyperparameter vector can be obtained. Value, and the evaluation value can be compared.
  • the embodiment of the present invention can automatically optimize the algorithm hyperparameter
  • the particle swarm algorithm can be used to find the final optimized value of the hyperparameter vector, thereby reducing manpower input and improving the optimization efficiency of hyperparameters.
  • the evaluation value is the performance obtained by training the algorithm model and testing on the test set, so choose the best In hyperparameters, the obtained training model will be the best, and the performance on the specified test set will be the best. Therefore, the embodiment of the present invention can also enable the algorithm model to obtain a better model after training at the same time.
  • the embodiment of the present invention automatically optimizes the hyperparameters, the user does not need to have knowledge and experience of relevant algorithm model optimization, so the user scope of the embodiment of the present invention is expanded.
  • FIG. 2 it is a flowchart of a method for optimizing hyperparameters in an embodiment of the present invention.
  • the method for optimizing hyperparameters in an embodiment of the present invention is applicable to the image recognition method and includes the following steps:
  • Step 1 Automatically extract all the hyperparameters included in the algorithm model and vectorize all the hyperparameters to form a hyperparameter vector.
  • the algorithm model is an algorithm model corresponding to the task.
  • the algorithm model is an image recognition algorithm model.
  • the hyperparameters include categorical numerical parameters and option parameters.
  • the numerical parameter is directly expressed in the form of a floating point number, and the option parameter is converted into a one-hot parameter.
  • Step 2 Automatically assign a value to the hyperparameter vector and automatically change the value of the hyperparameter vector.
  • Step 3 Evaluate the performance of the algorithm model corresponding to the hyperparameter vectors of various values and form corresponding evaluation values, and select the value of the hyperparameter vector with the best evaluation value as the hyperparameter vector The final optimized value.
  • Steps 2 and 3 are implemented by particle swarm algorithm, including:
  • Vi’ Vi*w+ra*ca*(Xpbest-Xi)+rb*cb*(Xgbest-Xi);
  • w, ca, cb are preset parameters
  • ra, rb are random numbers from 0 to 1
  • Xpbest is the best historical result
  • Xgbest is the best overall historical result
  • Vi' is the iterated Vi
  • the particle swarm algorithm further includes:
  • Xpbest and Xgbest are updated according to Pi’, and the updated Xpbest and Xgbest are Xpbest’ and Xgbest’.
  • the steps of updating Xpbest and Xgbest according to Pi' include:
  • Pi' is better than Pi, then take Xi' as the next iteration of Xi, Vi' as the next round of Vi, Xpbest' as the next round of Xpbest, and Xgbest' as the next round of Xgbest to re-iterate, iterate After multiple rounds, the finally obtained Xgbest is used as the final optimized value of the hyperparameter vector. If Pi' is not improved relative to Pi, it also includes using the corresponding probability to randomly generate the coordinates of Xi for the next iteration.
  • the conditions for the end of the iteration include: if Xgbest is not updated after iteration 1-5, the iteration ends;
  • end the iteration by setting a time such as setting the iteration to one night or other system setting time.
  • the present invention also provides a hyperparameter optimization device, including:
  • At least one processor a memory coupled with the at least one processor, and the memory stores executable instructions, where the executable instructions, when executed by the at least one processor, enable the method of the second aspect of this embodiment to be implemented.
  • This embodiment provides a hyperparameter optimization device, which includes: at least one processor; and a memory coupled with the at least one processor.
  • the processor and memory can be set separately or integrated together.
  • the memory may include random access memory, flash memory, read-only memory, programmable read-only memory, non-volatile memory, or registers.
  • the processor may be a central processing unit (Central Processing Unit, CPU) or the like.
  • a graphics processor Graphic Processing Unit, GPU
  • the processor can execute executable instructions stored in the memory to implement the various processes described herein.
  • the memory in this embodiment may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be ROM (Read-Only Memory), PROM (Programmable ROM, Programmable Read-Only Memory), EPROM (ErasablePROM, Erasable Programmable Read-Only Memory), EEPROM (Electrically EPROM, Electrically EPROM). Erasable programmable read-only memory) or flash memory.
  • the volatile memory may be RAM (Random Access Memory), which is used as an external cache.
  • RAM random access memory
  • SRAM StaticRAM, static random access memory
  • DRAM DynamicRAM, dynamic random access memory
  • SDRAM SynchronousDRAM, synchronous dynamic random access memory
  • DDRSDRAM DoubleDataRate SDRAM, double data rate synchronous dynamic random access memory
  • ESDRAM Enhanced SDRAM, enhanced synchronous dynamic random access memory
  • SLDRAM SynchronousDRAM, synchronous connection dynamic random access memory
  • DRRAM DirectRambusRAM, direct RAM bus random access memory.
  • the memory 42 described herein is intended to include, but is not limited to, these and any other suitable types of memory.
  • the memory stores the following elements, upgrade packages, executable units, or data structures, or a subset of them, or an extended set of them: operating systems and applications.
  • the operating system includes various system programs, such as a framework layer, a core library layer, and a driver layer, which are used to implement various basic services and process hardware-based tasks.
  • Application programs including various application programs, are used to implement various application services.
  • a program that implements the method of the embodiment of the present invention may be included in an application program.
  • the processor calls a program or instruction stored in the memory, specifically, a program or instruction stored in an application program, and the processor is used to execute the method steps provided in the second aspect.
  • an embodiment of the present invention also provides a chip for executing the method in the above second aspect.
  • the chip includes a processor, which is used to call and run a computer program from the memory, so that the device installed with the chip is used to execute the method in the second aspect described above.
  • the present invention also provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method of the second aspect of the present invention are implemented.
  • the machine-readable storage medium may include, but is not limited to, various known and unknown types of non-volatile memory.
  • an embodiment of the present invention also provides a computer program product, including computer program instructions, and the computer program instructions cause a computer to execute the method in the second aspect described above.
  • the disclosed system, device, and method may be implemented in other ways.
  • the division of units is only a logical function division, and there may be other division methods in actual implementation.
  • multiple units or components can be combined or integrated into another system.
  • the coupling between the various units may be direct coupling or indirect coupling.
  • the functional units in the embodiments of the present application may be integrated into one processing unit, or may be separate physical existences, and so on.
  • the size of the sequence number of each process does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not correspond to the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a machine-readable storage medium. Therefore, the technical solution of the present application may be embodied in the form of a software product.
  • the software product may be stored in a machine-readable storage medium, which may include a number of instructions to make an electronic device execute the technical solutions described in the embodiments of the present application. All or part of the process.
  • the foregoing storage media may include various media capable of storing program codes, such as ROM, RAM, removable disks, hard disks, magnetic disks, or optical disks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Physiology (AREA)
  • Feedback Control In General (AREA)
  • Stored Programmes (AREA)

Abstract

本发明公开了一种超参数的优化方法,包括步骤:步骤一、提取算法模型中所包括的所有超参数并将所有所述超参数向量化并形成超参数向量;步骤二、对所述超参数向量进行赋值并改变所述超参数向量的值;步骤三、评价各种取值的所述超参数向量对应的所述算法模型的性能表现并形成对应的评价值,选择评价值最好的所述超参数向量的取值作为所述超参数向量的最终优化值。本发明还公开了超参数的优化装置。本发明能自动优化算法超参数,以使在减少人力投入的同时,使算法模型训练后能得到较好的模型。

Description

超参数的优化装置和方法 技术领域
本发明涉及人工智能(Artificial Intelligence,AI),特别是涉及一种超参数的优化装置。本发明还涉及一种超参数的优化方法。
背景技术
机器学习中的模型参数和模型超参数在作用、来源等方面都有所不同。简单来说,模型参数就是模型内部的配置变量,可以用数据估计它的值。具体来讲,模型参数有以下特征:进行模型预测时需要模型参数;模型参数值可以定义模型功能;模型参数用数据估计或数据学习得到;模型参数一般不由实践者手动设置;模型参数通常作为学习模型的一部分保存;通常使用优化算法估计模型参数,优化算法是对参数的可能值进行的一种有效搜索。模型参数的一些例子包括:人造神经网络中的权重、支持向量机中的支持向量、线性回归或逻辑回归中的系数。模型超参数是模型外部的配置,其值不能从数据估计得到,必须手动设置参数的值。模型超参数的具体特征有:模型超参数常应用于估计模型参数的过程中;模型超参数通常由实践者直接指定;模型超参数通常可以使用启发式方法来设置;模型超参数通常根据给定的预测建模问题而调整。怎样得到模型超参数的最优值:对于给定的问题,我们无法知道模型超参数的最优值。但我们可以使用经验法则来探寻其最优值,或复制用于其他问题的值,也可以通过反复试验的方法。模型超参数的一些例子包括:训练神经网络的学习速率、支持向量机的C和sigma超参数、k邻域中的k。
人工智能的算法模型中,除了包括通过训练得到的训练参数外,还包括超参数。超参数通常是用于定义模型本身的结构,例如,模型中包括了多层网络,没层网络的节点都对应于一个函数,函数通过对多个输入信号进行处理后形成输出信号输出,多个输入信号的权重属于训练参数,训练参数需要通过采用样本进行训练得到。但是模型中网络的层数则需要在训练之前设置,故为超参数;类似的各种函数如多项式的次数也需要在训练之前进行设置,故也为超参数。根据实际的算法模型不同,超参数的设置 也不同,任务改变时,超参数的取值往往也需要改变。
学习率可能是最重要的超参数。超参数优化或模型选择是为学习算法选择一组最优超参数时的问题,通常目的是优化算法在独立数据集上的性能的度量。通常使用交叉验证来估计这种泛化性能。超参数优化与实际的学习问题形成对比,这些问题通常也被转化为优化问题,但是优化了训练集上的损失函数。实际上,学习算法学习可以很好地建模/重建输入的参数,而超参数优化则是确保模型不会像通过正则化一样通过调整来过滤其数据。目前的超参数优化方法有:网格搜索、贝叶斯优化、随机搜索、基于梯度的优化等。执行超参数优化的传统方法是网格搜索或参数扫描,这仅仅是通过学习算法的超参数空间的手动指定子集的详尽搜索。网格搜索算法必须由某些性能度量指导,通常通过训练集合上的交叉验证或对被保留验证集进行评估来衡量。由于机器学习者的参数空间可能包括某些参数的实值或无界值空间,因此在应用网格搜索之前可能需要手动设置边界和离散化。贝叶斯优化包括从超参数值到在验证集上评估的目标的功能的统计模型。直观上,该方法假设有一些平滑但嘈杂的功能,作为从超参数到目标的映射。在贝叶斯优化中,一个目的是收集观察结果,以便尽可能少地显示机器学习模型的次数,同时尽可能多地显示关于该功能的信息,特别是最佳位置。贝叶斯优化依赖于假设一个非常普遍的先验函数,当与观察到的超参数值和相应的输出结合时,产生函数分布。该方法通过迭代地选择超参数来观察(实验运行),以抛售(结果最不确定的超参数)和利用(预期具有良好结果的超参数)的方式。实际上,贝叶斯优化已经被证明,因为在实验的质量运行之前,能够对网格搜索和随机搜索进行更少的实验获得更好的结果。由于网格搜索是一种穷尽且潜在昂贵的方法,因此已经提出了几种替代方案。特别地,已经发现,简单地对参数设置进行固定次数的随机搜索,比在穷举搜索中的高维空间更有效。这是因为事实证明,一些超参数不会显着影响损失。因此,随机分散的数据给出了比最终不影响损失的参数的详尽搜索更多的“纹理”数据。对于特定的学习算法,可以计算相对于超参数的梯度,然后使用梯度下降优化超参数。这些技术的第一次 使用集中在神经网络,从那时起,这些方法已经扩展到其他模型,如支持向量机或逻辑回归。
现有超参数的训练工具,一般只支持在数据输入后,单纯的通过预设的算法模型进行训练。面对一个新任务常常效果不佳。而若需对算法模型进行优化,则需要有相关算法模型优化的知识,手动设计并编程实现优化,超参数,通常需要经验性慢慢调。用户范围较窄。世面上其他自动算法模型优化算法,一般通过循环神经网络(RNN)等方法自动设计网络算法模型,这种方法过程较慢,可并行性差,同时必须要大量数据。对于数据量中等(例如,百万数据),计算资源较小的情形并不适用。
中国发明专利申请CN110110862A公开了一种基于适应性模型的超参数优化方法,该方法基于适应性模型,能自适应待优化模型的搜索空间和数据集规模,该方法可并行性差,同时必须要大量数据,存在一定的局限性。
发明内容
本发明所要解决的技术问题是提供一种超参数的优化装置,可适用于图像识别技术,能自动优化图像识别算法超参数,以使在减少人力投入的同时,使算法模型训练后能得到较好的模型。为此,本发明还公开了一种超参数的优化方法,可适用于图像识别技术,该方法过程快,效率高,可并行性好,不需要大量数据,可以适用于数据量中等、计算资源较小的情形,扩大了适用范围。
为解决上述技术问题,本发明采用如下技术方案:
第一方面,本发明提供的一种超参数的优化方法,包括步骤:
步骤一、提取算法模型中所包括的所有超参数并将所有所述超参数向量化并形成超参数向量。
步骤二、对所述超参数向量进行赋值并改变所述超参数向量的值。
步骤三、评价各种取值的所述超参数向量对应的所述算法模型的性能表现并形成对应的评价值,选择评价值最好的所述超参数向量的取值作为所述超参数向量的最终优化值。
进一步的改进是,所述算法模型为任务对应的算法模型。
进一步的改进是,当任务改变时,所述算法模型的所述超参数需要进行优化。
进一步的改进是,所述超参数的优化方法适用于图像识别方法;所述算法模型为图像识别算法模型。
进一步的改进是,所述超参数向量中,所述超参数包括分类数值型参数和选项型参数。
进一步的改进是,所述超参数向量中,所述数值型参数直接通过浮点数形式表示,所述选项型参数则转化为独热型参数。
进一步的改进是,步骤二和步骤三通过粒子群算法实现,包括:
初始化若干个所述超参数向量,令获得的所述超参数向量为Xi,获得Xi对应的评价值,令Xi对应的评价值为Pi;
对各个Xi进行迭代,令Xi的迭代方向为Vi,Vi的迭代方程为:
Vi’=Vi*w+ra*ca*(Xpbest-Xi)+rb*cb*(Xgbest-Xi);
其中,w,ca,cb为预设参数,ra,rb为0~1的随机数,Xpbest为最好历史最好的结果,Xgbest为总体历史最好的结果,Vi'为迭代后的Vi;
计算得Vi'后,让Xi加上Vi’获得Xi’;
获得Xi’后,计算Xi’对应的的评价值Pi’。
进一步的改进是,所述粒子群算法还包括:
根据Pi’更新Xpbest和Xgbest,更新后的Xpbest和Xgbest为Xpbest’和Xgbest’;
如果Pi’优于Pi,则以Xi’作为下一轮迭代的Xi,Vi’作为下一轮的Vi,Xpbest’作为下一轮的Xpbest,Xgbest’作为下一轮的Xgbest重新进行迭代,迭代多轮后,以最终得到的Xgbest作为所述超参数向量的最终优化值。
进一步的改进是,根据Pi’更新Xpbest和Xgbest的步骤包括:
如果Ppbest比Pi'好,Xpbest’取Xpbest,否则Xpbest取Xi',Ppbest表示Xpbest对应的评价值;
如Pgbest比Ppbest比好,则Xgbest’取Xgbest,否则Xgbest’取Xpbest’,Pgbest表示Xgbest对应的评价值。
进一步的改进是,所述粒子群算法还包括实现:
如果Pi’相对于Pi没有提升,还包括则采用相应的概率重新随机生成下一轮迭代的Xi的坐标。
进一步的改进是,如果迭代1-5轮后Xgbest没有更新,则结束迭代;
或者,通过设定时间来结束迭代。
本文所述“超参数”是指:在机器学习的上下文中,超参数是在开始学习过程之前设置值的参数,而不是通过训练得到的参数数据。通常情况下,需要对超参数进行优化,给学习机选择一组最优超参数,以提高学习的性能和效果。超参数定义关于模型的更高层次的概念,如复杂性或学习能力。不能直接从标准模型培训过程中的数据中学习,需要预先定义。可以通过设置不同的值,训练不同的型和选择更好的测试值来决定。超参数的一些示例:树的数量或树的深度、矩阵分解中潜在因素的数量、学习率(多种模式)、深层神经网络隐藏层数、k均值聚类中的簇数。
第二方面,本发明提供的超参数的优化装置包括:
超参数提取单元,用于提取算法模型中所包括的所有超参数并将所有所述超参数向量化并形成超参数向量。
超参数向量赋值单元,用于对所述超参数向量进行赋值并改变所述超参数向量的值。
超参数向量评价单元,用于评价各种取值的所述超参数向量对应的所述算法模型的性能表现并形成对应的评价值,选择评价值最好的所述超参数向量的取值作为所述超参数向量的最终优化值。
进一步的改进是,所述算法模型为任务对应的算法模型。
进一步的改进是,当任务改变时,所述算法模型的所述超参数需要进行优化。
进一步的改进是,所述超参数的优化装置适用于图像识别装置;所述算法模型为图像识别算法模型。
进一步的改进是,所述超参数向量中,所述超参数包括分类数值型参数和选项型参数。
进一步的改进是,所述超参数向量中,所述数值型参数直接通过浮点 数形式表示,所述选项型参数则转化为独热型参数。
进一步的改进是,所述超参数向量赋值单元和所述超参数向量评价单元组成粒子群算法模块,用于实现:
初始化若干个所述超参数向量,令获得的所述超参数向量为Xi,获得Xi对应的评价值,令Xi对应的评价值为Pi。
对各个Xi进行迭代,令Xi的迭代方向为Vi,Vi的迭代方程为:
Vi’=Vi*w+ra*ca*(Xpbest-Xi)+rb*cb*(Xgbest-Xi);
其中,w,ca,cb为预设参数,ra,rb为0~1的随机数,Xpbest为最好历史最好的结果,Xgbest为总体历史最好的结果,Vi'为迭代后的Vi;
计算得Vi'后,让Xi加上Vi’获得Xi’;
获得Xi’后,计算Xi’对应的的评价值Pi’。
进一步的改进是,所述粒子群算法模块还包括实现:
根据Pi’更新Xpbest和Xgbest,更新后的Xpbest和Xgbest为Xpbest’和Xgbest’;
如果Pi’优于Pi,则以Xi’作为下一轮迭代的Xi,Vi’作为下一轮的Vi,Xpbest’作为下一轮的Xpbest,Xgbest’作为下一轮的Xgbest重新进行迭代,迭代多轮后,以最终得到的Xgbest作为所述超参数向量的最终优化值。
进一步的改进是,根据Pi’更新Xpbest和Xgbest的步骤包括:
如果Ppbest比Pi'好,Xpbest’取Xpbest,否则Xpbest取Xi',Ppbest表示Xpbest对应的评价值;
如Pgbest比Ppbest比好,则Xgbest’取Xgbest,否则Xgbest’取Xpbest’,Pgbest表示Xgbest对应的评价值。
进一步的改进是,所述粒子群算法模块还包括实现:
如果Pi’相对于Pi没有提升,还包括则采用相应的概率重新随机生成下一轮迭代的Xi的坐标。
进一步的改进是,如果迭代1-5轮后Xgbest没有更新,则结束迭代;
或者,通过设定时间来结束迭代。
第三方面,本发明实施例还提供一种超参数的优化装置,包括:至少 一个处理器;与所述至少一个处理器耦合的存储器,所述存储器存储有可执行指令,其中,所述可执行指令在被所述至少一个处理器执行时使得实现如上第二方面的任一项所述的方法。
第四方面,本发明实施例还提供一种芯片,用于执行上述第一方面中的方法。具体地,该芯片包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有该芯片的设备用于执行上述第二方面中的方法。
第五方面,本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如上第二方面的任一项所述的方法。
第六方面,本发明实施例还提供一种计算机程序产品,包括计算机程序指令,该计算机程序指令使得计算机执行上述第二方面中的方法。
本发明通过自动提取算法模型中所包括的超参数并将超参数向量化形成一个超参数向量,一个超参数向量就包括了算法模型的所有超参数;向量化后的超参数向量方便进行赋值,对超参数向量进行赋值之后就能实现算法模型的超参数的值的改变,通过计算各种赋值的超参数向量对应的算法模型的性能表现就能得到对应的赋值的超参数向量的评价值,而评价值能够进行比较,通过比较评价值最好能得到最优评价值对应的超参数向量的赋值即得到超参数向量的最终优化值;所以,本发明能自动优化算法超参数例如能实现通过粒子群算法寻找超参数向量的最终优化值,从而能减少人力投入,也能提高超参数的优化效率。
另外,由于本发明的超参数向量的最终优化值是通过评价值的比较得到的,评价值是通过对算法模型进行训练并在测试集上进行测试得到的性能表现,故选择最好的超参数时,会使得到的训练模型最好,在指定测试集上的表现也会最好,所以,本发明还能同时使算法模型训练后能得到较好的模型。
另外,和现有人工优化超参数相比,本发明自动化进行超参数优化后,不需要使用者拥有相关算法模型优化的知识和经验,故本发明的用户范围得到扩大。
另外,和现有人工优化超参数方法相比,本发明方法过程快,效率高,可并行性好,不需要大量数据,可以适用于数据量中等、计算资源较小的 情形,扩大了适用范围。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例超参数的优化装置的结构图;
图2是本发明实施例超参数的优化方法的流程图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。
本方案发明人发现,在现有技术中,现有超参数的训练工具,一般只支持在数据输入后,单纯的通过预设的算法模型进行训练。面对一个新任务常常效果不佳。而若需对算法模型进行优化,则需要有相关算法模型优化的知识,手动设计并编程实现优化,超参数,通常需要经验性慢慢调。用户范围较窄。世面上其他自动算法模型优化算法,一般通过循环神经网络(RNN)等方法自动设计网络算法模型,这种方法过程较慢,可并行性差,同时必须要大量数据。对于数据量中等(例如,百万数据),计算资源较小的情形并不适用。因此,如何研发一种可以适用于图像识别技术且数据量中等、计算资源较小的情形的超参数的优化装置和方法,能自动优化算法超参数,以使在减少人力投入的同时,使算法模型训练后能得到较好的模型,且加快速度,提高效率,可并行性好,不需要大量数据,扩大适用范 围。本发明实施例提供如下方案:
如图1所示,是本发明实施例超参数的优化装置的结构图;在本实施例第一方面,本发明实施例超参数的优化装置,可适用于图像识别装置,包括:
超参数提取单元1,用于自动提取算法模型中所包括的所有超参数并将所有所述超参数向量化并形成超参数向量。
所述算法模型为任务对应的算法模型。当任务改变时,所述算法模型的所述超参数需要进行优化。所述算法模型为图像识别算法模型。
所述超参数向量中,所述超参数包括分类数值型参数和选项型参数。
所述超参数向量中,所述数值型参数直接通过浮点数形式表示,所述选项型参数则转化为独热(onehot)型参数。
超参数向量赋值单元2,用于对所述超参数向量进行自动赋值并自动改变所述超参数向量的值。
超参数向量评价单元3,用于评价各种取值的所述超参数向量对应的所述算法模型的性能表现并形成对应的评价值,选择评价值最好的所述超参数向量的取值作为所述超参数向量的最终优化值。
所述超参数向量赋值单元2和所述超参数向量评价单元3组成粒子群算法模块,用于实现:
初始化若干个所述超参数向量,令获得的所述超参数向量为Xi,获得Xi对应的评价值,令Xi对应的评价值为Pi。
对各个Xi进行迭代,令Xi的迭代方向为Vi,Vi的迭代方程为:
Vi’=Vi*w+ra*ca*(Xpbest-Xi)+rb*cb*(Xgbest-Xi);
其中,w,ca,cb为预设参数,ra,rb为0~1的随机数,Xpbest为最好历史最好的结果,Xgbest为总体历史最好的结果,Vi'为迭代后的Vi;
计算得Vi'后,让Xi加上Vi’获得Xi’;
获得Xi’后,计算Xi’对应的的评价值Pi’。
进一步的改进是,所述粒子群算法模块还包括实现:
根据Pi’更新Xpbest和Xgbest,更新后的Xpbest和Xgbest为Xpbest’和Xgbest’。根据Pi’更新Xpbest和Xgbest的步骤包括:
如果Ppbest比Pi'好,Xpbest’取Xpbest,否则Xpbest取Xi',Ppbest表示Xpbest对应的评价值;
如Pgbest比Ppbest比好,则Xgbest’取Xgbest,否则Xgbest’取Xpbest’,Pgbest表示Xgbest对应的评价值。
如果Pi’优于Pi,则以Xi’作为下一轮迭代的Xi,Vi’作为下一轮的Vi,Xpbest’作为下一轮的Xpbest,Xgbest’作为下一轮的Xgbest重新进行迭代,迭代多轮后,以最终得到的Xgbest作为所述超参数向量的最终优化值。或者,如果Pi’相对于Pi没有提升,还包括则采用相应的概率重新随机生成下一轮迭代的Xi的坐标。
本发明实施例中,迭代结束的条件包括:如果迭代1-5轮后Xgbest没有更新,则结束迭代;或者,通过设定时间来结束迭代,例如设定迭代一晚上或者其他系统设定时间。
本发明实施例通过自动提取算法模型中所包括的超参数并将超参数向量化形成一个超参数向量,一个超参数向量就包括了算法模型的所有超参数;向量化后的超参数向量方便进行赋值,对超参数向量进行赋值之后就能实现算法模型的超参数的值的改变,通过计算各种赋值的超参数向量对应的算法模型的性能表现就能得到对应的赋值的超参数向量的评价值,而评价值能够进行比较,通过比较评价值最好能得到最优评价值对应的超参数向量的赋值即得到超参数向量的最终优化值;所以,本发明实施例能自动优化算法超参数例如能实现通过粒子群算法寻找超参数向量的最终优化值,从而能减少人力投入,也能提高超参数的优化效率。
另外,由于本发明实施例的超参数向量的最终优化值是通过评价值的比较得到的,评价值是通过对算法模型进行训练并在测试集上进行测试得到的性能表现,故选择最好的超参数时,会使得到的训练模型最好,在指定测试集上的表现也会最好,所以,本发明实施例还能同时使算法模型训练后能得到较好的模型。
另外,和现有人工优化超参数相比,本发明实施例自动化进行超参数优化后,不需要使用者拥有相关算法模型优化的知识和经验,故本发明实施例的用户范围得到扩大。
如图2所示,是本发明实施例超参数的优化方法的流程图;在本实施例第二方面,本发明实施例超参数的优化方法,可适用于图像识别方法,包括如下步骤:
步骤一、自动提取算法模型中所包括的所有超参数并将所有所述超参数向量化并形成超参数向量。
所述算法模型为任务对应的算法模型。
当任务改变时,所述算法模型的所述超参数需要进行优化。
所述算法模型为图像识别算法模型。
所述超参数向量中,所述超参数包括分类数值型参数和选项型参数。
所述超参数向量中,所述数值型参数直接通过浮点数形式表示,所述选项型参数则转化为独热型参数。
步骤二、对所述超参数向量进行自动赋值并自动改变所述超参数向量的值。
步骤三、评价各种取值的所述超参数向量对应的所述算法模型的性能表现并形成对应的评价值,选择评价值最好的所述超参数向量的取值作为所述超参数向量的最终优化值。
步骤二和步骤三通过粒子群算法实现,包括:
初始化若干个所述超参数向量,令获得的所述超参数向量为Xi,获得Xi对应的评价值,令Xi对应的评价值为Pi;
对各个Xi进行迭代,令Xi的迭代方向为Vi,Vi的迭代方程为:
Vi’=Vi*w+ra*ca*(Xpbest-Xi)+rb*cb*(Xgbest-Xi);
其中,w,ca,cb为预设参数,ra,rb为0~1的随机数,Xpbest为最好历史最好的结果,Xgbest为总体历史最好的结果,Vi'为迭代后的Vi;
计算得Vi'后,让Xi加上Vi’获得Xi’;
获得Xi’后,计算Xi’对应的的评价值Pi’。
所述粒子群算法还包括:
根据Pi’更新Xpbest和Xgbest,更新后的Xpbest和Xgbest为Xpbest’和Xgbest’。较佳为,根据Pi’更新Xpbest和Xgbest的步骤包括:
如果Ppbest比Pi'好,Xpbest’取Xpbest,否则Xpbest取Xi',Ppbest 表示Xpbest对应的评价值;
如Pgbest比Ppbest比好,则Xgbest’取Xgbest,否则Xgbest’取Xpbest’,Pgbest表示Xgbest对应的评价值。
如果Pi’优于Pi,则以Xi’作为下一轮迭代的Xi,Vi’作为下一轮的Vi,Xpbest’作为下一轮的Xpbest,Xgbest’作为下一轮的Xgbest重新进行迭代,迭代多轮后,以最终得到的Xgbest作为所述超参数向量的最终优化值。如果Pi’相对于Pi没有提升,还包括则采用相应的概率重新随机生成下一轮迭代的Xi的坐标。
迭代结束的条件包括:如果迭代1-5轮后Xgbest没有更新,则结束迭代;
或者,通过设定时间来结束迭代,例如设定迭代一晚上或者其他系统设定时间。
在第三方面,本发明还提供一种超参数的优化装置,包括:
至少一个处理器;与至少一个处理器耦合的存储器,存储器存储有可执行指令,其中,可执行指令在被至少一个处理器执行时使得实现本实施例第二方面的方法。
本实施例提供一种超参数的优化装置,包括:至少一个处理器;与至少一个处理器耦合的存储器。处理器和存储器可以单独设置,也可以集成在一起。
例如,存储器可以包括随机存储器、闪存、只读存储器、可编程只读存储器、非易失性存储器或寄存器等。处理器可以是中央处理器(Central Processing Unit,CPU)等。或者是图像处理器(Graphic Processing Unit,GPU)存储器可以存储可执行指令。处理器可以执行在存储器中存储的可执行指令,从而实现本文描述的各个过程。
可以理解,本实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是ROM(Read-OnlyMemory,只读存储器)、PROM(ProgrammableROM,可编程只读存储器)、EPROM(ErasablePROM,可擦除可编程只读存储器)、 EEPROM(ElectricallyEPROM,电可擦除可编程只读存储器)或闪存。易失性存储器可以是RAM(RandomAccessMemory,随机存取存储器),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如SRAM(StaticRAM,静态随机存取存储器)、DRAM(DynamicRAM,动态随机存取存储器)、SDRAM(SynchronousDRAM,同步动态随机存取存储器)、DDRSDRAM(DoubleDataRate SDRAM,双倍数据速率同步动态随机存取存储器)、ESDRAM(Enhanced SDRAM,增强型同步动态随机存取存储器)、SLDRAM(SynchlinkDRAM,同步连接动态随机存取存储器)和DRRAM(DirectRambusRAM,直接内存总线随机存取存储器)。本文描述的存储器42旨在包括但不限于这些和任意其它适合类型的存储器。
在一些实施方式中,存储器存储了如下的元素,升级包、可执行单元或者数据结构,或者他们的子集,或者他们的扩展集:操作系统和应用程序。
其中,操作系统,包含各种系统程序,例如框架层、核心库层、驱动层等,用于实现各种基础业务以及处理基于硬件的任务。应用程序,包含各种应用程序,用于实现各种应用业务。实现本发明实施例方法的程序可以包含在应用程序中。
在本发明实施例中,处理器通过调用存储器存储的程序或指令,具体的,可以是应用程序中存储的程序或指令,处理器用于执行第二方面所提供的方法步骤。
第四方面,本发明实施例还提供一种芯片,用于执行上述第二方面中的方法。具体地,该芯片包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有该芯片的设备用于执行上述第二方面中的方法。
此外,在第五方面,本发明还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现本发明第二方面的方法的步骤。
例如,机器可读存储介质可以包括但不限于各种已知和未知类型的非易失性存储器。
第六方面,本发明实施例还提供一种计算机程序产品,包括计算机程 序指令,该计算机程序指令使得计算机执行上述第二方面中的方法。
本领域技术人员可以明白的是,结合本文中所公开的实施例描述的各示例的单元及算法步骤能够以电子硬件、或者软件和电子硬件的结合来实现。这些功能是以硬件还是软件方式来实现,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以针对每个特定的应用,使用不同的方式来实现所描述的功能,但是这种实现并不应认为超出本申请的范围。
在本申请实施例中,所公开的系统、装置和方法可以通过其它方式来实现。例如,单元的划分仅仅为一种逻辑功能划分,在实际实现时还可以有另外的划分方式。例如,多个单元或组件可以进行组合或者可以集成到另一个系统中。另外,各个单元之间的耦合可以是直接耦合或间接耦合。另外,在本申请实施例中的各功能单元可以集成在一个处理单元中,也可以是单独的物理存在等等。
应理解,在本申请的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请的实施例的实施过程构成任何限定。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在机器可读存储介质中。因此,本申请的技术方案可以以软件产品的形式来体现,该软件产品可以存储在机器可读存储介质中,其可以包括若干指令用以使得电子设备执行本申请实施例所描述的技术方案的全部或部分过程。上述存储介质可以包括ROM、RAM、可移动盘、硬盘、磁盘或者光盘等各种可以存储程序代码的介质。
以上内容仅为本申请的具体实施方式,通过具体实施例对本发明进行了详细的说明,但这些并非构成对本发明的限制,本申请的保护范围并不局限于此。本领域技术人员在本申请所公开的技术范围内可以进行变化或替换,这些变化或替换都应当视为在本申请的保护范围之内。

Claims (26)

  1. 一种超参数的优化方法,其特征在于,包括步骤:
    步骤一、提取算法模型中所包括的所有超参数并将所有所述超参数向量化并形成超参数向量;
    步骤二、对所述超参数向量进行赋值并改变所述超参数向量的值;
    步骤三、评价各种取值的所述超参数向量对应的所述算法模型的性能表现并形成对应的评价值,选择评价值最好的所述超参数向量的取值作为所述超参数向量的最终优化值。
  2. 如权利要求1所述的超参数的优化方法,其特征在于:所述算法模型为任务对应的算法模型。
  3. 如权利要求2所述的超参数的优化方法,其特征在于:当任务改变时,所述算法模型的所述超参数需要进行优化。
  4. 如权利要求1-3任一项所述的超参数的优化方法,其特征在于:所述方法适用于图像识别方法;所述算法模型为图像识别算法模型。
  5. 如权利要求1所述的超参数的优化方法,其特征在于:所述超参数向量中,所述超参数包括分类数值型参数和选项型参数。
  6. 如权利要求5所述的超参数的优化方法,其特征在于:所述超参数向量中,所述数值型参数直接通过浮点数形式表示,所述选项型参数则转化为独热型参数。
  7. 如权利要求1所述的超参数的优化方法,其特征在于:步骤二和步骤三通过粒子群算法实现,包括:
    初始化若干个所述超参数向量,令获得的所述超参数向量为Xi,获得Xi对应的评价值,令Xi对应的评价值为Pi;
    对各个Xi进行迭代,令Xi的迭代方向为Vi,Vi的迭代方程为:
    Vi’=Vi*w+ra*ca*(Xpbest-Xi)+rb*cb*(Xgbest-Xi);
    其中,w,ca,cb为预设参数,ra,rb为0~1的随机数,Xpbest为最好历史最好的结果,Xgbest为总体历史最好的结果,Vi'为迭代后的Vi;
    计算得Vi'后,让Xi加上Vi’获得Xi’;
    获得Xi’后,计算Xi’对应的的评价值Pi’。
  8. 如权利要求7所述的超参数的优化方法,其特征在于:所述粒子群 算法还包括:
    根据Pi’更新Xpbest和Xgbest,更新后的Xpbest和Xgbest为Xpbest’和Xgbest’;
    如果Pi’优于Pi,则以Xi’作为下一轮迭代的Xi,Vi’作为下一轮的Vi,Xpbest’作为下一轮的Xpbest,Xgbest’作为下一轮的Xgbest重新进行迭代,迭代多轮后,以最终得到的Xgbest作为所述超参数向量的最终优化值。
  9. 如权利要求8所述的超参数的优化方法,其特征在于:根据Pi’更新Xpbest和Xgbest的步骤包括:
    如果Ppbest比Pi'好,Xpbest’取Xpbest,否则Xpbest取Xi',Ppbest表示Xpbest对应的评价值;
    如Pgbest比Ppbest比好,则Xgbest’取Xgbest,否则Xgbest’取Xpbest’,Pgbest表示Xgbest对应的评价值。
  10. 如权利要求9所述的超参数的优化方法,其特征在于:所述粒子群算法还包括实现:
    如果Pi’相对于Pi没有提升,还包括则采用相应的概率重新随机生成下一轮迭代的Xi的坐标。
  11. 如权利要求10所述的超参数的优化方法,其特征在于:如果迭代1-5轮后Xgbest没有更新,则结束迭代;
    或者,通过设定时间来结束迭代。
  12. 一种超参数的优化装置,其特征在于,包括:
    超参数提取单元,用于提取算法模型中所包括的所有超参数并将所有所述超参数向量化并形成超参数向量;
    超参数向量赋值单元,用于对所述超参数向量进行赋值并改变所述超参数向量的值;
    超参数向量评价单元,用于评价各种取值的所述超参数向量对应的所述算法模型的性能表现并形成对应的评价值,选择评价值最好的所述超参数向量的取值作为所述超参数向量的最终优化值。
  13. 如权利要求12所述的超参数的优化装置,其特征在于:所述算法 模型为任务对应的算法模型。
  14. 如权利要求13所述的超参数的优化装置,其特征在于:当任务改变时,所述算法模型的所述超参数需要进行优化。
  15. 如权利要求12-14任一项所述的超参数的优化装置,其特征在于:所述装置适用于图像识别装置;所述算法模型为图像识别算法模型。
  16. 如权利要求12所述的超参数的优化装置,其特征在于:所述超参数向量中,所述超参数包括分类数值型参数和选项型参数。
  17. 如权利要求16所述的超参数的优化装置,其特征在于:所述超参数向量中,所述数值型参数直接通过浮点数形式表示,所述选项型参数则转化为独热型参数。
  18. 如权利要求12所述的超参数的优化装置,其特征在于:所述超参数向量赋值单元和所述超参数向量评价单元组成粒子群算法模块,用于实现:
    初始化若干个所述超参数向量,令获得的所述超参数向量为Xi,获得Xi对应的评价值,令Xi对应的评价值为Pi;
    对各个Xi进行迭代,令Xi的迭代方向为Vi,Vi的迭代方程为:
    Vi’=Vi*w+ra*ca*(Xpbest-Xi)+rb*cb*(Xgbest-Xi);
    其中,w,ca,cb为预设参数,ra,rb为0~1的随机数,Xpbest为最好历史最好的结果,Xgbest为总体历史最好的结果,Vi'为迭代后的Vi;
    计算得Vi'后,让Xi加上Vi’获得Xi’;
    获得Xi’后,计算Xi’对应的的评价值Pi’。
  19. 如权利要求18所述的超参数的优化装置,其特征在于:所述粒子群算法模块还包括实现:
    根据Pi’更新Xpbest和Xgbest,更新后的Xpbest和Xgbest为Xpbest’和Xgbest’;
    如果Pi’优于Pi,则以Xi’作为下一轮迭代的Xi,Vi’作为下一轮的Vi,Xpbest’作为下一轮的Xpbest,Xgbest’作为下一轮的Xgbest重新进行迭代,迭代多轮后,以最终得到的Xgbest作为所述超参数向量的最终优化值。
  20. 如权利要求19所述的超参数的优化装置,其特征在于:根据Pi’更新Xpbest和Xgbest的步骤包括:
    如果Ppbest比Pi'好,Xpbest’取Xpbest,否则Xpbest取Xi',Ppbest表示Xpbest对应的评价值;
    如Pgbest比Ppbest比好,则Xgbest’取Xgbest,否则Xgbest’取Xpbest’,Pgbest表示Xgbest对应的评价值。
  21. 如权利要求20所述的超参数的优化装置,其特征在于:所述粒子群算法模块还包括实现:
    如果Pi’相对于Pi没有提升,还包括则采用相应的概率重新随机生成下一轮迭代的Xi的坐标。
  22. 如权利要求21所述的超参数的优化装置,其特征在于:如果迭代1-5轮后Xgbest没有更新,则结束迭代;
    或者,通过设定时间来结束迭代。
  23. 一种超参数的优化装置,其特征在于,包括:
    至少一个处理器;
    与所述至少一个处理器耦合的存储器,所述存储器存储有可执行指令,其中,所述可执行指令在被所述至少一个处理器执行时使得实现根据权利要求1至11中任一项所述的方法。
  24. 一种芯片,其特征在于,包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有所述芯片的设备执行:如权利要求1至11中任一项所述的方法。
  25. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至11中任一项所述的方法的步骤。
  26. 一种计算机程序产品,其特征在于,包括计算机程序指令,该计算机程序指令使得计算机执行如权利要求1至11中任一项所述的方法。
PCT/CN2020/089575 2019-12-30 2020-05-11 超参数的优化装置和方法 WO2021135025A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911389194.8 2019-12-30
CN201911389194.8A CN111160459A (zh) 2019-12-30 2019-12-30 超参数的优化装置和方法

Publications (1)

Publication Number Publication Date
WO2021135025A1 true WO2021135025A1 (zh) 2021-07-08

Family

ID=70559138

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/089575 WO2021135025A1 (zh) 2019-12-30 2020-05-11 超参数的优化装置和方法

Country Status (2)

Country Link
CN (1) CN111160459A (zh)
WO (1) WO2021135025A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3136299A1 (fr) 2022-01-04 2023-12-08 Alcom Technologies Procédé d’optimisation des hyperparamètres d’un modèle d’apprentissage automatique

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113053113A (zh) * 2021-03-11 2021-06-29 湖南交通职业技术学院 一种基于PSO-Welsch-Ridge的异常检测方法及装置
CN113780575B (zh) * 2021-08-30 2024-02-20 征图智能科技(江苏)有限公司 一种基于渐进式的深度学习模型的视觉分类方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106408031A (zh) * 2016-09-29 2017-02-15 南京航空航天大学 一种最小二乘支持向量机的超参优化方法
CN108446741A (zh) * 2018-03-29 2018-08-24 中国石油大学(华东) 机器学习超参数重要性评估方法、系统及存储介质
CN110443364A (zh) * 2019-06-21 2019-11-12 深圳大学 一种深度神经网络多任务超参数优化方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201010407A (en) * 2008-08-19 2010-03-01 Univ Nat Kaohsiung Applied Sci Color image noise reduction method using particle swarm optimization and cellular neural network
CN105281615A (zh) * 2015-11-12 2016-01-27 广西师范大学 一种基于改进粒子群算法优化无刷直流电机模糊控制器的方法
CN110399917B (zh) * 2019-07-24 2023-04-18 东北大学 一种基于超参数优化cnn的图像分类方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106408031A (zh) * 2016-09-29 2017-02-15 南京航空航天大学 一种最小二乘支持向量机的超参优化方法
CN108446741A (zh) * 2018-03-29 2018-08-24 中国石油大学(华东) 机器学习超参数重要性评估方法、系统及存储介质
CN110443364A (zh) * 2019-06-21 2019-11-12 深圳大学 一种深度神经网络多任务超参数优化方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHANG XUAN, WANG HONGLI: "LSSVM Based on PSO and Its Applications to Time Series Prediction", CHINA MECHANICAL ENGINEERING, ZHONGGUO JIXIE GONGCHENG ZAZHISHE, WUHAN, CN, vol. 22, no. 21, 1 January 2011 (2011-01-01), CN, pages 2572 - 2576, XP055828244, ISSN: 1004-132X *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3136299A1 (fr) 2022-01-04 2023-12-08 Alcom Technologies Procédé d’optimisation des hyperparamètres d’un modèle d’apprentissage automatique

Also Published As

Publication number Publication date
CN111160459A (zh) 2020-05-15

Similar Documents

Publication Publication Date Title
Yang et al. Pointflow: 3d point cloud generation with continuous normalizing flows
US12099906B2 (en) Parallel development and deployment for machine learning models
WO2021135025A1 (zh) 超参数的优化装置和方法
JP6969637B2 (ja) 因果関係分析方法および電子デバイス
JP7478145B2 (ja) 機械学習モデルの自動生成
US10074054B2 (en) Systems and methods for Bayesian optimization using non-linear mapping of input
CN114207635A (zh) 使用元建模对机器学习和深度学习模型进行快速准确的超参数优化
US20190005377A1 (en) Artificial neural network reduction to reduce inference computation time
US20200125945A1 (en) Automated hyper-parameterization for image-based deep model learning
JP2005276225A (ja) テーブルを使用したツリーの学習
US20220036232A1 (en) Technology for optimizing artificial intelligence pipelines
CN105976421B (zh) 一种渲染程序的在线优化方法
US20240330130A1 (en) Graph machine learning for case similarity
Harde et al. Design and implementation of ACO feature selection algorithm for data stream mining
CN112686299A (zh) 计算机执行的神经网络模型获取方法及装置
Reese et al. Predict better with less training data using a QNN
Deng et al. Multi-label image recognition in anime illustration with graph convolutional networks
Blagoveshchenskii et al. Hybrid algorithms for optimization and diagnostics of hydromechanical systems used in food production biotechnology
US11928562B2 (en) Framework for providing improved predictive model
US20220405599A1 (en) Automated design of architectures of artificial neural networks
Nguyen et al. High resolution self-organizing maps
US20240256742A1 (en) MACHINE LEARNING CLASSIFICATION AND REDUCTION OF cad PARTS FOR RAPID DESIGN TO SIMULATION
US20240161263A1 (en) Method for inspecting defects of product by using 2d image information
EP4198837A1 (en) Method and system for global explainability of neural networks
US20230195842A1 (en) Automated feature engineering for predictive modeling using deep reinforcement learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20910188

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20910188

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25.01.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20910188

Country of ref document: EP

Kind code of ref document: A1