CN113989584A

CN113989584A - Neural network hyper-parameter tuning method based on orthogonal design

Info

Publication number: CN113989584A
Application number: CN202111198402.3A
Authority: CN
Inventors: 王钰; 杜博
Original assignee: Shanxi University
Current assignee: Shanxi University
Priority date: 2021-10-14
Filing date: 2021-10-14
Publication date: 2022-01-28

Abstract

The invention relates to the field of computer vision, image processing and machine learning, and provides a neural network hyper-parameter tuning method based on orthogonal design, aiming at the problems of low classification precision and high calculation overhead of the existing hyper-parameter tuning method. The method comprises the steps of firstly determining the hyper-parameter and the value range thereof, dividing the value range of the hyper-parameter into a plurality of intervals, carrying out discretization operation to obtain each value level of the hyper-parameter, combining the value levels of the hyper-parameter based on an orthogonal table, and traversing all the hyper-parameter level combinations to obtain the optimal hyper-parameter level combination; and then carrying out next-step super-parameter value range division based on the obtained optimal super-parameter level combination, and repeating the steps until the super-parameter value range division can not be carried out any more. The problem of low classification precision of traditional hyper-parameter optimization is solved, and calculation overhead is greatly saved.

Description

Neural network hyper-parameter tuning method based on orthogonal design

Technical Field

The invention relates to the field of computer vision, image processing and machine learning, in particular to a neural network hyper-parameter tuning method based on orthogonal design.

Background

In practical applications, the deep neural network model is already a reference model in the fields of speech recognition, target detection, drug discovery, genomics, and the like. In the deep neural network model, the number of layers of the network is from tens of layers to hundreds of layers, and the number of neurons per layer is tens of neurons and up to thousands of neurons, and thus, the amount of parameters is enormous for the deep neural network. The values of the parameters are different, and the network performance is also different, so how to select a group of good hyper-parameters to optimize the performance of the neural network, namely, the optimization of the hyper-parameters of the neural network is always a problem in research by scholars. The following three types of common hyper-parameters are available:

(1) a network structure including a connection relationship between neurons, the number of layers, the number of neurons in each layer, the kind of an activation function, and the like;

(2) optimizing parameters including an optimization method, a learning rate, the number of samples for small-batch learning and the like;

(3) and (4) regularizing the coefficients.

So-called hyper-parameter tuning, is to select an optimal set of hyper-parameters by minimizing the difference between the test data and the model prediction results (i.e. model optimization). At present, widely used super-parameter tuning methods mainly comprise grid search, random search and manual tuning. The grid search is an exhaustive search, and in all candidate hyper-parameters, all hyper-parameter value level combinations are tried through cyclic traversal, the hyper-parameter value level combination which shows the best performance is the final result, and the principle is like finding the maximum value in an array. And random search does not test all values in the value range of the hyper-parameter any more, but randomly selects the hyper-parameter value in the search range, and if the sample point set is large enough and the search times are enough, the global optimum value can be found out at a high rate through random sampling. The manual tuning needs to rely on experience and intuition to obtain the optimal solution after a large number of times of tuning.

Obviously, the above methods have drawbacks that are not negligible. In grid search, as the number of combinations we try increases exponentially with the increase of the hyperparametric dimension, the required computing resources also increase greatly, causing "dimension disaster". Manual tuning requires a significant amount of labor and computational overhead. Although the random search is faster than the grid search, the optimal solution can be obtained after a large number of searches. Based on the above, the invention provides a neural network hyper-parameter tuning method based on orthogonal design.

Disclosure of Invention

Aiming at the problems of low classification precision and high calculation overhead of the conventional hyper-parameter tuning method, the invention integrates the idea of orthogonal design into the random search hyper-parameter tuning method and provides a neural network hyper-parameter tuning method based on orthogonal design.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a neural network hyper-parameter tuning method based on orthogonal design, which comprises the following steps:

step 1, determining the hyperparameters of the neural network to be optimized and the value ranges of the hyperparameters according to specific use conditions;

step 2, selecting a proper orthogonal table according to the number of the hyper-parameters needing to be optimized, which are determined in the step 1;

step 3, dividing the value range of each hyper-parameter determined in the step 1 into a plurality of intervals, and performing discretization operation to obtain each value level of the hyper-parameter;

step 4, combining all the value levels of the hyper-parameters obtained in the step 3 based on the selected orthogonal table, and calculating the scores of all the hyper-parameter level combinations;

step 5, selecting the hyperparameter level combination with the highest score, and finding the hyperparameter value range corresponding to each hyperparameter value level in the combination;

and 6, repeating the steps 3-5 until the value range division and discretization operation of the hyper-parameters cannot be carried out again, and obtaining the optimal hyper-parameter combination of the neural network.

Further, dividing the value range of each hyper-parameter into a plurality of intervals in the step 3 specifically comprises: when a two-level orthogonal design is selected, dividing the value range of each hyper-parameter into two; when a three-level orthogonal design is selected, dividing the value range one of each hyper-parameter into three; similar generalized partitioning is performed when other horizontal orthogonal designs are selected.

So-called orthogonal designs are usually presented in tabular form. The orthogonal design includes two-level, three-level, four-level, and other different levels of orthogonal design. Orthogonal design can achieve optimal results with fewer tests, shorter test time, and lower computational cost by arranging tests reasonably.

Compared with the prior art, the invention has the following advantages:

1. the neural network hyper-parameter tuning method based on the orthogonal design has uniform hyper-parameter distribution, ensures the economy and repeatability of test results, and can obtain accurate and reliable test results with less calculation time.

2. The invention combines the idea of orthogonal design into the widely used hyper-parameter random search tuning method to select and tune the hyper-parameters of the neural network, overcomes the disorder of the traditional hyper-parameter random search method, greatly saves the calculation cost, is successfully applied to the image classification task, obviously improves the classification performance of the image on the image classification task, and realizes the following steps: under the same test times, the image classification accuracy rate obtained by the neural network super-parameter tuning method based on the orthogonal design is obviously superior to that of a random search method; secondly, the method provided by the invention can achieve the same accuracy as the random search method with less test times.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a boxplot of the results of 50 replicates using the method of the invention compared to the results of a random search alone at the same number of replicates. In the figure, the first column shows that 48 trials were performed for each iteration using a random search method (without using an orthogonal design); the second column shows that 48 trials were performed with each repetition using the method of the invention (using an orthogonal design); the third column shows that 40 trials were performed for each repetition using the method of the invention (with orthogonal design); the fourth column shows that 32 trials were performed for each replicate using the method of the invention (using an orthogonal design).

Detailed Description

The technical solution in the embodiments of the present invention will be specifically and specifically described below with reference to the embodiments of the present invention and the accompanying drawings. It should be noted that variations and modifications can be made by those skilled in the art without departing from the principle of the present invention, and these should also be construed as falling within the scope of the present invention.

Before introducing the specific concepts and operations, some basic concepts and operations are first introduced.

Data set: let D be a data set comprising n images_nI.e. D_n＝{z_i1, 1.., n }, wherein z is_iIs a data set D_nThe ith image in (1), and data set D_nDivided into a training set and a validation set, the training set being denoted D^(train)The verification set is denoted as D^(valid)；

Learning rate: the learning rate is too high, so that the parameters can move back and forth on two sides of the extreme value; if the learning rate is too low, the optimization speed is reduced;

simulated annealing: a universal optimization algorithm can jump out a local optimal solution with a certain probability, so that parameters converge towards a global optimal direction;

batch production: the batch of samples used at each iteration is called a batch, and is used for defining the number of samples to be processed before the internal model parameters are updated;

regularization: by adding a regularization term after the loss function, the generalization capability of the model is improved.

And (3) super-parameter tuning: finding the optimal hyperparameter λ by minimizing the following objective function^(*)Namely:

wherein

Is a loss function, representsDifferences between the validation set and the model prediction results;

representing a neural network algorithm model.

Taking a two-level orthogonal design as an example, as shown in fig. 1, the flow of this embodiment is as follows:

1. the data set used in this embodiment is a handwritten digital image data set (MNIST), which has a total of 60000 images, each of 28x28 in size; the data set is divided into a training set and a verification set according to the ratio of 1: 5, a used network model is a 3-layer fully-connected neural network, 10 output nodes are arranged, and the classification labels (0, 1.. multidot.9) correspond to handwritten numbers.

2. The selected hyper-parameters are 5, and are respectively as follows: learning rate, batch size, simulated annealing, number of hidden nodes, and regularization coefficient. The values of the corresponding over-parameters are as follows:

learning rate	0.001-5
		Size of batch	20-100
Simulated annealing	100-10000
		Number of hidden nodes	16-1024
Regularization coefficients	3.1e-7-3.1e-5

3. And (3) selecting a 5-factor 2 horizontal orthogonal table according to the number of the hyperparameters selected in the step (2), wherein the orthogonal table is as follows:

4. dividing the value range of the 5 hyper-parameters in the step 2 into two, discretizing to obtain values of all levels of the hyper-parameters, combining all levels of the hyper-parameters based on the 5-factor 2 level orthogonal table in the step 3, and calculating the scores of all combinations of the levels of the hyper-parameters, wherein the calculation result is as follows:

test No	1	2	3	4	5	6	7	8
									Score of	0.934	0.941	0.849	0.721	0.932	0.939	0.940	0.945

5. The hyper-parameter level combination with the highest score, i.e. experiment 8(0.945), was selected and the corresponding hyper-parameter value and hyper-parameter value range was found, shown as follows:

	value of the hyper-parameter	Range
			Learning rate	0.014	0.001–2.5
Size of batch	90	60–100
			Simulated annealing	7673.0	5050-10000
Number of hidden nodes	736	520-1024
			Regularization coefficients	2.246e-6	3.1e-7–1.6e-5

6. Repeating the steps 4 and 5 until the batch interval length is less than 2, at the moment, carrying out discretization operation of dividing the batch into two parts to obtain a test result, and averaging the test result to obtain a repeated result;

7. the experiment was repeated 50 times and the 50 results were compared to boxplots of the results obtained using the random search alone at the same number of repetitions.

FIG. 2 is a boxplot of 50 results obtained using the method of the invention compared to results obtained using a random search alone at the same number of repetitions. In the figure, the first column shows that 48 trials were performed for each iteration using a random search method (without using an orthogonal design); the second column shows that 48 trials were performed with each repetition using the method of the invention (using an orthogonal design); the third column shows that 40 trials were performed for each repetition using the method of the invention (with orthogonal design); the fourth column shows that 32 trials were performed for each replicate using the method of the invention (using an orthogonal design).

By comparing the first column with the second column, it can be derived: the accuracy of using orthogonal design is significantly better than that of not using orthogonal design for the same number of trials because there is no crossover between boxes, i.e. the accuracy of using the method of the present invention is significantly better than using the random search method.

Comparing the first column with the third column yields: when the number of tests is reduced by 1/6, the accuracy of the orthogonal design is still better than that of the single random search, and boxes are not crossed, which shows that the accuracy of the method is still better than that of the random search method under the condition that the number of tests is reduced by 1/6.

By comparing the first column with the fourth column, it can be concluded that: at 1/3 where the number of trials was reduced, the accuracy obtained using the orthogonal table was not significantly different from the random search results alone.

Comprehensively, under the same test times, the image classification accuracy obtained by the neural network super-parameter tuning method based on the orthogonal design is obviously superior to that obtained by a random search method; secondly, the method provided by the invention can achieve the same accuracy as the random search method with less test times.

The test takes two-level orthogonal design as an example, the three-level orthogonal design is that the value range of the hyper-parameter is divided into three, discretization operation is carried out, a proper three-level orthogonal table is selected, and other horizontal orthogonal designs can be similarly popularized. Therefore, the invention provides the neural network hyper-parameter tuning method with high image classification precision and low calculation overhead.

Claims

1. A neural network hyper-parameter tuning method based on orthogonal design is characterized by comprising the following steps:

2. The neural network hyper-parameter tuning method based on orthogonal design as claimed in claim 1, wherein the dividing of the value range of each hyper-parameter into a plurality of intervals in step 3 is specifically: when a two-level orthogonal design is selected, dividing the value range of each hyper-parameter into two; when a three-level orthogonal design is selected, dividing the value range one of each hyper-parameter into three; similar generalized partitioning is performed when other horizontal orthogonal designs are selected.