CN113989584A - Neural network hyper-parameter tuning method based on orthogonal design - Google Patents

Neural network hyper-parameter tuning method based on orthogonal design Download PDF

Info

Publication number
CN113989584A
CN113989584A CN202111198402.3A CN202111198402A CN113989584A CN 113989584 A CN113989584 A CN 113989584A CN 202111198402 A CN202111198402 A CN 202111198402A CN 113989584 A CN113989584 A CN 113989584A
Authority
CN
China
Prior art keywords
hyper
parameter
value range
level
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111198402.3A
Other languages
Chinese (zh)
Inventor
王钰
杜博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi University
Original Assignee
Shanxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi University filed Critical Shanxi University
Priority to CN202111198402.3A priority Critical patent/CN113989584A/en
Publication of CN113989584A publication Critical patent/CN113989584A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the field of computer vision, image processing and machine learning, and provides a neural network hyper-parameter tuning method based on orthogonal design, aiming at the problems of low classification precision and high calculation overhead of the existing hyper-parameter tuning method. The method comprises the steps of firstly determining the hyper-parameter and the value range thereof, dividing the value range of the hyper-parameter into a plurality of intervals, carrying out discretization operation to obtain each value level of the hyper-parameter, combining the value levels of the hyper-parameter based on an orthogonal table, and traversing all the hyper-parameter level combinations to obtain the optimal hyper-parameter level combination; and then carrying out next-step super-parameter value range division based on the obtained optimal super-parameter level combination, and repeating the steps until the super-parameter value range division can not be carried out any more. The problem of low classification precision of traditional hyper-parameter optimization is solved, and calculation overhead is greatly saved.

Description

Neural network hyper-parameter tuning method based on orthogonal design
Technical Field
The invention relates to the field of computer vision, image processing and machine learning, in particular to a neural network hyper-parameter tuning method based on orthogonal design.
Background
In practical applications, the deep neural network model is already a reference model in the fields of speech recognition, target detection, drug discovery, genomics, and the like. In the deep neural network model, the number of layers of the network is from tens of layers to hundreds of layers, and the number of neurons per layer is tens of neurons and up to thousands of neurons, and thus, the amount of parameters is enormous for the deep neural network. The values of the parameters are different, and the network performance is also different, so how to select a group of good hyper-parameters to optimize the performance of the neural network, namely, the optimization of the hyper-parameters of the neural network is always a problem in research by scholars. The following three types of common hyper-parameters are available:
(1) a network structure including a connection relationship between neurons, the number of layers, the number of neurons in each layer, the kind of an activation function, and the like;
(2) optimizing parameters including an optimization method, a learning rate, the number of samples for small-batch learning and the like;
(3) and (4) regularizing the coefficients.
So-called hyper-parameter tuning, is to select an optimal set of hyper-parameters by minimizing the difference between the test data and the model prediction results (i.e. model optimization). At present, widely used super-parameter tuning methods mainly comprise grid search, random search and manual tuning. The grid search is an exhaustive search, and in all candidate hyper-parameters, all hyper-parameter value level combinations are tried through cyclic traversal, the hyper-parameter value level combination which shows the best performance is the final result, and the principle is like finding the maximum value in an array. And random search does not test all values in the value range of the hyper-parameter any more, but randomly selects the hyper-parameter value in the search range, and if the sample point set is large enough and the search times are enough, the global optimum value can be found out at a high rate through random sampling. The manual tuning needs to rely on experience and intuition to obtain the optimal solution after a large number of times of tuning.
Obviously, the above methods have drawbacks that are not negligible. In grid search, as the number of combinations we try increases exponentially with the increase of the hyperparametric dimension, the required computing resources also increase greatly, causing "dimension disaster". Manual tuning requires a significant amount of labor and computational overhead. Although the random search is faster than the grid search, the optimal solution can be obtained after a large number of searches. Based on the above, the invention provides a neural network hyper-parameter tuning method based on orthogonal design.
Disclosure of Invention
Aiming at the problems of low classification precision and high calculation overhead of the conventional hyper-parameter tuning method, the invention integrates the idea of orthogonal design into the random search hyper-parameter tuning method and provides a neural network hyper-parameter tuning method based on orthogonal design.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a neural network hyper-parameter tuning method based on orthogonal design, which comprises the following steps:
step 1, determining the hyperparameters of the neural network to be optimized and the value ranges of the hyperparameters according to specific use conditions;
step 2, selecting a proper orthogonal table according to the number of the hyper-parameters needing to be optimized, which are determined in the step 1;
step 3, dividing the value range of each hyper-parameter determined in the step 1 into a plurality of intervals, and performing discretization operation to obtain each value level of the hyper-parameter;
step 4, combining all the value levels of the hyper-parameters obtained in the step 3 based on the selected orthogonal table, and calculating the scores of all the hyper-parameter level combinations;
step 5, selecting the hyperparameter level combination with the highest score, and finding the hyperparameter value range corresponding to each hyperparameter value level in the combination;
and 6, repeating the steps 3-5 until the value range division and discretization operation of the hyper-parameters cannot be carried out again, and obtaining the optimal hyper-parameter combination of the neural network.
Further, dividing the value range of each hyper-parameter into a plurality of intervals in the step 3 specifically comprises: when a two-level orthogonal design is selected, dividing the value range of each hyper-parameter into two; when a three-level orthogonal design is selected, dividing the value range one of each hyper-parameter into three; similar generalized partitioning is performed when other horizontal orthogonal designs are selected.
So-called orthogonal designs are usually presented in tabular form. The orthogonal design includes two-level, three-level, four-level, and other different levels of orthogonal design. Orthogonal design can achieve optimal results with fewer tests, shorter test time, and lower computational cost by arranging tests reasonably.
Compared with the prior art, the invention has the following advantages:
1. the neural network hyper-parameter tuning method based on the orthogonal design has uniform hyper-parameter distribution, ensures the economy and repeatability of test results, and can obtain accurate and reliable test results with less calculation time.
2. The invention combines the idea of orthogonal design into the widely used hyper-parameter random search tuning method to select and tune the hyper-parameters of the neural network, overcomes the disorder of the traditional hyper-parameter random search method, greatly saves the calculation cost, is successfully applied to the image classification task, obviously improves the classification performance of the image on the image classification task, and realizes the following steps: under the same test times, the image classification accuracy rate obtained by the neural network super-parameter tuning method based on the orthogonal design is obviously superior to that of a random search method; secondly, the method provided by the invention can achieve the same accuracy as the random search method with less test times.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a boxplot of the results of 50 replicates using the method of the invention compared to the results of a random search alone at the same number of replicates. In the figure, the first column shows that 48 trials were performed for each iteration using a random search method (without using an orthogonal design); the second column shows that 48 trials were performed with each repetition using the method of the invention (using an orthogonal design); the third column shows that 40 trials were performed for each repetition using the method of the invention (with orthogonal design); the fourth column shows that 32 trials were performed for each replicate using the method of the invention (using an orthogonal design).
Detailed Description
The technical solution in the embodiments of the present invention will be specifically and specifically described below with reference to the embodiments of the present invention and the accompanying drawings. It should be noted that variations and modifications can be made by those skilled in the art without departing from the principle of the present invention, and these should also be construed as falling within the scope of the present invention.
Before introducing the specific concepts and operations, some basic concepts and operations are first introduced.
Data set: let D be a data set comprising n imagesnI.e. Dn={zi1, 1.., n }, wherein z isiIs a data set DnThe ith image in (1), and data set DnDivided into a training set and a validation set, the training set being denoted D(train)The verification set is denoted as D(valid)
Learning rate: the learning rate is too high, so that the parameters can move back and forth on two sides of the extreme value; if the learning rate is too low, the optimization speed is reduced;
simulated annealing: a universal optimization algorithm can jump out a local optimal solution with a certain probability, so that parameters converge towards a global optimal direction;
batch production: the batch of samples used at each iteration is called a batch, and is used for defining the number of samples to be processed before the internal model parameters are updated;
regularization: by adding a regularization term after the loss function, the generalization capability of the model is improved.
And (3) super-parameter tuning: finding the optimal hyperparameter λ by minimizing the following objective function(*)Namely:
Figure BDA0003304051550000051
wherein
Figure BDA0003304051550000054
Is a loss function, representsDifferences between the validation set and the model prediction results;
Figure BDA0003304051550000052
representing a neural network algorithm model.
Taking a two-level orthogonal design as an example, as shown in fig. 1, the flow of this embodiment is as follows:
1. the data set used in this embodiment is a handwritten digital image data set (MNIST), which has a total of 60000 images, each of 28x28 in size; the data set is divided into a training set and a verification set according to the ratio of 1: 5, a used network model is a 3-layer fully-connected neural network, 10 output nodes are arranged, and the classification labels (0, 1.. multidot.9) correspond to handwritten numbers.
2. The selected hyper-parameters are 5, and are respectively as follows: learning rate, batch size, simulated annealing, number of hidden nodes, and regularization coefficient. The values of the corresponding over-parameters are as follows:
learning rate 0.001-5
Size of batch 20-100
Simulated annealing 100-10000
Number of hidden nodes 16-1024
Regularization coefficients 3.1e-7-3.1e-5
3. And (3) selecting a 5-factor 2 horizontal orthogonal table according to the number of the hyperparameters selected in the step (2), wherein the orthogonal table is as follows:
Figure BDA0003304051550000053
Figure BDA0003304051550000061
4. dividing the value range of the 5 hyper-parameters in the step 2 into two, discretizing to obtain values of all levels of the hyper-parameters, combining all levels of the hyper-parameters based on the 5-factor 2 level orthogonal table in the step 3, and calculating the scores of all combinations of the levels of the hyper-parameters, wherein the calculation result is as follows:
test No 1 2 3 4 5 6 7 8
Score of 0.934 0.941 0.849 0.721 0.932 0.939 0.940 0.945
5. The hyper-parameter level combination with the highest score, i.e. experiment 8(0.945), was selected and the corresponding hyper-parameter value and hyper-parameter value range was found, shown as follows:
value of the hyper-parameter Range
Learning rate 0.014 0.001–2.5
Size of batch 90 60–100
Simulated annealing 7673.0 5050-10000
Number of hidden nodes 736 520-1024
Regularization coefficients 2.246e-6 3.1e-7–1.6e-5
6. Repeating the steps 4 and 5 until the batch interval length is less than 2, at the moment, carrying out discretization operation of dividing the batch into two parts to obtain a test result, and averaging the test result to obtain a repeated result;
7. the experiment was repeated 50 times and the 50 results were compared to boxplots of the results obtained using the random search alone at the same number of repetitions.
FIG. 2 is a boxplot of 50 results obtained using the method of the invention compared to results obtained using a random search alone at the same number of repetitions. In the figure, the first column shows that 48 trials were performed for each iteration using a random search method (without using an orthogonal design); the second column shows that 48 trials were performed with each repetition using the method of the invention (using an orthogonal design); the third column shows that 40 trials were performed for each repetition using the method of the invention (with orthogonal design); the fourth column shows that 32 trials were performed for each replicate using the method of the invention (using an orthogonal design).
By comparing the first column with the second column, it can be derived: the accuracy of using orthogonal design is significantly better than that of not using orthogonal design for the same number of trials because there is no crossover between boxes, i.e. the accuracy of using the method of the present invention is significantly better than using the random search method.
Comparing the first column with the third column yields: when the number of tests is reduced by 1/6, the accuracy of the orthogonal design is still better than that of the single random search, and boxes are not crossed, which shows that the accuracy of the method is still better than that of the random search method under the condition that the number of tests is reduced by 1/6.
By comparing the first column with the fourth column, it can be concluded that: at 1/3 where the number of trials was reduced, the accuracy obtained using the orthogonal table was not significantly different from the random search results alone.
Comprehensively, under the same test times, the image classification accuracy obtained by the neural network super-parameter tuning method based on the orthogonal design is obviously superior to that obtained by a random search method; secondly, the method provided by the invention can achieve the same accuracy as the random search method with less test times.
The test takes two-level orthogonal design as an example, the three-level orthogonal design is that the value range of the hyper-parameter is divided into three, discretization operation is carried out, a proper three-level orthogonal table is selected, and other horizontal orthogonal designs can be similarly popularized. Therefore, the invention provides the neural network hyper-parameter tuning method with high image classification precision and low calculation overhead.

Claims (2)

1. A neural network hyper-parameter tuning method based on orthogonal design is characterized by comprising the following steps:
step 1, determining the hyperparameters of the neural network to be optimized and the value ranges of the hyperparameters according to specific use conditions;
step 2, selecting a proper orthogonal table according to the number of the hyper-parameters needing to be optimized, which are determined in the step 1;
step 3, dividing the value range of each hyper-parameter determined in the step 1 into a plurality of intervals, and performing discretization operation to obtain each value level of the hyper-parameter;
step 4, combining all the value levels of the hyper-parameters obtained in the step 3 based on the selected orthogonal table, and calculating the scores of all the hyper-parameter level combinations;
step 5, selecting the hyperparameter level combination with the highest score, and finding the hyperparameter value range corresponding to each hyperparameter value level in the combination;
and 6, repeating the steps 3-5 until the value range division and discretization operation of the hyper-parameters cannot be carried out again, and obtaining the optimal hyper-parameter combination of the neural network.
2. The neural network hyper-parameter tuning method based on orthogonal design as claimed in claim 1, wherein the dividing of the value range of each hyper-parameter into a plurality of intervals in step 3 is specifically: when a two-level orthogonal design is selected, dividing the value range of each hyper-parameter into two; when a three-level orthogonal design is selected, dividing the value range one of each hyper-parameter into three; similar generalized partitioning is performed when other horizontal orthogonal designs are selected.
CN202111198402.3A 2021-10-14 2021-10-14 Neural network hyper-parameter tuning method based on orthogonal design Pending CN113989584A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111198402.3A CN113989584A (en) 2021-10-14 2021-10-14 Neural network hyper-parameter tuning method based on orthogonal design

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111198402.3A CN113989584A (en) 2021-10-14 2021-10-14 Neural network hyper-parameter tuning method based on orthogonal design

Publications (1)

Publication Number Publication Date
CN113989584A true CN113989584A (en) 2022-01-28

Family

ID=79738644

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111198402.3A Pending CN113989584A (en) 2021-10-14 2021-10-14 Neural network hyper-parameter tuning method based on orthogonal design

Country Status (1)

Country Link
CN (1) CN113989584A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116992253A (en) * 2023-07-24 2023-11-03 中电金信软件有限公司 Method for determining value of super-parameter in target prediction model associated with target service

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116992253A (en) * 2023-07-24 2023-11-03 中电金信软件有限公司 Method for determining value of super-parameter in target prediction model associated with target service

Similar Documents

Publication Publication Date Title
CN108846259B (en) Gene classification method and system based on clustering and random forest algorithm
Aydadenta et al. A clustering approach for feature selection in microarray data classification using random forest
CN108694390B (en) Modulation signal classification method for cuckoo search improved wolf optimization support vector machine
CN112232413B (en) High-dimensional data feature selection method based on graph neural network and spectral clustering
CN112926635B (en) Target clustering method based on iterative self-adaptive neighbor propagation algorithm
CN111625576B (en) Score clustering analysis method based on t-SNE
CN107832778B (en) Same target identification method based on spatial comprehensive similarity
CN105046323B (en) Regularization-based RBF network multi-label classification method
CN111325264A (en) Multi-label data classification method based on entropy
CN111564179A (en) Species biology classification method and system based on triple neural network
CN103020979A (en) Image segmentation method based on sparse genetic clustering
CN110555530B (en) Distributed large-scale gene regulation and control network construction method
CN114091650A (en) Searching method and application of deep convolutional neural network architecture
CN113989584A (en) Neural network hyper-parameter tuning method based on orthogonal design
Zhang et al. RUFP: Reinitializing unimportant filters for soft pruning
CN115512772A (en) High-precision single cell clustering method and system based on marker genes and ensemble learning
CN113283573A (en) Automatic search method for optimal structure of convolutional neural network
CN109871379A (en) A kind of online Hash K-NN search method based on data block study
Kim et al. Tweaking deep neural networks
Yamada et al. Weight Features for Predicting Future Model Performance of Deep Neural Networks.
Lee et al. Efficient decoupled neural architecture search by structure and operation sampling
CN115374868A (en) Non-supervision feature selection method based on JS divergence and ADMM algorithm
CN111176865B (en) Peer-to-peer mode parallel processing method and framework based on optimization algorithm
CN113449631A (en) Image classification method and system
CN113111774A (en) Radar signal modulation mode identification method based on active incremental fine adjustment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination