WO2023124342A1 - 一种针对图像分类的神经网络结构低成本自动搜索方法 - Google Patents
一种针对图像分类的神经网络结构低成本自动搜索方法 Download PDFInfo
- Publication number
- WO2023124342A1 WO2023124342A1 PCT/CN2022/123299 CN2022123299W WO2023124342A1 WO 2023124342 A1 WO2023124342 A1 WO 2023124342A1 CN 2022123299 W CN2022123299 W CN 2022123299W WO 2023124342 A1 WO2023124342 A1 WO 2023124342A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- reg
- individual
- network structure
- individuals
- block
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 81
- 230000001537 neural effect Effects 0.000 title abstract description 9
- 230000035772 mutation Effects 0.000 claims description 35
- 238000013528 artificial neural network Methods 0.000 claims description 27
- 238000011176 pooling Methods 0.000 claims description 25
- 230000008569 process Effects 0.000 claims description 15
- 230000007613 environmental effect Effects 0.000 claims description 13
- 230000008859 change Effects 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 5
- 230000007246 mechanism Effects 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 238000000926 separation method Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 abstract description 42
- 238000012549 training Methods 0.000 abstract description 13
- 238000002474 experimental method Methods 0.000 abstract description 11
- 238000011161 development Methods 0.000 abstract description 6
- 230000002068 genetic effect Effects 0.000 abstract description 5
- 238000012360 testing method Methods 0.000 description 28
- 238000013461 design Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 230000008901 benefit Effects 0.000 description 8
- 230000000875 corresponding effect Effects 0.000 description 6
- 238000003909 pattern recognition Methods 0.000 description 5
- 238000002679 ablation Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000010429 evolutionary process Effects 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 241000404158 Lonas Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 235000015122 lemonade Nutrition 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012803 optimization experiment Methods 0.000 description 1
- 230000002028 premature Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Definitions
- the invention relates to a low-cost automatic search method of a neural network structure for image classification, belonging to the technical field of image classification.
- Deep learning has made great progress in various computer vision tasks.
- the hand-designed neural network structure is one of the important driving forces in the development of deep learning, such as VGGNet, ResNet, Inception, and DenseNet.
- VGGNet Deep Learning Network
- ResNet ResNet
- Inception Inception
- DenseNet DenseNet
- a hand-designed neural network structure can achieve excellent classification performance, the design of the structure requires specialized domain knowledge that only a few experts possess.
- due to the need for repeated optimization experiments in the manual design method it will consume a lot of time and computing resources. This has also prompted a lot of research in the field of neural network architecture search (NAS) in recent years for the development of automatic design of neural network architectures.
- NAS neural network architecture search
- the NAS algorithm can be used by individuals who are not familiar with professional knowledge, which greatly reduces the threshold of network design.
- the automation of the NAS algorithm can reduce manpower and cost, and at the same time, the network structure searched by the NAS algorithm can outperform the manually designed algorithm.
- the search time and computational resource cost of NAS algorithms to find the optimal network structure are usually expensive.
- Most of the existing NAS algorithms mainly rely on verification data sets to optimize the network structure, which requires a lot of time and intensive computing resources. For example, NASNet uses 500 GPUs, and it takes 4 days to search for the best network.
- the network structure search problem is usually defined as a single-objective optimization problem, that is, only a single objective is considered at a time rather than multiple.
- Most real-world network deployments not only require extremely high classification performance, but also require lower computational resources, such as fewer network parameters and less network computational complexity. For this reason, some hand-designed network structures have been developed in recent years. While reducing computing consumption, the network can still have high-precision performance, such as MobileNet and MobileNetV2.
- some NAS algorithms based on multi-objective optimization have emerged in recent years to make the network structure easier to calculate and deploy. For example, NSGA-Net considers the trade-off between classification accuracy and computational complexity of the network. LEMONADE considers both the classification performance of the network and the number of network parameters.
- the present invention provides a low-cost automatic search method for the neural network structure for image classification, the method comprising:
- Step 1 For the image classification task, determine the main framework of the neural network structure, randomly generate X network structures as the population P, and each individual in the population represents a randomly generated network structure; the main framework of the neural network structure includes a standard volume product layer, unit num Reg Unit modules and a global average pooling layer, each Reg Unit module includes block num group convolution Reg Block; and each Reg Unit module contains a SENet module with a probability of 50%, and the SENet module passes Squeeze-and-Excitation to simulate the attention mechanism;
- the number unit num of the Reg Unit module, the number block num of the group convolution Reg Block, and the width width of the second convolutional layer in each branch of the group convolution Reg Block are randomly generated;
- Step 2 Set the separation points S 1 , S 2 and the maximum evolutionary generation Max_gen of the subsequent population evolution stage three stages;
- Step 3 Calculate the condition number K N of the NTK of the network structure of each individual in the population P as the fitness of the individual;
- Step 4 The population enters evolution, using tournament selection to select individual mutation operations to generate new network structure individuals, and selecting different indicators according to the stage of the current evolutionary algebra G to carry out environmental selection to eliminate individuals;
- Step 5 After reaching the maximum evolutionary generation Max_gen, select the network structure with the smallest individual fitness K N value as the searched neural network structure for image classification tasks.
- the group convolution Reg Block in each network structure contains group branches, each branch consists of three convolutional layers and a pooling layer, where the pooling layer is in the third layer; the first layer and the second Four convolution layers use 1 ⁇ 1 kernel to adjust the number of feature maps, the second convolution layer uses 3 ⁇ 3 kernel to extract feature maps, all convolution layers follow convolution operation, ReLu activation function and batch normalization The order of the layers; the pooling layer of the third layer is used to halve the size of the input data; the input data is image data.
- the number of pooling layers in the third layer in each branch of the group convolution Reg Block cannot be greater than
- step 4 different indicators are selected according to the stage of the current evolutionary algebra G to carry out environmental selection to eliminate individuals, including:
- the fitness K N of the individual is selected as the standard to eliminate the individual;
- the lifespan of the individual is selected as the standard to eliminate the individual, and the lifespan of the individual is the evolutionary generation experienced by the individual.
- the population evolution process includes:
- T parent individuals generate t offspring individuals through a set of mutation operators; after the offspring individuals are generated, they are evaluated and added to the existing population;
- the t parent individuals generate t offspring individuals through a set of mutation operators; after the offspring individuals are generated, they are evaluated and added to the existing population, including:
- Randomly select a mutation operator to perform the mutation of the parent individual includes adding operators, removing operators and changing operators;
- Add operator add a Reg Block with random parameter settings at the mutation position pos ij ;
- Remove operator remove the Reg Block at the mutation position pos ij ;
- the adding operator when implementing the adding operator, if the length of the parent individual reaches the upper limit, the adding operator cannot be implemented, and the operator can only be removed or changed;
- the present application also provides an image classification method, which uses the neural network structure searched by the above method to perform image classification.
- the method includes:
- a scalable network structure is constructed using the block as the basic unit.
- the controllable parameterization of the block makes the search space of the constructed network structure scalable. Combined with an improved genetic algorithm, through a three-stage natural selection strategy, it can better stimulate the exploration and development of the search space.
- the number of conditions of the non-training index NTK is introduced as the individual fitness to search for a network structure with high precision and low parameter amount very quickly, so that when solving practical problems, it is realized to use less computing resources to quickly search for a comprehensive A network structure with superior performance. For image classification tasks, experiments have proved that the accuracy of classification using the network structure with superior comprehensive performance found by searching is relatively high.
- Fig. 1 is a schematic diagram of the overall structure of the network designed in the low-cost neural network structure search method based on the three-stage evolutionary algorithm disclosed in an embodiment of the present invention and the proposed new network block Reg Block.
- Fig. 2 is a schematic diagram of the selected values of the parameters of the network structure for the image classification problem searched by the low-cost neural network structure search method based on the three-stage evolutionary algorithm disclosed in one embodiment of the present invention.
- Fig. 3 is a schematic diagram of a flexible coding strategy disclosed in an embodiment of the present invention.
- Fig. 4 is a comparison diagram of parameter quantities between the group convolution proposed in the present application and the standard convolution in the prior art disclosed in an embodiment of the present invention.
- Fig. 5A is a comparison diagram of test accuracy between the original network structure disclosed in one embodiment of the present invention and the network structure without SENet module.
- Fig. 5B is a comparison diagram of the parameters between the original network structure disclosed in one embodiment of the present invention and the network structure without SENet module.
- Fig. 6 is a schematic diagram of the negative correlation between K N and network structure test accuracy in the LoNAS search space on the CIFAR-10 data set disclosed in an embodiment of the present invention.
- Fig. 7 is a schematic diagram of the influence of the length of the second stage on the test accuracy under the premise of the same evolution length (the evolution algebra is set to 50).
- Fig. 8 is a schematic diagram of adding operators and removing operators in the evolution process disclosed in an embodiment of the present invention.
- This embodiment provides a low-cost neural network structure search method based on a three-stage evolutionary algorithm, the method comprising:
- Step 1 Given a specific parameter set about the Reg Block, the network structure is flexibly encoded; at the same time, the three-stage separation point S 1 , S 2 and the maximum algebra Max_gen of evolution are given; the Reg Block includes group convolution and SENet module, which contains a SENet module with a probability of 50%;
- the Reg Block contains group branches, each branch consists of three convolutional layers and a pooling layer, where the pooling layer is in the third layer; the first and fourth convolutional layers use 1 ⁇ 1 cores to Adjust the number of feature maps, the second convolutional layer uses a 3 ⁇ 3 kernel to extract feature maps, and all convolutional layers follow the order of convolution operations, ReLu activation functions, and batch normalization layers; the third layer of pooling layer Used to halve the size of the input data.
- the output of the Reg Block is connected by the output features of each branch and the residual connection, with a 50% probability plus a SENet module; the SENet module simulates the attention mechanism through Squeeze-and-Excitation.
- Step 2 According to the encoding method in step 1, initialize the population P containing 50 network structure individuals;
- each individual network structure includes a standard convolutional layer Conv Unit, unit num Reg Units and a global average pooling layer, as shown in Figure 1(a).
- the structure of each Reg Block in Reg Units is shown in Figure 1(b).
- Step 3 Use the CIFAR-10 and CIFAR-100 data sets to calculate the condition number K N of NTK of each network structure as the fitness of the individual;
- Step 4 The population enters evolution
- Step 5 Use the tournament selection to select individual mutation operations to generate new network structure individuals
- Step 6 Select different indicators according to the current evolutionary algebra G to carry out environmental selection to eliminate individuals;
- Step 7. Go back to step 5 until the maximum evolutionary number is reached.
- This embodiment provides a low-cost neural network structure search method based on a three-stage evolutionary algorithm.
- the low-cost neural network structure search for image classification tasks is used as an example for illustration.
- the method includes:
- Step 1 Given a specific parameter set about the Reg Block, the network structure is flexibly encoded; at the same time, the three-stage separation point S 1 , S 2 and the maximum algebra Max_gen of evolution are given; the Reg Block includes group convolution and SENet module, which contains a SENet module with a probability of 50%;
- the Reg Block contains group branches, each branch consists of three convolutional layers and a pooling layer, where the pooling layer is in the third layer; the first and fourth convolutional layers use 1 ⁇ 1 cores to Adjust the number of feature maps, the second convolutional layer uses a 3 ⁇ 3 kernel to extract feature maps, and all convolutional layers follow the order of convolution operations, ReLu activation functions, and batch normalization layers; the third layer of pooling layer Used to halve the size of the input data.
- the output of the Reg Block is connected by the output features of each branch and the residual connection, with a 50% probability plus a SENet module; the SENet module simulates the attention mechanism through Squeeze-and-Excitation.
- Reg Block consists of group convolution and SENet modules, which can be used to reduce the number of parameters and improve classification performance, respectively.
- Reg Block The topology of Reg Block is shown in Figure 1(b).
- the input features are divided into a certain number of groups, which makes the standard convolution operation decomposed into multiple independent convolution branches.
- the advantage of group convolution is that it can greatly reduce the computation and number of parameters of the network without significantly reducing the classification performance.
- the pooling layer of the third layer in the Reg Block is used to halve the size of the input data, and its number cannot be specified arbitrarily, and it needs to follow the calculation constraints. For example, for an M ⁇ M input data, the number of pooling layers used to halve the input feature size cannot be greater than Otherwise the size of the input data will be reduced to less than 1 and an error will be generated. Therefore, in the Reg Block, only a part of the pooling layer stride can be set to 2 to halve the number of feature maps, and the stride of the other part is set to 1.
- the output of the Reg Block is connected by the output features of each branch and the residual connection, plus a SENet module.
- the SENet module simulates the attention mechanism through Squeeze-and-Excitation, which can make the network structure pay more attention to the most informative part of the feature, thereby improving the representation ability of the network structure.
- this application has conducted two ablation experiments on CIFAR-10, the first is to verify the effectiveness of group convolution, and the second is to investigate SENet Validity of the module.
- the experimental results are shown in Fig. 4; 10 individuals are randomly selected from a final population for these two ablation experiments, and these individuals all contain group convolutions and a certain number of SENet modules.
- Figure 5(b) shows that compared with the overall parameter quantity in the network structure, the addition of the SENet module only brings about a small increase in the parameter quantity and has little impact on the network parameter quantity.
- Step 2 According to the encoding method in step 1, initialize the population P containing 50 network structure individuals;
- each individual network structure of 50 network structures includes a standard convolutional layer Conv Unit, unit num Reg Units and a global average pooling layer.
- the standard convolutional layer Conv Unit uses a 3 ⁇ 3 kernel to extract the features of the initial input data.
- the initial input data is the image to be classified.
- the number unit num of Reg Units is randomly generated; each Reg Unit consists of block num Reg Blocks. Reg Block is randomly generated based on a set of parameters that can be automatically searched, that is, the number block num of Reg Blocks is randomly generated. The number of Reg Blocks contained in each Reg Unit is also randomly generated, the branch number group in each Reg Block is randomly generated, and the width of the second convolutional layer in each branch is randomly generated.
- a group P initialized by random individuals which contains 50 individuals, and each individual represents a randomly generated network structure.
- the main body of the network structure of all individuals includes a standard convolutional layer Conv Unit, unit num Reg Units and A global average pooling layer.
- a global average pooling layer is placed at the end of each individual network structure to flatten the feature map output by Reg Units into a feature vector. Finally, by setting a fully connected layer with a softmax layer as a classifier, the feature vector is converted into the final prediction result.
- Step 3 Use the CIFAR-10 and CIFAR-100 data sets to calculate the condition number K N of NTK of each network structure as the fitness of the individual;
- NTK NTK
- Higher trainability represents higher classification accuracy performance of the network architecture.
- NTK can be used to characterize the gradient descent training dynamics of infinite-width or finite-width deep network architectures. Refer to W.Chen, X.Gong, and Z.Wang, "Neural architecture search on imagenet in four gpu hours: A theoretically inspired perspective," in International Conference on Learning Representations, 2020, using CIFAR-10 and CIFAR- 100 data sets calculate the condition number K N of NTK for each network structure;
- the eigenvalue ⁇ k of NTK between the training sets is obtained, and the condition number K of NTK of the network structure is obtained according to each eigenvalue ⁇ k N , the calculation formula is as follows:
- ⁇ 0 represents the maximum value of eigenvalue ⁇ k
- ⁇ m represents the minimum value of eigenvalue ⁇ k .
- this application uses K N to evaluate the fitness of individuals.
- minimizing K N helps to find network structures with high-precision performance.
- the non-training feature of K N can directly save a lot of search time and computing resources.
- Step 4 The population enters evolution, using the tournament selection to select individual mutation operations to generate new network structure individuals, and select different indicators according to the current evolutionary algebra G for environmental selection to eliminate individuals;
- k individuals are randomly selected from the population. From these k individuals, according to the size of each individual's fitness K N value, select the first t individuals with the best fitness as the parent individual.
- the t parent individuals generate t offspring individuals through a set of mutation operators. After generation, offspring individuals are evaluated and added to the existing population.
- the corresponding criteria are used to eliminate individuals in the environmental selection.
- the t worst individuals are eliminated, so that the population size remains unchanged, and the remaining individuals construct a new population and enter the next generation of evolution.
- the criteria for environment selection are based on K N , which helps to retain potential optimal solutions and improve Algorithm development.
- the lifespan of the individual is used as the criterion for environment selection, which ensures sufficient exploration.
- Step 5 Go back to step 4 until the maximum evolution algebra is reached, and select the individual with the smallest K N as the best network structure searched.
- the lifespans of different individuals in a population are relatively similar. If there are many individuals with good fitness in the population at the beginning, then as the life span of individuals grows, these individuals will be eliminated in the later evolution process. These individuals are removed as potential optimal solutions in the search space, which will slow down the convergence speed of the population, thus affecting the effect of population convergence.
- the present invention comprehensively considers traditional evolution and evolution based on individual lifespan, and proposes a new evolutionary algorithm with multi-standard environment selection.
- K N related to the classification performance of the network structure is selected as the criterion for environment selection, and individuals with smaller K N are retained each time during the selection process.
- the individual with a shorter lifespan is selected to be kept in the population based on the lifespan of the individual.
- the first stage it is ensured that the outstanding individuals in the population can enter the later evolutionary process, so that the offspring produced by mutation can inherit from them, improve the overall performance of the population, and ensure that there are enough potential optimal solutions in the population.
- the population is frequently updated to explore more search spaces and increase the diversity of individuals.
- excellent individuals are saved every time the environment is selected, and the population is guided to converge to the best optimal solution, which helps to ensure the development of the algorithm.
- each rectangular box represents the overall verification accuracy rate of a population
- the length of the box represents the deviation of the accuracy rate among individuals
- the dots and dashed lines in the box represent the average and median of the accuracy rate.
- the extended lines at both ends of the box represent the maximum and minimum precision in the population.
- the evolutionary algorithm degenerates into a traditional evolutionary algorithm with fixed standard environment selection. From Fig. 7, it is obvious that compared with other three-stage evolutionary algorithms, the traditional evolutionary algorithm has the lowest average verification accuracy. This indicates that since the second stage helps to explore more search spaces, it helps the population to converge to a network structure with better classification performance.
- the average accuracy of the population presents a trend of first increasing and then decreasing. This can explain that the long second stage causes the population to spend too much time exploring the search space throughout the evolution process, which leads to the population not being able to converge to a better solution in time.
- the length of the third rectangular box and its extension line is the shortest, indicating that the differences among individuals are the smallest.
- This can prove that a third stage with sufficient length can improve exploration, which helps weed out individuals with poor fitness and increase the number of optimal solutions. This in turn improves the stability of the evolutionary algorithm in the search process. Therefore, according to the above experimental results, the appropriate length of each stage helps to effectively balance the exploratory and exploitative nature of the algorithm, so as to better search for the optimal solution.
- the mutation operator is only performed in the Reg Unit, and the Conv Unit does not involve mutation due to its specific functions.
- the mutation operator first randomly select a mutation position pos ij within the length of the parent individual, which represents the position of the jth Reg Block in the i-th Reg Unit, and the position is determined by the order of the Reg Unit in the network structure and the Reg Block The order of positions in the Reg Unit is determined. Then, a mutation operator is randomly selected to perform the mutation of the parent individual.
- the designed mutation operator is as follows:
- the length of the original parent individual needs to be considered when implementing the add operator and remove operator. If the length reaches the upper limit, the addition operator cannot be implemented, and only two other operators can be selected. When the length of the original individual reaches the lower limit, the operator cannot be removed.
- This application designs a new network block called Reg Block, which combines group convolution and SENet modules, which can reduce the number of network parameters and improve network classification performance, respectively.
- Reg Block Based on Reg Block, a flexible coding strategy is proposed to construct the network structure.
- a limited search space can be constructed to discover network structures that take into account both network classification accuracy and the number of parameters.
- NTK Neural Tangent Kernel
- This application proposes a three-stage evolutionary algorithm based on multi-criteria environment selection.
- the criteria for environmental selection were based on the number of NTKs (K N ) and the lifespan of the individual.
- a lifespan attribute is associated with each individual, representing the evolutionary generations the individual has experienced.
- K N NTKs
- individuals with high fitness are preserved to the next generation, and a population containing many individuals with high fitness can be formed.
- older individuals are eliminated according to their lifespan, so that the population can maintain diversity and avoid premature convergence to local optimal solutions.
- K N is used as the standard to retain the best individual to ensure the convergence of the population.
- the three-stage evolutionary algorithm can well balance the exploration and exploitation in the search process.
- this method also designs a simple mutation operator based on a set of Reg Blocks to maintain the evolution of the population.
- the column below CIFAR-10 and CIFAR-100 represents the accuracy rate corresponding to the network structure obtained by each method for image classification. The higher the accuracy rate, the better the classification effect.
- Parameters indicates the amount of parameters of the designed network structure, and the smaller the amount of parameters, the better the network structure.
- GPU Days indicates the search time used by the method. 1 GPU Day means that it needs to run for one day on a 1080Ti graphics card. The smaller the value, the less time it takes. GPUs indicates the number of graphics cards required, and the smaller the value, the less graphics card resources are required. Table 1 shows the comparison results. The results of these algorithms were extracted from data in their respective seminal papers.
- the CIFAR-10 and CIFAR-100 datasets are public datasets.
- the CIFAR-10 dataset consists of 60,000 32x32 color images of 10 classes, and each class has 6,000 images. There are 50000 training images and 10000 testing images.
- the dataset is divided into five training batches and one testing batch, each with 10000 images.
- the test batch contains exactly 1000 randomly selected images from each category.
- the training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Overall, the sum of the five training sets contains exactly 5000 images from each class.
- the CIFAR-100 dataset has 100 classes, and each class contains 600 images. Each class has 500 training images and 100 testing images.
- the 100 classes in CIFAR-100 are divided into 20 superclasses.
- ResNet-110 method can refer to "K.He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016 The introduction in ;
- the FractalNet method can refer to the introduction in "G.Larsson, M.Maire, and G.Shakhnarovich.Fractalnet: Ultra-deep neural networks without residuals.arXiv preprint arXiv:1605.07648,2016.”;
- the Wide ResNet method can refer to the introduction in "S.Zagoruyko and N.Komodakis.Wide residual networks.arXiv preprint arXiv:1605.07146,2016.”;
- ResNeXt-29 (8x64d) method can refer to "S.Xie, R.Girshick, P.Doll'ar, Z.Tu, and K.He. Aggregated residual transformations for deep neural networks.In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1492–1500, 2017.” Introduction;
- Hierarchical Evolution method can refer to the introduction in "H.Liu, K.Simonyan, O.Vinyals, C.Fernando, and K.Kavukcuoglu.Hierarchical representations for efficient architecture search.In International Conference on Learning Representations, 2018.";
- AmoebaNet-A method can refer to "E.Real, A.Aggarwal, Y.Huang, and Q.V.Le. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4780–4789, 2019 The introduction in ;
- the NASNet-A method can refer to "B.Zoph, V.Vasudevan, J.Shlens, and Q.V.Le. Learning transferable architectures for scalable image recognition.
- B.Zoph V.Vasudevan
- J.Shlens J.Shlens
- Q.V.Le. Learning transferable architectures for scalable image recognition.
- the DARTS method can refer to the introduction in "H. Liu, K. Simonyan, and Y. Yang. Darts: Differentiable architecture search. In International Conference on Learning Representations, 2018.";
- ENAS (macro) method and ENAS (micro) method can refer to "H.Pham, M.Guan, B.Zoph, Q.Le, and J.Dean. Efficient neural architecture search via parameters sharing.In International Conference on Machine Learning, Introduction in pages 4095–4104.PMLR, 2018.”;
- Block-QNN-S method can refer to "Z.Zhong, J.Yan, W.Wu, J.Shao, and C.-L.Liu. Practical block-wise neural network architecture generation. In Proceedings of the IEEE conference on computer Vision and pattern recognition, pages 2423–2432, 2018.” Introduction;
- the TE-NAS method can refer to the introduction in "W.Chen, X.Gong, and Z.Wang. Neural architecture search on imagenet in four gpu hours: A theoretically inspired perspective. In International Conference on Learning Representations, 2020.”;
- the AE-CNN method can refer to "Y.Sun, B.Xue, M.Zhang, and G.G.Yen. Completely automated cnn architecture design based on blocks.IEEE transactions on neural networks and learning systems, 31(4):1242–1254, 2019." Introduction in;
- the CNN-GA method can refer to "Y.Sun, B.Xue, M.Zhang, G.G.Yen, and J.Lv. Automatically designing cnn architectures using the genetic algorithm for image classification. IEEE transactions on cybernetics, 50(9):3840 –3854, 2020.” Introduction in;
- the NAS method can refer to the introduction in "B.Zoph and Q.V.Le.Neural architecture search with reinforcement learning.ArXiv preprint arXiv:1611.01578,2016.”;
- NSGA-Net method can refer to "Z.Lu, I.Whalen, V.Boddeti, Y.Dhebar, K.Deb, E.Goodman, and W.Banzhaf.Nsga-net: neural architecture search using multi-objective genetic algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 419–427, 2019.” Introduction.
- Table 1 The comparison results of the application method and other algorithms on the CIFAR-10 and CIFAR-100 data sets, respectively from the test accuracy (%), the number of parameters, the number of days to search for GPUs, and the number of GPUs used.
- EX-Net achieves better test accuracy performance on CIFAR-10.
- ResNeXt-29 (8x64d) EX-Net only uses 1/8 of GPU resources.
- the network structure EX-Net searched by the method of this application can achieve higher accuracy performance.
- the parameters of EX-Net are much less than most hand-designed network structures.
- the required GPU Days EX-Net is only 0.02, which is only 1/157500 of AmoebaNet-A, and the computing resources required by the GPU are only 1/450 of AmoebaNet-A.
- DARTS and ENAS micro have slightly better accuracy performance on CIFAR-10 than EX-Net, but EX-Net has much fewer parameters.
- the search time of EX-Net is 75x and 25x less than them, respectively.
- the accuracy performance of EX-Ne is not as good as TE-NAS, the number of parameters of EX-Net and the number of GPU days consumed by EX-Net are only half of TE-NAS.
- the network structure EX-Net searched by the method of this application is competitive in test accuracy, and at the same time shows a better advantage in the number of parameters.
- EX-Net also exhibits great advantages in terms of search time cost and required computing resource consumption.
- EX-Net Compared with the fully automatic NAS algorithm, the network structure EX-Net searched by the method of this application shows advantages over Large-scale Evolution and NAS in terms of accuracy performance and number of parameters. In addition, EX-Net only consumes 0.02GPU Days, much lower than Large-scale Evolution and NAS. At the same time, EX-Net requires 800 times less GPU resources than NAS. EX-Net outperforms AE-CNN in both test accuracy and number of parameters on CIFAR-10 and CIFAR-100. EX-Net gets a better boost in terms of search time cost and required GPU resource consumption. Compared with CNN-GA, EX-Net has higher test accuracy and fewer parameters on CIFAR-10.
- EX-Net has better accuracy performance on the more complex CIFAR-100, and the number of parameters is close to CNN-GA.
- the search time of EX-Net is only about 1/1750 of that consumed by CNN-GA.
- the accuracy of NSGA-Net on CIFAR-10 is slightly better than EX-NET (97.5% vs. 96.83%), but the number of parameters of EX-Net is only 1/13 of NSGA-Net (1.9M vs. 26.8M).
- EX-Net takes 200 times less search time than NSGA-Net when using the same computing resources.
- EX-Net searched by the method of this application exceeds most of the manually designed network structures in terms of test accuracy, and has fewer parameters.
- EX-Net also shows great advantages over most automatic NAS algorithms in terms of test accuracy and number of parameters. At the same time it requires less GPU resources and reduces search time by a factor of 200 to 1,120,000.
- the advantage of EX-Net in test accuracy performance is not obvious, but the parameter amount of EX-Net is much less, and it greatly reduces the search time. Time cost and computing resource consumption.
- Part of the steps in the embodiments of the present invention can be realized by software, and the corresponding software program can be stored in a readable storage medium, such as an optical disk or a hard disk.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
一种针对图像分类的神经网络结构低成本自动搜索方法,通过设计一种基于分组卷积的网络block,以block作为基础单元构建一种可扩展的网络结构,block的可控参数化设置使得构建的网络结构的搜索空间可扩展;再结合改进的遗传算法,通过一个三阶段的自然选择策略,更好地激发搜索空间的探索性和开发性;同时引入非训练指标NTK的条件数量作为个体适应度,以极快速度搜索到高精度且低参数量的网络结构,从而在解决实际问题时实现了使用较少的计算资源来快速搜索出综合性能优越的网路结构,针对图像分类任务,通过实验证明搜索出的网路结构进行分类的精度较高。
Description
Low-cost automatic neural architecture search approach for image classification
本发明涉及一种针对图像分类的神经网络结构低成本自动搜索方法,属于图像分类技术领域。
深度学习已经在各类计算机视觉任务上取得了极大的进展。其中手工设计的神经网络结构是深度学习发展过程中的重要驱动力之一,比如VGGNet、ResNet、Inception、DenseNet。虽然手工设计的神经网络结构可以实现出色的分类性能,但结构的设计需要专业领域知识,而该类知识只有少数专家拥有。同时由于手工设计方法中需要进行重复的优化实验,会导致消耗大量的时间和计算资源。这也促使近年来对神经网络结构搜索(NAS)这一领域展开了大量的研究,用于开发神经网络结构的自动设计。
NAS算法通过自动设计网络结构,可以使得对不熟悉专业领域知识的个人来说也能使用,极大地降低了网络设计的门槛。通过NAS算法的自动化可以减少人力和成本,同时NAS算法搜索到的网络结构可以胜过手工设计的算法。然而,NAS算法寻找最佳网络结构的搜索时间和计算资源的成本通常都很昂贵。现有的大多数NAS算法主要依靠验证数据集来优化网络结构,这需要大量的时间和密集的计算资源,例如NASNet使用500个GPU,耗时4天才搜索到最佳网络。
网络结构搜索问题通常被定义为单目标优化问题,即同一时间只考虑单个目标而不是多个。大多数现实世界中的网络部署不仅需要极高的分类性能,而且还需要较低的计算资源,比如更少的网络参数量和更少的网络计算复杂度。为此近年来发展了一些手工设计的网络结构,在减少计算消耗的同时,网络仍然可以具有高精度的性能,例如MobileNet和MobileNetV2。同时,近年来一些基于多目标优化的NAS算法也相继出现,用于使网络结构更易于计算和部署。例如,NSGA-Net考虑了网络的分类精度和计算复杂度之间的权衡。LEMONADE同时考虑了网络的分类性能和网络参数的数量。
然而,这些方法仍然需要大量的计算资源并需要很长的搜索时间,但是很多计算机视觉任务存在时间上的要求,比如很多场景下的图像分类任务具有实时性的要求。因此如何使用较少的计算资源来快速搜索出综合性能优越的网路结构以适用现实世界中的实际问题仍需进一步研究。
发明内容
为了解决目前针对图像分类技术中的神经网络结构自动搜索方法成本高的问题,本发明提供了一种针对图像分类的神经网络结构低成本自动搜索方法,所述方法包括:
步骤一:针对图像分类任务,确定神经网络结构的主体框架,随机生成X个网络结构作为种群P,种群中每个个体代表一个随机生成的网络结构;其中神经网络结构的主体框架包括一个标准卷积层、unit num个Reg Unit模块和一个全局平均池化层,每个Reg Unit模块包括block num个组卷积Reg Block;且每个Reg Unit模块以50%的概率包含SENet模块,SENet模块通过Squeeze-and-Excitation来模拟注意力机制;
Reg Unit模块的个数unit num、组卷积Reg Block的个数block num以及组卷积Reg Block中每个分支中第二层卷积层的宽度width随机生成;
步骤二:设定后续种群进化阶段三阶段分隔点S
1,S
2和进化的最大代数Max_gen;
步骤三:计算种群P中每个个体的网络结构的NTK的条件数K
N作为个体的适应度;
步骤四:种群进入进化,使用锦标赛选择选择个体变异操作生成新的网络结构个体,根据当前进化的代数G所属阶段选择不同的指标进行环境选择来淘汰个体;
步骤五:达到进化的最大代数Max_gen后选择个体的适应度K
N值最小的网络结构作为搜索出的用于图像分类任务的神经网络结构。
可选的,每个网络结构中的组卷积Reg Block包含group个分支,每个分支由三个卷积层和一个池化层组成,其中池化层处于第三层;第一层和第四层卷积层使用1×1核来调整特征图的数量,第二层卷积层使用3×3核来提取特征图,所有卷积层都遵循卷积操作、ReLu激活函数和批量归一化层的顺序;第三层的池化层用于将输入数据的大小减半;所述输入数据为图像数据。
可选的,所述步骤四中根据当前进化的代数G所属阶段选择不同的指标进行环境选择来淘汰个体,包括:
在第一阶段和第三阶段,即当0<G≤S
1和S
2<G≤Max_gen时,选择个体的适应度K
N作为标准来淘汰个体;
在第二阶段,即当S
1<G≤S
2时,选择个体的寿命作为标准来淘汰个体,个体的寿命为个体经历的进化代次。
可选的,种群进化过程包括:
从种群中随机选择k个个体;从这k个个体中,根据每个个体的适应度K
N值的大小,选择前t个最佳适应度的个体作为父代个体;
t个父代个体通过一组变异算子来生成t个后代个体;后代个体生成后进行评估并添加到现有种群中;
根据当前进化代数所属的阶段,在环境选择中使用相应的标准淘汰个体;根据当前的标准淘汰t个最差的个体,使得种群保持规模不变,剩余的个体构建新的种群,进入下一代进化。
可选的,所述t个父代个体通过一组变异算子来生成t个后代个体;后代个体生成后进行评估并添加到现有种群中,包括:
在父代个体的长度内随机选择一个变异位置pos
ij,它代表第i个Reg Unit中第j个Reg Block的位置,位置由Reg Unit在网络结构中的顺序和Reg Block在Reg Unit中的位置顺序决定;
随机选择一种变异算子来执行父代个体的变异,所述变异算子包括添加算子、移除算子和改变算子;
添加算子:在变异位置pos
ij添加一个随机参数设置的Reg Block;
移除算子:移除在变异位置pos
ij上的Reg Block;
改变算子:随机改变变异位置pos
ij上的Reg Block的参数。
可选的,在实现添加算子时,若父代个体的长度达到上限,则无法实现添加算子,只能选择移除算子或改变算子;
在实现移除算子时,当父代个体的长度达到下限时,则无法进行移除算子的操作,只能选择添加算子或改变算子。
本申请还提供一种图像分类方法,所述方法采用上述方法搜索出的神经网络结构进行图像分类。
可选的,所述方法包括:
将待分类的图像输入神经网络结构中,通过标准卷积层提取待分类的图像的特征;
通过unit num个Reg Unit模块进行进一步的特征提取,其中每个Reg Unit模块中的每个组卷积Reg Block的输出由每个分支的输出特征和残差连接进行联结,再以50%的概率通过SENet模块得到特征图,进而通过全局平均池化层将Reg Units输出的特征图展平为特征向量,最后通过设置一个带有softmax层的全连接层作为分类器,将特征向量转化为最终的分类结果。
本发明有益效果是:
通过设计一种基于分组卷积的网络block,以该block作为基础单元构建一种可扩展的网络结构,block的可控参数化设置使得构建的网络结构的搜索空间可扩展。再结合一个改进的遗传算法,通过一个三阶段的自然选择策略,更好地激发搜索空间的探索性和开发性。同时引入非训练指标NTK的条件数量作为个体适应度,以极快速度地搜索到高精度且低参数量的网络结构,从而在解决实际问题时实现了使用较少的计算资源来快速搜索出综合性能优越的网路结构,针对图像分类任务,通过实验证明采用所搜索出的综合性能优越的网路结构进行分类的精度较高。
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明一个实施例中公开的基于三阶段进化算法的低成本神经网络结构搜索方法中设计的网络整体结构与所提出的新型网络块Reg Block的结构示意图。
图2是本发明一个实施例中公开的基于三阶段进化算法的低成本神经网络结构搜索方法搜索出的针对图像分类问题的网络结构的参数的选择值示意图。
图3是本发明一个实施例中公开的灵活的编码策略示意图。
图4是本发明一个实施例中公开的本申请提出的组卷积与现有技术中标准卷积的参数量对比图。
图5A是本发明一个实施例中公开的原始网络结构与没有SENet模块的网络架构之间的测试精度对比图。
图5B是本发明一个实施例中公开的原始网络结构与没有SENet模块的网络架构之间的参数量的对比图。
图6是本发明一个实施例中公开的CIFAR-10数据集上LoNAS搜索空间中K
N与网络结构测试准确率的负相关示意图。
图7是在进化长度相同(进化代数设置为50)的前提下,第二阶段的长度对测试精度的影响示意图。
图8是本发明一个实施例中公开的进化过程中添加算子和移除算子的示意图。
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
实施例一:
本实施例提供一种基于三阶段进化算法的低成本神经网络结构搜索方法,所述方法包括:
步骤1.给定关于Reg Block的特定参数集合对网络结构进行灵活的编码;同时给定三阶段的分隔点S
1,S
2和进化的最大代数Max_gen;所述Reg Block包含组卷积和SENet模块,其中包含SENet模块的概率为50%;
所述Reg Block包含group个分支,每个分支由三个卷积层和一个池化层组成,其中池化层处于第三层;第一层和第四层卷积层使用1×1核来调整特征图的数量,第二层卷积层使用3×3核来提取特征图,所有卷积层都遵循卷积操作、ReLu激活函数和批量归一化层的顺序;第三层池化层用于将输入数据的大小减半。
Reg Block的输出由每个分支的输出特征和残差连接进行联结,以50%的概率外加上一个SENet模块组成;SENet模块通过Squeeze-and-Excitation来模拟注意力机制。
步骤2.按照步骤1中的编码方式,初始化包含50个网络结构个体的种群P;
每个个体的网络结构主体包括一个标准卷积层Conv Unit、unit num个Reg Units和一个全局平均池化层,如图1(a)所示。Reg Units中每个Reg Block结构如图1(b)所示。
步骤3.使用CIFAR-10和CIFAR-100数据集计算每个网络结构的NTK的条件数K
N作为个体的适应度;
步骤4.种群进入进化;
步骤5.使用锦标赛选择选择个体变异操作生成新的网络结构个体;
步骤6.根据当前进化的代数G选择不同的指标进行环境选择来淘汰个体;
具体的:
当0<G≤S
1时,选择个体的适应度K
N作为标准来淘汰个体;
当S
2<G≤Max_gen时,选择个体的寿命作为标准来淘汰个体,个体的寿命为个体经历的进化代次;
步骤7.回到步骤5,直到到达最大进化代数。
在图像分类数据集CIFAR-10和CIFAR-100上进行实验可以证明,本发明可以仅消耗极少的计算资源情况下,使用极短的搜索时间就可以搜索到兼顾分类精度和参数量的网络结构。
实施例二
本实施例提供一种基于三阶段进化算法的低成本神经网络结构搜索方法,以针对图像分类任务进行低成本神经网络结构搜索为例进行举例进行说明,所述方法包括:
步骤1.给定关于Reg Block的特定参数集合对网络结构进行灵活的编码;同时给定三阶段的分隔点S
1,S
2和进化的最大代数Max_gen;所述Reg Block包含组卷积和SENet模块,其中包含SENet模块的概率为50%;
所述Reg Block包含group个分支,每个分支由三个卷积层和一个池化层组成,其中池化层处于第三层;第一层和第四层卷积层使用1×1核来调整特征图的数量,第二层卷积层使用3×3核来提取特征图,所有卷积层都遵循卷积操作、ReLu激活函数和批量归一化层的顺序;第三层池化层用于将输入数据的大小减半。
Reg Block的输出由每个分支的输出特征和残差连接进行联结,以50%的概率外加上一个SENet模块组成;SENet模块通过Squeeze-and-Excitation来模拟注意力机制。
传统的标准卷积可以实现良好的分类性能,但同时需要的参数也较多,不利于设计参数量较少的高精度网络结构。因此本申请在ResNet Block的基础上设计了一种称为Reg Block的新型网络块。Reg Block由组卷积和SENet模块组成,分别可以用于减少参数数量和提高分类性能。
Reg Block的拓扑如图1(b)所示。在Reg Block中,输入特征会被划分为一定数量的组,这使得标准卷积操作被分解为多个独立的卷积分支。
与标准卷积操作相比,组卷积的优点是在不显着降低分类性能的情况下,可以大大降低了网络的计算量和参数数量。Reg Block中第三层的池化层用于将输入数据的大小减半,其数量不能任意指定,需要遵循计算约束。比如,对于一个M×M的输入数据,用于将输入特征大小减半的池化层数量不能大于
否则输入数据的大小会减少到小于1从而产生错误。因此,在Reg Block中只有一部分的池化层的步幅可以设置为2用来减半特征图的数量,另一部分的步幅设置为1。
Reg Block的输出由每个分支的输出特征和残差连接进行联结,外加上一个SENet模块组成。SENet模块通过Squeeze-and-Excitation来模拟注意力机制,可以使网络结构更加关注特征中信息量最大的部分,从而提高网络结构的表征能力。
对于本申请设计的Reg Block包含组卷积和SENet模块的有效性,本申请在CIFAR-10上进行了两次消融实验,第一个是验证组卷积的有效性,第二个是调查SENet模块的有效性。实验结果如图4所示;从一个最终种群中随机选择10个个体进行这两个消融实验,这些个体都包含组卷积和若干数量的SENet模块。
在第一个消融实验中,验证了组卷积对网络结构参数数量的影响。首先记录每个个体的参数数量。然后,在保持其他拓扑结构不变的情况下,将每个个体的组卷积转化为标准卷积,并记录相应的参数数量。对比结果如图4所示,黑色代表组卷积,灰色代表标准卷积。从图4可以清楚地看出,群卷积比标准卷积拥有的参数少得多,每个包含群卷积的个体可以减少大约一半的参数数量。因此,组卷积可以有效减少网络结构中参数量。
在第二个消融实验中,验证了SENet模块对于网络测试精度和参数数量的有效性。对于每个个体,通过10次独立试验获得个体和去除所有SENet模块的个体的测试精度和参数数量。测试精度和参数个数的比较结果分别如图5(a)和图5(b)所示。虚线和黑条代表原始网络结构,实线和灰条代表移除了所有SENet模块的网络结构。图5(a)清楚地显示了与原始网络结构相比,移除了SENet模块的网络结构精度性能大幅降低,表明SENet模块可以提升网络结构的测试精度。图5(b)表明,与网络结构中的整体参数量相比,SENet模块的添加仅带来了参数数量的小幅增加,对网络参数量的影响很小。这些结果表明,SENet模块可以在仅增加少量参数量的前提下显着提高网络结构的分类性能。
步骤2.按照步骤1中的编码方式,初始化包含50个网络结构个体的种群P;
如图1(a)所示,50个网络结构每个个体的网络结构主体包括一个标准卷积层Conv Unit、unit num个Reg Units和一个全局平均池化层。
其中,标准卷积层Conv Unit使用3×3核提取初始输入数据的特征,用于图像分类任务时,初始输入数据即为待分类图像。
Reg Units的个数unit num随机生成;每个Reg Unit由block num个Reg Blocks组成。Reg Block是基于一组可以自动搜索的参数以随机的方式生成的,即Reg Blocks的个数block num随机生成。而每个Reg Unit中包含Reg Block的个数也是随机生成,每个Reg Block中分支数group随机生成,每个分支中第二个卷积层的宽度width随机生成。
由此得到一个随机个体初始化的群体P,其中包含50个个体,每个个体表示一个随机生成的网络结构,所有个体的网络结构主体均包含一个标准卷积层Conv Unit、unit num个Reg Units和一个全局平均池化层。
每个个体的网络结构的末端放置一个全局平均池化层,用于将Reg Units输出的特征图展平为特征向量。最后通过设置一个带有softmax层的全连接层作为分类器,将特征向量转化为最终的预测结果。
步骤3.使用CIFAR-10和CIFAR-100数据集计算每个网络结构的NTK的条件数K
N作为个体的适应度;
为了加速搜索过程,本发明引入NTK来表征网络结构的可训练性。更高的可训练性代表网络架构的更高分类准确度性能。NTK可用于表征无限宽或有限宽深度网络架构的梯度下降训练动态。参考W.Chen,X.Gong,and Z.Wang,“Neural architecture search on imagenet in four gpu hours:A theoretically inspired perspective,”in International Conference on Learning Representations,2020中的记载,利用CIFAR-10和CIFAR-100数据集计算每个网络结构的NTK的条件数K
N;
具体的,根据CIFAR-10和CIFAR-100数据集中的每组训练图像及对应的标签得到训练集之间NTK的特征值λ
k,根据各特征值λ
k得到该网络结构的NTK的条件数K
N,计算公式如下:
其中,λ
0表示特征值λ
k的最大值,λ
m表示特征值λ
k的最小值。
本申请通过随机生成200网络结构个体,测试他们的K
N与网络结构测试准确率之间的相关性,结果如图6所示。从图6可以看出,K
N与网络结构的精度性能呈负相关。
因此,本申请利用K
N来评估个体的适应度。在进化过程中,最小化K
N有助于找到具有高精度性能的网络结构。K
N非训练的特性可以直接节省大量的搜索时间和计算资源。
计算得到每个初始个体的K
N值。
步骤4.种群进入进化,使用锦标赛选择选择个体变异操作生成新的网络结构个体,根据当前进化的代数G选择不同的指标进行环境选择来淘汰个体;
进化过程中,首先,从种群中随机选择k个个体。从这k个个体中,根据每个个体的适应度K
N值的大小,选择前t个最佳适应度的个体作为父代个体。
然后,这t个父代个体通过一组变异算子来生成t个后代个体。后代个体生成后进行评估并添加到现有种群中。
然后,根据当前进化代数所属的阶段,在环境选择中使用相应的标准淘汰个体。根据当前的标准淘汰t个最差的个体,使得种群保持规模不变,剩余的个体构建新的种群,进入下一代进化。
具体的:
在第一阶段(0<G≤G
1)和第三阶段(G
2<G≤Max_gen)中,环境选择的标准都是基于K
N的,这分别有助于保留潜在的最优解和提升算法的开发性。在第二阶段(G
1<G≤G
2),以个体的寿命作为环境选择的标准,保证了充分的探索性。
即:
当0<G≤S
1时,选择个体的适应度K
N作为标准来淘汰个体;
当S
2<G≤Max_gen时,选择个体的寿命作为标准来淘汰个体,个体的寿命为个体经历的进化代次;
步骤5.回到步骤4,直到到达最大进化代数,选取K
N最小的个体作为搜索到的最佳网络结构。
在传统的基于进化算法的整个进化过程中通常使用固定标准进行环境选择。大多数选择的标准都是可以直接反映网络结构的性能,例如网络的测试精度和参数数量。使用这样的方法,当种群进入进化过程时,适应度较好的个体能够通过环境选择被保存在种群中。但在后续的进化过程中,将会在这些个体之间进行变异,这将导致大多数后代在进化过程中都是由这部分个体继承而来。久而久之算法只会关注这少部分优秀的个体,容易导致陷入局部最优,算法的探索能力大大降低。
因此,(E.Real,A.Aggarwal,Y.Huang,and Q.V.Le,“Regularized evolution for image classifier architecture search,”in Proceedings of the AAAI Conference on Artificial Intelligence,vol.33,no.01,2019,pp.4780–4789.)中提出了一个基于个体寿命的进化算法来解决这个问题,它以种群中个体的寿命作为环境选择的标准。在进化过程中,每次环境选择都会丢弃最老的个体,从而淘汰种群中适应度较好、生存时间较长的个体,增加其他个体进入后续进化过程的概率,使得算法可以搜索更多的空间。
但是发明人通过深入研究发现,这种基于个体寿命的进化可能存在收敛不稳定的问题。在进化初期,种群中不同个体的寿命都较为相近。如果一开始种群中存在很多适应度好的个体,那么随着个体寿命的增长,这些个体在后期的进化过程中都会被相继淘汰。这些个体在搜索空间中作为潜在的最优解被去除,会减慢种群的收敛速度,从而影响种群收敛的效果。
因此,本发明综合考虑传统进化和基于个体寿命进化,提出了一种具备多标准环境选择的新进化算法。在进化的第一阶段和第三阶段,选择与网络结构的分类性能相关的K
N作为环境选择的标准,在选择过程中每次保留具有较小K
N的个体。在第二阶段,以个体的寿命为标准,选择寿命较短的个体保存在种群中。
在第一阶段,保证种群中的优秀个体可以进入到后期的进化过程,使变异产生的后代可以从他们那里继承下来,提高种群的整体性能,保证种群中有足够多的潜在最优解。然后在第二阶段,频繁更新种群,探索更多的搜索空间,增加个体的多样性。最后,在第三阶段,在每次环境选择时都保存优秀的个体,引导种群收敛到最佳最优解,有助于确保算法的开发性。
为了验证本申请采用三阶段进化的有效性,本实施例进行五个拥有不同第二阶段长度的独立实验。每个实验种群进化的最大代数相同,记录最终种群的分类性能。通过改变第二阶段的长度,第一阶段和第三阶段的长度也相应改变,这有助于研究每个阶段的不同长度对最终种群验证精度的影响。第二阶段的长度从[0-30]改变,图7展示了不同种群的整体准确率表现。在图7中,每个矩形框代表一个种群的整体验证准确率,框的长度代表个体之间准确率的偏差,框内的点和虚线代表准确率的平均值和中位数。框两端的延长线代表种群中最大和最小精度。当第二阶段的长度设置为0时,进化算法退化为包含固定标准环境选择的传统进化算法。从图7中可以明显看出,与其他的三阶段进化算法相比,传统进化算法的平均验证精度最低。这表明由于第二阶段有助于探索更多的搜索空间,帮助种群收敛到具有更好分类性能的网络结构。当第二阶段的长度增加时,种群的平均准确率呈现先增加后降低的趋势。这可以说明较长的第二阶段导致种群在整个进化过程中花费过多时间来探索搜索空间,从而导致种群无法及时收敛到更好的解。同时,第三个矩形框及其延长线的长度最短,说明个体间的差异最小。这可以证明具有足够长度的第三阶段可以改进探索,这有助于淘汰适应度较差的个体,增加最优解的数量。这进而提高了进化算法在搜索过程中的稳定性。因此,根据上述实验结果,每个阶段的适当长度有助于有效平衡算法的探索性和开发性,从而更好地搜索最优解。
进化过程中,种群中的后代个体都是由现有个体的变异产生,以探索更多的搜索空间,增加个体的多样性。在本申请中,变异算子只在Reg Unit中进行,Conv Unit由于其特定的功能并不会涉及变异。对于变异算子,首先在父代个体的长度内随机选择一个变异位置pos
ij,它代表第i个Reg Unit中第j个Reg Block的位置,位置由Reg Unit在网络结构中的顺序和Reg Block在Reg Unit中的位置顺序决定。然后,随机选择一种变异算子来执行父代个体的变异。根据基于块的网络结构,设计的变异算子如下:
·添加(添加一个随机参数设置的Reg Block);
·移除(移除选定位置上的Reg Block);
·改变(随机改变选定位置上的Reg Block的参数)。更具体地说,在添加算子中,生成一个带有随机参数的Reg Block并插入到位置pos
ij之后。在移除算子中,位置pos
ij上的Reg Block直接被删除。
在改变算子中,随机生成一组新参数来替换位置pos
ij上Reg Block的旧参数。如图8所示,展示了添加算子和移除算子的示例,以便更好地理解变异算子。在图8(a)中,随机生成一个新的Reg Block,并且在Reg Block 11之后插入。在图8(b)中,Reg Block 23从 Reg Unit 2中移除。
需要注意的是,在实现添加算子和移除算子时需要考虑原始父代个体的长度。如果长度达到上限,则无法实现添加算子,只能选择其他两种算子。当原始个体的长度达到下限时,同样无法进行移除算子的操作。
本申请设计了一种名为Reg Block的新型网络块,它结合了组卷积和SENet模块,分别可以减少网络参数数量和提高网络分类性能。基于Reg Block,提出了一种灵活的编码策略来构建网络结构。通过设计网络结构约束,可以构建一种有限的搜索空间来发现兼顾网络分类精度和参数数量的网络结构。
本申请有益效果:
本申请通过分析神经切线核(NTK)来评估每个网络结构的适应度。NTK可以有效地表征网络结构的可训练性,NTK的数量(K
N)与网络结构的分类精度有很强的关联性。由于可以通过非训练的方式来计算指标(K
N),可以大大减少了搜索时间并节省了大量计算资源。
本申请提出了一种基于多标准环境选择的三阶段进化算法。环境选择的标准基于NTK的数量(K
N)和个体的寿命。寿命属性与每个个体相关联,表示个体经历的进化代次。在进化过程的早期阶段,根据K
N将适应度高个体保存到下一代,可以形成包含众多适应度高的个体的种群。在第二阶段,根据其寿命来淘汰年龄较大的个体,使得种群可以保持多样性,避免过早的收敛到局部最优解。第三阶段,以K
N为标准来保留最佳个体,保证种群的收敛性。该三阶段进化算法可以很好地平衡搜索过程中的探索性和开发性。此外,本方法还设计了基于一组Reg Block的简单变异算子来保持种群的进化。
为验证本申请提供的搜索方法能够在短时间内搜索高精度、低参数量的网络结构。同时仅需少量的计算资源,下面通过将本申请方法搜索到的网络结构与现有手动设计的网络结构、半自动搜索+人工微调以及全自动搜索得到的网络结构进行对比实验如下:
在CIFAR-10和CIFAR-100上进行实验,对比目前主流的算法,结果如表1所示。表1中:
CIFAR-10和CIFAR-100下面一栏代表的是各个方法得到的网络结构进行图像分类时对应的准确率,准确率越高,说明分类效果越好。
Parameters表示设计出的网络结构的参数量,参数量越少,表示网络结构越好。
GPU Days表示方法所使用的搜索时间,1 GPU Day表示在一块1080Ti显卡上需要运行一天,数值越小表示需要的时间就越少。GPUs表示需要的显卡数量,数值越小表示需要的 显卡资源越少。表一显示了比较结果。这些算法的结果均摘自他们各自发表的开创性论文中的数据。
需要进行说明的是,CIFAR-10和CIFAR-100数据集为公开的数据集,其中,CIFAR-10数据集由10个类的60000个32x32彩色图像组成,每个类有6000个图像。有50000个训练图像和10000个测试图像。数据集分为五个训练批次和一个测试批次,每个批次有10000个图像。测试批次包含来自每个类别的恰好1000个随机选择的图像。训练批次以随机顺序包含剩余图像,但一些训练批次可能包含来自一个类别的图像比另一个更多。总体来说,五个训练集之和包含来自每个类的正好5000张图像。CIFAR-100数据集有100个类,每个类包含600个图像。每类各有500个训练图像和100个测试图像。CIFAR-100中的100个类被分成20个超类。每个图像都带有一个“精细”标签(它所属的类)和一个“粗糙”标签(它所属的超类)。详细可参考https://www.cnblogs.com/cloud-ken/p/8456878.html网页上的介绍。
上述现有方法参考文献如下:
ResNet-110方法可参考“K.He,X.Zhang,S.Ren,and J.Sun.Deep residual learning for image recognition.In Proceedings of the IEEE conference on computer vision and pattern recognition,pages 770–778,2016.”中的介绍;
FractalNet方法可参考“G.Larsson,M.Maire,and G.Shakhnarovich.Fractalnet:Ultra-deep neural networks without residuals.arXiv preprint arXiv:1605.07648,2016.”中的介绍;
DenseNet(k=24)方法和DenseNet-B(k=40)可参考“G.Huang,Z.Liu,L.Van Der Maaten,and K.Q.Weinberger.Densely connected convolutional networks.In Proceedings of the IEEE conference on computer vision and pattern recognition,pages 4700–4708,2017.”中的介绍;
Wide ResNet方法可参考“S.Zagoruyko and N.Komodakis.Wide residual networks.arXiv preprint arXiv:1605.07146,2016.”中的介绍;
ResNeXt-29(8x64d)方法可参考“S.Xie,R.Girshick,P.Doll′ar,Z.Tu,and K.He.Aggregated residual transformations for deep neural networks.In Proceedings of the IEEE conference on computer vision and pattern recognition,pages 1492–1500,2017.”中的介绍;
Hierarchical Evolution方法可参考“H.Liu,K.Simonyan,O.Vinyals,C.Fernando,and K.Kavukcuoglu.Hierarchical representations for efficient architecture search.In International Conference on Learning Representations,2018.”中的介绍;
AmoebaNet-A方法可参考“E.Real,A.Aggarwal,Y.Huang,and Q.V.Le.Regularized evolution for image classifier architecture search.In Proceedings of the AAAI Conference on Artificial Intelligence,volume 33,pages 4780–4789,2019.”中的介绍;
NASNet-A方法可参考“B.Zoph,V.Vasudevan,J.Shlens,and Q.V.Le.Learning transferable architectures for scalable image recognition.In Proceedings of the IEEE conference on computer vision and pattern recognition,pages 8697–8710,2018.”中的介绍;
DARTS方法可参考“H.Liu,K.Simonyan,and Y.Yang.Darts:Differentiable architecture search.In International Conference on Learning Representations,2018.”中的介绍;
ENAS(macro)方法和ENAS(micro)方法可参考“H.Pham,M.Guan,B.Zoph,Q.Le,and J.Dean.Efficient neural architecture search via parameters sharing.In International Conference on Machine Learning,pages 4095–4104.PMLR,2018.”中的介绍;
Block-QNN-S方法可参考“Z.Zhong,J.Yan,W.Wu,J.Shao,and C.-L.Liu.Practical block-wise neural network architecture generation.In Proceedings of the IEEE conference on computer vision and pattern recognition,pages 2423–2432,2018.”中的介绍;
TE-NAS方法可参考“W.Chen,X.Gong,and Z.Wang.Neural architecture search on imagenet in four gpu hours:A theoretically inspired perspective.In International Conference on Learning Representations,2020.”中的介绍;
Large-scale Evolution方法可参考“E.Real,S.Moore,A.Selle,S.Saxena,Y.L.Suematsu,J.Tan,Q.V.Le,and A.Kurakin.Large-scale evolution of image classifiers.In International Conference on Machine Learning,pages 2902–2911.PMLR,2017.”中的介绍;
AE-CNN方法可参考“Y.Sun,B.Xue,M.Zhang,and G.G.Yen.Completely automated cnn architecture design based on blocks.IEEE transactions on neural networks and learning systems,31(4):1242–1254,2019.”中的介绍;
CNN-GA方法可参考“Y.Sun,B.Xue,M.Zhang,G.G.Yen,and J.Lv.Automatically designing cnn architectures using the genetic algorithm for image classification.IEEE transactions on cybernetics,50(9):3840–3854,2020.”中的介绍;
NAS方法可参考“B.Zoph and Q.V.Le.Neural architecture search with reinforcement learning.ArXiv preprint arXiv:1611.01578,2016.”中的介绍;
NSGA-Net方法可参考“Z.Lu,I.Whalen,V.Boddeti,Y.Dhebar,K.Deb,E.Goodman,and W.Banzhaf.Nsga-net:neural architecture search using multi-objective genetic algorithm.In Proceedings of the Genetic and Evolutionary Computation Conference,pages 419–427,2019.”中的介绍。
表1中采用本发明提出的方法搜索到的最佳网络结构表示为EX-Net。
表1:本申请方法与其他算法在CIFAR-10和CIFAR-100数据集上的比较结果,分别从测试准确率(%)、参数数量、搜索GPU天数以及使用的GPU数量方面进行比较
分析如下:
1)与手工设计的网络的比较结果
从表1可以看出,与手动设计的最先进网络结构相比,本申请方法搜索得到的网络结构EX-Net在CIFAR-10和CIFAR-100的测试精度和参数数量远优于FractalNet和Wide ResNet。对于DenseNet(k=24),EX-Net在CIFAR-10和CIFAR-100上表现出更好的测试精度,而EX-Net在CIFAR-10和CIFAR-100上获得的参数量仅为DenseNet(k=24)的6.9%和15.8%。EX-Net中的参数数量略高于ResNet-100,但EX-Net在两个数据集上的测试准确率都有很大的提升,分别提升了3.5%和8.9%。与DenseNet-B(k=40)和ResNeXt-29(8x64d)相比,EX-Net在CIFAR-10上的测试精度性能更好。在CIFAR-100上,虽然 EX-Net的准确率稍逊他们,但EX-Net的参数数量仅为DenseNet-B(k=40)和ResNeXt-29(8x64d)参数数量的16.8%和12.5%,参数量大大减少。与ResNeXt-29(8x64d)相比,EX-Net仅使用1/8的GPU资源。
因此,与手动设计的最先进的网络结构相比,本申请方法搜索得到的网络结构EX-Net可以实现更高的精度性能。同时EX-Net的参数比大多数手工设计的网络结构少得多。
2)与半自动的NAS算法的比较结果
从表1可以看出,对比半自动NAS算法,与Hierarchical Evolution、Block-QNN-S和ENAS(macro)相比,本申请方法搜索得到的网络结构EX-Net在测试精度和参数数量方面完全优于它们,同时大大减少了搜索时间成本(减少16~4500倍)。与NASNet-A相比,EX-Net在测试精度方面略差于它,但EX-Net的参数量比NASNet-A少得多。此外,EX-Net搜索速度比NASNet-A快100000倍,并且消耗的GPU资源仅为NASNet-A消耗的1/500。EX-Net比AmoebaNet-A具有更好的测试精度和更少的参数。所需的GPU Days EX-Net仅为0.02,仅为AmoebaNet-A的1/157500,GPU所需的计算资源仅为AmoebaNet-A的1/450。DARTS和ENAS(micro)在CIFAR-10上的精度性能略好于EX-Net,但是EX-Net的参数要少得多。在GPU资源消耗相同的情况下,EX-Net的搜索时间分别比它们少75倍和25倍。另外,虽然EX-Ne的精度性能不如TE-NAS,但EX-Net的参数数量和EX-Net消耗的GPU天数都只有TE-NAS的一半。
因此,与半自动的NAS算法相比,本申请方法搜索得到的网络结构EX-Net在测试精度上具有竞争力,同时在参数数量上表现出更好的优势。此外,EX-Net在搜索时间成本和所需的计算资源消耗方面也展示出极大的优势。
3)与全自动NAS算法的比较结果
对比全自动NAS算法,本申请方法搜索得到的网络结构EX-Net在精度性能和参数数量方面都表现出优于Large-scale Evolution和NAS的优势。此外,EX-Net仅消耗0.02GPU Days,远低于Large-scale Evolution和NAS。同时EX-Net所需的GPU资源比NAS少800倍。EX-Net在CIFAR-10和CIFAR-100上的测试精度和参数数量方面都优于AE-CNN。EX-Net在搜索时间成本和所需的GPU资源消耗方面获得更好的提升。与CNN-GA相比,EX-Net在CIFAR-10上的测试精度更高,参数更少。此外,EX-Net在更复杂的CIFAR-100上有更好的准确率表现,同时参数数量接近CNN-GA。EX-Net的搜索时间仅为CNN-GA消耗的大约1/1750。NSGA-Net在CIFAR-10上的准确率表现略好于EX-NET(97.5%对96.83%),但EX-Net的参数数量仅为NSGA-Net的1/13(1.9M对26.8M)。在使用 相同的计算资源时,EX-Net的搜索时间比NSGA-Net少200倍。
因此,在全自动的NAS算法的比较中,本申请方法搜索得到的网络结构EX-Net在所有目标上都表现出很大的优势。
结论
总而言之,本申请方法搜索得到的网络结构EX-Net在测试精度上超过了大多数手工设计的网络结构,同时参数更少。EX-Net在测试精度和参数数量方面也比大多数自动NAS算法显示出很大的优势。同时它需要的GPU资源更少,并将搜索时间减少了200倍到1120000倍。与半自动NAS算法相比,考虑到搜索空间的差异和人工设计的参与,EX-Net在测试精度性能上的优势并不明显,但EX-Net的参数量要少得多,并且大大降低了搜索时间成本和计算资源消耗。
本发明实施例中的部分步骤,可以利用软件实现,相应的软件程序可以存储在可读取的存储介质中,如光盘或硬盘等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。
Claims (9)
- 一种针对图像分类的神经网络结构低成本自动搜索方法,其特征在于,所述方法包括:步骤一:针对图像分类任务,确定神经网络结构的主体框架,随机生成X个网络结构作为种群P,种群中每个个体代表一个随机生成的网络结构;其中神经网络结构的主体框架包括一个标准卷积层、unit num个Reg Unit模块和一个全局平均池化层,每个Reg Unit模块包括block num个组卷积Reg Block;且每个Reg Unit模块以50%的概率包含SENet模块,SENet模块通过Squeeze-and-Excitation来模拟注意力机制;Reg Unit模块的个数unit num、组卷积Reg Block的个数block num以及组卷积Reg Block的分支数group、每个分支中第二层卷积层的宽度width均随机生成;步骤二:设定后续种群进化阶段三阶段分隔点S 1,S 2和进化的最大代数Max_gen;步骤三:计算种群P中每个个体的网络结构的NTK的条件数K N作为个体的适应度;步骤四:种群进入进化,使用锦标赛选择选择个体变异操作生成新的网络结构个体,根据当前进化的代数G所属阶段选择不同的指标进行环境选择来淘汰个体;步骤五:达到进化的最大代数Max_gen后选择个体的适应度K N值最小的网络结构作为搜索出的用于图像分类任务的神经网络结构。
- 根据权利要求1所述的方法,其特征在于,每个网络结构中的组卷积Reg Block包含group个分支,每个分支由三个卷积层和一个池化层组成,其中池化层处于第三层;第一层和第四层卷积层使用1×1核来调整特征图的数量,第二层卷积层使用3×3核来提取特征图,所有卷积层都遵循卷积操作、ReLu激活函数和批量归一化层的顺序;第三层的池化层用于将输入数据的大小减半;所述输入数据为图像数据。
- 根据权利要求3所述的方法,其特征在于,所述步骤四中根据当前进化的代数G所属阶段选择不同的指标进行环境选择来淘汰个体,包括:在第一阶段和第三阶段,即当0<G≤S 1和S 2<G≤Max_gen时,选择个体的适应度K N作为标准来淘汰个体;在第二阶段,即当S 1<G≤S 2时,选择个体的寿命作为标准来淘汰个体,个体的寿命为个体经历的进化代次。
- 根据权利要求4所述的方法,其特征在于,种群进化过程包括:从种群中随机选择k个个体;从这k个个体中,根据每个个体的适应度K N值的大小,选择前t个最佳适应度的个体作为父代个体;t个父代个体通过一组变异算子来生成t个后代个体;后代个体生成后进行评估并添加到现有种群中;根据当前进化代数所属的阶段,在环境选择中使用相应的标准淘汰个体;根据当前的标准淘汰t个最差的个体,使得种群保持规模不变,剩余的个体构建新的种群,进入下一代进化。
- 根据权利要求5所述的方法,其特征在于,所述t个父代个体通过一组变异算子来生成t个后代个体;后代个体生成后进行评估并添加到现有种群中,包括:在父代个体的长度内随机选择一个变异位置pos ij,它代表第i个Reg Unit中第j个Reg Block的位置,位置由Reg Unit在网络结构中的顺序和Reg Block在Reg Unit中的位置顺序决定;随机选择一种变异算子来执行父代个体的变异,所述变异算子包括添加算子、移除算子和改变算子;添加算子:在变异位置pos ij添加一个随机参数设置的Reg Block;移除算子:移除在变异位置pos ij上的Reg Block;改变算子:随机改变变异位置pos ij上的Reg Block的参数。
- 根据权利要求6所述的方法,其特征在于,在实现添加算子时,若父代个体的长度达到上限,则无法实现添加算子,只能选择移除算子或改变算子;在实现移除算子时,当父代个体的长度达到下限时,则无法进行移除算子的操作,只能选择添加算子或改变算子。
- 一种图像分类方法,其特征在于,所述方法采用权利要求1-7任一所述的方法搜索出的神经网络结构进行图像分类。
- 根据权利要求8所述的方法,其特征在于,所述方法包括:将待分类的图像输入神经网络结构中,通过标准卷积层提取待分类的图像的特征;通过unit num个Reg Unit模块进行进一步的特征提取,其中每个Reg Unit模块中的每个组卷积Reg Block的输出由每个分支的输出特征和残差连接进行联结,再以50%的概率通过SENet模块得到特征图,进而通过全局平均池化层将Reg Units输出的特征图展平为特征向量,最后通过设置一个带有softmax层的全连接层作为分类器,将特征向量转化为最终的分类结果。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111669013.4A CN114299344A (zh) | 2021-12-31 | 2021-12-31 | 一种针对图像分类的神经网络结构低成本自动搜索方法 |
CN202111669013.4 | 2021-12-31 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023124342A1 true WO2023124342A1 (zh) | 2023-07-06 |
Family
ID=80973023
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/123299 WO2023124342A1 (zh) | 2021-12-31 | 2022-09-30 | 一种针对图像分类的神经网络结构低成本自动搜索方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114299344A (zh) |
WO (1) | WO2023124342A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118153633A (zh) * | 2023-07-14 | 2024-06-07 | 天津大学 | 一种改进的cnn架构优化设计方法 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114299344A (zh) * | 2021-12-31 | 2022-04-08 | 江南大学 | 一种针对图像分类的神经网络结构低成本自动搜索方法 |
CN114926698B (zh) * | 2022-07-19 | 2022-10-14 | 深圳市南方硅谷半导体股份有限公司 | 基于演化博弈论的神经网络架构搜索的图像分类方法 |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105279555A (zh) * | 2015-10-28 | 2016-01-27 | 清华大学 | 一种基于进化算法的自适应学习神经网络实现方法 |
US20180137404A1 (en) * | 2016-11-15 | 2018-05-17 | International Business Machines Corporation | Joint learning of local and global features for entity linking via neural networks |
US20190286984A1 (en) * | 2018-03-13 | 2019-09-19 | Google Llc | Neural architecture search by proxy |
CN111414849A (zh) * | 2020-03-19 | 2020-07-14 | 四川大学 | 一种基于演化卷积神经网络的人脸识别方法 |
CN111415009A (zh) * | 2020-03-19 | 2020-07-14 | 四川大学 | 基于遗传算法的卷积变积分自编码器网络结构搜索方法 |
CN111898689A (zh) * | 2020-08-05 | 2020-11-06 | 中南大学 | 一种基于神经网络架构搜索的图像分类方法 |
CN112465120A (zh) * | 2020-12-08 | 2021-03-09 | 上海悠络客电子科技股份有限公司 | 一种基于进化方法的快速注意力神经网络架构搜索方法 |
CN112561039A (zh) * | 2020-12-26 | 2021-03-26 | 上海悠络客电子科技股份有限公司 | 一种改进的基于超网络的进化神经网络架构搜索方法 |
CN114299344A (zh) * | 2021-12-31 | 2022-04-08 | 江南大学 | 一种针对图像分类的神经网络结构低成本自动搜索方法 |
-
2021
- 2021-12-31 CN CN202111669013.4A patent/CN114299344A/zh active Pending
-
2022
- 2022-09-30 WO PCT/CN2022/123299 patent/WO2023124342A1/zh unknown
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105279555A (zh) * | 2015-10-28 | 2016-01-27 | 清华大学 | 一种基于进化算法的自适应学习神经网络实现方法 |
US20180137404A1 (en) * | 2016-11-15 | 2018-05-17 | International Business Machines Corporation | Joint learning of local and global features for entity linking via neural networks |
US20190286984A1 (en) * | 2018-03-13 | 2019-09-19 | Google Llc | Neural architecture search by proxy |
CN111414849A (zh) * | 2020-03-19 | 2020-07-14 | 四川大学 | 一种基于演化卷积神经网络的人脸识别方法 |
CN111415009A (zh) * | 2020-03-19 | 2020-07-14 | 四川大学 | 基于遗传算法的卷积变积分自编码器网络结构搜索方法 |
CN111898689A (zh) * | 2020-08-05 | 2020-11-06 | 中南大学 | 一种基于神经网络架构搜索的图像分类方法 |
CN112465120A (zh) * | 2020-12-08 | 2021-03-09 | 上海悠络客电子科技股份有限公司 | 一种基于进化方法的快速注意力神经网络架构搜索方法 |
CN112561039A (zh) * | 2020-12-26 | 2021-03-26 | 上海悠络客电子科技股份有限公司 | 一种改进的基于超网络的进化神经网络架构搜索方法 |
CN114299344A (zh) * | 2021-12-31 | 2022-04-08 | 江南大学 | 一种针对图像分类的神经网络结构低成本自动搜索方法 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118153633A (zh) * | 2023-07-14 | 2024-06-07 | 天津大学 | 一种改进的cnn架构优化设计方法 |
Also Published As
Publication number | Publication date |
---|---|
CN114299344A (zh) | 2022-04-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023124342A1 (zh) | 一种针对图像分类的神经网络结构低成本自动搜索方法 | |
CN110263227B (zh) | 基于图神经网络的团伙发现方法和系统 | |
Wang et al. | Architecture evolution of convolutional neural network using monarch butterfly optimization | |
CN111882040A (zh) | 基于通道数量搜索的卷积神经网络压缩方法 | |
WO2023217290A1 (zh) | 基于图神经网络的基因表型预测 | |
US20230153577A1 (en) | Trust-region aware neural network architecture search for knowledge distillation | |
CN112199536A (zh) | 一种基于跨模态的快速多标签图像分类方法和系统 | |
CN115661550A (zh) | 基于生成对抗网络的图数据类别不平衡分类方法及装置 | |
CN110033089A (zh) | 基于分布式估计算法的深度神经网络参数优化方法及系统 | |
Loni et al. | ADONN: adaptive design of optimized deep neural networks for embedded systems | |
WO2023091428A1 (en) | Trust-region aware neural network architecture search for knowledge distillation | |
Zhu et al. | Saswot: Real-time semantic segmentation architecture search without training | |
CN114897085A (zh) | 一种基于封闭子图链路预测的聚类方法及计算机设备 | |
CN114329124A (zh) | 基于梯度重优化的半监督小样本分类方法 | |
CN111783688B (zh) | 一种基于卷积神经网络的遥感图像场景分类方法 | |
Zhong et al. | Rebalanced Siamese Contrastive Mining for Long-Tailed Recognition | |
CN116701647A (zh) | 基于嵌入向量与迁移学习融合的知识图谱补全方法及装置 | |
Wang et al. | psoResNet: An improved PSO-based residual network search algorithm | |
CN111553442A (zh) | 一种分类器链标签序列的优化方法及系统 | |
Tang et al. | Training Compact DNNs with ℓ1/2 Regularization | |
Hao et al. | Architecture self-attention mechanism: Nonlinear optimization for neural architecture search | |
Xia et al. | Efficient synthesis of compact deep neural networks | |
CN113537325B (zh) | 一种用于图像分类的基于提取高低层特征逻辑的深度学习方法 | |
CN114595641A (zh) | 组合优化问题的求解方法和系统 | |
CN115063374A (zh) | 模型训练、人脸图像质量评分方法、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22913607 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |