CN112465120A - Fast attention neural network architecture searching method based on evolution method - Google Patents
Fast attention neural network architecture searching method based on evolution method Download PDFInfo
- Publication number
- CN112465120A CN112465120A CN202011424217.7A CN202011424217A CN112465120A CN 112465120 A CN112465120 A CN 112465120A CN 202011424217 A CN202011424217 A CN 202011424217A CN 112465120 A CN112465120 A CN 112465120A
- Authority
- CN
- China
- Prior art keywords
- individual
- population
- neural network
- network architecture
- neuron
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 144
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 105
- 238000012549 training Methods 0.000 claims abstract description 52
- 230000007246 mechanism Effects 0.000 claims abstract description 31
- 238000012795 verification Methods 0.000 claims abstract description 21
- 238000012360 testing method Methods 0.000 claims abstract description 18
- 238000005457 optimization Methods 0.000 claims abstract description 16
- 238000012163 sequencing technique Methods 0.000 claims abstract description 3
- 210000002569 neuron Anatomy 0.000 claims description 88
- 230000008569 process Effects 0.000 claims description 20
- 238000011176 pooling Methods 0.000 claims description 15
- 238000013507 mapping Methods 0.000 claims description 13
- 238000010586 diagram Methods 0.000 claims description 12
- 238000005070 sampling Methods 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000006243 chemical reaction Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 6
- 230000035772 mutation Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000009827 uniform distribution Methods 0.000 claims description 3
- 238000012935 Averaging Methods 0.000 claims 1
- 230000001131 transforming effect Effects 0.000 claims 1
- 238000011156 evaluation Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 9
- 108090000623 proteins and genes Proteins 0.000 description 8
- 230000009467 reduction Effects 0.000 description 5
- 238000004088 simulation Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 2
- 230000002787 reinforcement Effects 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012854 evaluation process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/086—Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Physiology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Genetics & Genomics (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a rapid attention neural network architecture searching method based on an evolution method, which comprises the following steps: (1) generating a neural network architecture search space containing an attention mechanism based on a predetermined coding scheme and a population initialization scheme; (2) the method comprises the steps of taking an evolution method as a search method, taking the image classification accuracy of a verification set as an optimization target, and simultaneously optimizing the structure of an individual and the weight of a one-shot model through an evolution mechanism and a back propagation gradient value; after the search task of the evolution method is finished, sequencing the individuals in the population, and reserving the individual with the maximum fitness value as the searched optimal result; (3) decoding the individuals searched by the evolution to generate a neural network architecture, resetting the structural weight, training the neural network architecture by using a training data set until convergence, and testing the performance of the neural network architecture.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a fast attention neural network architecture searching method based on an evolution method.
Background
Deep neural networks have enjoyed significant success in dealing with various computer vision tasks such as target classification, target detection, target segmentation, and target tracking. Wherein the target classification task is the basis of other tasks. The performance of a deep neural network depends to a large extent on its architecture. Therefore, in order to deliver the maximum performance of deep neural networks, human experts are typically required to manually tune the model architecture using expert knowledge and corresponding data sets. The framework tuning, training and evaluation of the neural network are an iterative process, and need to be repeatedly performed and continuously optimized. The process not only consumes a large amount of labor and time cost, but also improves the threshold of popularization of the artificial intelligence technology in the traditional industries, such as medical treatment, education, finance and the like. Therefore, a method for automatically generating a network model, namely, a Neural Architecture Search (NAS) technology, has attracted extensive attention of researchers.
The NAS technology can automatically generate a deep neural network architecture through a method based on a task target and a corresponding data set, so that the labor and time cost consumed by manually building the neural network architecture is reduced. The NAS technology is generally divided into three steps: first, a search space, i.e., a set of neural network architectures, is defined. And secondly, exploring an excellent neural network architecture in a search space by using a search method. And thirdly, evaluating the explored neural network architecture. The objective of NAS technology is to find a network architecture with excellent performance in a huge search space. Therefore, the second step and the third step are a repeated iteration process, and the evaluation result of the model architecture is usually fed back to the search method for guiding the search method to explore a more effective neural network architecture. When the method iteration is completed, the structure with the best performance evaluation is taken as the output of the method. The most common method of architecture evaluation is to input a training data set into the structure to train to converge, and then test the performance of the structure through a validation set. Each training and evaluation of the architecture consumes significant computational resources and time costs. This computational bottleneck has made NAS technology difficult to further generalize. Therefore, how to improve the efficiency of searching the neural network architecture and reduce the computation cost becomes an urgent issue to be solved in the NAS technology.
Currently, mainstream NAS search methods are mainly classified into three types: the NAS method based on the reinforcement learning, the NAS method based on the gradient and the NAS method based on the evolution method. The reinforcement learning based NAS approach typically requires the construction of a controller, which typically samples the model architecture from a search space. Because the performance of the sampled model architecture is determined by the performance of the controller, the controller needs to repeatedly evaluate different model architectures and update the controller in an iterative manner, thereby generating an effective model architecture. The gradient-based NAS method is based on continuous relaxation (relax) representation of a structure, converts architecture search into an optimization problem of a continuous space, and optimizes a network architecture and network parameters simultaneously by using gradient descent. The essence of the evolution-based NAS approach is to explore the search space by means of natural selection. Neural network architectures in the search space are evolved as populations. Each individual in the population is a neural network architecture. The performance of the neural network architecture on the verification set is represented as fitness value of the individual. During the course of evolution, one or more individuals were selected as parents based on their fitness values. And then generating offspring individuals through crossover and mutation operators. After the child individuals complete the performance evaluation, the child individuals join the parent population. And then generating a next generation population through environment selection. This process is repeated until the preset evolution algebra iteration is completed. And finally, taking the individual with the best fitness value as the output of the evolution method.
In order to improve the search efficiency of the NAS method, a super-network based architecture search (one-shot architecture search) becomes a focus of attention of researchers. The architecture search technology based on the super network generally considers a search space as a super network model (one-shot model), and a neural network architecture contained in the search space is considered as a sub-network of the super network. The technique generally generates a weight value of each operation by training a super network, and then the sub-networks are evaluated by sharing the weight value of the super network, thereby reducing the calculation cost of network evaluation. Chinese patent CN110851566 proposes an improved differentiable network architecture search method. The method is a NAS technology based on gradient, all possible edges and operations in the whole super network are optimized, and therefore the discretization optimal sub network is determined. In addition, the method utilizes global normalization operation, reduces the influence of local bias in the network, and solves the problem that the two-layer optimization mode in the traditional differentiable network architecture searching method aggravates the mutual competition between weight coupling and weights at the later stage of searching, and further causes the network to be difficult to train to converge. Chinese patent CN110569972 proposes a method and an apparatus for constructing a search space of a super-network (one-shot) model, and an electronic device. The method constructs a super network by stacking unit structures. The unit structure in the super network is divided into two types: a normal unit and a down-sampling unit. Therefore, the optimization target of the search method is converted from the optimization of the overall architecture of the neural network into the optimization of the internal structures of the two unit structures, the calculation cost of the structure optimization is further reduced, and the search efficiency is improved.
However, there are some limitations to current neural network architecture search techniques based on hypernetworks. First, when the size of the super-network is large, it takes a lot of time to train the super-network to converge. Secondly, because the neural network structure shares the weight of the hyper-network for performance evaluation, a large deviation may be introduced during the structural performance evaluation, and further the performance of the structure may be underestimated and the performance ranking of the structure is inaccurate, and the performance of the final structure cannot be guaranteed.
Disclosure of Invention
The invention aims to solve the problems in the prior art, provides a fast attention neural network architecture searching method based on an evolutionary method, and applies the method to computer vision tasks. According to the method, the search space based on the one-shot model is established through the initial population in the evolution method, and the problems that the one-shot model is too large, so that the training is difficult and the training time is too long are solved. The method takes the evolution method as a search method, optimizes the individual structure and the weight of the one-shot model simultaneously through the evolution mechanism and the back propagation gradient value in the process of evolution search, and effectively improves the efficiency of evolution search. The invention codes the light-weight channel attention mechanism into the search space, and adaptively integrates the light-weight channel attention mechanism into the neural network architecture through the method, thereby further improving the performance of the final neural network architecture.
In order to achieve the purpose, the technical scheme adopted by the application is as follows: a fast attention neural network architecture searching method based on an evolution method comprises the following steps:
(1) generating a neural network architecture search space containing an attention mechanism based on a predetermined coding scheme and a population initialization scheme;
(2) the method comprises the steps of taking an evolution method as a search method, taking the image classification accuracy of a verification set as an optimization target, and simultaneously optimizing the structure of an individual and the weight of a one-shot model through an evolution mechanism and a back propagation gradient value; after the search task of the evolution method is finished, sequencing the individuals in the population, and reserving the individual with the maximum fitness value as the searched optimal result;
(3) decoding the individuals obtained by the evolutionary search to generate a neural network architecture, resetting the structural weight, training the neural network architecture by using a training data set until convergence, and testing the performance of the neural network architecture.
In the step (1), the predetermined coding scheme is a one-shot model coding method based on an evolutionary method. The method specifically comprises the following steps:
(201) the neural network architecture is divided into the following components according to different scales: network, sub-network block, neuron;
(202) the neuron is the minimum unit in a neural network architecture, and each computing neuron comprises two computing nodes; encoding the neurons as integer quadruplets Indicating the index of the neuron connected to neuron i, i.e. neuronAs two inputs to neuron i;representing two calculation node types contained in a neuron i, and respectively processing two inputs of the neuron i;
(203) the output calculation formula of the neuron i is as follows:
wherein HiRepresents the output of neuron i;the input of the neuron i is selected,two computational nodes contained within neuron i;are respectively composed ofAfter processing, the output data O of the two computing nodes is processeda(Ia),Ob(Ib) Adding as output H of neuron ii;
(204) A sub-net block contains M neurons, M being an integer greater than one. Then, at this time, the coding structure of a sub-network block is:
(205) stacking N sub-network blocks of different types to form a neural network architecture; n is an integer greater than one; then, at this time, the coding structure of a neural network architecture is:
(206) the sub-network blocks are sequentially stacked and connected to form a complete neural network architecture; the coding structure of a neural network architecture is called an individual in a population.
In the step (1), the population initialization scheme is to randomly generate individuals through uniform distribution based on the coding scheme until reaching a predetermined population scale of the initial population; all individuals in the initial population form a one-shot model and cover the whole search space; that is, each individual is a sub-model of the one-shot model.
In the step (1), the feature reconstruction computing node containing the attention mechanism is adaptively integrated into a neural network architecture in the evolution search process, so that the expression capability of the neural network is improved. The light-weight multi-scale channel attention mechanism consists of a 2-dimensional convolution layer and a light-weight channel attention mechanism; the 2-dimensional convolutional layer can extract feature information of different scales, and the channel attention mechanism is used for reducing redundant information in a channel, recalibrating channel feature response and improving the expression capacity of the neural network; the method comprises the following specific steps:
(401) the structure of the feature reconstruction computing node is as follows: the system comprises a 2-dimensional convolutional layer, a global average pooling layer, a 1-dimensional convolutional layer, a sigmoid layer and a multiplication module; the process of feature reconstruction is as follows:
(402) the 2-dimensional convolution layer is used for extracting the characteristic information of the input characteristic diagram and collecting the input characteristic diagramConversion into a collection of transformation profiles H × W represents the size of the input feature map, s represents the number of channels of the input feature map set, H × W represents the size of the converted feature map, and c represents the number of channels of the converted feature map set;
(403) inputting the conversion feature map set into a global average pooling layer, and extracting global features of each feature map by using the global average pooling layer; the formula of the global average pooling layer is as follows:
for a one-dimensional vector z e { z1,z2,......,zc-said one-dimensional vector represents the characteristics of c channels;
(404) and completing the feature mapping of adjacent channels by using the one-dimensional convolution layer. The feature mapping formula is as follows:
Fl=C1Dk(zl)
wherein C1D represents a one-dimensional convolutional layer; k represents the size of the one-dimensional convolutional layer convolution kernel; z is a radical oflFeatures representing the 1 st channel, based on step (2), zlE.g. z. The feature set after the one-dimensional convolutional layer mapping is denoted as F, and F ═ F1,F2,......,Fc};FlRepresents the feature of the 1 st channel after one-dimensional convolutional layer mapping, and Fl∈F;
(405) Forming a weight set w of c channels by using a sigmoid activation function, wherein w is { w ═ w1,w2,......,wc};
Wm=σsigmoid(Fm)
(406) In the formula, wmRepresents the weight of the m-th channel, and wm∈w;FmCharacteristic value representing the mth channel, based on step (3), Fm∈F;
(407) Each channel is given a respective weight by means of a multiplication module. The formula of the multiplication module is as follows:
U=u*w
wherein u represents a feature map set of a channel, and based on step (2), represents matrix dot product; w represents a weight set of channels, based on step (4), w ═ w1,w2,......,wc};
(408) The output of the convolutional layer is reconstructed using U as a feature.
In the step (2), the searching method is a fast attention neural network architecture searching method (SIENAS) based on evolutionary computation; optimizing the structure of the individual and the weight of the one-shot model through an evolutionary mechanism and a back propagation gradient value, wherein the individual in the parent population is trained through a sampling training strategy to generate the weight for the neuron in the one-shot model; generating a child population, namely a new group of neural network structure topological graphs, by a node inheritance strategy based on the parent population, inheriting the weight of the corresponding neuron in the parent population as an initial weight, and directly evaluating without any training; that is, the goal of the sianas is to adjust the computational nodes inside the neurons and optimize the connection of the neurons in each sub-network block, thereby realizing the overall optimization of the neural network architecture; the method specifically comprises the following steps:
(501) classifying the image dataset of the scaled image-label pairs and dividing the image classification dataset into a training dataset DtrainVerifying the data set DvalidTest data set Dtest;
(502) Initializing SIENAS strategy parameters; the method comprises the following steps: the number N of sub-network block types contained in the network architecture, the number M of neurons contained in each sub-network block, the initial number C of channels of the sub-network block, and the population size P of the initial populationnCross rate PcThe rate of variation PmBatch data volume batchsizeMaximum evolution algebra G. Initializing SGD optimizer parameters; the method comprises the following steps: the method comprises the following steps of (1) obtaining an initial learning rate r, a weight attenuation coefficient w, a learning rate adjustment strategy and a momentum coefficient m;
(503) a training dataset D in the image classification datasettrainVerification data set DvalidAs input to the SIENAS; dtrainContaining train in combinationnumberOpening a picture;
(504) let G be 0;
(505) taking the G-th generation population as a G-th generation parent population, training all individuals in the parent population by utilizing a sampling training strategy, and optimizing the weight of the neuron in each individual so as to optimize all neuron weights in the one-shot model; the method specifically comprises the following steps:
1) according to batchsizeD istrainDivided into several batches of data (mini-batch). Then the D istrainJ batches of data are contained in the total, and the calculation formula is as follows:
2) let f be 1;
3) training the individual based on the f batch of training data, calculating a loss value by using a cross entropy loss function, and optimizing the weight of the neuron in the individual through a back propagation gradient value;
4) judging whether the f +1 th batch of data meets the conditions: if f +1 is larger than j, entering step 5); otherwise, making f equal to f +1, and returning to the step 3);
5) stopping training, and storing the weight values of all neurons in the parent population;
(506) after the step (505) is finished, evaluating the fitness values of all individuals in the parent population and recording; the fitness value of the parent individual is the classification accuracy of the individual based on the verification data set;
(507) based on P in the parent populationnIndividual and fitness value thereof, and P is generated by using node inheritance strategynAnd taking the new individual as a filial generation population Q of the G generation. The node inheritance strategy is based on the neural network topological graph of the parent individual, a new neural network topological graph is generated through selection, intersection and variation in an evolution mechanism, and the weight of the neuron in the parent individual is distributed to the neuron of the corresponding child individual; as shown in figure 5 of the drawings,the method specifically comprises the following steps:
1) and (6) selecting operation.
Selecting two individuals from a parent population randomly by using a championship selection strategy, and reserving the individual with a higher fitness value; repeating the process until two individuals are selected;
2) performing cross operation;
generating a random value kc;
If k isc≤PcPerforming a single-point cross operation on the two selected individuals to generate two new individuals; saving the new individual as a progeny individual to the progeny population Q;
if k isc>PcThen, no operation is carried out on the selected individuals, and the selected individuals are used as child individuals and stored in the child population Q;
3) repeatedly executing the step 1) and the step 2) until the number of the individuals in the filial generation population Q reaches PnA plurality of;
4) performing mutation operation;
randomly generating a variation rate k for each individual in Q for a progeny population Q of the G-th generation populationm;
If k ism≤PmThen, the exchange mutation operation is performed on the individual to generate a new individual. Storing the new individual to a progeny population Q, and deleting the original individual;
if k ism>PmThen no operation is performed on the individual;
5) carrying out weight inheritance operation;
based on the step 4), the neuron in each descendant individual inherits the weight of the corresponding neuron in the parent individual in turn as an initial weight;
(508) based on step (507), evaluating fitness values of the offspring individuals and recording; the fitness value of the offspring individual is the classification accuracy of the individual on the verification data set;
(509) merging the G generation parent population and the child population Q, wherein the G generation population contains 2Pn(ii) individuals;
(510) based on step (509), the individuals are ranked according to their fitness values. Generating a G +1 generation population through an environment selection strategy; the population scale of the G +1 generation population needs to be the same as that of the G generation population;
(511) judging whether G +1 meets the condition: if G +1 is larger than or equal to the preset maximum evolution algebra G, if so, entering the step (410); otherwise, let G ═ G +1, return to step (504);
(512) and finishing the search and outputting the individual with the highest fitness value in the G-th generation population.
Has the advantages that:
according to the method, the search space based on the one-shot model is established through the initial population in the evolution method, and the problems that the one-shot model is too large, so that the training is difficult and the training time is too long are solved. The method takes the evolution method as a search method, optimizes the structure of the individual and the weight of the one-shot model simultaneously through the evolution mechanism and the back propagation gradient value in the process of evolution search, and provides a sampling training strategy for the parent individual and a node inheritance strategy for the offspring individual, so that the parent individual can be trained for multiple times based on different mini-batch of a training set, the offspring individual directly inherits the weight of the parent individual as an initial weight, and the fitness value can be evaluated without additional training. The improvement effectively improves the searching efficiency of the method, reduces the deviation introduced in the structure evaluation process, enables the prediction performance of the model to be closer to the real performance of the model, and improves the reliability of the model ranking. The invention codes the attention mechanism of the light-weight channel into a search space, adaptively integrates the light-weight channel into the neural network architecture through the method, adaptively calibrates the channel response, strengthens the weight of the effective channel, weakens redundant information in the channel, improves the expression capability of the neural network architecture, and further improves the performance of the final neural network architecture. Compared with the existing neural network architecture method based on the evolution method, the method obtains better results in CIFAR10, CIFAR100 and ImageNet image classification tasks.
Drawings
FIG. 1 is a diagram of a neural network architecture in accordance with the present invention.
FIG. 2 is a schematic diagram of a feature reconstruction convolution layer.
Fig. 3 is a schematic diagram of sub-network block encoding and decoding.
Fig. 4 is a schematic view of the SIENAS flow diagram.
FIG. 5 is a schematic diagram of a node inheritance policy.
FIG. 6 is a neural network architecture searched based on CIAFR 10.
FIG. 7 is a neural network architecture optimization process based on the CIFAR10 classification task.
FIG. 8 is a neural network architecture optimization process based on the CIFAR100 classification task.
Detailed Description
An implementation of the present invention is further described with reference to the accompanying drawings
The invention is further explained below with reference to the drawings in the specification. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
A fast attention neural network architecture searching method based on an evolution method comprises the following steps:
(1) a neural network architecture search space containing an attention mechanism is generated based on a predetermined encoding scheme and a population initialization scheme.
(2) And (3) taking the evolution method as a searching method, taking the image classification accuracy of the verification set as an optimization target, and simultaneously optimizing the structure of the individual and the weight of the one-shot model through an evolution mechanism and a back propagation gradient value. After the search task of the evolution method is finished, the individuals in the population are sequenced, and the individual with the largest fitness value is reserved as the searched optimal result.
(3) Decoding the searched individuals to generate a neural network architecture, resetting the structural weight, training the neural network architecture by using a training data set until convergence, and testing the performance of the neural network architecture.
The predetermined coding scheme in the step (1) is a one-shot model coding method based on an evolution method. The method specifically comprises the following steps:
(201) the neural network architecture is divided into the following components according to different scales: network, sub-network block, neuron.
(202) Neurons are the smallest unit in neural network architectures, with each computational neuron containing two computational nodes. Encoding the neurons as integer quadruplets Indicating the index of the neuron connected to neuron i, i.e. neuronAs two inputs to neuron i.Represents two types of computation nodes contained in a neuron i, and respectively processes two inputs of the neuron i.
(203) The output calculation formula of the neuron i is as follows:
wherein HiRepresenting the output of neuron i.The input of the neuron i is selected,two computational nodes contained within neuron i.Are respectively composed ofAfter processing, the output data O of the two computing nodes is processeda(Ia),Ob(Ib) Adding as output H of neuron ii。
(204) A sub-net block contains M neurons, M being an integer greater than one. The coding structure of a sub-network block is:
(205) n different types of sub-network blocks are stacked to form a neural network architecture. N is an integer greater than one. The coding structure of a neural network architecture is as follows:
(206) the sub-network blocks are sequentially stacked and connected to form a complete neural network architecture. The coding structure of a neural network architecture is called an individual in a population. Fig. 3 is a diagram illustrating an example of encoding and decoding a sub-network block.
In the step (1), the population initialization scheme is to randomly generate individuals through uniform distribution based on the coding scheme until a predetermined population size of the initial population is reached. And all individuals in the initial population form a one-shot model and cover the whole search space. That is, each individual is a sub-model of the one-shot model.
In the step (1), the feature reconstruction computing nodes containing the attention mechanism are adaptively integrated into a neural network architecture in the evolution search process, so that the expression capability of the neural network is improved. The lightweight multi-scale channel attention mechanism consists of a 2-dimensional convolutional layer and a lightweight channel attention mechanism. The 2-dimensional convolutional layer can extract feature information of different scales, and the channel attention mechanism is used for reducing redundant information in the channel, recalibrating channel feature response and improving the expression capacity of the neural network.
(401) The structure of the feature reconstruction computing node is as follows: the system comprises a 2-dimensional convolutional layer, a global average pooling layer, a 1-dimensional convolutional layer, a sigmoid layer and a multiplication module. The process of feature reconstruction is as follows:
(402) the 2-dimensional convolution layer is used for extracting the characteristic information of the input characteristic diagram and collecting the input characteristic diagramConversion into a collection of transformation profiles Where H × W represents the size of the input feature map, s represents the number of channels of the input feature map set, H × W represents the size of the feature map after conversion, and c represents the number of channels of the conversion feature map set.
(403) And inputting the conversion feature map set into a global average pooling layer, and extracting the global features of each feature map by using the global average pooling layer. The formula of the global average pooling layer is as follows:
for a one-dimensional vector z e { z1,z2,......,zcAnd the one-dimensional vector represents the characteristics of the c channels.
(404) And completing the feature mapping of adjacent channels by using the one-dimensional convolution layer. The feature mapping formula is as follows:
Fl=C1Dk(zl)
wherein C1D represents a one-dimensional convolutional layer. k represents the size of the one-dimensional convolutional layer convolution kernel. z is a radical oflFeatures representing the 1 st channel, based on step (2), zlE.g. z. The feature set after the one-dimensional convolutional layer mapping is denoted as F, and F ═ F1,F2,......,Fc}。FlRepresents the feature of the 1 st channel after one-dimensional convolutional layer mapping, and Fl∈F。
(405) Forming a weight set w of c channels by using a sigmoid activation function, wherein w is { w ═ w1,w2,......,wc}。
wm=σsigmoid(Fm)
(406) In the formula, wmRepresents the weight of the m-th channel, and wm∈w;FmCharacteristic value representing the mth channel, based on step (3), Fm∈F。
(407) Each channel is given a respective weight by means of a multiplication module. The formula of the multiplication module is as follows:
U=u*w
wherein u represents a feature map set of a channel, and based on step (2), represents matrix dot product; w represents a weight set of channels, based on step (4), w ═ w1,w2,......,wc}。
(408) The output of the convolutional layer is reconstructed using U as a feature. As shown in fig. 2.
In the step (2), the searching method is a fast attention neural network architecture searching method (SIENAS) based on evolutionary computation. Optimizing the structure of the individual and the weight of the one-shot model through an evolutionary mechanism and a back propagation gradient value, wherein the individual in the parent population is trained through a sampling training strategy to generate the weight for the neuron in the one-shot model; and generating a child population, namely a new group of neural network structure topological graphs, by using a node inheritance strategy based on the parent population, inheriting the weight of the corresponding neuron in the parent population as an initial weight, and directly evaluating without any training. That is, the goal of the sianas is to adjust the computational nodes inside the neurons and optimize the connections of the neurons within each sub-net block, thereby achieving an overall optimization of the neural network architecture. As shown in fig. 4, specifically:
(501) classifying the image dataset of the scaled image-label pairs and dividing the image classification dataset into a training dataset DtrainVerifying the data set DvalidTest data set Dtest。
(502) The SIENAS policy parameters are initialized. The method comprises the following steps: the number N of sub-network block types contained in the network architecture, the number M of neurons contained in each sub-network block, the initial number C of channels of the sub-network block, and the population size P of the initial populationnCross rate PcThe rate of variation PmBatch data volume batchsizeMaximum evolution algebra G. The SGD optimizer parameters are initialized. The method comprises the following steps: initial learning rate r, weight attenuation coefficient w, learning rate adjustment strategy and momentum coefficient m.
(503) A training dataset D in the image classification datasettrainVerification data set DvalidAs input to the SIENAS. DtrainContaining train in combinationnumberAnd (5) opening a picture.
(504) Let G be 0.
(505) And taking the G-th generation population as a G-th generation parent population, training all individuals in the parent population by using a sampling training strategy, and optimizing the weight of the neuron in each individual so as to optimize all neuron weights in the one-shot model.
The method specifically comprises the following steps:
1) according to batchsizeD istrainDivided into several batches of data (mini-batch). Then the D istrainJ batches of data are contained in the total, and the calculation formula is as follows:
2) let f be 1;
3) training the individual based on the f batch of training data, calculating a loss value by using a cross entropy loss function, and optimizing the weight of the neuron in the individual through a back propagation gradient value;
4) judging whether the f +1 th batch of data meets the conditions: if f +1 is larger than j, entering step 5); otherwise, if f is equal to f +1, returning to the step 3);
5) stopping training, and storing the weight values of all the neurons in the parent population.
(506) After step (505) is completed, the fitness values of all individuals in the parent population are evaluated and recorded. The fitness value of the parent individual is the individual's classification accuracy based on the validation dataset.
(507) Based on P in the parent populationnIndividual and fitness value thereof, and P is generated by using node inheritance strategynAnd taking the new individual as a filial generation population Q of the G generation. The node inheritance strategy is based on the neural network topological graph of the parent individual, a new neural network topological graph is generated through selection, intersection and variation in an evolution mechanism, and the neuron weight value in the parent individual is distributed to the corresponding neuron of the child individual. As shown in fig. 5, specifically:
1) and (6) selecting operation.
And (3) randomly selecting two individuals from the parent population by using a championship selection strategy, and reserving the individual with a higher fitness value. This process is repeated until two individuals p are selected1,p2。
2) And (4) performing a crossover operation.
Generating a random value kc;
If k isc≤PcThen a single point crossover operation is performed on the selected two individuals, as shown in fig. 5, with the crossover point occurring between the third gene and the fourth gene. Thus p is1Three genes located after the crossover point will sum with p2Three gene exchanges after the crossover point in the middle, resulting in two new individuals q1,q2And combining said new subject q1,q2Storing the obtained product as a progeny individual to the progeny population Q;
if k isc>PcThen no action is performed on the selected individual,p1,p2Storing the individual serving as a filial generation into the filial generation population Q;
3) repeatedly executing the step 1) and the step 2) until the number of the individuals in the filial generation population Q reaches PnAnd (4) respectively.
4) And (5) performing mutation operation.
Randomly generating a variation rate k for each individual in Q for a progeny population Q of the G-th generation populationm;
If k ism≤PmThen a crossover mutation operation is performed on the individual. As shown in FIG. 5, qiThe third gene and the sixth gene in the gene are exchanged to change the selected gene, and the positions are exchanged to generate a new qi. The new q is addediStoring the obtained product in a filial generation population Q, and deleting the original Qi;
If k ism>PmNo operation is performed on the individual.
5) And (4) carrying out weight inheritance operation.
Based on step 4), the neuron in each descendant individual inherits the weight of the corresponding neuron in the parent individual in turn as an initial weight.
(508) Based on step (507), fitness values of the offspring individuals are evaluated and recorded. The fitness value of the offspring individual is the classification accuracy of the individual on the verification data set.
(509) Merging the G generation parent population and the child population Q, wherein the G generation population contains 2PnAnd (4) individuals.
(510) Based on step (509), ordering the individuals according to fitness values of the individuals in the G-th generation population. And generating a G +1 generation population through an environment selection strategy. The population size of the G +1 generation population needs to be the same as that of the G generation population.
(511) Judging whether G +1 meets the condition: if G +1 is larger than or equal to the preset maximum evolution algebra G, if so, entering the step (511); otherwise, let G be G +1, return to step (504).
(512) And finishing the search task and outputting the individual with the highest fitness value in the G-th generation population.
The effects of the present invention are further illustrated by the following simulation experiments.
Simulation conditions
The method is developed based on a deep learning framework of the pytorech, and a programming language mainly used is python.
Emulated content
The method verifies the performance of the method through three image classification tasks of CIFAR10, CIFAR100 and ILSVRC 2012.
2.1 data set
The CIFAR10 and CIFAR100 data sets had 60000 color images with a resolution of 32 x 32. The total number of the pictures is 60000, 50000 pictures are taken as a training set, and 10000 pictures are taken as a testing set. CIFAR10 contains 10 categories of 6000 pictures (5000 training pictures, 1000 test pictures) per category. The simulation randomly selects 1000 pictures from each category as verification pictures to form a verification set. The verification set was 10000 pictures in total. The CIFAR100 contains 100 classes, each of which contains 600 pictures (500 training pictures, 100 test pictures). The simulation experiment randomly selects 100 pictures from each category as verification pictures to form a verification set. The verification set was 10000 pictures in total. The ILSVRC2012 is a large visualization data set used for visual object recognition studies. The image classification method comprises more than 1400 million images, and is divided into a training set, a verification set and a test set, and the training set, the verification set and the test set comprise 20000 categories.
Method setting
Initializing the number of neurons M contained in the sub-network block to 5, wherein the candidate computing node types include: identity mapping, depth Separable Convolution with Convolution kernel of 3 (DW 3), depth Separable Convolution with Convolution kernel of 5 (DW 5), MAX pooling of size 3 (MAX), Average pooling of size 3 (AVG), feature reconstruction node of size 3 (3 × 3feature reconstruction constraint, FR3) of size 2 convolutional layer of size 5 (5 × 5feature reconstruction constraint, FR5)
Initializing a sub-network block type N-4, comprising: convolution module 1(convolution block 1), convolution module 2(convolution block 2), convolution module 3(convolution block 3), and reduction module (reduction block). The step size of all convolution modules is set to 1, and the width, height, and depth of the feature maps of the input and output of this module are unchanged. The convolution module processes the characteristic information of the neural network at different stages in forward propagation. The step size of all the computation nodes in the reduction module is 2, and the module reduces the width and height of the input feature map to half of the original width and extends the depth to twice of the original height.
As shown in fig. 1, the stacking order of the sub-network blocks in a neural network architecture is: the convolution module 1, the reduction module, the convolution module 2, the reduction module and the convolution module 3 are stacked in sequence. The main goal of the SIENAS is to search for the way neurons in each sub-net block connect and the type of compute nodes contained in the neurons.
Initializing strategy parameters of the SIENAS, comprising: the number of neurons N included in the sub-net block is 5, and the number of individuals P included in the initial populationn25, cross rate Pc0.9, rate of variation Pm0.1, 32 initial channel number C, batch data volume batchsize128, the maximum evolution algebra G300. The SGD optimizer parameters are initialized. The method comprises the following steps: the initial learning rate lr is 0.1, the weight attenuation coefficient w is 0.0003, and the momentum (momentum) coefficient m is 0.9.
And after the iteration of the method is finished, outputting the individual with the optimal fitness value. The individual is decoded into the corresponding neural network architecture SI-EvoNet-S. Network structure parameters are reinitialized, and the neural network architecture is trained using the training data set until convergence. The test data set is then used to test the performance of the neural network architecture.
3. Simulation result
In the invention, the optimization processes based on the CIFAR10 and the CIFAR100 are respectively shown in FIG. 7 and FIG. 8, and it can be seen that the higher prediction accuracy obtained in the search process is 93.7% and 76.8% respectively, therefore, the actual performance of an individual can be reflected in the individual performance evaluation result, and a more accurate performance ranking result can be obtained.
The sub-network blocks of the neural network architecture searched by the method based on the CIFAR10 data set are shown in FIG. 6. Compared with the existing artificially designed neural network architecture and the existing NAS method based on the evolution method, the comparison results are respectively shown in the tables 1 and 2, and compared with the comparison method, the method has higher search efficiency and higher classification accuracy.
TABLE 1
TABLE 2
Claims (5)
1. A fast attention neural network architecture searching method based on an evolution method is characterized by comprising the following steps:
(1) generating a neural network architecture search space containing an attention mechanism based on a predetermined coding scheme and a population initialization scheme;
(2) the method comprises the steps of taking an evolution method as a search method, taking the image classification accuracy of a verification set as an optimization target, and simultaneously optimizing the structure of an individual and the weight of a one-shot model through an evolution mechanism and a back propagation gradient value; after the search task of the evolution method is finished, sequencing the individuals in the population, and reserving the individual with the maximum fitness value as the searched optimal result;
(3) decoding the individuals searched by the evolution to generate a neural network architecture, resetting the structural weight, training the neural network architecture by using a training data set until convergence, and testing the performance of the neural network architecture.
2. The fast attention neural network architecture searching method based on the evolutionary method as claimed in claim 1, wherein in the step (1), the predetermined coding scheme is a one-shot model coding method based on the evolutionary method; the method specifically comprises the following steps:
(201) the neural network architecture is divided into the following components according to different scales: network, sub-network block, neuron;
(202) the neuron is the minimum unit in a neural network architecture, and each computing neuron comprises two computing nodes; encoding the neurons as integer quadruplets Indicating the index of the neuron connected to neuron i, i.e. neuronAs two inputs to neuron i;representing two calculation node types contained in a neuron i, and respectively processing two inputs of the neuron i;
(203) the output calculation formula of the neuron i is as follows:
wherein HiRepresents the output of neuron i;the input of the neuron i is selected,two computational nodes contained within neuron i;are respectively composed ofAfter processing, the output data O of the two computing nodes is processeda(Ia),Ob(Ib) Adding as output H of neuron ii;
(204) A sub-network block contains M neurons, M being an integer greater than one; the coding structure of a sub-network block is:
(205) stacking N sub-network blocks of different types to form a neural network architecture; n is an integer greater than one; the coding structure of a neural network architecture is as follows:
(206) the sub-network blocks are sequentially stacked and connected to form a complete neural network architecture; the coding structure of a neural network architecture is called an individual in a population.
3. The fast attention neural network architecture searching method based on evolution method as claimed in claim 1, wherein in the step (1), the population initialization scheme is based on the encoding scheme, and randomly generating individuals by uniform distribution until reaching a predetermined population size of initial population; all individuals in the initial population form a one-shot model and cover the whole search space; that is, each individual is a sub-model of the one-shot model.
4. The fast attention neural network architecture searching method based on the evolution method as claimed in claim 1, wherein in the step (1), feature reconstruction computing nodes containing an attention mechanism are adaptively integrated into the neural network architecture in the evolution searching process, so as to improve the expression capability of the neural network; the light-weight multi-scale channel attention mechanism consists of a 2-dimensional convolution layer and a light-weight channel attention mechanism; the 2-dimensional convolutional layer can extract feature information of different scales, and the channel attention mechanism is used for reducing redundant information in a channel, recalibrating channel feature response and improving the expression capacity of the neural network;
(401) the structure of the feature reconstruction computing node is as follows: the system comprises a 2-dimensional convolutional layer, a global average pooling layer, a 1-dimensional convolutional layer, a sigmoid layer and a multiplication module; the process of feature reconstruction is as follows:
(402) the 2-dimensional convolution layer is used for extracting the characteristic information of the input characteristic diagram and collecting the input characteristic diagramConversion into a collection of transformation profiles H × W represents the size of the input feature map, s represents the number of channels of the input feature map set, H × W represents the size of the converted feature map, and c represents the number of channels of the converted feature map set;
(403) inputting the conversion feature map set into a global average pooling layer, and extracting global features of each feature map by using the global average pooling layer; the formula of the global average pooling layer is as follows:
transforming feature atlas sets through a global averaging pooling layerIs converted into a one-dimensional vector z e { z1,z2,......,zc-said one-dimensional vector represents the characteristics of c channels;
(404) utilizing the one-dimensional convolution layer to complete the feature mapping of adjacent channels; the feature mapping formula is as follows:
Fl=C1Dk(zl)
wherein C1D represents a one-dimensional convolutional layer; k represents the size of the one-dimensional convolutional layer convolution kernel; z is a radical oflFeatures representing the 1 st channel, based on step (2), zlE is z; the feature set after the one-dimensional convolutional layer mapping is denoted as F, and F ═ F1,F2,......,Fc};FlRepresents the feature of the 1 st channel after one-dimensional convolutional layer mapping, and Fl∈F;
(405) Forming a weight set w of c channels by using a sigmoid activation function, wherein w is { w ═ w1,w2,......,wc};
Wm=σsigmoid(Fm)
(406) In the formula, wmRepresents the weight of the m-th channel, and wm∈w;FmCharacteristic value representing the mth channel, based on step (3), Fm∈F;
(407) Assigning a corresponding weight to each channel by using a multiplication module; the formula of the multiplication module is as follows:
U=u*w
wherein u represents a feature map set of a channel, and based on step (2),represents matrix dot product; w represents a weight set of channels, based on step (4), w ═ w1,w2,......,wc};
(408) The output of the convolutional layer is reconstructed using U as a feature.
5. The fast attention neural network architecture searching method based on evolutionary method as claimed in claim 1, wherein in the step (2), the searching method is a fast attention neural network architecture searching method (SIENAS) based on evolutionary computation; optimizing the structure of the individual and the weight of the one-shot model through an evolutionary mechanism and a back propagation gradient value, wherein the individual in the parent population is trained through a sampling training strategy to generate the weight for the neuron in the one-shot model; generating a child population, namely a new group of neural network structure topological graphs, based on the parent population through a node inheritance strategy, inheriting the weight of corresponding neurons in the parent population as an initial weight, and directly evaluating without any training, namely, the SIENAS aims at adjusting the computing nodes in the neurons and optimizing the connection of the neurons in each sub-network block so as to realize the overall optimization of a neural network architecture; the method specifically comprises the following steps:
(501) classifying the image dataset of the scaled image-label pairs and dividing the image classification dataset into a training dataset DtrainVerifying the data set DvalidTest data set Dtest;
(502) Initializing SIENAS strategy parameters; the method comprises the following steps: the number N of sub-network block types contained in the network architecture, the number M of neurons contained in each sub-network block, the initial number C of channels of the sub-network block, and the population size P of the initial populationnCross rate PcThe rate of variation PmBatch data volume batchsizeMaximum evolution algebra G; initializing SGD optimizer parameters; the method comprises the following steps: the method comprises the following steps of (1) obtaining an initial learning rate r, a weight attenuation coefficient w, a learning rate adjustment strategy and a momentum coefficient m;
(503) training data set D in predetermined data settrainVerification data set DvalidAs input to the SIENAS; d _ train contains trainnumberOpening a picture;
(504) let G be 0;
(505) taking the G-th generation population as a G-th generation parent population, training all individuals in the parent population by utilizing a sampling training strategy, and optimizing the weight of the neuron in each individual so as to optimize all neuron weights in the one-shot model; the method specifically comprises the following steps:
1) according to batchsizeD istrainDividing the data into a plurality of batches of data (mini-batch); then the D istrainJ batches of data are contained in the total, and the calculation formula is as follows:
2) let f be 1;
3) training the individual based on the f batch of training data, calculating a loss value by using a cross entropy loss function, and optimizing the weight of the neuron in the individual through a back propagation gradient value;
4) judging whether the f +1 th batch of data meets the conditions: if f +1 is larger than j, entering step 5); otherwise, if f is equal to f +1, returning to the step 3);
5) stopping training, and storing the weight values of all neurons in the parent population;
(506) after the step (505) is finished, evaluating the fitness values of all individuals in the parent population and recording; the fitness value of the parent individual is the classification accuracy of the individual based on the verification data set;
(507) based on P in the parent populationnIndividual and fitness value thereof, and P is generated by using node inheritance strategynTaking new individuals as a filial generation population Q of the G generation; the node inheritance strategy is based on the neural network topological graph of the parent individual, a new neural network topological graph is generated through selection, intersection and variation in an evolution mechanism, and the weight of the neuron in the parent individual is distributed to the neuron of the corresponding child individual; the method specifically comprises the following steps:
1) selecting operation;
selecting two individuals from a parent population randomly by using a championship selection strategy, and reserving the individual with a higher fitness value; repeating the process until two individuals are selected;
2) performing cross operation;
generating a random value kc;
If k isc≤PcPerforming a single-point cross operation on the two selected individuals to generate two new individuals; saving the new individual as a progeny individual to the progeny population Q;
if k isc>PcThen, no operation is carried out on the selected individuals, and the selected individuals are used as child individuals and stored in the child population Q;
3) repeatedly executing the step 1) and the step 2) until the number of the individuals in the filial generation population Q reaches PnA plurality of;
4) performing mutation operation;
randomly generating a variation rate k for each individual in Q for a progeny population Q of the G-th generation populationm;
If k ism≤PmThen, performing exchange variation operation on the individual to generate a new individual; storing the new individual to a progeny population Q, and deleting the original individual;
if k ism>PmThen no operation is performed on the individual;
5) carrying out weight inheritance operation;
based on the step 4), the neuron in each descendant individual inherits the weight of the corresponding neuron in the parent individual in turn as an initial weight;
(508) based on step (507), evaluating fitness values of the offspring individuals and recording; the fitness value of the offspring individual is the classification accuracy of the individual on the verification data set;
(509) merging the G generation parent population and the child population Q, wherein the G generation population contains 2Pn(ii) individuals;
(510) based on step (509), ordering the individuals according to fitness values of the individuals in the G-th generation population; generating a G +1 generation population through an environment selection strategy; the population scale of the G +1 generation population needs to be the same as that of the G generation population;
(511) judging whether G +1 meets the condition: if G +1 is larger than or equal to the preset maximum evolution algebra G, if so, entering the step (511); otherwise, let G ═ G +1, return to step (504);
(512) and finishing the search task and outputting the individual with the highest fitness value in the G-th generation population.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011424217.7A CN112465120A (en) | 2020-12-08 | 2020-12-08 | Fast attention neural network architecture searching method based on evolution method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011424217.7A CN112465120A (en) | 2020-12-08 | 2020-12-08 | Fast attention neural network architecture searching method based on evolution method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112465120A true CN112465120A (en) | 2021-03-09 |
Family
ID=74801640
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011424217.7A Pending CN112465120A (en) | 2020-12-08 | 2020-12-08 | Fast attention neural network architecture searching method based on evolution method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112465120A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112949842A (en) * | 2021-05-13 | 2021-06-11 | 北京市商汤科技开发有限公司 | Neural network structure searching method, apparatus, computer device and storage medium |
CN113240055A (en) * | 2021-06-18 | 2021-08-10 | 桂林理工大学 | Pigment skin damage image classification method based on macro-operation variant neural architecture search |
CN113743605A (en) * | 2021-06-16 | 2021-12-03 | 温州大学 | Method for searching smoke and fire detection network architecture based on evolution method |
CN114419389A (en) * | 2021-12-14 | 2022-04-29 | 上海悠络客电子科技股份有限公司 | Target detection model construction method based on neural network architecture search |
CN114445674A (en) * | 2021-12-13 | 2022-05-06 | 上海悠络客电子科技股份有限公司 | Target detection model searching method based on multi-scale fusion convolution |
CN114638982A (en) * | 2021-12-14 | 2022-06-17 | 上海悠络客电子科技股份有限公司 | Online target detection model construction method |
CN114863508A (en) * | 2022-03-24 | 2022-08-05 | 华南理工大学 | Expression recognition model generation method, medium and device of adaptive attention mechanism |
CN114926698A (en) * | 2022-07-19 | 2022-08-19 | 深圳市南方硅谷半导体股份有限公司 | Image classification method for neural network architecture search based on evolutionary game theory |
CN114997360A (en) * | 2022-05-18 | 2022-09-02 | 四川大学 | Evolution parameter optimization method, system and storage medium of neural architecture search algorithm |
CN115994575A (en) * | 2023-03-22 | 2023-04-21 | 方心科技股份有限公司 | Power failure diagnosis neural network architecture design method and system |
WO2023124342A1 (en) * | 2021-12-31 | 2023-07-06 | 江南大学 | Low-cost automatic neural architecture search method for image classification |
CN117173037A (en) * | 2023-08-03 | 2023-12-05 | 江南大学 | Neural network structure automatic search method for image noise reduction |
CN117611974A (en) * | 2024-01-24 | 2024-02-27 | 湘潭大学 | Image recognition method and system based on searching of multiple group alternative evolutionary neural structures |
-
2020
- 2020-12-08 CN CN202011424217.7A patent/CN112465120A/en active Pending
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112949842B (en) * | 2021-05-13 | 2021-09-14 | 北京市商汤科技开发有限公司 | Neural network structure searching method, apparatus, computer device and storage medium |
CN112949842A (en) * | 2021-05-13 | 2021-06-11 | 北京市商汤科技开发有限公司 | Neural network structure searching method, apparatus, computer device and storage medium |
CN113743605A (en) * | 2021-06-16 | 2021-12-03 | 温州大学 | Method for searching smoke and fire detection network architecture based on evolution method |
CN113240055A (en) * | 2021-06-18 | 2021-08-10 | 桂林理工大学 | Pigment skin damage image classification method based on macro-operation variant neural architecture search |
CN113240055B (en) * | 2021-06-18 | 2022-06-14 | 桂林理工大学 | Pigment skin damage image classification method based on macro-operation variant neural architecture search |
CN114445674A (en) * | 2021-12-13 | 2022-05-06 | 上海悠络客电子科技股份有限公司 | Target detection model searching method based on multi-scale fusion convolution |
CN114419389A (en) * | 2021-12-14 | 2022-04-29 | 上海悠络客电子科技股份有限公司 | Target detection model construction method based on neural network architecture search |
CN114638982A (en) * | 2021-12-14 | 2022-06-17 | 上海悠络客电子科技股份有限公司 | Online target detection model construction method |
CN114638982B (en) * | 2021-12-14 | 2024-09-17 | 上海悠络客电子科技股份有限公司 | Online target detection model construction method |
WO2023124342A1 (en) * | 2021-12-31 | 2023-07-06 | 江南大学 | Low-cost automatic neural architecture search method for image classification |
CN114863508A (en) * | 2022-03-24 | 2022-08-05 | 华南理工大学 | Expression recognition model generation method, medium and device of adaptive attention mechanism |
CN114863508B (en) * | 2022-03-24 | 2024-08-06 | 华南理工大学 | Expression recognition model generation method, medium and device of self-adaptive attention mechanism |
CN114997360A (en) * | 2022-05-18 | 2022-09-02 | 四川大学 | Evolution parameter optimization method, system and storage medium of neural architecture search algorithm |
CN114997360B (en) * | 2022-05-18 | 2024-01-19 | 四川大学 | Evolution parameter optimization method, system and storage medium of neural architecture search algorithm |
CN114926698A (en) * | 2022-07-19 | 2022-08-19 | 深圳市南方硅谷半导体股份有限公司 | Image classification method for neural network architecture search based on evolutionary game theory |
CN115994575A (en) * | 2023-03-22 | 2023-04-21 | 方心科技股份有限公司 | Power failure diagnosis neural network architecture design method and system |
CN117173037A (en) * | 2023-08-03 | 2023-12-05 | 江南大学 | Neural network structure automatic search method for image noise reduction |
CN117173037B (en) * | 2023-08-03 | 2024-07-09 | 江南大学 | Neural network structure automatic search method for image noise reduction |
CN117611974A (en) * | 2024-01-24 | 2024-02-27 | 湘潭大学 | Image recognition method and system based on searching of multiple group alternative evolutionary neural structures |
CN117611974B (en) * | 2024-01-24 | 2024-04-16 | 湘潭大学 | Image recognition method and system based on searching of multiple group alternative evolutionary neural structures |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112465120A (en) | Fast attention neural network architecture searching method based on evolution method | |
WO2022083624A1 (en) | Model acquisition method, and device | |
CN111553480B (en) | Image data processing method and device, computer readable medium and electronic equipment | |
CN109948029B (en) | Neural network self-adaptive depth Hash image searching method | |
CN111898689B (en) | Image classification method based on neural network architecture search | |
CN107729999A (en) | Consider the deep neural network compression method of matrix correlation | |
CN114841257B (en) | Small sample target detection method based on self-supervision comparison constraint | |
CN110310345A (en) | A kind of image generating method generating confrontation network based on hidden cluster of dividing the work automatically | |
WO2022126448A1 (en) | Neural architecture search method and system based on evolutionary learning | |
CN114118369B (en) | Image classification convolutional neural network design method based on group intelligent optimization | |
CN113592060A (en) | Neural network optimization method and device | |
CN113011487B (en) | Open set image classification method based on joint learning and knowledge migration | |
Bakhshi et al. | Fast evolution of CNN architecture for image classification | |
CN114943345A (en) | Federal learning global model training method based on active learning and model compression | |
CN113011091A (en) | Automatic-grouping multi-scale light-weight deep convolution neural network optimization method | |
CN114819091B (en) | Multi-task network model training method and system based on self-adaptive task weight | |
CN117253037A (en) | Semantic segmentation model structure searching method, automatic semantic segmentation method and system | |
CN114241267A (en) | Structural entropy sampling-based multi-target architecture search osteoporosis image identification method | |
CN118114734A (en) | Convolutional neural network optimization method and system based on sparse regularization theory | |
Cai et al. | EST-NAS: An evolutionary strategy with gradient descent for neural architecture search | |
Chai et al. | Correlation analysis-based neural network self-organizing genetic evolutionary algorithm | |
CN115063374A (en) | Model training method, face image quality scoring method, electronic device and storage medium | |
CN114742292A (en) | Knowledge tracking process-oriented two-state co-evolution method for predicting future performance of students | |
Hu et al. | Apenas: An asynchronous parallel evolution based multi-objective neural architecture search | |
CN118170993B (en) | Educational resource recommendation method based on contrast learning and field factor decomposition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |