CN112465120A - Fast attention neural network architecture searching method based on evolution method - Google Patents

Fast attention neural network architecture searching method based on evolution method Download PDF

Info

Publication number
CN112465120A
CN112465120A CN202011424217.7A CN202011424217A CN112465120A CN 112465120 A CN112465120 A CN 112465120A CN 202011424217 A CN202011424217 A CN 202011424217A CN 112465120 A CN112465120 A CN 112465120A
Authority
CN
China
Prior art keywords
individual
population
neural network
network architecture
neuron
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011424217.7A
Other languages
Chinese (zh)
Inventor
金耀初
沈修平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI ULUCU ELECTRONIC TECHNOLOGY CO LTD
Original Assignee
SHANGHAI ULUCU ELECTRONIC TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI ULUCU ELECTRONIC TECHNOLOGY CO LTD filed Critical SHANGHAI ULUCU ELECTRONIC TECHNOLOGY CO LTD
Priority to CN202011424217.7A priority Critical patent/CN112465120A/en
Publication of CN112465120A publication Critical patent/CN112465120A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Physiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Genetics & Genomics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a rapid attention neural network architecture searching method based on an evolution method, which comprises the following steps: (1) generating a neural network architecture search space containing an attention mechanism based on a predetermined coding scheme and a population initialization scheme; (2) the method comprises the steps of taking an evolution method as a search method, taking the image classification accuracy of a verification set as an optimization target, and simultaneously optimizing the structure of an individual and the weight of a one-shot model through an evolution mechanism and a back propagation gradient value; after the search task of the evolution method is finished, sequencing the individuals in the population, and reserving the individual with the maximum fitness value as the searched optimal result; (3) decoding the individuals searched by the evolution to generate a neural network architecture, resetting the structural weight, training the neural network architecture by using a training data set until convergence, and testing the performance of the neural network architecture.

Description

Fast attention neural network architecture searching method based on evolution method
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a fast attention neural network architecture searching method based on an evolution method.
Background
Deep neural networks have enjoyed significant success in dealing with various computer vision tasks such as target classification, target detection, target segmentation, and target tracking. Wherein the target classification task is the basis of other tasks. The performance of a deep neural network depends to a large extent on its architecture. Therefore, in order to deliver the maximum performance of deep neural networks, human experts are typically required to manually tune the model architecture using expert knowledge and corresponding data sets. The framework tuning, training and evaluation of the neural network are an iterative process, and need to be repeatedly performed and continuously optimized. The process not only consumes a large amount of labor and time cost, but also improves the threshold of popularization of the artificial intelligence technology in the traditional industries, such as medical treatment, education, finance and the like. Therefore, a method for automatically generating a network model, namely, a Neural Architecture Search (NAS) technology, has attracted extensive attention of researchers.
The NAS technology can automatically generate a deep neural network architecture through a method based on a task target and a corresponding data set, so that the labor and time cost consumed by manually building the neural network architecture is reduced. The NAS technology is generally divided into three steps: first, a search space, i.e., a set of neural network architectures, is defined. And secondly, exploring an excellent neural network architecture in a search space by using a search method. And thirdly, evaluating the explored neural network architecture. The objective of NAS technology is to find a network architecture with excellent performance in a huge search space. Therefore, the second step and the third step are a repeated iteration process, and the evaluation result of the model architecture is usually fed back to the search method for guiding the search method to explore a more effective neural network architecture. When the method iteration is completed, the structure with the best performance evaluation is taken as the output of the method. The most common method of architecture evaluation is to input a training data set into the structure to train to converge, and then test the performance of the structure through a validation set. Each training and evaluation of the architecture consumes significant computational resources and time costs. This computational bottleneck has made NAS technology difficult to further generalize. Therefore, how to improve the efficiency of searching the neural network architecture and reduce the computation cost becomes an urgent issue to be solved in the NAS technology.
Currently, mainstream NAS search methods are mainly classified into three types: the NAS method based on the reinforcement learning, the NAS method based on the gradient and the NAS method based on the evolution method. The reinforcement learning based NAS approach typically requires the construction of a controller, which typically samples the model architecture from a search space. Because the performance of the sampled model architecture is determined by the performance of the controller, the controller needs to repeatedly evaluate different model architectures and update the controller in an iterative manner, thereby generating an effective model architecture. The gradient-based NAS method is based on continuous relaxation (relax) representation of a structure, converts architecture search into an optimization problem of a continuous space, and optimizes a network architecture and network parameters simultaneously by using gradient descent. The essence of the evolution-based NAS approach is to explore the search space by means of natural selection. Neural network architectures in the search space are evolved as populations. Each individual in the population is a neural network architecture. The performance of the neural network architecture on the verification set is represented as fitness value of the individual. During the course of evolution, one or more individuals were selected as parents based on their fitness values. And then generating offspring individuals through crossover and mutation operators. After the child individuals complete the performance evaluation, the child individuals join the parent population. And then generating a next generation population through environment selection. This process is repeated until the preset evolution algebra iteration is completed. And finally, taking the individual with the best fitness value as the output of the evolution method.
In order to improve the search efficiency of the NAS method, a super-network based architecture search (one-shot architecture search) becomes a focus of attention of researchers. The architecture search technology based on the super network generally considers a search space as a super network model (one-shot model), and a neural network architecture contained in the search space is considered as a sub-network of the super network. The technique generally generates a weight value of each operation by training a super network, and then the sub-networks are evaluated by sharing the weight value of the super network, thereby reducing the calculation cost of network evaluation. Chinese patent CN110851566 proposes an improved differentiable network architecture search method. The method is a NAS technology based on gradient, all possible edges and operations in the whole super network are optimized, and therefore the discretization optimal sub network is determined. In addition, the method utilizes global normalization operation, reduces the influence of local bias in the network, and solves the problem that the two-layer optimization mode in the traditional differentiable network architecture searching method aggravates the mutual competition between weight coupling and weights at the later stage of searching, and further causes the network to be difficult to train to converge. Chinese patent CN110569972 proposes a method and an apparatus for constructing a search space of a super-network (one-shot) model, and an electronic device. The method constructs a super network by stacking unit structures. The unit structure in the super network is divided into two types: a normal unit and a down-sampling unit. Therefore, the optimization target of the search method is converted from the optimization of the overall architecture of the neural network into the optimization of the internal structures of the two unit structures, the calculation cost of the structure optimization is further reduced, and the search efficiency is improved.
However, there are some limitations to current neural network architecture search techniques based on hypernetworks. First, when the size of the super-network is large, it takes a lot of time to train the super-network to converge. Secondly, because the neural network structure shares the weight of the hyper-network for performance evaluation, a large deviation may be introduced during the structural performance evaluation, and further the performance of the structure may be underestimated and the performance ranking of the structure is inaccurate, and the performance of the final structure cannot be guaranteed.
Disclosure of Invention
The invention aims to solve the problems in the prior art, provides a fast attention neural network architecture searching method based on an evolutionary method, and applies the method to computer vision tasks. According to the method, the search space based on the one-shot model is established through the initial population in the evolution method, and the problems that the one-shot model is too large, so that the training is difficult and the training time is too long are solved. The method takes the evolution method as a search method, optimizes the individual structure and the weight of the one-shot model simultaneously through the evolution mechanism and the back propagation gradient value in the process of evolution search, and effectively improves the efficiency of evolution search. The invention codes the light-weight channel attention mechanism into the search space, and adaptively integrates the light-weight channel attention mechanism into the neural network architecture through the method, thereby further improving the performance of the final neural network architecture.
In order to achieve the purpose, the technical scheme adopted by the application is as follows: a fast attention neural network architecture searching method based on an evolution method comprises the following steps:
(1) generating a neural network architecture search space containing an attention mechanism based on a predetermined coding scheme and a population initialization scheme;
(2) the method comprises the steps of taking an evolution method as a search method, taking the image classification accuracy of a verification set as an optimization target, and simultaneously optimizing the structure of an individual and the weight of a one-shot model through an evolution mechanism and a back propagation gradient value; after the search task of the evolution method is finished, sequencing the individuals in the population, and reserving the individual with the maximum fitness value as the searched optimal result;
(3) decoding the individuals obtained by the evolutionary search to generate a neural network architecture, resetting the structural weight, training the neural network architecture by using a training data set until convergence, and testing the performance of the neural network architecture.
In the step (1), the predetermined coding scheme is a one-shot model coding method based on an evolutionary method. The method specifically comprises the following steps:
(201) the neural network architecture is divided into the following components according to different scales: network, sub-network block, neuron;
(202) the neuron is the minimum unit in a neural network architecture, and each computing neuron comprises two computing nodes; encoding the neurons as integer quadruplets
Figure BDA0002824054450000041
Figure BDA0002824054450000042
Indicating the index of the neuron connected to neuron i, i.e. neuron
Figure BDA0002824054450000043
As two inputs to neuron i;
Figure BDA0002824054450000044
representing two calculation node types contained in a neuron i, and respectively processing two inputs of the neuron i;
(203) the output calculation formula of the neuron i is as follows:
Figure BDA0002824054450000045
wherein HiRepresents the output of neuron i;
Figure BDA0002824054450000046
the input of the neuron i is selected,
Figure BDA0002824054450000047
two computational nodes contained within neuron i;
Figure BDA0002824054450000048
are respectively composed of
Figure BDA0002824054450000049
After processing, the output data O of the two computing nodes is processeda(Ia),Ob(Ib) Adding as output H of neuron ii
(204) A sub-net block contains M neurons, M being an integer greater than one. Then, at this time, the coding structure of a sub-network block is:
Figure BDA00028240544500000410
(205) stacking N sub-network blocks of different types to form a neural network architecture; n is an integer greater than one; then, at this time, the coding structure of a neural network architecture is:
Figure BDA00028240544500000411
(206) the sub-network blocks are sequentially stacked and connected to form a complete neural network architecture; the coding structure of a neural network architecture is called an individual in a population.
In the step (1), the population initialization scheme is to randomly generate individuals through uniform distribution based on the coding scheme until reaching a predetermined population scale of the initial population; all individuals in the initial population form a one-shot model and cover the whole search space; that is, each individual is a sub-model of the one-shot model.
In the step (1), the feature reconstruction computing node containing the attention mechanism is adaptively integrated into a neural network architecture in the evolution search process, so that the expression capability of the neural network is improved. The light-weight multi-scale channel attention mechanism consists of a 2-dimensional convolution layer and a light-weight channel attention mechanism; the 2-dimensional convolutional layer can extract feature information of different scales, and the channel attention mechanism is used for reducing redundant information in a channel, recalibrating channel feature response and improving the expression capacity of the neural network; the method comprises the following specific steps:
(401) the structure of the feature reconstruction computing node is as follows: the system comprises a 2-dimensional convolutional layer, a global average pooling layer, a 1-dimensional convolutional layer, a sigmoid layer and a multiplication module; the process of feature reconstruction is as follows:
(402) the 2-dimensional convolution layer is used for extracting the characteristic information of the input characteristic diagram and collecting the input characteristic diagram
Figure BDA0002824054450000051
Conversion into a collection of transformation profiles
Figure BDA0002824054450000052
Figure BDA0002824054450000053
H × W represents the size of the input feature map, s represents the number of channels of the input feature map set, H × W represents the size of the converted feature map, and c represents the number of channels of the converted feature map set;
(403) inputting the conversion feature map set into a global average pooling layer, and extracting global features of each feature map by using the global average pooling layer; the formula of the global average pooling layer is as follows:
Figure BDA0002824054450000054
for a one-dimensional vector z e { z1,z2,......,zc-said one-dimensional vector represents the characteristics of c channels;
(404) and completing the feature mapping of adjacent channels by using the one-dimensional convolution layer. The feature mapping formula is as follows:
Fl=C1Dk(zl)
wherein C1D represents a one-dimensional convolutional layer; k represents the size of the one-dimensional convolutional layer convolution kernel; z is a radical oflFeatures representing the 1 st channel, based on step (2), zlE.g. z. The feature set after the one-dimensional convolutional layer mapping is denoted as F, and F ═ F1,F2,......,Fc};FlRepresents the feature of the 1 st channel after one-dimensional convolutional layer mapping, and Fl∈F;
(405) Forming a weight set w of c channels by using a sigmoid activation function, wherein w is { w ═ w1,w2,......,wc};
Wm=σsigmoid(Fm)
(406) In the formula, wmRepresents the weight of the m-th channel, and wm∈w;FmCharacteristic value representing the mth channel, based on step (3), Fm∈F;
(407) Each channel is given a respective weight by means of a multiplication module. The formula of the multiplication module is as follows:
U=u*w
wherein u represents a feature map set of a channel, and based on step (2),
Figure BDA0002824054450000061
Figure BDA0002824054450000062
represents matrix dot product; w represents a weight set of channels, based on step (4), w ═ w1,w2,......,wc};
(408) The output of the convolutional layer is reconstructed using U as a feature.
In the step (2), the searching method is a fast attention neural network architecture searching method (SIENAS) based on evolutionary computation; optimizing the structure of the individual and the weight of the one-shot model through an evolutionary mechanism and a back propagation gradient value, wherein the individual in the parent population is trained through a sampling training strategy to generate the weight for the neuron in the one-shot model; generating a child population, namely a new group of neural network structure topological graphs, by a node inheritance strategy based on the parent population, inheriting the weight of the corresponding neuron in the parent population as an initial weight, and directly evaluating without any training; that is, the goal of the sianas is to adjust the computational nodes inside the neurons and optimize the connection of the neurons in each sub-network block, thereby realizing the overall optimization of the neural network architecture; the method specifically comprises the following steps:
(501) classifying the image dataset of the scaled image-label pairs and dividing the image classification dataset into a training dataset DtrainVerifying the data set DvalidTest data set Dtest
(502) Initializing SIENAS strategy parameters; the method comprises the following steps: the number N of sub-network block types contained in the network architecture, the number M of neurons contained in each sub-network block, the initial number C of channels of the sub-network block, and the population size P of the initial populationnCross rate PcThe rate of variation PmBatch data volume batchsizeMaximum evolution algebra G. Initializing SGD optimizer parameters; the method comprises the following steps: the method comprises the following steps of (1) obtaining an initial learning rate r, a weight attenuation coefficient w, a learning rate adjustment strategy and a momentum coefficient m;
(503) a training dataset D in the image classification datasettrainVerification data set DvalidAs input to the SIENAS; dtrainContaining train in combinationnumberOpening a picture;
(504) let G be 0;
(505) taking the G-th generation population as a G-th generation parent population, training all individuals in the parent population by utilizing a sampling training strategy, and optimizing the weight of the neuron in each individual so as to optimize all neuron weights in the one-shot model; the method specifically comprises the following steps:
1) according to batchsizeD istrainDivided into several batches of data (mini-batch). Then the D istrainJ batches of data are contained in the total, and the calculation formula is as follows:
Figure BDA0002824054450000071
2) let f be 1;
3) based on the f batch of data
Figure BDA0002824054450000072
Randomly sampling an individual from the parent population;
3) training the individual based on the f batch of training data, calculating a loss value by using a cross entropy loss function, and optimizing the weight of the neuron in the individual through a back propagation gradient value;
4) judging whether the f +1 th batch of data meets the conditions: if f +1 is larger than j, entering step 5); otherwise, making f equal to f +1, and returning to the step 3);
5) stopping training, and storing the weight values of all neurons in the parent population;
(506) after the step (505) is finished, evaluating the fitness values of all individuals in the parent population and recording; the fitness value of the parent individual is the classification accuracy of the individual based on the verification data set;
(507) based on P in the parent populationnIndividual and fitness value thereof, and P is generated by using node inheritance strategynAnd taking the new individual as a filial generation population Q of the G generation. The node inheritance strategy is based on the neural network topological graph of the parent individual, a new neural network topological graph is generated through selection, intersection and variation in an evolution mechanism, and the weight of the neuron in the parent individual is distributed to the neuron of the corresponding child individual; as shown in figure 5 of the drawings,the method specifically comprises the following steps:
1) and (6) selecting operation.
Selecting two individuals from a parent population randomly by using a championship selection strategy, and reserving the individual with a higher fitness value; repeating the process until two individuals are selected;
2) performing cross operation;
generating a random value kc
If k isc≤PcPerforming a single-point cross operation on the two selected individuals to generate two new individuals; saving the new individual as a progeny individual to the progeny population Q;
if k isc>PcThen, no operation is carried out on the selected individuals, and the selected individuals are used as child individuals and stored in the child population Q;
3) repeatedly executing the step 1) and the step 2) until the number of the individuals in the filial generation population Q reaches PnA plurality of;
4) performing mutation operation;
randomly generating a variation rate k for each individual in Q for a progeny population Q of the G-th generation populationm
If k ism≤PmThen, the exchange mutation operation is performed on the individual to generate a new individual. Storing the new individual to a progeny population Q, and deleting the original individual;
if k ism>PmThen no operation is performed on the individual;
5) carrying out weight inheritance operation;
based on the step 4), the neuron in each descendant individual inherits the weight of the corresponding neuron in the parent individual in turn as an initial weight;
(508) based on step (507), evaluating fitness values of the offspring individuals and recording; the fitness value of the offspring individual is the classification accuracy of the individual on the verification data set;
(509) merging the G generation parent population and the child population Q, wherein the G generation population contains 2Pn(ii) individuals;
(510) based on step (509), the individuals are ranked according to their fitness values. Generating a G +1 generation population through an environment selection strategy; the population scale of the G +1 generation population needs to be the same as that of the G generation population;
(511) judging whether G +1 meets the condition: if G +1 is larger than or equal to the preset maximum evolution algebra G, if so, entering the step (410); otherwise, let G ═ G +1, return to step (504);
(512) and finishing the search and outputting the individual with the highest fitness value in the G-th generation population.
Has the advantages that:
according to the method, the search space based on the one-shot model is established through the initial population in the evolution method, and the problems that the one-shot model is too large, so that the training is difficult and the training time is too long are solved. The method takes the evolution method as a search method, optimizes the structure of the individual and the weight of the one-shot model simultaneously through the evolution mechanism and the back propagation gradient value in the process of evolution search, and provides a sampling training strategy for the parent individual and a node inheritance strategy for the offspring individual, so that the parent individual can be trained for multiple times based on different mini-batch of a training set, the offspring individual directly inherits the weight of the parent individual as an initial weight, and the fitness value can be evaluated without additional training. The improvement effectively improves the searching efficiency of the method, reduces the deviation introduced in the structure evaluation process, enables the prediction performance of the model to be closer to the real performance of the model, and improves the reliability of the model ranking. The invention codes the attention mechanism of the light-weight channel into a search space, adaptively integrates the light-weight channel into the neural network architecture through the method, adaptively calibrates the channel response, strengthens the weight of the effective channel, weakens redundant information in the channel, improves the expression capability of the neural network architecture, and further improves the performance of the final neural network architecture. Compared with the existing neural network architecture method based on the evolution method, the method obtains better results in CIFAR10, CIFAR100 and ImageNet image classification tasks.
Drawings
FIG. 1 is a diagram of a neural network architecture in accordance with the present invention.
FIG. 2 is a schematic diagram of a feature reconstruction convolution layer.
Fig. 3 is a schematic diagram of sub-network block encoding and decoding.
Fig. 4 is a schematic view of the SIENAS flow diagram.
FIG. 5 is a schematic diagram of a node inheritance policy.
FIG. 6 is a neural network architecture searched based on CIAFR 10.
FIG. 7 is a neural network architecture optimization process based on the CIFAR10 classification task.
FIG. 8 is a neural network architecture optimization process based on the CIFAR100 classification task.
Detailed Description
An implementation of the present invention is further described with reference to the accompanying drawings
The invention is further explained below with reference to the drawings in the specification. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
A fast attention neural network architecture searching method based on an evolution method comprises the following steps:
(1) a neural network architecture search space containing an attention mechanism is generated based on a predetermined encoding scheme and a population initialization scheme.
(2) And (3) taking the evolution method as a searching method, taking the image classification accuracy of the verification set as an optimization target, and simultaneously optimizing the structure of the individual and the weight of the one-shot model through an evolution mechanism and a back propagation gradient value. After the search task of the evolution method is finished, the individuals in the population are sequenced, and the individual with the largest fitness value is reserved as the searched optimal result.
(3) Decoding the searched individuals to generate a neural network architecture, resetting the structural weight, training the neural network architecture by using a training data set until convergence, and testing the performance of the neural network architecture.
The predetermined coding scheme in the step (1) is a one-shot model coding method based on an evolution method. The method specifically comprises the following steps:
(201) the neural network architecture is divided into the following components according to different scales: network, sub-network block, neuron.
(202) Neurons are the smallest unit in neural network architectures, with each computational neuron containing two computational nodes. Encoding the neurons as integer quadruplets
Figure BDA0002824054450000101
Figure BDA0002824054450000102
Indicating the index of the neuron connected to neuron i, i.e. neuron
Figure BDA0002824054450000103
As two inputs to neuron i.
Figure BDA0002824054450000104
Represents two types of computation nodes contained in a neuron i, and respectively processes two inputs of the neuron i.
(203) The output calculation formula of the neuron i is as follows:
Figure BDA0002824054450000105
wherein HiRepresenting the output of neuron i.
Figure BDA0002824054450000111
The input of the neuron i is selected,
Figure BDA0002824054450000112
two computational nodes contained within neuron i.
Figure BDA0002824054450000113
Are respectively composed of
Figure BDA0002824054450000114
After processing, the output data O of the two computing nodes is processeda(Ia),Ob(Ib) Adding as output H of neuron ii
(204) A sub-net block contains M neurons, M being an integer greater than one. The coding structure of a sub-network block is:
Figure BDA0002824054450000115
(205) n different types of sub-network blocks are stacked to form a neural network architecture. N is an integer greater than one. The coding structure of a neural network architecture is as follows:
Figure BDA0002824054450000116
(206) the sub-network blocks are sequentially stacked and connected to form a complete neural network architecture. The coding structure of a neural network architecture is called an individual in a population. Fig. 3 is a diagram illustrating an example of encoding and decoding a sub-network block.
In the step (1), the population initialization scheme is to randomly generate individuals through uniform distribution based on the coding scheme until a predetermined population size of the initial population is reached. And all individuals in the initial population form a one-shot model and cover the whole search space. That is, each individual is a sub-model of the one-shot model.
In the step (1), the feature reconstruction computing nodes containing the attention mechanism are adaptively integrated into a neural network architecture in the evolution search process, so that the expression capability of the neural network is improved. The lightweight multi-scale channel attention mechanism consists of a 2-dimensional convolutional layer and a lightweight channel attention mechanism. The 2-dimensional convolutional layer can extract feature information of different scales, and the channel attention mechanism is used for reducing redundant information in the channel, recalibrating channel feature response and improving the expression capacity of the neural network.
(401) The structure of the feature reconstruction computing node is as follows: the system comprises a 2-dimensional convolutional layer, a global average pooling layer, a 1-dimensional convolutional layer, a sigmoid layer and a multiplication module. The process of feature reconstruction is as follows:
(402) the 2-dimensional convolution layer is used for extracting the characteristic information of the input characteristic diagram and collecting the input characteristic diagram
Figure BDA0002824054450000121
Conversion into a collection of transformation profiles
Figure BDA0002824054450000122
Figure BDA0002824054450000123
Where H × W represents the size of the input feature map, s represents the number of channels of the input feature map set, H × W represents the size of the feature map after conversion, and c represents the number of channels of the conversion feature map set.
(403) And inputting the conversion feature map set into a global average pooling layer, and extracting the global features of each feature map by using the global average pooling layer. The formula of the global average pooling layer is as follows:
Figure BDA0002824054450000124
for a one-dimensional vector z e { z1,z2,......,zcAnd the one-dimensional vector represents the characteristics of the c channels.
(404) And completing the feature mapping of adjacent channels by using the one-dimensional convolution layer. The feature mapping formula is as follows:
Fl=C1Dk(zl)
wherein C1D represents a one-dimensional convolutional layer. k represents the size of the one-dimensional convolutional layer convolution kernel. z is a radical oflFeatures representing the 1 st channel, based on step (2), zlE.g. z. The feature set after the one-dimensional convolutional layer mapping is denoted as F, and F ═ F1,F2,......,Fc}。FlRepresents the feature of the 1 st channel after one-dimensional convolutional layer mapping, and Fl∈F。
(405) Forming a weight set w of c channels by using a sigmoid activation function, wherein w is { w ═ w1,w2,......,wc}。
wm=σsigmoid(Fm)
(406) In the formula, wmRepresents the weight of the m-th channel, and wm∈w;FmCharacteristic value representing the mth channel, based on step (3), Fm∈F。
(407) Each channel is given a respective weight by means of a multiplication module. The formula of the multiplication module is as follows:
U=u*w
wherein u represents a feature map set of a channel, and based on step (2),
Figure BDA0002824054450000125
Figure BDA0002824054450000126
represents matrix dot product; w represents a weight set of channels, based on step (4), w ═ w1,w2,......,wc}。
(408) The output of the convolutional layer is reconstructed using U as a feature. As shown in fig. 2.
In the step (2), the searching method is a fast attention neural network architecture searching method (SIENAS) based on evolutionary computation. Optimizing the structure of the individual and the weight of the one-shot model through an evolutionary mechanism and a back propagation gradient value, wherein the individual in the parent population is trained through a sampling training strategy to generate the weight for the neuron in the one-shot model; and generating a child population, namely a new group of neural network structure topological graphs, by using a node inheritance strategy based on the parent population, inheriting the weight of the corresponding neuron in the parent population as an initial weight, and directly evaluating without any training. That is, the goal of the sianas is to adjust the computational nodes inside the neurons and optimize the connections of the neurons within each sub-net block, thereby achieving an overall optimization of the neural network architecture. As shown in fig. 4, specifically:
(501) classifying the image dataset of the scaled image-label pairs and dividing the image classification dataset into a training dataset DtrainVerifying the data set DvalidTest data set Dtest
(502) The SIENAS policy parameters are initialized. The method comprises the following steps: the number N of sub-network block types contained in the network architecture, the number M of neurons contained in each sub-network block, the initial number C of channels of the sub-network block, and the population size P of the initial populationnCross rate PcThe rate of variation PmBatch data volume batchsizeMaximum evolution algebra G. The SGD optimizer parameters are initialized. The method comprises the following steps: initial learning rate r, weight attenuation coefficient w, learning rate adjustment strategy and momentum coefficient m.
(503) A training dataset D in the image classification datasettrainVerification data set DvalidAs input to the SIENAS. DtrainContaining train in combinationnumberAnd (5) opening a picture.
(504) Let G be 0.
(505) And taking the G-th generation population as a G-th generation parent population, training all individuals in the parent population by using a sampling training strategy, and optimizing the weight of the neuron in each individual so as to optimize all neuron weights in the one-shot model.
The method specifically comprises the following steps:
1) according to batchsizeD istrainDivided into several batches of data (mini-batch). Then the D istrainJ batches of data are contained in the total, and the calculation formula is as follows:
Figure BDA0002824054450000141
2) let f be 1;
3) based on the f batch of data
Figure BDA0002824054450000142
Randomly sampling an individual from the parent population;
3) training the individual based on the f batch of training data, calculating a loss value by using a cross entropy loss function, and optimizing the weight of the neuron in the individual through a back propagation gradient value;
4) judging whether the f +1 th batch of data meets the conditions: if f +1 is larger than j, entering step 5); otherwise, if f is equal to f +1, returning to the step 3);
5) stopping training, and storing the weight values of all the neurons in the parent population.
(506) After step (505) is completed, the fitness values of all individuals in the parent population are evaluated and recorded. The fitness value of the parent individual is the individual's classification accuracy based on the validation dataset.
(507) Based on P in the parent populationnIndividual and fitness value thereof, and P is generated by using node inheritance strategynAnd taking the new individual as a filial generation population Q of the G generation. The node inheritance strategy is based on the neural network topological graph of the parent individual, a new neural network topological graph is generated through selection, intersection and variation in an evolution mechanism, and the neuron weight value in the parent individual is distributed to the corresponding neuron of the child individual. As shown in fig. 5, specifically:
1) and (6) selecting operation.
And (3) randomly selecting two individuals from the parent population by using a championship selection strategy, and reserving the individual with a higher fitness value. This process is repeated until two individuals p are selected1,p2
2) And (4) performing a crossover operation.
Generating a random value kc
If k isc≤PcThen a single point crossover operation is performed on the selected two individuals, as shown in fig. 5, with the crossover point occurring between the third gene and the fourth gene. Thus p is1Three genes located after the crossover point will sum with p2Three gene exchanges after the crossover point in the middle, resulting in two new individuals q1,q2And combining said new subject q1,q2Storing the obtained product as a progeny individual to the progeny population Q;
if k isc>PcThen no action is performed on the selected individual,p1,p2Storing the individual serving as a filial generation into the filial generation population Q;
3) repeatedly executing the step 1) and the step 2) until the number of the individuals in the filial generation population Q reaches PnAnd (4) respectively.
4) And (5) performing mutation operation.
Randomly generating a variation rate k for each individual in Q for a progeny population Q of the G-th generation populationm
If k ism≤PmThen a crossover mutation operation is performed on the individual. As shown in FIG. 5, qiThe third gene and the sixth gene in the gene are exchanged to change the selected gene, and the positions are exchanged to generate a new qi. The new q is addediStoring the obtained product in a filial generation population Q, and deleting the original Qi
If k ism>PmNo operation is performed on the individual.
5) And (4) carrying out weight inheritance operation.
Based on step 4), the neuron in each descendant individual inherits the weight of the corresponding neuron in the parent individual in turn as an initial weight.
(508) Based on step (507), fitness values of the offspring individuals are evaluated and recorded. The fitness value of the offspring individual is the classification accuracy of the individual on the verification data set.
(509) Merging the G generation parent population and the child population Q, wherein the G generation population contains 2PnAnd (4) individuals.
(510) Based on step (509), ordering the individuals according to fitness values of the individuals in the G-th generation population. And generating a G +1 generation population through an environment selection strategy. The population size of the G +1 generation population needs to be the same as that of the G generation population.
(511) Judging whether G +1 meets the condition: if G +1 is larger than or equal to the preset maximum evolution algebra G, if so, entering the step (511); otherwise, let G be G +1, return to step (504).
(512) And finishing the search task and outputting the individual with the highest fitness value in the G-th generation population.
The effects of the present invention are further illustrated by the following simulation experiments.
Simulation conditions
The method is developed based on a deep learning framework of the pytorech, and a programming language mainly used is python.
Emulated content
The method verifies the performance of the method through three image classification tasks of CIFAR10, CIFAR100 and ILSVRC 2012.
2.1 data set
The CIFAR10 and CIFAR100 data sets had 60000 color images with a resolution of 32 x 32. The total number of the pictures is 60000, 50000 pictures are taken as a training set, and 10000 pictures are taken as a testing set. CIFAR10 contains 10 categories of 6000 pictures (5000 training pictures, 1000 test pictures) per category. The simulation randomly selects 1000 pictures from each category as verification pictures to form a verification set. The verification set was 10000 pictures in total. The CIFAR100 contains 100 classes, each of which contains 600 pictures (500 training pictures, 100 test pictures). The simulation experiment randomly selects 100 pictures from each category as verification pictures to form a verification set. The verification set was 10000 pictures in total. The ILSVRC2012 is a large visualization data set used for visual object recognition studies. The image classification method comprises more than 1400 million images, and is divided into a training set, a verification set and a test set, and the training set, the verification set and the test set comprise 20000 categories.
Method setting
Initializing the number of neurons M contained in the sub-network block to 5, wherein the candidate computing node types include: identity mapping, depth Separable Convolution with Convolution kernel of 3 (DW 3), depth Separable Convolution with Convolution kernel of 5 (DW 5), MAX pooling of size 3 (MAX), Average pooling of size 3 (AVG), feature reconstruction node of size 3 (3 × 3feature reconstruction constraint, FR3) of size 2 convolutional layer of size 5 (5 × 5feature reconstruction constraint, FR5)
Initializing a sub-network block type N-4, comprising: convolution module 1(convolution block 1), convolution module 2(convolution block 2), convolution module 3(convolution block 3), and reduction module (reduction block). The step size of all convolution modules is set to 1, and the width, height, and depth of the feature maps of the input and output of this module are unchanged. The convolution module processes the characteristic information of the neural network at different stages in forward propagation. The step size of all the computation nodes in the reduction module is 2, and the module reduces the width and height of the input feature map to half of the original width and extends the depth to twice of the original height.
As shown in fig. 1, the stacking order of the sub-network blocks in a neural network architecture is: the convolution module 1, the reduction module, the convolution module 2, the reduction module and the convolution module 3 are stacked in sequence. The main goal of the SIENAS is to search for the way neurons in each sub-net block connect and the type of compute nodes contained in the neurons.
Initializing strategy parameters of the SIENAS, comprising: the number of neurons N included in the sub-net block is 5, and the number of individuals P included in the initial populationn25, cross rate Pc0.9, rate of variation Pm0.1, 32 initial channel number C, batch data volume batchsize128, the maximum evolution algebra G300. The SGD optimizer parameters are initialized. The method comprises the following steps: the initial learning rate lr is 0.1, the weight attenuation coefficient w is 0.0003, and the momentum (momentum) coefficient m is 0.9.
And after the iteration of the method is finished, outputting the individual with the optimal fitness value. The individual is decoded into the corresponding neural network architecture SI-EvoNet-S. Network structure parameters are reinitialized, and the neural network architecture is trained using the training data set until convergence. The test data set is then used to test the performance of the neural network architecture.
3. Simulation result
In the invention, the optimization processes based on the CIFAR10 and the CIFAR100 are respectively shown in FIG. 7 and FIG. 8, and it can be seen that the higher prediction accuracy obtained in the search process is 93.7% and 76.8% respectively, therefore, the actual performance of an individual can be reflected in the individual performance evaluation result, and a more accurate performance ranking result can be obtained.
The sub-network blocks of the neural network architecture searched by the method based on the CIFAR10 data set are shown in FIG. 6. Compared with the existing artificially designed neural network architecture and the existing NAS method based on the evolution method, the comparison results are respectively shown in the tables 1 and 2, and compared with the comparison method, the method has higher search efficiency and higher classification accuracy.
TABLE 1
Figure BDA0002824054450000171
Figure BDA0002824054450000181
TABLE 2
Figure BDA0002824054450000182

Claims (5)

1. A fast attention neural network architecture searching method based on an evolution method is characterized by comprising the following steps:
(1) generating a neural network architecture search space containing an attention mechanism based on a predetermined coding scheme and a population initialization scheme;
(2) the method comprises the steps of taking an evolution method as a search method, taking the image classification accuracy of a verification set as an optimization target, and simultaneously optimizing the structure of an individual and the weight of a one-shot model through an evolution mechanism and a back propagation gradient value; after the search task of the evolution method is finished, sequencing the individuals in the population, and reserving the individual with the maximum fitness value as the searched optimal result;
(3) decoding the individuals searched by the evolution to generate a neural network architecture, resetting the structural weight, training the neural network architecture by using a training data set until convergence, and testing the performance of the neural network architecture.
2. The fast attention neural network architecture searching method based on the evolutionary method as claimed in claim 1, wherein in the step (1), the predetermined coding scheme is a one-shot model coding method based on the evolutionary method; the method specifically comprises the following steps:
(201) the neural network architecture is divided into the following components according to different scales: network, sub-network block, neuron;
(202) the neuron is the minimum unit in a neural network architecture, and each computing neuron comprises two computing nodes; encoding the neurons as integer quadruplets
Figure FDA0002824054440000011
Figure FDA0002824054440000012
Indicating the index of the neuron connected to neuron i, i.e. neuron
Figure FDA0002824054440000013
As two inputs to neuron i;
Figure FDA0002824054440000014
representing two calculation node types contained in a neuron i, and respectively processing two inputs of the neuron i;
(203) the output calculation formula of the neuron i is as follows:
Figure FDA0002824054440000015
wherein HiRepresents the output of neuron i;
Figure FDA0002824054440000016
the input of the neuron i is selected,
Figure FDA0002824054440000017
two computational nodes contained within neuron i;
Figure FDA0002824054440000018
are respectively composed of
Figure FDA0002824054440000019
After processing, the output data O of the two computing nodes is processeda(Ia),Ob(Ib) Adding as output H of neuron ii
(204) A sub-network block contains M neurons, M being an integer greater than one; the coding structure of a sub-network block is:
Figure FDA00028240544400000110
(205) stacking N sub-network blocks of different types to form a neural network architecture; n is an integer greater than one; the coding structure of a neural network architecture is as follows:
Figure FDA00028240544400000111
(206) the sub-network blocks are sequentially stacked and connected to form a complete neural network architecture; the coding structure of a neural network architecture is called an individual in a population.
3. The fast attention neural network architecture searching method based on evolution method as claimed in claim 1, wherein in the step (1), the population initialization scheme is based on the encoding scheme, and randomly generating individuals by uniform distribution until reaching a predetermined population size of initial population; all individuals in the initial population form a one-shot model and cover the whole search space; that is, each individual is a sub-model of the one-shot model.
4. The fast attention neural network architecture searching method based on the evolution method as claimed in claim 1, wherein in the step (1), feature reconstruction computing nodes containing an attention mechanism are adaptively integrated into the neural network architecture in the evolution searching process, so as to improve the expression capability of the neural network; the light-weight multi-scale channel attention mechanism consists of a 2-dimensional convolution layer and a light-weight channel attention mechanism; the 2-dimensional convolutional layer can extract feature information of different scales, and the channel attention mechanism is used for reducing redundant information in a channel, recalibrating channel feature response and improving the expression capacity of the neural network;
(401) the structure of the feature reconstruction computing node is as follows: the system comprises a 2-dimensional convolutional layer, a global average pooling layer, a 1-dimensional convolutional layer, a sigmoid layer and a multiplication module; the process of feature reconstruction is as follows:
(402) the 2-dimensional convolution layer is used for extracting the characteristic information of the input characteristic diagram and collecting the input characteristic diagram
Figure FDA0002824054440000021
Conversion into a collection of transformation profiles
Figure FDA0002824054440000022
Figure FDA0002824054440000023
H × W represents the size of the input feature map, s represents the number of channels of the input feature map set, H × W represents the size of the converted feature map, and c represents the number of channels of the converted feature map set;
(403) inputting the conversion feature map set into a global average pooling layer, and extracting global features of each feature map by using the global average pooling layer; the formula of the global average pooling layer is as follows:
Figure FDA0002824054440000024
transforming feature atlas sets through a global averaging pooling layer
Figure FDA0002824054440000025
Is converted into a one-dimensional vector z e { z1,z2,......,zc-said one-dimensional vector represents the characteristics of c channels;
(404) utilizing the one-dimensional convolution layer to complete the feature mapping of adjacent channels; the feature mapping formula is as follows:
Fl=C1Dk(zl)
wherein C1D represents a one-dimensional convolutional layer; k represents the size of the one-dimensional convolutional layer convolution kernel; z is a radical oflFeatures representing the 1 st channel, based on step (2), zlE is z; the feature set after the one-dimensional convolutional layer mapping is denoted as F, and F ═ F1,F2,......,Fc};FlRepresents the feature of the 1 st channel after one-dimensional convolutional layer mapping, and Fl∈F;
(405) Forming a weight set w of c channels by using a sigmoid activation function, wherein w is { w ═ w1,w2,......,wc};
Wm=σsigmoid(Fm)
(406) In the formula, wmRepresents the weight of the m-th channel, and wm∈w;FmCharacteristic value representing the mth channel, based on step (3), Fm∈F;
(407) Assigning a corresponding weight to each channel by using a multiplication module; the formula of the multiplication module is as follows:
U=u*w
wherein u represents a feature map set of a channel, and based on step (2),
Figure FDA0002824054440000026
represents matrix dot product; w represents a weight set of channels, based on step (4), w ═ w1,w2,......,wc};
(408) The output of the convolutional layer is reconstructed using U as a feature.
5. The fast attention neural network architecture searching method based on evolutionary method as claimed in claim 1, wherein in the step (2), the searching method is a fast attention neural network architecture searching method (SIENAS) based on evolutionary computation; optimizing the structure of the individual and the weight of the one-shot model through an evolutionary mechanism and a back propagation gradient value, wherein the individual in the parent population is trained through a sampling training strategy to generate the weight for the neuron in the one-shot model; generating a child population, namely a new group of neural network structure topological graphs, based on the parent population through a node inheritance strategy, inheriting the weight of corresponding neurons in the parent population as an initial weight, and directly evaluating without any training, namely, the SIENAS aims at adjusting the computing nodes in the neurons and optimizing the connection of the neurons in each sub-network block so as to realize the overall optimization of a neural network architecture; the method specifically comprises the following steps:
(501) classifying the image dataset of the scaled image-label pairs and dividing the image classification dataset into a training dataset DtrainVerifying the data set DvalidTest data set Dtest
(502) Initializing SIENAS strategy parameters; the method comprises the following steps: the number N of sub-network block types contained in the network architecture, the number M of neurons contained in each sub-network block, the initial number C of channels of the sub-network block, and the population size P of the initial populationnCross rate PcThe rate of variation PmBatch data volume batchsizeMaximum evolution algebra G; initializing SGD optimizer parameters; the method comprises the following steps: the method comprises the following steps of (1) obtaining an initial learning rate r, a weight attenuation coefficient w, a learning rate adjustment strategy and a momentum coefficient m;
(503) training data set D in predetermined data settrainVerification data set DvalidAs input to the SIENAS; d _ train contains trainnumberOpening a picture;
(504) let G be 0;
(505) taking the G-th generation population as a G-th generation parent population, training all individuals in the parent population by utilizing a sampling training strategy, and optimizing the weight of the neuron in each individual so as to optimize all neuron weights in the one-shot model; the method specifically comprises the following steps:
1) according to batchsizeD istrainDividing the data into a plurality of batches of data (mini-batch); then the D istrainJ batches of data are contained in the total, and the calculation formula is as follows:
Figure FDA0002824054440000031
2) let f be 1;
3) based on the f batch of data
Figure FDA0002824054440000032
Randomly sampling an individual from the parent population;
3) training the individual based on the f batch of training data, calculating a loss value by using a cross entropy loss function, and optimizing the weight of the neuron in the individual through a back propagation gradient value;
4) judging whether the f +1 th batch of data meets the conditions: if f +1 is larger than j, entering step 5); otherwise, if f is equal to f +1, returning to the step 3);
5) stopping training, and storing the weight values of all neurons in the parent population;
(506) after the step (505) is finished, evaluating the fitness values of all individuals in the parent population and recording; the fitness value of the parent individual is the classification accuracy of the individual based on the verification data set;
(507) based on P in the parent populationnIndividual and fitness value thereof, and P is generated by using node inheritance strategynTaking new individuals as a filial generation population Q of the G generation; the node inheritance strategy is based on the neural network topological graph of the parent individual, a new neural network topological graph is generated through selection, intersection and variation in an evolution mechanism, and the weight of the neuron in the parent individual is distributed to the neuron of the corresponding child individual; the method specifically comprises the following steps:
1) selecting operation;
selecting two individuals from a parent population randomly by using a championship selection strategy, and reserving the individual with a higher fitness value; repeating the process until two individuals are selected;
2) performing cross operation;
generating a random value kc
If k isc≤PcPerforming a single-point cross operation on the two selected individuals to generate two new individuals; saving the new individual as a progeny individual to the progeny population Q;
if k isc>PcThen, no operation is carried out on the selected individuals, and the selected individuals are used as child individuals and stored in the child population Q;
3) repeatedly executing the step 1) and the step 2) until the number of the individuals in the filial generation population Q reaches PnA plurality of;
4) performing mutation operation;
randomly generating a variation rate k for each individual in Q for a progeny population Q of the G-th generation populationm
If k ism≤PmThen, performing exchange variation operation on the individual to generate a new individual; storing the new individual to a progeny population Q, and deleting the original individual;
if k ism>PmThen no operation is performed on the individual;
5) carrying out weight inheritance operation;
based on the step 4), the neuron in each descendant individual inherits the weight of the corresponding neuron in the parent individual in turn as an initial weight;
(508) based on step (507), evaluating fitness values of the offspring individuals and recording; the fitness value of the offspring individual is the classification accuracy of the individual on the verification data set;
(509) merging the G generation parent population and the child population Q, wherein the G generation population contains 2Pn(ii) individuals;
(510) based on step (509), ordering the individuals according to fitness values of the individuals in the G-th generation population; generating a G +1 generation population through an environment selection strategy; the population scale of the G +1 generation population needs to be the same as that of the G generation population;
(511) judging whether G +1 meets the condition: if G +1 is larger than or equal to the preset maximum evolution algebra G, if so, entering the step (511); otherwise, let G ═ G +1, return to step (504);
(512) and finishing the search task and outputting the individual with the highest fitness value in the G-th generation population.
CN202011424217.7A 2020-12-08 2020-12-08 Fast attention neural network architecture searching method based on evolution method Pending CN112465120A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011424217.7A CN112465120A (en) 2020-12-08 2020-12-08 Fast attention neural network architecture searching method based on evolution method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011424217.7A CN112465120A (en) 2020-12-08 2020-12-08 Fast attention neural network architecture searching method based on evolution method

Publications (1)

Publication Number Publication Date
CN112465120A true CN112465120A (en) 2021-03-09

Family

ID=74801640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011424217.7A Pending CN112465120A (en) 2020-12-08 2020-12-08 Fast attention neural network architecture searching method based on evolution method

Country Status (1)

Country Link
CN (1) CN112465120A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949842A (en) * 2021-05-13 2021-06-11 北京市商汤科技开发有限公司 Neural network structure searching method, apparatus, computer device and storage medium
CN113240055A (en) * 2021-06-18 2021-08-10 桂林理工大学 Pigment skin damage image classification method based on macro-operation variant neural architecture search
CN113743605A (en) * 2021-06-16 2021-12-03 温州大学 Method for searching smoke and fire detection network architecture based on evolution method
CN114445674A (en) * 2021-12-13 2022-05-06 上海悠络客电子科技股份有限公司 Target detection model searching method based on multi-scale fusion convolution
CN114926698A (en) * 2022-07-19 2022-08-19 深圳市南方硅谷半导体股份有限公司 Image classification method for neural network architecture search based on evolutionary game theory
CN114997360A (en) * 2022-05-18 2022-09-02 四川大学 Evolution parameter optimization method, system and storage medium of neural architecture search algorithm
CN115994575A (en) * 2023-03-22 2023-04-21 方心科技股份有限公司 Power failure diagnosis neural network architecture design method and system
WO2023124342A1 (en) * 2021-12-31 2023-07-06 江南大学 Low-cost automatic neural architecture search method for image classification
CN117173037A (en) * 2023-08-03 2023-12-05 江南大学 Neural network structure automatic search method for image noise reduction
CN117611974A (en) * 2024-01-24 2024-02-27 湘潭大学 Image recognition method and system based on searching of multiple group alternative evolutionary neural structures

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949842A (en) * 2021-05-13 2021-06-11 北京市商汤科技开发有限公司 Neural network structure searching method, apparatus, computer device and storage medium
CN112949842B (en) * 2021-05-13 2021-09-14 北京市商汤科技开发有限公司 Neural network structure searching method, apparatus, computer device and storage medium
CN113743605A (en) * 2021-06-16 2021-12-03 温州大学 Method for searching smoke and fire detection network architecture based on evolution method
CN113240055A (en) * 2021-06-18 2021-08-10 桂林理工大学 Pigment skin damage image classification method based on macro-operation variant neural architecture search
CN113240055B (en) * 2021-06-18 2022-06-14 桂林理工大学 Pigment skin damage image classification method based on macro-operation variant neural architecture search
CN114445674A (en) * 2021-12-13 2022-05-06 上海悠络客电子科技股份有限公司 Target detection model searching method based on multi-scale fusion convolution
WO2023124342A1 (en) * 2021-12-31 2023-07-06 江南大学 Low-cost automatic neural architecture search method for image classification
CN114997360A (en) * 2022-05-18 2022-09-02 四川大学 Evolution parameter optimization method, system and storage medium of neural architecture search algorithm
CN114997360B (en) * 2022-05-18 2024-01-19 四川大学 Evolution parameter optimization method, system and storage medium of neural architecture search algorithm
CN114926698A (en) * 2022-07-19 2022-08-19 深圳市南方硅谷半导体股份有限公司 Image classification method for neural network architecture search based on evolutionary game theory
CN115994575A (en) * 2023-03-22 2023-04-21 方心科技股份有限公司 Power failure diagnosis neural network architecture design method and system
CN117173037A (en) * 2023-08-03 2023-12-05 江南大学 Neural network structure automatic search method for image noise reduction
CN117611974A (en) * 2024-01-24 2024-02-27 湘潭大学 Image recognition method and system based on searching of multiple group alternative evolutionary neural structures
CN117611974B (en) * 2024-01-24 2024-04-16 湘潭大学 Image recognition method and system based on searching of multiple group alternative evolutionary neural structures

Similar Documents

Publication Publication Date Title
CN112465120A (en) Fast attention neural network architecture searching method based on evolution method
WO2022083624A1 (en) Model acquisition method, and device
CN111553480B (en) Image data processing method and device, computer readable medium and electronic equipment
CN111898689B (en) Image classification method based on neural network architecture search
CN109948029A (en) Based on the adaptive depth hashing image searching method of neural network
CN107729999A (en) Consider the deep neural network compression method of matrix correlation
CN114841257B (en) Small sample target detection method based on self-supervision comparison constraint
WO2021218470A1 (en) Neural network optimization method and device
WO2022126448A1 (en) Neural architecture search method and system based on evolutionary learning
CN112561039A (en) Improved search method of evolutionary neural network architecture based on hyper-network
CN112784913A (en) miRNA-disease associated prediction method and device based on graph neural network fusion multi-view information
CN113011487B (en) Open set image classification method based on joint learning and knowledge migration
Chen et al. Binarized neural architecture search for efficient object recognition
CN114943345A (en) Federal learning global model training method based on active learning and model compression
CN110033089A (en) Deep neural network parameter optimization method and system based on Distributed fusion algorithm
CN114819091B (en) Multi-task network model training method and system based on self-adaptive task weight
Bakhshi et al. Fast evolution of CNN architecture for image classification
CN116170328A (en) Method and device for predicting bandwidth used for graphic coding
CN114241267A (en) Structural entropy sampling-based multi-target architecture search osteoporosis image identification method
CN113011091A (en) Automatic-grouping multi-scale light-weight deep convolution neural network optimization method
Gao Game-theoretic approaches for generative modeling
Cai et al. EST-NAS: An evolutionary strategy with gradient descent for neural architecture search
CN117034060A (en) AE-RCNN-based flood classification intelligent forecasting method
Chai et al. Correlation Analysis-Based Neural Network Self-Organizing Genetic Evolutionary Algorithm
CN115063374A (en) Model training method, face image quality scoring method, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination