CN112465120A

CN112465120A - Fast attention neural network architecture searching method based on evolution method

Info

Publication number: CN112465120A
Application number: CN202011424217.7A
Authority: CN
Inventors: 金耀初; 沈修平
Original assignee: SHANGHAI ULUCU ELECTRONIC TECHNOLOGY CO LTD
Current assignee: SHANGHAI ULUCU ELECTRONIC TECHNOLOGY CO LTD
Priority date: 2020-12-08
Filing date: 2020-12-08
Publication date: 2021-03-09

Abstract

The invention provides a rapid attention neural network architecture searching method based on an evolution method, which comprises the following steps: (1) generating a neural network architecture search space containing an attention mechanism based on a predetermined coding scheme and a population initialization scheme; (2) the method comprises the steps of taking an evolution method as a search method, taking the image classification accuracy of a verification set as an optimization target, and simultaneously optimizing the structure of an individual and the weight of a one-shot model through an evolution mechanism and a back propagation gradient value; after the search task of the evolution method is finished, sequencing the individuals in the population, and reserving the individual with the maximum fitness value as the searched optimal result; (3) decoding the individuals searched by the evolution to generate a neural network architecture, resetting the structural weight, training the neural network architecture by using a training data set until convergence, and testing the performance of the neural network architecture.

Description

Fast attention neural network architecture searching method based on evolution method

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a fast attention neural network architecture searching method based on an evolution method.

Background

Deep neural networks have enjoyed significant success in dealing with various computer vision tasks such as target classification, target detection, target segmentation, and target tracking. Wherein the target classification task is the basis of other tasks. The performance of a deep neural network depends to a large extent on its architecture. Therefore, in order to deliver the maximum performance of deep neural networks, human experts are typically required to manually tune the model architecture using expert knowledge and corresponding data sets. The framework tuning, training and evaluation of the neural network are an iterative process, and need to be repeatedly performed and continuously optimized. The process not only consumes a large amount of labor and time cost, but also improves the threshold of popularization of the artificial intelligence technology in the traditional industries, such as medical treatment, education, finance and the like. Therefore, a method for automatically generating a network model, namely, a Neural Architecture Search (NAS) technology, has attracted extensive attention of researchers.

The NAS technology can automatically generate a deep neural network architecture through a method based on a task target and a corresponding data set, so that the labor and time cost consumed by manually building the neural network architecture is reduced. The NAS technology is generally divided into three steps: first, a search space, i.e., a set of neural network architectures, is defined. And secondly, exploring an excellent neural network architecture in a search space by using a search method. And thirdly, evaluating the explored neural network architecture. The objective of NAS technology is to find a network architecture with excellent performance in a huge search space. Therefore, the second step and the third step are a repeated iteration process, and the evaluation result of the model architecture is usually fed back to the search method for guiding the search method to explore a more effective neural network architecture. When the method iteration is completed, the structure with the best performance evaluation is taken as the output of the method. The most common method of architecture evaluation is to input a training data set into the structure to train to converge, and then test the performance of the structure through a validation set. Each training and evaluation of the architecture consumes significant computational resources and time costs. This computational bottleneck has made NAS technology difficult to further generalize. Therefore, how to improve the efficiency of searching the neural network architecture and reduce the computation cost becomes an urgent issue to be solved in the NAS technology.

Currently, mainstream NAS search methods are mainly classified into three types: the NAS method based on the reinforcement learning, the NAS method based on the gradient and the NAS method based on the evolution method. The reinforcement learning based NAS approach typically requires the construction of a controller, which typically samples the model architecture from a search space. Because the performance of the sampled model architecture is determined by the performance of the controller, the controller needs to repeatedly evaluate different model architectures and update the controller in an iterative manner, thereby generating an effective model architecture. The gradient-based NAS method is based on continuous relaxation (relax) representation of a structure, converts architecture search into an optimization problem of a continuous space, and optimizes a network architecture and network parameters simultaneously by using gradient descent. The essence of the evolution-based NAS approach is to explore the search space by means of natural selection. Neural network architectures in the search space are evolved as populations. Each individual in the population is a neural network architecture. The performance of the neural network architecture on the verification set is represented as fitness value of the individual. During the course of evolution, one or more individuals were selected as parents based on their fitness values. And then generating offspring individuals through crossover and mutation operators. After the child individuals complete the performance evaluation, the child individuals join the parent population. And then generating a next generation population through environment selection. This process is repeated until the preset evolution algebra iteration is completed. And finally, taking the individual with the best fitness value as the output of the evolution method.

In order to improve the search efficiency of the NAS method, a super-network based architecture search (one-shot architecture search) becomes a focus of attention of researchers. The architecture search technology based on the super network generally considers a search space as a super network model (one-shot model), and a neural network architecture contained in the search space is considered as a sub-network of the super network. The technique generally generates a weight value of each operation by training a super network, and then the sub-networks are evaluated by sharing the weight value of the super network, thereby reducing the calculation cost of network evaluation. Chinese patent CN110851566 proposes an improved differentiable network architecture search method. The method is a NAS technology based on gradient, all possible edges and operations in the whole super network are optimized, and therefore the discretization optimal sub network is determined. In addition, the method utilizes global normalization operation, reduces the influence of local bias in the network, and solves the problem that the two-layer optimization mode in the traditional differentiable network architecture searching method aggravates the mutual competition between weight coupling and weights at the later stage of searching, and further causes the network to be difficult to train to converge. Chinese patent CN110569972 proposes a method and an apparatus for constructing a search space of a super-network (one-shot) model, and an electronic device. The method constructs a super network by stacking unit structures. The unit structure in the super network is divided into two types: a normal unit and a down-sampling unit. Therefore, the optimization target of the search method is converted from the optimization of the overall architecture of the neural network into the optimization of the internal structures of the two unit structures, the calculation cost of the structure optimization is further reduced, and the search efficiency is improved.

However, there are some limitations to current neural network architecture search techniques based on hypernetworks. First, when the size of the super-network is large, it takes a lot of time to train the super-network to converge. Secondly, because the neural network structure shares the weight of the hyper-network for performance evaluation, a large deviation may be introduced during the structural performance evaluation, and further the performance of the structure may be underestimated and the performance ranking of the structure is inaccurate, and the performance of the final structure cannot be guaranteed.

Disclosure of Invention

The invention aims to solve the problems in the prior art, provides a fast attention neural network architecture searching method based on an evolutionary method, and applies the method to computer vision tasks. According to the method, the search space based on the one-shot model is established through the initial population in the evolution method, and the problems that the one-shot model is too large, so that the training is difficult and the training time is too long are solved. The method takes the evolution method as a search method, optimizes the individual structure and the weight of the one-shot model simultaneously through the evolution mechanism and the back propagation gradient value in the process of evolution search, and effectively improves the efficiency of evolution search. The invention codes the light-weight channel attention mechanism into the search space, and adaptively integrates the light-weight channel attention mechanism into the neural network architecture through the method, thereby further improving the performance of the final neural network architecture.

In order to achieve the purpose, the technical scheme adopted by the application is as follows: a fast attention neural network architecture searching method based on an evolution method comprises the following steps:

(1) generating a neural network architecture search space containing an attention mechanism based on a predetermined coding scheme and a population initialization scheme;

(2) the method comprises the steps of taking an evolution method as a search method, taking the image classification accuracy of a verification set as an optimization target, and simultaneously optimizing the structure of an individual and the weight of a one-shot model through an evolution mechanism and a back propagation gradient value; after the search task of the evolution method is finished, sequencing the individuals in the population, and reserving the individual with the maximum fitness value as the searched optimal result;

(3) decoding the individuals obtained by the evolutionary search to generate a neural network architecture, resetting the structural weight, training the neural network architecture by using a training data set until convergence, and testing the performance of the neural network architecture.

In the step (1), the predetermined coding scheme is a one-shot model coding method based on an evolutionary method. The method specifically comprises the following steps:

(201) the neural network architecture is divided into the following components according to different scales: network, sub-network block, neuron;

(202) the neuron is the minimum unit in a neural network architecture, and each computing neuron comprises two computing nodes; encoding the neurons as integer quadruplets

Indicating the index of the neuron connected to neuron i, i.e. neuron

As two inputs to neuron i;

representing two calculation node types contained in a neuron i, and respectively processing two inputs of the neuron i;

(203) the output calculation formula of the neuron i is as follows:

wherein H_iRepresents the output of neuron i;

the input of the neuron i is selected,

two computational nodes contained within neuron i;

are respectively composed of

After processing, the output data O of the two computing nodes is processed_a(I_a)，O_b(I_b) Adding as output H of neuron i_i；

(204) A sub-net block contains M neurons, M being an integer greater than one. Then, at this time, the coding structure of a sub-network block is:

(205) stacking N sub-network blocks of different types to form a neural network architecture; n is an integer greater than one; then, at this time, the coding structure of a neural network architecture is:

(206) the sub-network blocks are sequentially stacked and connected to form a complete neural network architecture; the coding structure of a neural network architecture is called an individual in a population.

In the step (1), the population initialization scheme is to randomly generate individuals through uniform distribution based on the coding scheme until reaching a predetermined population scale of the initial population; all individuals in the initial population form a one-shot model and cover the whole search space; that is, each individual is a sub-model of the one-shot model.

In the step (1), the feature reconstruction computing node containing the attention mechanism is adaptively integrated into a neural network architecture in the evolution search process, so that the expression capability of the neural network is improved. The light-weight multi-scale channel attention mechanism consists of a 2-dimensional convolution layer and a light-weight channel attention mechanism; the 2-dimensional convolutional layer can extract feature information of different scales, and the channel attention mechanism is used for reducing redundant information in a channel, recalibrating channel feature response and improving the expression capacity of the neural network; the method comprises the following specific steps:

(401) the structure of the feature reconstruction computing node is as follows: the system comprises a 2-dimensional convolutional layer, a global average pooling layer, a 1-dimensional convolutional layer, a sigmoid layer and a multiplication module; the process of feature reconstruction is as follows:

(402) the 2-dimensional convolution layer is used for extracting the characteristic information of the input characteristic diagram and collecting the input characteristic diagram

Conversion into a collection of transformation profiles

H × W represents the size of the input feature map, s represents the number of channels of the input feature map set, H × W represents the size of the converted feature map, and c represents the number of channels of the converted feature map set;

(403) inputting the conversion feature map set into a global average pooling layer, and extracting global features of each feature map by using the global average pooling layer; the formula of the global average pooling layer is as follows:

for a one-dimensional vector z e { z₁，z₂，......，z_c-said one-dimensional vector represents the characteristics of c channels;

(404) and completing the feature mapping of adjacent channels by using the one-dimensional convolution layer. The feature mapping formula is as follows:

F_l＝C1D_k(z_l)

wherein C1D represents a one-dimensional convolutional layer; k represents the size of the one-dimensional convolutional layer convolution kernel; z is a radical of_lFeatures representing the 1 st channel, based on step (2), z_lE.g. z. The feature set after the one-dimensional convolutional layer mapping is denoted as F, and F ═ F₁，F₂，......，F_c}；F_lRepresents the feature of the 1 st channel after one-dimensional convolutional layer mapping, and F_l∈F；

(405) Forming a weight set w of c channels by using a sigmoid activation function, wherein w is { w ═ w₁，w₂，......，w_c}；

W_m＝σ_sigmoid(F_m)

(406) In the formula, w_mRepresents the weight of the m-th channel, and w_m∈w；F_mCharacteristic value representing the mth channel, based on step (3), F_m∈F；

(407) Each channel is given a respective weight by means of a multiplication module. The formula of the multiplication module is as follows:

U＝u*w

wherein u represents a feature map set of a channel, and based on step (2),

represents matrix dot product; w represents a weight set of channels, based on step (4), w ═ w₁，w₂，......，w_c}；

(408) The output of the convolutional layer is reconstructed using U as a feature.

In the step (2), the searching method is a fast attention neural network architecture searching method (SIENAS) based on evolutionary computation; optimizing the structure of the individual and the weight of the one-shot model through an evolutionary mechanism and a back propagation gradient value, wherein the individual in the parent population is trained through a sampling training strategy to generate the weight for the neuron in the one-shot model; generating a child population, namely a new group of neural network structure topological graphs, by a node inheritance strategy based on the parent population, inheriting the weight of the corresponding neuron in the parent population as an initial weight, and directly evaluating without any training; that is, the goal of the sianas is to adjust the computational nodes inside the neurons and optimize the connection of the neurons in each sub-network block, thereby realizing the overall optimization of the neural network architecture; the method specifically comprises the following steps:

(501) classifying the image dataset of the scaled image-label pairs and dividing the image classification dataset into a training dataset D_trainVerifying the data set D_validTest data set D_test；

(502) Initializing SIENAS strategy parameters; the method comprises the following steps: the number N of sub-network block types contained in the network architecture, the number M of neurons contained in each sub-network block, the initial number C of channels of the sub-network block, and the population size P of the initial population_nCross rate P_cThe rate of variation P_mBatch data volume batch_sizeMaximum evolution algebra G. Initializing SGD optimizer parameters; the method comprises the following steps: the method comprises the following steps of (1) obtaining an initial learning rate r, a weight attenuation coefficient w, a learning rate adjustment strategy and a momentum coefficient m;

(503) a training dataset D in the image classification dataset_trainVerification data set D_validAs input to the SIENAS; d_trainContaining train in combination_numberOpening a picture;

(504) let G be 0;

(505) taking the G-th generation population as a G-th generation parent population, training all individuals in the parent population by utilizing a sampling training strategy, and optimizing the weight of the neuron in each individual so as to optimize all neuron weights in the one-shot model; the method specifically comprises the following steps:

1) according to batch_sizeD is_trainDivided into several batches of data (mini-batch). Then the D is_trainJ batches of data are contained in the total, and the calculation formula is as follows:

2) let f be 1;

3) based on the f batch of data

Randomly sampling an individual from the parent population;

3) training the individual based on the f batch of training data, calculating a loss value by using a cross entropy loss function, and optimizing the weight of the neuron in the individual through a back propagation gradient value;

4) judging whether the f +1 th batch of data meets the conditions: if f +1 is larger than j, entering step 5); otherwise, making f equal to f +1, and returning to the step 3);

5) stopping training, and storing the weight values of all neurons in the parent population;

(506) after the step (505) is finished, evaluating the fitness values of all individuals in the parent population and recording; the fitness value of the parent individual is the classification accuracy of the individual based on the verification data set;

(507) based on P in the parent population_nIndividual and fitness value thereof, and P is generated by using node inheritance strategy_nAnd taking the new individual as a filial generation population Q of the G generation. The node inheritance strategy is based on the neural network topological graph of the parent individual, a new neural network topological graph is generated through selection, intersection and variation in an evolution mechanism, and the weight of the neuron in the parent individual is distributed to the neuron of the corresponding child individual; as shown in figure 5 of the drawings,the method specifically comprises the following steps:

1) and (6) selecting operation.

Selecting two individuals from a parent population randomly by using a championship selection strategy, and reserving the individual with a higher fitness value; repeating the process until two individuals are selected;

2) performing cross operation;

generating a random value k_c；

If k is_c≤P_cPerforming a single-point cross operation on the two selected individuals to generate two new individuals; saving the new individual as a progeny individual to the progeny population Q;

if k is_c>P_cThen, no operation is carried out on the selected individuals, and the selected individuals are used as child individuals and stored in the child population Q;

3) repeatedly executing the step 1) and the step 2) until the number of the individuals in the filial generation population Q reaches P_nA plurality of;

4) performing mutation operation;

randomly generating a variation rate k for each individual in Q for a progeny population Q of the G-th generation population_m；

If k is_m≤P_mThen, the exchange mutation operation is performed on the individual to generate a new individual. Storing the new individual to a progeny population Q, and deleting the original individual;

if k is_m>P_mThen no operation is performed on the individual;

5) carrying out weight inheritance operation;

based on the step 4), the neuron in each descendant individual inherits the weight of the corresponding neuron in the parent individual in turn as an initial weight;

(508) based on step (507), evaluating fitness values of the offspring individuals and recording; the fitness value of the offspring individual is the classification accuracy of the individual on the verification data set;

(509) merging the G generation parent population and the child population Q, wherein the G generation population contains 2P_n(ii) individuals;

(510) based on step (509), the individuals are ranked according to their fitness values. Generating a G +1 generation population through an environment selection strategy; the population scale of the G +1 generation population needs to be the same as that of the G generation population;

(511) judging whether G +1 meets the condition: if G +1 is larger than or equal to the preset maximum evolution algebra G, if so, entering the step (410); otherwise, let G ═ G +1, return to step (504);

(512) and finishing the search and outputting the individual with the highest fitness value in the G-th generation population.

Has the advantages that:

according to the method, the search space based on the one-shot model is established through the initial population in the evolution method, and the problems that the one-shot model is too large, so that the training is difficult and the training time is too long are solved. The method takes the evolution method as a search method, optimizes the structure of the individual and the weight of the one-shot model simultaneously through the evolution mechanism and the back propagation gradient value in the process of evolution search, and provides a sampling training strategy for the parent individual and a node inheritance strategy for the offspring individual, so that the parent individual can be trained for multiple times based on different mini-batch of a training set, the offspring individual directly inherits the weight of the parent individual as an initial weight, and the fitness value can be evaluated without additional training. The improvement effectively improves the searching efficiency of the method, reduces the deviation introduced in the structure evaluation process, enables the prediction performance of the model to be closer to the real performance of the model, and improves the reliability of the model ranking. The invention codes the attention mechanism of the light-weight channel into a search space, adaptively integrates the light-weight channel into the neural network architecture through the method, adaptively calibrates the channel response, strengthens the weight of the effective channel, weakens redundant information in the channel, improves the expression capability of the neural network architecture, and further improves the performance of the final neural network architecture. Compared with the existing neural network architecture method based on the evolution method, the method obtains better results in CIFAR10, CIFAR100 and ImageNet image classification tasks.

Drawings

FIG. 1 is a diagram of a neural network architecture in accordance with the present invention.

FIG. 2 is a schematic diagram of a feature reconstruction convolution layer.

Fig. 3 is a schematic diagram of sub-network block encoding and decoding.

Fig. 4 is a schematic view of the SIENAS flow diagram.

FIG. 5 is a schematic diagram of a node inheritance policy.

FIG. 6 is a neural network architecture searched based on CIAFR 10.

FIG. 7 is a neural network architecture optimization process based on the CIFAR10 classification task.

FIG. 8 is a neural network architecture optimization process based on the CIFAR100 classification task.

Detailed Description

An implementation of the present invention is further described with reference to the accompanying drawings

The invention is further explained below with reference to the drawings in the specification. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

A fast attention neural network architecture searching method based on an evolution method comprises the following steps:

(1) a neural network architecture search space containing an attention mechanism is generated based on a predetermined encoding scheme and a population initialization scheme.

(2) And (3) taking the evolution method as a searching method, taking the image classification accuracy of the verification set as an optimization target, and simultaneously optimizing the structure of the individual and the weight of the one-shot model through an evolution mechanism and a back propagation gradient value. After the search task of the evolution method is finished, the individuals in the population are sequenced, and the individual with the largest fitness value is reserved as the searched optimal result.

(3) Decoding the searched individuals to generate a neural network architecture, resetting the structural weight, training the neural network architecture by using a training data set until convergence, and testing the performance of the neural network architecture.

The predetermined coding scheme in the step (1) is a one-shot model coding method based on an evolution method. The method specifically comprises the following steps:

(201) the neural network architecture is divided into the following components according to different scales: network, sub-network block, neuron.

(202) Neurons are the smallest unit in neural network architectures, with each computational neuron containing two computational nodes. Encoding the neurons as integer quadruplets

Indicating the index of the neuron connected to neuron i, i.e. neuron

As two inputs to neuron i.

Represents two types of computation nodes contained in a neuron i, and respectively processes two inputs of the neuron i.

(203) The output calculation formula of the neuron i is as follows:

wherein H_iRepresenting the output of neuron i.

The input of the neuron i is selected,

two computational nodes contained within neuron i.

Are respectively composed of

After processing, the output data O of the two computing nodes is processed_a(I_a)，O_b(I_b) Adding as output H of neuron i_i。

(204) A sub-net block contains M neurons, M being an integer greater than one. The coding structure of a sub-network block is:

(205) n different types of sub-network blocks are stacked to form a neural network architecture. N is an integer greater than one. The coding structure of a neural network architecture is as follows:

(206) the sub-network blocks are sequentially stacked and connected to form a complete neural network architecture. The coding structure of a neural network architecture is called an individual in a population. Fig. 3 is a diagram illustrating an example of encoding and decoding a sub-network block.

In the step (1), the population initialization scheme is to randomly generate individuals through uniform distribution based on the coding scheme until a predetermined population size of the initial population is reached. And all individuals in the initial population form a one-shot model and cover the whole search space. That is, each individual is a sub-model of the one-shot model.

In the step (1), the feature reconstruction computing nodes containing the attention mechanism are adaptively integrated into a neural network architecture in the evolution search process, so that the expression capability of the neural network is improved. The lightweight multi-scale channel attention mechanism consists of a 2-dimensional convolutional layer and a lightweight channel attention mechanism. The 2-dimensional convolutional layer can extract feature information of different scales, and the channel attention mechanism is used for reducing redundant information in the channel, recalibrating channel feature response and improving the expression capacity of the neural network.

(401) The structure of the feature reconstruction computing node is as follows: the system comprises a 2-dimensional convolutional layer, a global average pooling layer, a 1-dimensional convolutional layer, a sigmoid layer and a multiplication module. The process of feature reconstruction is as follows:

Conversion into a collection of transformation profiles

Where H × W represents the size of the input feature map, s represents the number of channels of the input feature map set, H × W represents the size of the feature map after conversion, and c represents the number of channels of the conversion feature map set.

(403) And inputting the conversion feature map set into a global average pooling layer, and extracting the global features of each feature map by using the global average pooling layer. The formula of the global average pooling layer is as follows:

for a one-dimensional vector z e { z₁，z₂，......，z_cAnd the one-dimensional vector represents the characteristics of the c channels.

F_l＝C1D_k(z_l)

wherein C1D represents a one-dimensional convolutional layer. k represents the size of the one-dimensional convolutional layer convolution kernel. z is a radical of_lFeatures representing the 1 st channel, based on step (2), z_lE.g. z. The feature set after the one-dimensional convolutional layer mapping is denoted as F, and F ═ F₁，F₂，......，F_c}。F_lRepresents the feature of the 1 st channel after one-dimensional convolutional layer mapping, and F_l∈F。

(405) Forming a weight set w of c channels by using a sigmoid activation function, wherein w is { w ═ w₁，w₂，......，w_c}。

w_m＝σ_sigmoid(F_m)

(406) In the formula, w_mRepresents the weight of the m-th channel, and w_m∈w；F_mCharacteristic value representing the mth channel, based on step (3), F_m∈F。

U＝u*w

wherein u represents a feature map set of a channel, and based on step (2),

represents matrix dot product; w represents a weight set of channels, based on step (4), w ═ w₁，w₂，......，w_c}。

(408) The output of the convolutional layer is reconstructed using U as a feature. As shown in fig. 2.

In the step (2), the searching method is a fast attention neural network architecture searching method (SIENAS) based on evolutionary computation. Optimizing the structure of the individual and the weight of the one-shot model through an evolutionary mechanism and a back propagation gradient value, wherein the individual in the parent population is trained through a sampling training strategy to generate the weight for the neuron in the one-shot model; and generating a child population, namely a new group of neural network structure topological graphs, by using a node inheritance strategy based on the parent population, inheriting the weight of the corresponding neuron in the parent population as an initial weight, and directly evaluating without any training. That is, the goal of the sianas is to adjust the computational nodes inside the neurons and optimize the connections of the neurons within each sub-net block, thereby achieving an overall optimization of the neural network architecture. As shown in fig. 4, specifically:

(501) classifying the image dataset of the scaled image-label pairs and dividing the image classification dataset into a training dataset D_trainVerifying the data set D_validTest data set D_test。

(502) The SIENAS policy parameters are initialized. The method comprises the following steps: the number N of sub-network block types contained in the network architecture, the number M of neurons contained in each sub-network block, the initial number C of channels of the sub-network block, and the population size P of the initial population_nCross rate P_cThe rate of variation P_mBatch data volume batch_sizeMaximum evolution algebra G. The SGD optimizer parameters are initialized. The method comprises the following steps: initial learning rate r, weight attenuation coefficient w, learning rate adjustment strategy and momentum coefficient m.

(503) A training dataset D in the image classification dataset_trainVerification data set D_validAs input to the SIENAS. D_trainContaining train in combination_numberAnd (5) opening a picture.

(504) Let G be 0.

(505) And taking the G-th generation population as a G-th generation parent population, training all individuals in the parent population by using a sampling training strategy, and optimizing the weight of the neuron in each individual so as to optimize all neuron weights in the one-shot model.

The method specifically comprises the following steps:

2) let f be 1;

3) based on the f batch of data

Randomly sampling an individual from the parent population;

4) judging whether the f +1 th batch of data meets the conditions: if f +1 is larger than j, entering step 5); otherwise, if f is equal to f +1, returning to the step 3);

5) stopping training, and storing the weight values of all the neurons in the parent population.

(506) After step (505) is completed, the fitness values of all individuals in the parent population are evaluated and recorded. The fitness value of the parent individual is the individual's classification accuracy based on the validation dataset.

(507) Based on P in the parent population_nIndividual and fitness value thereof, and P is generated by using node inheritance strategy_nAnd taking the new individual as a filial generation population Q of the G generation. The node inheritance strategy is based on the neural network topological graph of the parent individual, a new neural network topological graph is generated through selection, intersection and variation in an evolution mechanism, and the neuron weight value in the parent individual is distributed to the corresponding neuron of the child individual. As shown in fig. 5, specifically:

1) and (6) selecting operation.

And (3) randomly selecting two individuals from the parent population by using a championship selection strategy, and reserving the individual with a higher fitness value. This process is repeated until two individuals p are selected₁，p₂。

2) And (4) performing a crossover operation.

Generating a random value k_c；

If k is_c≤P_cThen a single point crossover operation is performed on the selected two individuals, as shown in fig. 5, with the crossover point occurring between the third gene and the fourth gene. Thus p is₁Three genes located after the crossover point will sum with p₂Three gene exchanges after the crossover point in the middle, resulting in two new individuals q₁，q₂And combining said new subject q₁，q₂Storing the obtained product as a progeny individual to the progeny population Q;

if k is_c＞P_cThen no action is performed on the selected individual，p₁，p₂Storing the individual serving as a filial generation into the filial generation population Q;

3) repeatedly executing the step 1) and the step 2) until the number of the individuals in the filial generation population Q reaches P_nAnd (4) respectively.

4) And (5) performing mutation operation.

If k is_m≤P_mThen a crossover mutation operation is performed on the individual. As shown in FIG. 5, q_iThe third gene and the sixth gene in the gene are exchanged to change the selected gene, and the positions are exchanged to generate a new q_i. The new q is added_iStoring the obtained product in a filial generation population Q, and deleting the original Q_i；

If k is_m＞P_mNo operation is performed on the individual.

5) And (4) carrying out weight inheritance operation.

Based on step 4), the neuron in each descendant individual inherits the weight of the corresponding neuron in the parent individual in turn as an initial weight.

(508) Based on step (507), fitness values of the offspring individuals are evaluated and recorded. The fitness value of the offspring individual is the classification accuracy of the individual on the verification data set.

(509) Merging the G generation parent population and the child population Q, wherein the G generation population contains 2P_nAnd (4) individuals.

(510) Based on step (509), ordering the individuals according to fitness values of the individuals in the G-th generation population. And generating a G +1 generation population through an environment selection strategy. The population size of the G +1 generation population needs to be the same as that of the G generation population.

(511) Judging whether G +1 meets the condition: if G +1 is larger than or equal to the preset maximum evolution algebra G, if so, entering the step (511); otherwise, let G be G +1, return to step (504).

(512) And finishing the search task and outputting the individual with the highest fitness value in the G-th generation population.

The effects of the present invention are further illustrated by the following simulation experiments.

Simulation conditions

The method is developed based on a deep learning framework of the pytorech, and a programming language mainly used is python.

Emulated content

The method verifies the performance of the method through three image classification tasks of CIFAR10, CIFAR100 and ILSVRC 2012.

2.1 data set

The CIFAR10 and CIFAR100 data sets had 60000 color images with a resolution of 32 x 32. The total number of the pictures is 60000, 50000 pictures are taken as a training set, and 10000 pictures are taken as a testing set. CIFAR10 contains 10 categories of 6000 pictures (5000 training pictures, 1000 test pictures) per category. The simulation randomly selects 1000 pictures from each category as verification pictures to form a verification set. The verification set was 10000 pictures in total. The CIFAR100 contains 100 classes, each of which contains 600 pictures (500 training pictures, 100 test pictures). The simulation experiment randomly selects 100 pictures from each category as verification pictures to form a verification set. The verification set was 10000 pictures in total. The ILSVRC2012 is a large visualization data set used for visual object recognition studies. The image classification method comprises more than 1400 million images, and is divided into a training set, a verification set and a test set, and the training set, the verification set and the test set comprise 20000 categories.

Method setting

Initializing the number of neurons M contained in the sub-network block to 5, wherein the candidate computing node types include: identity mapping, depth Separable Convolution with Convolution kernel of 3 (DW 3), depth Separable Convolution with Convolution kernel of 5 (DW 5), MAX pooling of size 3 (MAX), Average pooling of size 3 (AVG), feature reconstruction node of size 3 (3 × 3feature reconstruction constraint, FR3) of size 2 convolutional layer of size 5 (5 × 5feature reconstruction constraint, FR5)

Initializing a sub-network block type N-4, comprising: convolution module 1(convolution block 1), convolution module 2(convolution block 2), convolution module 3(convolution block 3), and reduction module (reduction block). The step size of all convolution modules is set to 1, and the width, height, and depth of the feature maps of the input and output of this module are unchanged. The convolution module processes the characteristic information of the neural network at different stages in forward propagation. The step size of all the computation nodes in the reduction module is 2, and the module reduces the width and height of the input feature map to half of the original width and extends the depth to twice of the original height.

As shown in fig. 1, the stacking order of the sub-network blocks in a neural network architecture is: the convolution module 1, the reduction module, the convolution module 2, the reduction module and the convolution module 3 are stacked in sequence. The main goal of the SIENAS is to search for the way neurons in each sub-net block connect and the type of compute nodes contained in the neurons.

Initializing strategy parameters of the SIENAS, comprising: the number of neurons N included in the sub-net block is 5, and the number of individuals P included in the initial population_n25, cross rate P_c0.9, rate of variation P_m0.1, 32 initial channel number C, batch data volume batch_size128, the maximum evolution algebra G300. The SGD optimizer parameters are initialized. The method comprises the following steps: the initial learning rate lr is 0.1, the weight attenuation coefficient w is 0.0003, and the momentum (momentum) coefficient m is 0.9.

And after the iteration of the method is finished, outputting the individual with the optimal fitness value. The individual is decoded into the corresponding neural network architecture SI-EvoNet-S. Network structure parameters are reinitialized, and the neural network architecture is trained using the training data set until convergence. The test data set is then used to test the performance of the neural network architecture.

3. Simulation result

In the invention, the optimization processes based on the CIFAR10 and the CIFAR100 are respectively shown in FIG. 7 and FIG. 8, and it can be seen that the higher prediction accuracy obtained in the search process is 93.7% and 76.8% respectively, therefore, the actual performance of an individual can be reflected in the individual performance evaluation result, and a more accurate performance ranking result can be obtained.

The sub-network blocks of the neural network architecture searched by the method based on the CIFAR10 data set are shown in FIG. 6. Compared with the existing artificially designed neural network architecture and the existing NAS method based on the evolution method, the comparison results are respectively shown in the tables 1 and 2, and compared with the comparison method, the method has higher search efficiency and higher classification accuracy.

TABLE 1

TABLE 2

Claims

1. A fast attention neural network architecture searching method based on an evolution method is characterized by comprising the following steps:

(3) decoding the individuals searched by the evolution to generate a neural network architecture, resetting the structural weight, training the neural network architecture by using a training data set until convergence, and testing the performance of the neural network architecture.

2. The fast attention neural network architecture searching method based on the evolutionary method as claimed in claim 1, wherein in the step (1), the predetermined coding scheme is a one-shot model coding method based on the evolutionary method; the method specifically comprises the following steps:

Indicating the index of the neuron connected to neuron i, i.e. neuron

As two inputs to neuron i;

(203) the output calculation formula of the neuron i is as follows:

wherein H_iRepresents the output of neuron i;

the input of the neuron i is selected,

two computational nodes contained within neuron i;

are respectively composed of

(204) A sub-network block contains M neurons, M being an integer greater than one; the coding structure of a sub-network block is:

(205) stacking N sub-network blocks of different types to form a neural network architecture; n is an integer greater than one; the coding structure of a neural network architecture is as follows:

3. The fast attention neural network architecture searching method based on evolution method as claimed in claim 1, wherein in the step (1), the population initialization scheme is based on the encoding scheme, and randomly generating individuals by uniform distribution until reaching a predetermined population size of initial population; all individuals in the initial population form a one-shot model and cover the whole search space; that is, each individual is a sub-model of the one-shot model.

4. The fast attention neural network architecture searching method based on the evolution method as claimed in claim 1, wherein in the step (1), feature reconstruction computing nodes containing an attention mechanism are adaptively integrated into the neural network architecture in the evolution searching process, so as to improve the expression capability of the neural network; the light-weight multi-scale channel attention mechanism consists of a 2-dimensional convolution layer and a light-weight channel attention mechanism; the 2-dimensional convolutional layer can extract feature information of different scales, and the channel attention mechanism is used for reducing redundant information in a channel, recalibrating channel feature response and improving the expression capacity of the neural network;

Conversion into a collection of transformation profiles

transforming feature atlas sets through a global averaging pooling layer

Is converted into a one-dimensional vector z e { z₁，z₂，......，z_c-said one-dimensional vector represents the characteristics of c channels;

(404) utilizing the one-dimensional convolution layer to complete the feature mapping of adjacent channels; the feature mapping formula is as follows:

F_l＝C1D_k(z_l)

wherein C1D represents a one-dimensional convolutional layer; k represents the size of the one-dimensional convolutional layer convolution kernel; z is a radical of_lFeatures representing the 1 st channel, based on step (2), z_lE is z; the feature set after the one-dimensional convolutional layer mapping is denoted as F, and F ═ F₁，F₂，......，F_c}；F_lRepresents the feature of the 1 st channel after one-dimensional convolutional layer mapping, and F_l∈F；

W_m＝σ_sigmoid(F_m)

(407) Assigning a corresponding weight to each channel by using a multiplication module; the formula of the multiplication module is as follows:

U＝u*w

wherein u represents a feature map set of a channel, and based on step (2),

5. The fast attention neural network architecture searching method based on evolutionary method as claimed in claim 1, wherein in the step (2), the searching method is a fast attention neural network architecture searching method (SIENAS) based on evolutionary computation; optimizing the structure of the individual and the weight of the one-shot model through an evolutionary mechanism and a back propagation gradient value, wherein the individual in the parent population is trained through a sampling training strategy to generate the weight for the neuron in the one-shot model; generating a child population, namely a new group of neural network structure topological graphs, based on the parent population through a node inheritance strategy, inheriting the weight of corresponding neurons in the parent population as an initial weight, and directly evaluating without any training, namely, the SIENAS aims at adjusting the computing nodes in the neurons and optimizing the connection of the neurons in each sub-network block so as to realize the overall optimization of a neural network architecture; the method specifically comprises the following steps:

(502) Initializing SIENAS strategy parameters; the method comprises the following steps: the number N of sub-network block types contained in the network architecture, the number M of neurons contained in each sub-network block, the initial number C of channels of the sub-network block, and the population size P of the initial population_nCross rate P_cThe rate of variation P_mBatch data volume batch_sizeMaximum evolution algebra G; initializing SGD optimizer parameters; the method comprises the following steps: the method comprises the following steps of (1) obtaining an initial learning rate r, a weight attenuation coefficient w, a learning rate adjustment strategy and a momentum coefficient m;

(503) training data set D in predetermined data set_trainVerification data set D_validAs input to the SIENAS; d _ train contains train_numberOpening a picture;

(504) let G be 0;

1) according to batch_sizeD is_trainDividing the data into a plurality of batches of data (mini-batch); then the D is_trainJ batches of data are contained in the total, and the calculation formula is as follows:

2) let f be 1;

3) based on the f batch of data

Randomly sampling an individual from the parent population;

(507) based on P in the parent population_nIndividual and fitness value thereof, and P is generated by using node inheritance strategy_nTaking new individuals as a filial generation population Q of the G generation; the node inheritance strategy is based on the neural network topological graph of the parent individual, a new neural network topological graph is generated through selection, intersection and variation in an evolution mechanism, and the weight of the neuron in the parent individual is distributed to the neuron of the corresponding child individual; the method specifically comprises the following steps:

1) selecting operation;

2) performing cross operation;

generating a random value k_c；

if k is_c＞P_cThen, no operation is carried out on the selected individuals, and the selected individuals are used as child individuals and stored in the child population Q;

4) performing mutation operation;

If k is_m≤P_mThen, performing exchange variation operation on the individual to generate a new individual; storing the new individual to a progeny population Q, and deleting the original individual;

if k is_m＞P_mThen no operation is performed on the individual;

5) carrying out weight inheritance operation;

(510) based on step (509), ordering the individuals according to fitness values of the individuals in the G-th generation population; generating a G +1 generation population through an environment selection strategy; the population scale of the G +1 generation population needs to be the same as that of the G generation population;

(511) judging whether G +1 meets the condition: if G +1 is larger than or equal to the preset maximum evolution algebra G, if so, entering the step (511); otherwise, let G ═ G +1, return to step (504);