CN112561039A - Improved search method of evolutionary neural network architecture based on hyper-network - Google Patents
Improved search method of evolutionary neural network architecture based on hyper-network Download PDFInfo
- Publication number
- CN112561039A CN112561039A CN202011567363.5A CN202011567363A CN112561039A CN 112561039 A CN112561039 A CN 112561039A CN 202011567363 A CN202011567363 A CN 202011567363A CN 112561039 A CN112561039 A CN 112561039A
- Authority
- CN
- China
- Prior art keywords
- population
- neural network
- chromosome
- node
- individuals
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 90
- 238000000034 method Methods 0.000 title claims abstract description 80
- 238000004364 calculation method Methods 0.000 claims abstract description 38
- 230000035772 mutation Effects 0.000 claims abstract description 31
- 238000010187 selection method Methods 0.000 claims abstract description 12
- 238000004806 packaging method and process Methods 0.000 claims abstract description 4
- 210000000349 chromosome Anatomy 0.000 claims description 95
- 108090000623 proteins and genes Proteins 0.000 claims description 37
- 238000012549 training Methods 0.000 claims description 33
- 238000012795 verification Methods 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 14
- 238000013461 design Methods 0.000 claims description 10
- 238000011156 evaluation Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 238000005538 encapsulation Methods 0.000 claims description 2
- 238000010200 validation analysis Methods 0.000 claims 1
- 230000008569 process Effects 0.000 description 8
- 238000003062 neural network model Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 238000010845 search algorithm Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 101100153581 Bacillus anthracis topX gene Proteins 0.000 description 1
- 101150041570 TOP1 gene Proteins 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Neurology (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to an improved search method of an evolutionary neural network architecture based on a hyper-network. The method comprises the following steps: step S1, packaging five calculation modules by taking the input layer as the first layer; step S2, connection binarization of the internal calculation nodes of the neural network is carried out; and step S3, learning a structure weight for each calculation node, and step S4, constructing a parent population P by adopting a binary tournament selection method. And step S5, forming a child population Q. And step S6, performing mutation operation on the individuals in the offspring population Q. Step S7, decoding each individual in the child population Q into a corresponding neural network to obtain a structure weight; and step S8, merging the parent population P and the child population Q into a population R, selecting a plurality of individuals as the original population of the next generation by adopting an environment selection method, and feeding back to the step S4 until a preset maximum evolution generation number is reached. And after the evolution is finished, outputting the individual with the highest fitness value as an optimal neural network architecture.
Description
Technical Field
The invention relates to the technical field of image classification model construction, in particular to an improved search method of an evolutionary neural network architecture based on a hyper-network.
Background
An image classification (image classification) task is an image processing technique that distinguishes objects of different categories based on different feature information reflected in a picture. Since many models applied to the image classification task can be migrated as a feature extraction network to other computer vision fields, the image classification task is a basic task in the computer vision field, and the design of the image classification model is a hot spot of attention of researchers. However, the artificial design of the neural network model requires experienced experts, and the neural network model with excellent performance can be designed through careful study and repeated experiments on the distribution and characteristics of the data set. Therefore, a huge amount of time and labor cost are required.
Currently, Neural network Architecture Search algorithms (NAS) are attracting a wide range of attention of researchers. Such algorithms enable an efficient neural network architecture to be automatically designed based on a given data set without much expertise. Since NAS algorithms typically require continuous evaluation of neural network models in the search space, a great deal of computer effort is required. In order to improve the search efficiency of the NAS algorithm, there are two main methods:
the first approach is to construct an End-to-End Performance Predictor. This approach requires a coding method that uniquely maps the neural network architecture into a set of digital decision variables. The coding of the neural network architecture and its performance (e.g., accuracy of classification) are then formed into a data pair that is used as input to a performance predictor, which is trained. After the performance predictor is trained, the performance of the neural network model in the search space can be directly predicted without training the neural network model, and the search efficiency is further improved. However, this approach follows a training-then-prediction approach, requiring the performance predictor to be trained first using a set of training samples. In general, the more samples trained, the better the performance of the predictor. However, collecting more training samples means consuming more computing resources, and thus has a certain impact on search efficiency. Therefore, in practical use, a neural network architecture which is more effective by using an incremental strategy needs to be sampled, and certain calculation cost is needed.
The second method is a Neural network Architecture Search method (One-shot Neural Architecture Search) based on the super network. The method needs to train a hyper-network (One-shot model) as a search space; then randomly sampling a certain number of sub-networks from the super-network for performance evaluation, and ranking the sub-networks according to the performance of the sub-networks; finally, the sub-network with the best performance evaluation is taken as the output of the algorithm. Because the sub-network can relay the bearing value from the super-network and can evaluate without training, the search efficiency of the NAS algorithm can be effectively improved. However, the existing neural network architecture search algorithm based on the super network has certain defects. Firstly, the training of nodes inside the super-network is unbalanced, which causes inaccurate performance ranking in the sub-network evaluation stage, and further causes the algorithm not to find a network architecture with the best performance. Secondly, when the super network is trained, mutual interference among different sub networks may cause instability of a neural network architecture search algorithm based on the super network, the super network convergence speed is slow, and even convergence is impossible, so that the performance prediction result of the sub model is poor.
Disclosure of Invention
Aiming at the defects that the neural network architecture searching method based on the super network in the prior art is unstable in performance, the super network training convergence speed is low and even the convergence cannot be realized, the invention provides the neural network architecture searching method based on the super network, and the neural network architecture is automatically generated based on the super network by using an evolutionary algorithm as a searching strategy so as to improve the classification accuracy of an image classification task.
In order to solve the technical problems, the invention adopts the technical scheme that:
an improved search method of an evolutionary neural network architecture based on a hyper-network is characterized by comprising the following steps:
step S1, packaging five calculation modules by taking the input layer as the first layer; m computing nodes are packaged in each module, and finally, a full connection layer is used as an output layer of the neural network; and M is a natural number greater than 1.
Step S2, coding the neural network structure by a mixed coding mode, and binarizing the connection of the internal calculation nodes of the neural network; randomly generating N chromosomes to construct an initial population; and N is a natural number greater than 1.
Step S3, aiming at the individuals in the population, evenly sampling, training based on training data, learning a structure weight for each computing node, and adopting the classification precision of a verification set as a fitness function to evaluate the fitness of the individuals.
And step S4, constructing a parent population P by adopting a binary championship selection method.
Step S5, based on the given crossover rate pcAnd carrying out pairwise crossing on chromosome individuals in the parent population by adopting a mixed crossing method to obtain a plurality of new chromosomes to form a child population Q.
Step S6, based on the given variation rate pmAnd performing mutation operation on the individuals in the offspring population Q by adopting a mixed mutation method.
And step S7, decoding each individual in the offspring population Q into a corresponding neural network, obtaining a structure weight value in an inheritance or random initialization mode, and performing fitness evaluation on the individual by adopting the classification precision of the verification set as a fitness function.
Step S8, merging the parent population P and the child population Q into a population R, selecting several individuals as the original population of the next Generation by using an environment selection method, and feeding back to step S4 until reaching a predetermined maximum evolution number (Generation). And after the evolution is finished, outputting the individual with the highest fitness value as an optimal neural network architecture.
Further, in step S1, the input layer is sequentially composed of a convolutional layer, a ReLU activation function, and a Batch Normalization (BN) layer encapsulation.
Further, in step S1, the computing node is a computing unit in the neural network, and may be randomly selected from the operation search space θ. All the calculation node steps in the first calculation module, the third calculation module and the fifth calculation module are 1; the step length of all the computing nodes in the second computing module and the fourth computing module is 2.
Further, in step S2, the hybrid coding scheme is a combination of integer and binary. Describing the types of the computing nodes in the neural network architecture and the connection relation between the nodes by using integer coding; and binarizing the connection relation of the computing nodes in the neural network architecture by using binary numbers to describe whether the connection between the two computing nodes is activated or not. The method specifically comprises the following steps:
further, in the above step S21, a compute node is encoded as a quintupleWherein the content of the first and second substances,represents a calculation unit a included in a calculation node i; i is1,I2Indexes of computing units representing connections of computing node I, i.e. computing node I and computing node I1,I2Are connected with each other;I1,I2is a set of integers; j. the design is a square1,J2Representing a compute node I and a compute node I for a set of binary numbers1,I2The four states of the connection mode are specifically: j. the design is a square1=0,J 20 denotes a compute node I and a compute node I1,I2All the connections are in an activated state; then at this point, node I is computed1,I2After the feature maps of the outputs are fused, the fused feature maps are used as the inputs of the computing nodes i. The output δ of the computing node i is:
J1=0,J 21, denotes a compute node I and a compute node I1Is activated, calculates node i and countsCalculation node I2The connection of (2) is closed; then, at this time, the output δ of the computing node i is:
J1=1,J 20 denotes a compute node I and a compute node I1Is closed, computing node I and computing node I2Is activated; then, at this time, the output δ of the computing node i is:
J1=1,J 21, denotes a compute node I and a compute node I1,I2All the connections are in a closed state; i.e. the current compute node i is masked. Then at this point, node I is computed1,I2After the output feature graph is fused, the feature graph does not pass through a computing nodeProcessing is done directly as the output value δ of the compute node i:
δ=I1(xc)+I2(xd)
wherein x isc,xdAre respectively a computing node I1,I2Input of (1)1(xc),I2(xd) Are respectively a computing node I1,I2Output of (2)Representing a computing node I1,I2Output characteristic diagram I1(xc),I2(xd) Fusion, as input to compute node i, by a compute unitAfter processing, as the output of compute node i.
Further, in step S22, the computing module includes M computing nodes. Then the coding structure of a computing module at this time is:
in step S23, the chromosome is a neural network architecture, and each neural network architecture includes five computing modules. Then, at this time, the coding structure of a neural network architecture is:
further, in step S3, the individuals in the population are uniformly sampled, training is performed based on training data, a structure weight is generated for each computing node, and fitness evaluation is performed on the individuals by using the classification accuracy of the verification set as a fitness function. The method specifically comprises the following steps:
further, in step S31, the predetermined training data set is divided into B batches (batch) on average according to the size of the batch data (batch size). And B is a natural number larger than N. In each batch, randomly selecting an individual from the parent population P, decoding the individual into a corresponding neural network, and training until a maximum training batch B is reached.
Further, at the above step S32, the fitness value fitness of each individual in the parent population is evaluated. The method comprises the following steps of adopting the classification accuracy of the pictures in the verification data set as a fitness function to evaluate the fitness, wherein the expression is as follows:
wherein G is the number of pictures with correct model identification, and H is the total number of pictures in the verification set.
Further, in the step S4, the binary tournament selection method may include the steps of:
and step S41, randomly selecting two individuals from the original population, reserving the individual with higher fitness value to the parent population P according to the fitness value, and returning the individual with lower fitness value to the original population.
And step S42, repeating step S41 until the number of individuals contained in the parent population P reaches a preset number of individuals K, wherein K is a natural number more than 1.
Further, in the above step S5, based on the given crossover rate pcCarrying out pairwise crossing on chromosome individuals in the parent population P by using a mixed crossing method to obtain a plurality of chromosome individuals, and specifically comprising the following steps:
step S51, the integer part and the binary part of each chromosome are split into an integer chromosome part and a binary chromosome part.
Step S52, in the interval [0, 1]]Randomly generating a random number r, and randomly selecting two individuals P from the parent population P1And p2Determining the two individuals p by using the random number r1And p2Whether to perform a crossover operation.
Step S53, if r is less than or equal to pmAligning the left sides of the integer chromosome parts of the two chromosomes to carry out single-point crossing, namely randomly setting a crossing point in the two integer chromosomes, and exchanging genes after the crossing point, wherein the crossing point of the two integer chromosomes is at the same position; the left sides of the binary chromosome parts of the two chromosomes are aligned for multi-point crossing, i.e. several crossing points are randomly selected in the two binary chromosomes, and the genes at the crossing points are exchanged, and the crossing points of the two binary numbers should be at the same position. Combining two individuals q resulting from said hybrid crossover method1And q is2And storing the obtained product into a filial generation population Q.
Step S54, if r is larger than or equal to pmThe two individuals p selected in step S521And p2And storing the obtained product into a filial generation population Q.
Further, in the above step S6, the variation rate p is determined based on the predetermined variation ratemBy mixed variationMutation operations are performed on individuals in the offspring population Q. The method comprises the following specific steps:
step S61, the integer part and the binary part of each chromosome are split into an integer chromosome part and a binary chromosome part.
Step S62, randomly generating a random number t corresponding to any chromosome individual in the interval [0, 1] for any gene position in any chromosome individual, and using the random number to determine whether the gene position of the individual is mutated.
Step S63, if t is less than or equal to pmThen a polynomial mutation operation is performed on the integer chromosome portion of the chromosome.
Wherein, aiA 'represents the gene at the i-th gene position in the chromosome'iThe expression is based on said gene aiThe novel gene so produced; u is in the interval [0, 1]]A random number generated in (a);respectively represent the genes aiUpper and lower bounds of variation.
Step S64, if t is less than or equal to pmThen, a flip mutation operation is performed on the binary chromosome part of the chromosome, that is, a plurality of mutation points are randomly selected in the chromosome, and the mutation is performed on the gene locus corresponding to each mutation point, wherein the gene locus with 0 is mutated into 1, and the gene locus with 1 is mutated into 0.
In step S7, each individual in the child population obtains a structure weight value by means of inheritance or random initialization, specifically: for any chromosome individual in the offspring population Q, if any calculation node in the chromosome individual is obtained by the hybrid crossover method of the step S5, inheriting the weight from the corresponding calculation node in the parent generation chromosome individual; if the hybrid mutation method in step S6 is used, the weight of the computing node is generated by random initialization.
In step S8, the parent population P and the child population Q are combined into a population R, and a plurality of individuals are selected as the original population of the next generation by an environment selection method, which specifically comprises the steps of:
and step S81, sorting the individuals in the population R according to the fitness value in the sequence from high fitness value to low fitness value.
And step S82, selecting individuals ranked from No. 1 to No. N in the population R according to the preset population scale N as the next generation population.
Compared with the prior art, the invention has the following beneficial effects:
(1) the method comprises the steps of coding a hyper-network by using a hybrid coding mode, describing the type and the connection relation of a calculation node inside a neural network architecture by using integer coding, and binarizing the connection relation of the calculation node inside the neural network architecture by using binary coding; the design has the advantages that different parts of the chromosomes can be randomly selected to be crossed in the population evolution process, the global search and the local search of the search space can be realized simultaneously, specifically, the single-point crossing operation is to generate a new neural network architecture by exchanging the internal computing nodes of two individuals, and the global exploration of the search space is realized. The multipoint intersection operation is only to exchange the binarization information of the neural network connection, and is to change the flow direction of the data stream in the single neural network to generate a new neural network architecture, thereby realizing the local exploration of the search space.
(2) Based on the mixed coding mode, different parts of the chromosome can be randomly selected to carry out mutation operation in the evolution process of the population, and calculation nodes which do not belong to the super network are introduced by a polynomial mutation method, and the weights of the calculation nodes are randomly initialized; the design has the advantage that the problem that the convergence is difficult after the hyper-network training to the later stage due to the deep coupling relation formed by calculating the node weights in the hyper-network training by the conventional method can be solved. The introduced computing nodes which do not belong to the super network can be merged into the super network along with the population evolution process, and because the weights of the computing nodes which do not belong to the super network are randomly given, the deep coupling relation in the super network training can be reduced, the algorithm can be helped to jump out of a local optimal solution, and the difficulty in convergence of the super network training can be avoided.
Based on the beneficial effects (1) and (2), the method provided by the invention can solve the problem that the super network is difficult to train to converge, and based on the problem, compared with the existing method, the method provided by the invention can realize the neural network architecture search based on a large-scale data set (for example, ImageNet).
Drawings
Fig. 1 is an overall architecture of the neural network of the present invention.
FIG. 2 is a flow chart of the algorithm of the present invention.
FIG. 3 is a flow chart of chromosome generation creation according to the present invention.
FIG. 4 is a schematic diagram of the hybrid cross method and hybrid mutation method of the present invention.
FIG. 5 is a process of neural network architecture optimization based on ImageNet classification task according to the present invention.
FIG. 6 is a training process of the neural network architecture searched by the present invention, based on ImageNet classification task.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1 to fig. 3, the present embodiment provides an improved searching method for an evolutionary neural network architecture based on a super network, which mainly includes the following steps:
the method comprises the following steps that firstly, an input layer is used as a first layer, and five calculation modules are packaged; m computing nodes are packaged in each module, and finally, a full connection layer is used as an output layer of the neural network; in this embodiment, each computing module is configured to include 9 computing nodes, that is, M is 9; the input layer is formed by packaging a convolution layer, a ReLU activation function and a Batch Normalization (BN) layer in sequence; the computing nodes are computing units in the neural network and can be randomly selected from the operation search space theta. All the calculation node steps in the first calculation module, the third calculation module and the fifth calculation module are 1; the step length of all the computing nodes in the second computing module and the fourth computing module is 2.
Secondly, coding the neural network structure in a mixed coding mode, and binarizing the connection of internal computing nodes of the neural network; randomly generating N chromosomes to construct an initial population; in this example, the initial population includes 40 chromosomes, i.e., N-40.
In this embodiment, a hybrid coding method is adopted to randomly generate an initial population to realize population initialization, each individual in the initial population represents a neural network architecture corresponding to the individual, and a connection method of internal computing nodes is binarized at the same time. Each computing node represents a computing unit of the neural network, and the coding information of the computing unit is shown in table 1. In the process of gene coding, the computing units are randomly coded into the overall neural network architecture to form a chromosome, namely the final neural network architecture is formed.
TABLE 1 coding information of neural network computing units
The specific coding mode is as follows:
the mixed coding mode is a coding mode combining integer and binary number. Describing the types of the computing nodes in the neural network architecture and the connection relation between the nodes by using integer coding; and binarizing the connection relation of the computing nodes in the neural network architecture by using binary numbers to describe whether the connection between the two computing nodes is activated or not. The method specifically comprises the following steps:
in step S21, a compute node is encoded as a five tupleWherein the content of the first and second substances,represents a calculation unit a included in a calculation node i; i is1,I2Indexes of computing units representing connections of computing node I, i.e. computing node I and computing node I1,I2Are connected with each other;I1,I2is a set of integers; j. the design is a square1,J2Representing a compute node I and a compute node I for a set of binary numbers1,I2The four states of the connection mode are specifically:
J1=0,J 20 denotes a compute node I and a compute node I1,I2All the connections are in an activated state; then at this point, node I is computed1,I2After the feature maps of the outputs are fused, the fused feature maps are used as the inputs of the computing nodes i. The output δ of the computing node i is:
J1=0,J 21, denotes a compute node I and a compute node I1Is activated, computing node I and computing node I2The connection of (2) is closed; then, at this time, the output δ of the computing node i is:
J1=1,J 20 denotes a compute node I and a compute node I1Is closed, computing node I and computing node I2Is activated; then, at this time, the output δ of the computing node i is:
J1=1,J 21, denotes a compute node I and a compute node I1,I2All the connections are in a closed state; i.e. the current compute node i is masked. Then at this point, node I is computed1,I2After the output feature graph is fused, the feature graph does not pass through a computing nodeProcessing is done directly as the output value δ of the compute node i:
δ=I1(xc)+I2(xd)
wherein x isc,xdAre respectively a computing node I1,I2Input of (1)1(xc),I2(xd) Are respectively a computing node I1,I2Output of (2)Representing a computing node I1,I2Output characteristic diagram I1(xc),I2(xd) Fusion, as input to compute node i, by a compute unitAfter processing, as the output of compute node i.
In step S22, the computing module includes M computing nodes. Then the coding structure of a computing module at this time is:
in step S23, the chromosome is a neural network architecture, and each neural network architecture includes five computing modules. Then, at this time, the coding structure of a neural network architecture is:
and thirdly, uniformly sampling the individuals in the population, training based on training data, learning a structure weight for each computing node, and evaluating the fitness of the individuals by adopting the classification precision of a verification set as a fitness function. The method specifically comprises the following steps:
in step S31, the predetermined training data set is divided into B batches (batch) on average according to the size of the batch data (batch size). And B is a natural number larger than N. In each batch, randomly selecting an individual from the parent population P, decoding the individual into a corresponding neural network, and training until a maximum training batch B is reached. Based on this, in the present embodiment, the batch size is set to 256.
At step S32, the fitness value fitness of each individual in the parent population is evaluated. The method comprises the following steps of adopting the classification accuracy of the pictures in the verification data set as a fitness function to evaluate the fitness, wherein the expression is as follows:
wherein G is the number of pictures with correct model identification, and H is the total number of pictures in the verification set.
And fourthly, constructing a parent population P by adopting a binary championship selection method. The method specifically comprises the following steps:
and step S41, randomly selecting two individuals from the original population, reserving the individual with higher fitness value to the parent population P according to the fitness value, and returning the individual with lower fitness value to the original population.
In step S42, step S41 is repeated until the number of individuals included in the parent population P reaches a preset number of individuals K, which is 40 in the present embodiment.
A fifth step of determining the intersection rate p based on the given intersection ratecAnd carrying out pairwise crossing on chromosome individuals in the parent population by adopting a mixed crossing method to obtain a plurality of new chromosomes to form a child population Q. In this example, pcThe hybrid crossover method is shown in fig. 4 as 0.95, and comprises the following specific steps:
step S51, the integer part and the binary part of each chromosome are split into an integer chromosome part and a binary chromosome part.
Step S52, in the interval [0, 1]]Randomly generating a random number r, and randomly selecting two individuals P from the parent population P1And p2Determining the two individuals p by using the random number r1And p2Whether to perform a crossover operation.
Step S53, if r is less than or equal to pmAligning the left sides of the integer chromosome parts of the two chromosomes to carry out single-point crossing, namely randomly setting a crossing point in the two integer chromosomes, and exchanging genes after the crossing point, wherein the crossing point of the two integer chromosomes is at the same position; the left sides of the binary chromosome parts of the two chromosomes are aligned for multi-point crossing, i.e. several crossing points are randomly selected in the two binary chromosomes, and the genes at the crossing points are exchanged, and the crossing points of the two binary numbers should be at the same position. Combining two individuals q resulting from said hybrid crossover method1And q is2And storing the obtained product into a filial generation population Q.
Step S54, if r is larger than or equal to pmThe two individuals p selected in step S521And p2And storing the obtained product into a filial generation population Q.
A sixth step of determining the variation rate p based on the given variation ratemAnd performing mutation operation on the individuals in the offspring population Q by adopting a mixed mutation method. In this example pmThe mixed variation method is shown in fig. 5 as 0.1, and comprises the following specific steps:
step S61, the integer part and the binary part of each chromosome are split into an integer chromosome part and a binary chromosome part.
Step S62, randomly generating a random number t corresponding to any chromosome individual in the interval [0, 1] for any gene position in any chromosome individual, and using the random number to determine whether the gene position of the individual is mutated.
Step S63, if t is less than or equal to pmThen a polynomial mutation operation is performed on the integer chromosome portion of the chromosome.
Wherein, aiA 'represents the gene at the i-th gene position in the chromosome'iThe expression is based on said gene aiThe novel gene so produced; u is in the interval [0, 1]]A random number generated in (a);respectively represent the genes aiUpper and lower bounds of variation.
Step S64, if t is less than or equal to pmThen, a flip mutation operation is performed on the binary chromosome part of the chromosome, that is, a plurality of mutation points are randomly selected in the chromosome, and the mutation is performed on the gene locus corresponding to each mutation point, wherein the gene locus with 0 is mutated into 1, and the gene locus with 1 is mutated into 0.
And seventhly, decoding each individual in the offspring population Q into a corresponding neural network, obtaining a structure weight value in a succession or random initialization mode, and performing fitness evaluation on the individual by adopting the classification precision of the verification set as a fitness function.
Each individual in the offspring population obtains a structure weight value through inheritance or random initialization, and the method specifically comprises the following steps: for any chromosome individual in the offspring population Q, if any calculation node in the chromosome individual is obtained by the hybrid crossover method of the step S5, inheriting the weight from the corresponding calculation node in the parent generation chromosome individual; if the hybrid mutation method in step S6 is used, the weight of the computing node is generated by random initialization.
And eighthly, combining the parent population P and the child population Q into a population R, selecting a plurality of individuals as the original population of the next generation by adopting an environment selection method, and feeding back to the step S4 until a preset maximum evolution generation number is reached. And after the evolution is finished, outputting the individual with the highest fitness value as an optimal neural network architecture.
And step S81, sorting the individuals in the population R according to the fitness value in the sequence from high fitness value to low fitness value.
And step S82, selecting individuals ranked from No. 1 to No. N in the population R according to the preset population scale N as the next generation population.
To verify the advantages of the present invention, the following comparisons were made:
the dataset used by the invention is ImageNet. ImageNet is a large visual data set used for visual object recognition studies. The image classification method comprises more than 1400 million images, and is divided into a training set, a verification set and a test set, and the training set, the verification set and the test set comprise 20000 categories.
The algorithm hyper-parameter design used by the invention is as follows:
the initial channel number C is 32, and the maximum evolution Generation is 100. The SGD optimizer parameters are initialized. The method comprises the following steps: the initial learning rate lr is 0.1, the weight attenuation coefficient w is 0.0003, and the momentum (momentum) coefficient m is 0.9.
And after the algorithm iteration is finished, outputting the individual with the optimal fitness value. The individual is decoded into the corresponding neural network architecture EvoNet. Network structure parameters are reinitialized, and the neural network architecture is trained using the training data set until convergence. The test data set is then used to test the performance of the neural network architecture.
In the invention, the optimization process and the final individual test process based on ImageNet are respectively shown in FIGS. 5 and 6, and it can be seen that the invention obtains higher prediction classification accuracy during searching, and the classification accuracy of top1 is as follows: 77.4 percent.
The comparison result of the performance of the neural network architecture searched by the method, the existing artificially designed neural network architecture and the neural network architecture searching algorithm is shown in table 2. From table 2, it can be found that the neural network architecture searched by the present invention has better performance than the existing artificially designed neural network architecture and neural network architecture search algorithm.
Table 2 comparison of experimental results
Claims (10)
1. An improved search method of an evolutionary neural network architecture based on a hyper-network is characterized by comprising the following steps:
step S1, packaging five calculation modules by taking the input layer as the first layer; m computing nodes are packaged in each module, and finally, a full connection layer is used as an output layer of the neural network; m is a natural number greater than 1;
step S2, coding the neural network structure by a mixed coding mode, and binarizing the connection of the internal calculation nodes of the neural network; randomly generating N chromosomes to construct an original population; the number of the computing nodes in any chromosome is less than the total number of the computing nodes of a preset chromosome; n is a natural number greater than 1;
step S3, uniformly sampling the individuals in the population, training based on training data, learning a structure weight for each computing node, and performing fitness evaluation on the individuals by adopting the classification precision of a verification set as a fitness function;
step S4, constructing a parent population P by adopting a binary championship selection method;
step S5, based on the given crossover rate pcCarrying out pairwise crossing on chromosome individuals in the parent population by adopting a mixed crossing method to obtain a plurality of new chromosomes to form a child population Q;
step S6, based on the given variation rate pmPerforming mutation operation on individuals in the offspring population Q by adopting a mixed mutation method;
step S7, decoding each individual in the offspring population Q into a corresponding neural network, obtaining a structure weight value in an inheritance or random initialization mode, and adopting the classification precision of a verification set as a fitness function to evaluate the fitness of the individual;
step S8, merging the parent population P and the child population Q into a population R, selecting a plurality of individuals as the original population of the next generation by adopting an environment selection method, and feeding back to the step S4 until a preset maximum evolution generation is reached; and after the evolution is finished, outputting the individual with the highest fitness value as an optimal neural network architecture.
2. The improved searching method for evolutionary neural network architecture based on super network as claimed in claim 1, wherein the input layer is composed of convolutional layer, ReLU activation function and batch normalization layer encapsulation in sequence.
3. The improved searching method for neural network architecture based on super network evolution as claimed in claim 1, wherein in step S1, said computing node is a computing unit in the neural network, and can be randomly selected from the operation search space θ; all the calculation node steps in the first calculation module, the third calculation module and the fifth calculation module are 1; the step length of all the computing nodes in the second computing module and the fourth computing module is 2.
4. The improved searching method for neural network architecture based on super network as claimed in claim 1, wherein in step S2, said hybrid coding mode is a coding mode combining integer and binary number; describing the types of the computing nodes in the neural network architecture and the connection relation between the nodes by using integer coding; binarizing the connection relation of the computing nodes in the neural network architecture by using binary numbers to describe whether the connection between the two computing nodes is activated or not; the method specifically comprises the following steps:
in step S21, a compute node is encoded as a five tupleWherein the content of the first and second substances,represents a calculation unit a included in a calculation node i; i is1,I2Indexes of computing units representing connections of computing node I, i.e. computing node I and computing node I1,I2Are connected with each other;I1,I2is a set of integers; j. the design is a square1,J2Representing a compute node I and a compute node I for a set of binary numbers1,I2The four states of the connection mode are specifically: j. the design is a square1=0,J20 denotes a compute node I and a compute node I1,I2All the connections are in an activated state; then at this point, node I is computed1,I2After the output feature graphs are fused, the feature graphs are used as the input of a computing node i; the output δ of the computing node i is:
J1=0,J21, denotes a compute node I and a compute node I1Is activated, computing node I and computing node I2The connection of (2) is closed; then, at this time, the output δ of the computing node i is:
J1=1,J20 denotes a compute node I and a compute node I1Is closed, computing node I and computing node I2Is activated; then, at this time, the output δ of the computing node i is:
J1=1,J21, denotes a compute node I and a compute node I1,I2All the connections are in a closed state; namely, the current computing node i is shielded; then at this point, node I is computed1,I2After the output feature graph is fused, the feature graph does not pass through a computing nodeProcessing is done directly as the output value δ of the compute node i:
δ=I1(xc)+I2(xd)
wherein x isc,xdAre respectively a computing node I1,I2Input of (1)1(xc),I2(xd) Are respectively a computing node I1,I2Output of (2)Representing a computing node I1,I2Output characteristic diagram I1(xc),I2(xd) Fusion, as input to compute node i, by a compute unitAfter processing, the output of the computing node i is used;
step S22, the computing module comprises M computing nodes; then the coding structure of a computing module at this time is:
step S23, the chromosome is a neural network architecture, each neural network architecture comprises five calculation modules; then, at this time, the coding structure of a neural network architecture is:
5. the improved search method for an evolutionary neural network architecture based on a super network as claimed in claim 1, wherein in step S3, aiming at the individuals in the population, uniform sampling is performed, training is performed based on training data, a structure weight is generated for each computing node, and fitness evaluation is performed on the individuals by using the classification precision of the validation set as a fitness function; the method specifically comprises the following steps:
step S31, equally dividing a predetermined training data set into B batches (batch) according to the size of the given batch size; b is a natural number larger than N; in each batch, randomly selecting an individual from the parent population P, decoding the individual into a corresponding neural network for training until a maximum training batch B is reached;
step S32, evaluating the fitness value fitness of each individual in the parent population; the method comprises the following steps of adopting the classification accuracy of the pictures in the verification data set as a fitness function to evaluate the fitness, wherein the expression is as follows:
wherein G is the number of pictures with correct model identification, and H is the total number of pictures in the verification set.
6. The improved searching method for neural network architecture based on super networks as claimed in claim 1, wherein in step S4, for said binary tournament selection method, the steps are as follows:
step S41, randomly selecting two individuals from the original population, reserving the individual with higher fitness value to the parent population P according to the fitness value, and putting the individual with lower fitness value back to the original population;
and step S42, repeating step S41 until the number of individuals contained in the parent population P reaches a preset number of individuals K, wherein K is a natural number more than 1.
7. The improved searching method for evolutionary neural network architecture based on super network as claimed in claim 1, wherein in step S5, based on a given crossover rate pcCarrying out pairwise crossing on chromosome individuals in the parent population P by using a mixed crossing method to obtain a plurality of chromosome individualsThe method comprises the following steps:
step S51, splitting the integer part and the binary number part of each chromosome into an integer chromosome part and a binary number chromosome part;
step S52, in the interval [0, 1]]Randomly generating a random number r, and randomly selecting two individuals P from the parent population P1And p2Determining the two individuals p by using the random number r1And p2Whether to perform a crossover operation;
step S53, if r is less than or equal to pmAligning the left sides of the integer chromosome parts of the two chromosomes to carry out single-point crossing, namely randomly setting a crossing point in the two integer chromosomes, and exchanging genes after the crossing point, wherein the crossing point of the two integer chromosomes is at the same position; aligning the left sides of the binary chromosome parts of the two chromosomes to carry out multi-point crossing, namely randomly selecting a plurality of cross points in the two binary chromosomes, and exchanging genes at the cross points, wherein the cross points of the two binary numbers are in the same position; combining two individuals q resulting from said hybrid crossover method1And q is2Storing the obtained product in a filial generation population Q;
step S54, if r is larger than or equal to pmThe two individuals p selected in step S521And p2And storing the obtained product into a filial generation population Q.
8. The improved searching method for neural network architecture based on super-networks as claimed in claim 1, wherein in step S6, based on a given variation rate pmPerforming mutation operation on individuals in the offspring population Q by using a mixed mutation method; the method comprises the following specific steps:
step S61, splitting the integer part and the binary number part of each chromosome into an integer chromosome part and a binary number chromosome part;
step S62, randomly generating a random number t corresponding to any chromosome individual in the interval [0, 1] aiming at any gene position in any chromosome individual, and determining whether the gene position of the individual is subjected to mutation operation by using the random number;
step S63, if t is less than or equal to pmPerforming a polynomial mutation operation on the integer chromosome portion of the chromosome;
wherein, aiA 'represents the gene at the i-th gene position in the chromosome'iThe expression is based on said gene aiThe novel gene so produced; u is in the interval [0, 1]]A random number generated in (a);respectively represent the genes aiUpper and lower bounds of variation;
step S64, if t is less than or equal to pmThen, a flip mutation operation is performed on the binary chromosome part of the chromosome, that is, a plurality of mutation points are randomly selected in the chromosome, and the mutation is performed on the gene locus corresponding to each mutation point, wherein the gene locus with 0 is mutated into 1, and the gene locus with 1 is mutated into 0.
9. The improved searching method for the neural network architecture based on the evolution of the super network as claimed in claim 1, wherein in step S7, each individual in the offspring population obtains the structure weight through inheritance or random initialization, specifically: for any chromosome individual in the offspring population Q, if any calculation node in the chromosome individual is obtained by the hybrid crossover method of the step S5, inheriting the weight from the corresponding calculation node in the parent generation chromosome individual; if the hybrid mutation method in step S6 is used, the weight of the computing node is generated by random initialization.
10. The improved searching method for the neural network architecture based on the evolution of the super network as claimed in claim 1, wherein in step S8, the parent population P and the child population Q are combined into a population R, and a plurality of individuals are selected as the original population of the next generation by using the environment selection method, which comprises the following specific steps:
step S81, according to the fitness value, sorting the individuals in the population R according to the sequence of the fitness value from high to low;
and step S82, selecting individuals ranked from No. 1 to No. N in the population R according to the preset population scale N as the next generation population.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011567363.5A CN112561039A (en) | 2020-12-26 | 2020-12-26 | Improved search method of evolutionary neural network architecture based on hyper-network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011567363.5A CN112561039A (en) | 2020-12-26 | 2020-12-26 | Improved search method of evolutionary neural network architecture based on hyper-network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112561039A true CN112561039A (en) | 2021-03-26 |
Family
ID=75033047
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011567363.5A Pending CN112561039A (en) | 2020-12-26 | 2020-12-26 | Improved search method of evolutionary neural network architecture based on hyper-network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112561039A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113128432A (en) * | 2021-04-25 | 2021-07-16 | 四川大学 | Multi-task neural network architecture searching method based on evolutionary computation |
CN113537399A (en) * | 2021-08-11 | 2021-10-22 | 西安电子科技大学 | Polarized SAR image classification method and system of multi-target evolutionary graph convolution neural network |
CN113642730A (en) * | 2021-08-30 | 2021-11-12 | Oppo广东移动通信有限公司 | Convolutional network pruning method and device and electronic equipment |
CN114943866A (en) * | 2022-06-17 | 2022-08-26 | 之江实验室 | Image classification method based on evolutionary neural network structure search |
CN114997360A (en) * | 2022-05-18 | 2022-09-02 | 四川大学 | Evolution parameter optimization method, system and storage medium of neural architecture search algorithm |
CN115359337A (en) * | 2022-08-23 | 2022-11-18 | 四川大学 | Searching method, system and application of pulse neural network for image recognition |
WO2023124342A1 (en) * | 2021-12-31 | 2023-07-06 | 江南大学 | Low-cost automatic neural architecture search method for image classification |
-
2020
- 2020-12-26 CN CN202011567363.5A patent/CN112561039A/en active Pending
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113128432A (en) * | 2021-04-25 | 2021-07-16 | 四川大学 | Multi-task neural network architecture searching method based on evolutionary computation |
CN113128432B (en) * | 2021-04-25 | 2022-09-06 | 四川大学 | Machine vision multitask neural network architecture searching method based on evolution calculation |
CN113537399A (en) * | 2021-08-11 | 2021-10-22 | 西安电子科技大学 | Polarized SAR image classification method and system of multi-target evolutionary graph convolution neural network |
CN113642730A (en) * | 2021-08-30 | 2021-11-12 | Oppo广东移动通信有限公司 | Convolutional network pruning method and device and electronic equipment |
WO2023124342A1 (en) * | 2021-12-31 | 2023-07-06 | 江南大学 | Low-cost automatic neural architecture search method for image classification |
CN114997360A (en) * | 2022-05-18 | 2022-09-02 | 四川大学 | Evolution parameter optimization method, system and storage medium of neural architecture search algorithm |
CN114997360B (en) * | 2022-05-18 | 2024-01-19 | 四川大学 | Evolution parameter optimization method, system and storage medium of neural architecture search algorithm |
CN114943866A (en) * | 2022-06-17 | 2022-08-26 | 之江实验室 | Image classification method based on evolutionary neural network structure search |
CN114943866B (en) * | 2022-06-17 | 2024-04-02 | 之江实验室 | Image classification method based on evolutionary neural network structure search |
CN115359337A (en) * | 2022-08-23 | 2022-11-18 | 四川大学 | Searching method, system and application of pulse neural network for image recognition |
CN115359337B (en) * | 2022-08-23 | 2023-04-18 | 四川大学 | Searching method, system and application of pulse neural network for image recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112561039A (en) | Improved search method of evolutionary neural network architecture based on hyper-network | |
CN102413029B (en) | Method for partitioning communities in complex dynamic network by virtue of multi-objective local search based on decomposition | |
CN111737535B (en) | Network characterization learning method based on element structure and graph neural network | |
CN111275172B (en) | Feedforward neural network structure searching method based on search space optimization | |
CN112465120A (en) | Fast attention neural network architecture searching method based on evolution method | |
CN110232434A (en) | A kind of neural network framework appraisal procedure based on attributed graph optimization | |
Gao et al. | An improved clonal selection algorithm and its application to traveling salesman problems | |
Wen et al. | Learning ensemble of decision trees through multifactorial genetic programming | |
CN113128432B (en) | Machine vision multitask neural network architecture searching method based on evolution calculation | |
Pawar et al. | Optimized ensembled machine learning model for IRIS plant classification | |
Bedboudi et al. | An heterogeneous population-based genetic algorithm for data clustering | |
Pan et al. | Neural architecture search based on evolutionary algorithms with fitness approximation | |
Chen et al. | A new multiobjective evolutionary algorithm for community detection in dynamic complex networks | |
Broni-Bediako et al. | Evolutionary NAS with gene expression programming of cellular encoding | |
CN114241267A (en) | Structural entropy sampling-based multi-target architecture search osteoporosis image identification method | |
Wei et al. | MOO-DNAS: Efficient neural network design via differentiable architecture search based on multi-objective optimization | |
CN116611504A (en) | Neural architecture searching method based on evolution | |
Parsa et al. | Multi-objective hyperparameter optimization for spiking neural network neuroevolution | |
Hu et al. | Apenas: An asynchronous parallel evolution based multi-objective neural architecture search | |
Chen et al. | MFENAS: multifactorial evolution for neural architecture search | |
CN115620046A (en) | Multi-target neural architecture searching method based on semi-supervised performance predictor | |
Ma et al. | Auto-ORVNet: Orientation-boosted volumetric neural architecture search for 3D shape classification | |
Xue et al. | RARTS: an efficient first-order relaxed architecture search method | |
Zhang et al. | A fast evolutionary knowledge transfer search for multiscale deep neural architecture | |
Dong et al. | Conditionally tractable density estimation using neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |