CN112561039A - Improved search method of evolutionary neural network architecture based on hyper-network - Google Patents

Improved search method of evolutionary neural network architecture based on hyper-network Download PDF

Info

Publication number
CN112561039A
CN112561039A CN202011567363.5A CN202011567363A CN112561039A CN 112561039 A CN112561039 A CN 112561039A CN 202011567363 A CN202011567363 A CN 202011567363A CN 112561039 A CN112561039 A CN 112561039A
Authority
CN
China
Prior art keywords
population
neural network
chromosome
node
individuals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011567363.5A
Other languages
Chinese (zh)
Inventor
金耀初
沈修平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI ULUCU ELECTRONIC TECHNOLOGY CO LTD
Original Assignee
SHANGHAI ULUCU ELECTRONIC TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI ULUCU ELECTRONIC TECHNOLOGY CO LTD filed Critical SHANGHAI ULUCU ELECTRONIC TECHNOLOGY CO LTD
Priority to CN202011567363.5A priority Critical patent/CN112561039A/en
Publication of CN112561039A publication Critical patent/CN112561039A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an improved search method of an evolutionary neural network architecture based on a hyper-network. The method comprises the following steps: step S1, packaging five calculation modules by taking the input layer as the first layer; step S2, connection binarization of the internal calculation nodes of the neural network is carried out; and step S3, learning a structure weight for each calculation node, and step S4, constructing a parent population P by adopting a binary tournament selection method. And step S5, forming a child population Q. And step S6, performing mutation operation on the individuals in the offspring population Q. Step S7, decoding each individual in the child population Q into a corresponding neural network to obtain a structure weight; and step S8, merging the parent population P and the child population Q into a population R, selecting a plurality of individuals as the original population of the next generation by adopting an environment selection method, and feeding back to the step S4 until a preset maximum evolution generation number is reached. And after the evolution is finished, outputting the individual with the highest fitness value as an optimal neural network architecture.

Description

Improved search method of evolutionary neural network architecture based on hyper-network
Technical Field
The invention relates to the technical field of image classification model construction, in particular to an improved search method of an evolutionary neural network architecture based on a hyper-network.
Background
An image classification (image classification) task is an image processing technique that distinguishes objects of different categories based on different feature information reflected in a picture. Since many models applied to the image classification task can be migrated as a feature extraction network to other computer vision fields, the image classification task is a basic task in the computer vision field, and the design of the image classification model is a hot spot of attention of researchers. However, the artificial design of the neural network model requires experienced experts, and the neural network model with excellent performance can be designed through careful study and repeated experiments on the distribution and characteristics of the data set. Therefore, a huge amount of time and labor cost are required.
Currently, Neural network Architecture Search algorithms (NAS) are attracting a wide range of attention of researchers. Such algorithms enable an efficient neural network architecture to be automatically designed based on a given data set without much expertise. Since NAS algorithms typically require continuous evaluation of neural network models in the search space, a great deal of computer effort is required. In order to improve the search efficiency of the NAS algorithm, there are two main methods:
the first approach is to construct an End-to-End Performance Predictor. This approach requires a coding method that uniquely maps the neural network architecture into a set of digital decision variables. The coding of the neural network architecture and its performance (e.g., accuracy of classification) are then formed into a data pair that is used as input to a performance predictor, which is trained. After the performance predictor is trained, the performance of the neural network model in the search space can be directly predicted without training the neural network model, and the search efficiency is further improved. However, this approach follows a training-then-prediction approach, requiring the performance predictor to be trained first using a set of training samples. In general, the more samples trained, the better the performance of the predictor. However, collecting more training samples means consuming more computing resources, and thus has a certain impact on search efficiency. Therefore, in practical use, a neural network architecture which is more effective by using an incremental strategy needs to be sampled, and certain calculation cost is needed.
The second method is a Neural network Architecture Search method (One-shot Neural Architecture Search) based on the super network. The method needs to train a hyper-network (One-shot model) as a search space; then randomly sampling a certain number of sub-networks from the super-network for performance evaluation, and ranking the sub-networks according to the performance of the sub-networks; finally, the sub-network with the best performance evaluation is taken as the output of the algorithm. Because the sub-network can relay the bearing value from the super-network and can evaluate without training, the search efficiency of the NAS algorithm can be effectively improved. However, the existing neural network architecture search algorithm based on the super network has certain defects. Firstly, the training of nodes inside the super-network is unbalanced, which causes inaccurate performance ranking in the sub-network evaluation stage, and further causes the algorithm not to find a network architecture with the best performance. Secondly, when the super network is trained, mutual interference among different sub networks may cause instability of a neural network architecture search algorithm based on the super network, the super network convergence speed is slow, and even convergence is impossible, so that the performance prediction result of the sub model is poor.
Disclosure of Invention
Aiming at the defects that the neural network architecture searching method based on the super network in the prior art is unstable in performance, the super network training convergence speed is low and even the convergence cannot be realized, the invention provides the neural network architecture searching method based on the super network, and the neural network architecture is automatically generated based on the super network by using an evolutionary algorithm as a searching strategy so as to improve the classification accuracy of an image classification task.
In order to solve the technical problems, the invention adopts the technical scheme that:
an improved search method of an evolutionary neural network architecture based on a hyper-network is characterized by comprising the following steps:
step S1, packaging five calculation modules by taking the input layer as the first layer; m computing nodes are packaged in each module, and finally, a full connection layer is used as an output layer of the neural network; and M is a natural number greater than 1.
Step S2, coding the neural network structure by a mixed coding mode, and binarizing the connection of the internal calculation nodes of the neural network; randomly generating N chromosomes to construct an initial population; and N is a natural number greater than 1.
Step S3, aiming at the individuals in the population, evenly sampling, training based on training data, learning a structure weight for each computing node, and adopting the classification precision of a verification set as a fitness function to evaluate the fitness of the individuals.
And step S4, constructing a parent population P by adopting a binary championship selection method.
Step S5, based on the given crossover rate pcAnd carrying out pairwise crossing on chromosome individuals in the parent population by adopting a mixed crossing method to obtain a plurality of new chromosomes to form a child population Q.
Step S6, based on the given variation rate pmAnd performing mutation operation on the individuals in the offspring population Q by adopting a mixed mutation method.
And step S7, decoding each individual in the offspring population Q into a corresponding neural network, obtaining a structure weight value in an inheritance or random initialization mode, and performing fitness evaluation on the individual by adopting the classification precision of the verification set as a fitness function.
Step S8, merging the parent population P and the child population Q into a population R, selecting several individuals as the original population of the next Generation by using an environment selection method, and feeding back to step S4 until reaching a predetermined maximum evolution number (Generation). And after the evolution is finished, outputting the individual with the highest fitness value as an optimal neural network architecture.
Further, in step S1, the input layer is sequentially composed of a convolutional layer, a ReLU activation function, and a Batch Normalization (BN) layer encapsulation.
Further, in step S1, the computing node is a computing unit in the neural network, and may be randomly selected from the operation search space θ. All the calculation node steps in the first calculation module, the third calculation module and the fifth calculation module are 1; the step length of all the computing nodes in the second computing module and the fourth computing module is 2.
Further, in step S2, the hybrid coding scheme is a combination of integer and binary. Describing the types of the computing nodes in the neural network architecture and the connection relation between the nodes by using integer coding; and binarizing the connection relation of the computing nodes in the neural network architecture by using binary numbers to describe whether the connection between the two computing nodes is activated or not. The method specifically comprises the following steps:
further, in the above step S21, a compute node is encoded as a quintuple
Figure BDA0002861999840000031
Wherein the content of the first and second substances,
Figure BDA0002861999840000032
represents a calculation unit a included in a calculation node i; i is1,I2Indexes of computing units representing connections of computing node I, i.e. computing node I and computing node I1,I2Are connected with each other;
Figure BDA0002861999840000033
I1,I2is a set of integers; j. the design is a square1,J2Representing a compute node I and a compute node I for a set of binary numbers1,I2The four states of the connection mode are specifically: j. the design is a square1=0,J 20 denotes a compute node I and a compute node I1,I2All the connections are in an activated state; then at this point, node I is computed1,I2After the feature maps of the outputs are fused, the fused feature maps are used as the inputs of the computing nodes i. The output δ of the computing node i is:
Figure BDA0002861999840000041
J1=0,J 21, denotes a compute node I and a compute node I1Is activated, calculates node i and countsCalculation node I2The connection of (2) is closed; then, at this time, the output δ of the computing node i is:
Figure BDA0002861999840000042
J1=1,J 20 denotes a compute node I and a compute node I1Is closed, computing node I and computing node I2Is activated; then, at this time, the output δ of the computing node i is:
Figure BDA0002861999840000043
J1=1,J 21, denotes a compute node I and a compute node I1,I2All the connections are in a closed state; i.e. the current compute node i is masked. Then at this point, node I is computed1,I2After the output feature graph is fused, the feature graph does not pass through a computing node
Figure BDA0002861999840000044
Processing is done directly as the output value δ of the compute node i:
δ=I1(xc)+I2(xd)
wherein x isc,xdAre respectively a computing node I1,I2Input of (1)1(xc),I2(xd) Are respectively a computing node I1,I2Output of (2)
Figure BDA0002861999840000045
Representing a computing node I1,I2Output characteristic diagram I1(xc),I2(xd) Fusion, as input to compute node i, by a compute unit
Figure BDA0002861999840000046
After processing, as the output of compute node i.
Further, in step S22, the computing module includes M computing nodes. Then the coding structure of a computing module at this time is:
Figure BDA0002861999840000047
in step S23, the chromosome is a neural network architecture, and each neural network architecture includes five computing modules. Then, at this time, the coding structure of a neural network architecture is:
Figure BDA0002861999840000048
further, in step S3, the individuals in the population are uniformly sampled, training is performed based on training data, a structure weight is generated for each computing node, and fitness evaluation is performed on the individuals by using the classification accuracy of the verification set as a fitness function. The method specifically comprises the following steps:
further, in step S31, the predetermined training data set is divided into B batches (batch) on average according to the size of the batch data (batch size). And B is a natural number larger than N. In each batch, randomly selecting an individual from the parent population P, decoding the individual into a corresponding neural network, and training until a maximum training batch B is reached.
Further, at the above step S32, the fitness value fitness of each individual in the parent population is evaluated. The method comprises the following steps of adopting the classification accuracy of the pictures in the verification data set as a fitness function to evaluate the fitness, wherein the expression is as follows:
Figure BDA0002861999840000051
wherein G is the number of pictures with correct model identification, and H is the total number of pictures in the verification set.
Further, in the step S4, the binary tournament selection method may include the steps of:
and step S41, randomly selecting two individuals from the original population, reserving the individual with higher fitness value to the parent population P according to the fitness value, and returning the individual with lower fitness value to the original population.
And step S42, repeating step S41 until the number of individuals contained in the parent population P reaches a preset number of individuals K, wherein K is a natural number more than 1.
Further, in the above step S5, based on the given crossover rate pcCarrying out pairwise crossing on chromosome individuals in the parent population P by using a mixed crossing method to obtain a plurality of chromosome individuals, and specifically comprising the following steps:
step S51, the integer part and the binary part of each chromosome are split into an integer chromosome part and a binary chromosome part.
Step S52, in the interval [0, 1]]Randomly generating a random number r, and randomly selecting two individuals P from the parent population P1And p2Determining the two individuals p by using the random number r1And p2Whether to perform a crossover operation.
Step S53, if r is less than or equal to pmAligning the left sides of the integer chromosome parts of the two chromosomes to carry out single-point crossing, namely randomly setting a crossing point in the two integer chromosomes, and exchanging genes after the crossing point, wherein the crossing point of the two integer chromosomes is at the same position; the left sides of the binary chromosome parts of the two chromosomes are aligned for multi-point crossing, i.e. several crossing points are randomly selected in the two binary chromosomes, and the genes at the crossing points are exchanged, and the crossing points of the two binary numbers should be at the same position. Combining two individuals q resulting from said hybrid crossover method1And q is2And storing the obtained product into a filial generation population Q.
Step S54, if r is larger than or equal to pmThe two individuals p selected in step S521And p2And storing the obtained product into a filial generation population Q.
Further, in the above step S6, the variation rate p is determined based on the predetermined variation ratemBy mixed variationMutation operations are performed on individuals in the offspring population Q. The method comprises the following specific steps:
step S61, the integer part and the binary part of each chromosome are split into an integer chromosome part and a binary chromosome part.
Step S62, randomly generating a random number t corresponding to any chromosome individual in the interval [0, 1] for any gene position in any chromosome individual, and using the random number to determine whether the gene position of the individual is mutated.
Step S63, if t is less than or equal to pmThen a polynomial mutation operation is performed on the integer chromosome portion of the chromosome.
Figure BDA0002861999840000061
Wherein, aiA 'represents the gene at the i-th gene position in the chromosome'iThe expression is based on said gene aiThe novel gene so produced; u is in the interval [0, 1]]A random number generated in (a);
Figure BDA0002861999840000062
respectively represent the genes aiUpper and lower bounds of variation.
Step S64, if t is less than or equal to pmThen, a flip mutation operation is performed on the binary chromosome part of the chromosome, that is, a plurality of mutation points are randomly selected in the chromosome, and the mutation is performed on the gene locus corresponding to each mutation point, wherein the gene locus with 0 is mutated into 1, and the gene locus with 1 is mutated into 0.
In step S7, each individual in the child population obtains a structure weight value by means of inheritance or random initialization, specifically: for any chromosome individual in the offspring population Q, if any calculation node in the chromosome individual is obtained by the hybrid crossover method of the step S5, inheriting the weight from the corresponding calculation node in the parent generation chromosome individual; if the hybrid mutation method in step S6 is used, the weight of the computing node is generated by random initialization.
In step S8, the parent population P and the child population Q are combined into a population R, and a plurality of individuals are selected as the original population of the next generation by an environment selection method, which specifically comprises the steps of:
and step S81, sorting the individuals in the population R according to the fitness value in the sequence from high fitness value to low fitness value.
And step S82, selecting individuals ranked from No. 1 to No. N in the population R according to the preset population scale N as the next generation population.
Compared with the prior art, the invention has the following beneficial effects:
(1) the method comprises the steps of coding a hyper-network by using a hybrid coding mode, describing the type and the connection relation of a calculation node inside a neural network architecture by using integer coding, and binarizing the connection relation of the calculation node inside the neural network architecture by using binary coding; the design has the advantages that different parts of the chromosomes can be randomly selected to be crossed in the population evolution process, the global search and the local search of the search space can be realized simultaneously, specifically, the single-point crossing operation is to generate a new neural network architecture by exchanging the internal computing nodes of two individuals, and the global exploration of the search space is realized. The multipoint intersection operation is only to exchange the binarization information of the neural network connection, and is to change the flow direction of the data stream in the single neural network to generate a new neural network architecture, thereby realizing the local exploration of the search space.
(2) Based on the mixed coding mode, different parts of the chromosome can be randomly selected to carry out mutation operation in the evolution process of the population, and calculation nodes which do not belong to the super network are introduced by a polynomial mutation method, and the weights of the calculation nodes are randomly initialized; the design has the advantage that the problem that the convergence is difficult after the hyper-network training to the later stage due to the deep coupling relation formed by calculating the node weights in the hyper-network training by the conventional method can be solved. The introduced computing nodes which do not belong to the super network can be merged into the super network along with the population evolution process, and because the weights of the computing nodes which do not belong to the super network are randomly given, the deep coupling relation in the super network training can be reduced, the algorithm can be helped to jump out of a local optimal solution, and the difficulty in convergence of the super network training can be avoided.
Based on the beneficial effects (1) and (2), the method provided by the invention can solve the problem that the super network is difficult to train to converge, and based on the problem, compared with the existing method, the method provided by the invention can realize the neural network architecture search based on a large-scale data set (for example, ImageNet).
Drawings
Fig. 1 is an overall architecture of the neural network of the present invention.
FIG. 2 is a flow chart of the algorithm of the present invention.
FIG. 3 is a flow chart of chromosome generation creation according to the present invention.
FIG. 4 is a schematic diagram of the hybrid cross method and hybrid mutation method of the present invention.
FIG. 5 is a process of neural network architecture optimization based on ImageNet classification task according to the present invention.
FIG. 6 is a training process of the neural network architecture searched by the present invention, based on ImageNet classification task.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1 to fig. 3, the present embodiment provides an improved searching method for an evolutionary neural network architecture based on a super network, which mainly includes the following steps:
the method comprises the following steps that firstly, an input layer is used as a first layer, and five calculation modules are packaged; m computing nodes are packaged in each module, and finally, a full connection layer is used as an output layer of the neural network; in this embodiment, each computing module is configured to include 9 computing nodes, that is, M is 9; the input layer is formed by packaging a convolution layer, a ReLU activation function and a Batch Normalization (BN) layer in sequence; the computing nodes are computing units in the neural network and can be randomly selected from the operation search space theta. All the calculation node steps in the first calculation module, the third calculation module and the fifth calculation module are 1; the step length of all the computing nodes in the second computing module and the fourth computing module is 2.
Secondly, coding the neural network structure in a mixed coding mode, and binarizing the connection of internal computing nodes of the neural network; randomly generating N chromosomes to construct an initial population; in this example, the initial population includes 40 chromosomes, i.e., N-40.
In this embodiment, a hybrid coding method is adopted to randomly generate an initial population to realize population initialization, each individual in the initial population represents a neural network architecture corresponding to the individual, and a connection method of internal computing nodes is binarized at the same time. Each computing node represents a computing unit of the neural network, and the coding information of the computing unit is shown in table 1. In the process of gene coding, the computing units are randomly coded into the overall neural network architecture to form a chromosome, namely the final neural network architecture is formed.
TABLE 1 coding information of neural network computing units
Figure BDA0002861999840000081
Figure BDA0002861999840000091
The specific coding mode is as follows:
the mixed coding mode is a coding mode combining integer and binary number. Describing the types of the computing nodes in the neural network architecture and the connection relation between the nodes by using integer coding; and binarizing the connection relation of the computing nodes in the neural network architecture by using binary numbers to describe whether the connection between the two computing nodes is activated or not. The method specifically comprises the following steps:
in step S21, a compute node is encoded as a five tuple
Figure BDA0002861999840000092
Wherein the content of the first and second substances,
Figure BDA0002861999840000093
represents a calculation unit a included in a calculation node i; i is1,I2Indexes of computing units representing connections of computing node I, i.e. computing node I and computing node I1,I2Are connected with each other;
Figure BDA0002861999840000094
I1,I2is a set of integers; j. the design is a square1,J2Representing a compute node I and a compute node I for a set of binary numbers1,I2The four states of the connection mode are specifically:
J1=0,J 20 denotes a compute node I and a compute node I1,I2All the connections are in an activated state; then at this point, node I is computed1,I2After the feature maps of the outputs are fused, the fused feature maps are used as the inputs of the computing nodes i. The output δ of the computing node i is:
Figure BDA0002861999840000095
J1=0,J 21, denotes a compute node I and a compute node I1Is activated, computing node I and computing node I2The connection of (2) is closed; then, at this time, the output δ of the computing node i is:
Figure BDA0002861999840000096
J1=1,J 20 denotes a compute node I and a compute node I1Is closed, computing node I and computing node I2Is activated; then, at this time, the output δ of the computing node i is:
Figure BDA0002861999840000097
J1=1,J 21, denotes a compute node I and a compute node I1,I2All the connections are in a closed state; i.e. the current compute node i is masked. Then at this point, node I is computed1,I2After the output feature graph is fused, the feature graph does not pass through a computing node
Figure BDA0002861999840000101
Processing is done directly as the output value δ of the compute node i:
δ=I1(xc)+I2(xd)
wherein x isc,xdAre respectively a computing node I1,I2Input of (1)1(xc),I2(xd) Are respectively a computing node I1,I2Output of (2)
Figure BDA0002861999840000102
Representing a computing node I1,I2Output characteristic diagram I1(xc),I2(xd) Fusion, as input to compute node i, by a compute unit
Figure BDA0002861999840000103
After processing, as the output of compute node i.
In step S22, the computing module includes M computing nodes. Then the coding structure of a computing module at this time is:
Figure BDA0002861999840000104
in step S23, the chromosome is a neural network architecture, and each neural network architecture includes five computing modules. Then, at this time, the coding structure of a neural network architecture is:
Figure BDA0002861999840000105
and thirdly, uniformly sampling the individuals in the population, training based on training data, learning a structure weight for each computing node, and evaluating the fitness of the individuals by adopting the classification precision of a verification set as a fitness function. The method specifically comprises the following steps:
in step S31, the predetermined training data set is divided into B batches (batch) on average according to the size of the batch data (batch size). And B is a natural number larger than N. In each batch, randomly selecting an individual from the parent population P, decoding the individual into a corresponding neural network, and training until a maximum training batch B is reached. Based on this, in the present embodiment, the batch size is set to 256.
At step S32, the fitness value fitness of each individual in the parent population is evaluated. The method comprises the following steps of adopting the classification accuracy of the pictures in the verification data set as a fitness function to evaluate the fitness, wherein the expression is as follows:
Figure BDA0002861999840000106
wherein G is the number of pictures with correct model identification, and H is the total number of pictures in the verification set.
And fourthly, constructing a parent population P by adopting a binary championship selection method. The method specifically comprises the following steps:
and step S41, randomly selecting two individuals from the original population, reserving the individual with higher fitness value to the parent population P according to the fitness value, and returning the individual with lower fitness value to the original population.
In step S42, step S41 is repeated until the number of individuals included in the parent population P reaches a preset number of individuals K, which is 40 in the present embodiment.
A fifth step of determining the intersection rate p based on the given intersection ratecAnd carrying out pairwise crossing on chromosome individuals in the parent population by adopting a mixed crossing method to obtain a plurality of new chromosomes to form a child population Q. In this example, pcThe hybrid crossover method is shown in fig. 4 as 0.95, and comprises the following specific steps:
step S51, the integer part and the binary part of each chromosome are split into an integer chromosome part and a binary chromosome part.
Step S52, in the interval [0, 1]]Randomly generating a random number r, and randomly selecting two individuals P from the parent population P1And p2Determining the two individuals p by using the random number r1And p2Whether to perform a crossover operation.
Step S53, if r is less than or equal to pmAligning the left sides of the integer chromosome parts of the two chromosomes to carry out single-point crossing, namely randomly setting a crossing point in the two integer chromosomes, and exchanging genes after the crossing point, wherein the crossing point of the two integer chromosomes is at the same position; the left sides of the binary chromosome parts of the two chromosomes are aligned for multi-point crossing, i.e. several crossing points are randomly selected in the two binary chromosomes, and the genes at the crossing points are exchanged, and the crossing points of the two binary numbers should be at the same position. Combining two individuals q resulting from said hybrid crossover method1And q is2And storing the obtained product into a filial generation population Q.
Step S54, if r is larger than or equal to pmThe two individuals p selected in step S521And p2And storing the obtained product into a filial generation population Q.
A sixth step of determining the variation rate p based on the given variation ratemAnd performing mutation operation on the individuals in the offspring population Q by adopting a mixed mutation method. In this example pmThe mixed variation method is shown in fig. 5 as 0.1, and comprises the following specific steps:
step S61, the integer part and the binary part of each chromosome are split into an integer chromosome part and a binary chromosome part.
Step S62, randomly generating a random number t corresponding to any chromosome individual in the interval [0, 1] for any gene position in any chromosome individual, and using the random number to determine whether the gene position of the individual is mutated.
Step S63, if t is less than or equal to pmThen a polynomial mutation operation is performed on the integer chromosome portion of the chromosome.
Figure BDA0002861999840000121
Wherein, aiA 'represents the gene at the i-th gene position in the chromosome'iThe expression is based on said gene aiThe novel gene so produced; u is in the interval [0, 1]]A random number generated in (a);
Figure BDA0002861999840000122
respectively represent the genes aiUpper and lower bounds of variation.
Step S64, if t is less than or equal to pmThen, a flip mutation operation is performed on the binary chromosome part of the chromosome, that is, a plurality of mutation points are randomly selected in the chromosome, and the mutation is performed on the gene locus corresponding to each mutation point, wherein the gene locus with 0 is mutated into 1, and the gene locus with 1 is mutated into 0.
And seventhly, decoding each individual in the offspring population Q into a corresponding neural network, obtaining a structure weight value in a succession or random initialization mode, and performing fitness evaluation on the individual by adopting the classification precision of the verification set as a fitness function.
Each individual in the offspring population obtains a structure weight value through inheritance or random initialization, and the method specifically comprises the following steps: for any chromosome individual in the offspring population Q, if any calculation node in the chromosome individual is obtained by the hybrid crossover method of the step S5, inheriting the weight from the corresponding calculation node in the parent generation chromosome individual; if the hybrid mutation method in step S6 is used, the weight of the computing node is generated by random initialization.
And eighthly, combining the parent population P and the child population Q into a population R, selecting a plurality of individuals as the original population of the next generation by adopting an environment selection method, and feeding back to the step S4 until a preset maximum evolution generation number is reached. And after the evolution is finished, outputting the individual with the highest fitness value as an optimal neural network architecture.
And step S81, sorting the individuals in the population R according to the fitness value in the sequence from high fitness value to low fitness value.
And step S82, selecting individuals ranked from No. 1 to No. N in the population R according to the preset population scale N as the next generation population.
To verify the advantages of the present invention, the following comparisons were made:
the dataset used by the invention is ImageNet. ImageNet is a large visual data set used for visual object recognition studies. The image classification method comprises more than 1400 million images, and is divided into a training set, a verification set and a test set, and the training set, the verification set and the test set comprise 20000 categories.
The algorithm hyper-parameter design used by the invention is as follows:
the initial channel number C is 32, and the maximum evolution Generation is 100. The SGD optimizer parameters are initialized. The method comprises the following steps: the initial learning rate lr is 0.1, the weight attenuation coefficient w is 0.0003, and the momentum (momentum) coefficient m is 0.9.
And after the algorithm iteration is finished, outputting the individual with the optimal fitness value. The individual is decoded into the corresponding neural network architecture EvoNet. Network structure parameters are reinitialized, and the neural network architecture is trained using the training data set until convergence. The test data set is then used to test the performance of the neural network architecture.
In the invention, the optimization process and the final individual test process based on ImageNet are respectively shown in FIGS. 5 and 6, and it can be seen that the invention obtains higher prediction classification accuracy during searching, and the classification accuracy of top1 is as follows: 77.4 percent.
The comparison result of the performance of the neural network architecture searched by the method, the existing artificially designed neural network architecture and the neural network architecture searching algorithm is shown in table 2. From table 2, it can be found that the neural network architecture searched by the present invention has better performance than the existing artificially designed neural network architecture and neural network architecture search algorithm.
Table 2 comparison of experimental results
Figure BDA0002861999840000131

Claims (10)

1. An improved search method of an evolutionary neural network architecture based on a hyper-network is characterized by comprising the following steps:
step S1, packaging five calculation modules by taking the input layer as the first layer; m computing nodes are packaged in each module, and finally, a full connection layer is used as an output layer of the neural network; m is a natural number greater than 1;
step S2, coding the neural network structure by a mixed coding mode, and binarizing the connection of the internal calculation nodes of the neural network; randomly generating N chromosomes to construct an original population; the number of the computing nodes in any chromosome is less than the total number of the computing nodes of a preset chromosome; n is a natural number greater than 1;
step S3, uniformly sampling the individuals in the population, training based on training data, learning a structure weight for each computing node, and performing fitness evaluation on the individuals by adopting the classification precision of a verification set as a fitness function;
step S4, constructing a parent population P by adopting a binary championship selection method;
step S5, based on the given crossover rate pcCarrying out pairwise crossing on chromosome individuals in the parent population by adopting a mixed crossing method to obtain a plurality of new chromosomes to form a child population Q;
step S6, based on the given variation rate pmPerforming mutation operation on individuals in the offspring population Q by adopting a mixed mutation method;
step S7, decoding each individual in the offspring population Q into a corresponding neural network, obtaining a structure weight value in an inheritance or random initialization mode, and adopting the classification precision of a verification set as a fitness function to evaluate the fitness of the individual;
step S8, merging the parent population P and the child population Q into a population R, selecting a plurality of individuals as the original population of the next generation by adopting an environment selection method, and feeding back to the step S4 until a preset maximum evolution generation is reached; and after the evolution is finished, outputting the individual with the highest fitness value as an optimal neural network architecture.
2. The improved searching method for evolutionary neural network architecture based on super network as claimed in claim 1, wherein the input layer is composed of convolutional layer, ReLU activation function and batch normalization layer encapsulation in sequence.
3. The improved searching method for neural network architecture based on super network evolution as claimed in claim 1, wherein in step S1, said computing node is a computing unit in the neural network, and can be randomly selected from the operation search space θ; all the calculation node steps in the first calculation module, the third calculation module and the fifth calculation module are 1; the step length of all the computing nodes in the second computing module and the fourth computing module is 2.
4. The improved searching method for neural network architecture based on super network as claimed in claim 1, wherein in step S2, said hybrid coding mode is a coding mode combining integer and binary number; describing the types of the computing nodes in the neural network architecture and the connection relation between the nodes by using integer coding; binarizing the connection relation of the computing nodes in the neural network architecture by using binary numbers to describe whether the connection between the two computing nodes is activated or not; the method specifically comprises the following steps:
in step S21, a compute node is encoded as a five tuple
Figure FDA0002861999830000021
Wherein the content of the first and second substances,
Figure FDA0002861999830000022
represents a calculation unit a included in a calculation node i; i is1,I2Indexes of computing units representing connections of computing node I, i.e. computing node I and computing node I1,I2Are connected with each other;
Figure FDA0002861999830000023
I1,I2is a set of integers; j. the design is a square1,J2Representing a compute node I and a compute node I for a set of binary numbers1,I2The four states of the connection mode are specifically: j. the design is a square1=0,J20 denotes a compute node I and a compute node I1,I2All the connections are in an activated state; then at this point, node I is computed1,I2After the output feature graphs are fused, the feature graphs are used as the input of a computing node i; the output δ of the computing node i is:
Figure FDA0002861999830000024
J1=0,J21, denotes a compute node I and a compute node I1Is activated, computing node I and computing node I2The connection of (2) is closed; then, at this time, the output δ of the computing node i is:
Figure FDA0002861999830000025
J1=1,J20 denotes a compute node I and a compute node I1Is closed, computing node I and computing node I2Is activated; then, at this time, the output δ of the computing node i is:
Figure FDA0002861999830000026
J1=1,J21, denotes a compute node I and a compute node I1,I2All the connections are in a closed state; namely, the current computing node i is shielded; then at this point, node I is computed1,I2After the output feature graph is fused, the feature graph does not pass through a computing node
Figure FDA0002861999830000027
Processing is done directly as the output value δ of the compute node i:
δ=I1(xc)+I2(xd)
wherein x isc,xdAre respectively a computing node I1,I2Input of (1)1(xc),I2(xd) Are respectively a computing node I1,I2Output of (2)
Figure FDA0002861999830000031
Representing a computing node I1,I2Output characteristic diagram I1(xc),I2(xd) Fusion, as input to compute node i, by a compute unit
Figure FDA0002861999830000032
After processing, the output of the computing node i is used;
step S22, the computing module comprises M computing nodes; then the coding structure of a computing module at this time is:
Figure FDA0002861999830000033
step S23, the chromosome is a neural network architecture, each neural network architecture comprises five calculation modules; then, at this time, the coding structure of a neural network architecture is:
Figure FDA0002861999830000034
5. the improved search method for an evolutionary neural network architecture based on a super network as claimed in claim 1, wherein in step S3, aiming at the individuals in the population, uniform sampling is performed, training is performed based on training data, a structure weight is generated for each computing node, and fitness evaluation is performed on the individuals by using the classification precision of the validation set as a fitness function; the method specifically comprises the following steps:
step S31, equally dividing a predetermined training data set into B batches (batch) according to the size of the given batch size; b is a natural number larger than N; in each batch, randomly selecting an individual from the parent population P, decoding the individual into a corresponding neural network for training until a maximum training batch B is reached;
step S32, evaluating the fitness value fitness of each individual in the parent population; the method comprises the following steps of adopting the classification accuracy of the pictures in the verification data set as a fitness function to evaluate the fitness, wherein the expression is as follows:
Figure FDA0002861999830000035
wherein G is the number of pictures with correct model identification, and H is the total number of pictures in the verification set.
6. The improved searching method for neural network architecture based on super networks as claimed in claim 1, wherein in step S4, for said binary tournament selection method, the steps are as follows:
step S41, randomly selecting two individuals from the original population, reserving the individual with higher fitness value to the parent population P according to the fitness value, and putting the individual with lower fitness value back to the original population;
and step S42, repeating step S41 until the number of individuals contained in the parent population P reaches a preset number of individuals K, wherein K is a natural number more than 1.
7. The improved searching method for evolutionary neural network architecture based on super network as claimed in claim 1, wherein in step S5, based on a given crossover rate pcCarrying out pairwise crossing on chromosome individuals in the parent population P by using a mixed crossing method to obtain a plurality of chromosome individualsThe method comprises the following steps:
step S51, splitting the integer part and the binary number part of each chromosome into an integer chromosome part and a binary number chromosome part;
step S52, in the interval [0, 1]]Randomly generating a random number r, and randomly selecting two individuals P from the parent population P1And p2Determining the two individuals p by using the random number r1And p2Whether to perform a crossover operation;
step S53, if r is less than or equal to pmAligning the left sides of the integer chromosome parts of the two chromosomes to carry out single-point crossing, namely randomly setting a crossing point in the two integer chromosomes, and exchanging genes after the crossing point, wherein the crossing point of the two integer chromosomes is at the same position; aligning the left sides of the binary chromosome parts of the two chromosomes to carry out multi-point crossing, namely randomly selecting a plurality of cross points in the two binary chromosomes, and exchanging genes at the cross points, wherein the cross points of the two binary numbers are in the same position; combining two individuals q resulting from said hybrid crossover method1And q is2Storing the obtained product in a filial generation population Q;
step S54, if r is larger than or equal to pmThe two individuals p selected in step S521And p2And storing the obtained product into a filial generation population Q.
8. The improved searching method for neural network architecture based on super-networks as claimed in claim 1, wherein in step S6, based on a given variation rate pmPerforming mutation operation on individuals in the offspring population Q by using a mixed mutation method; the method comprises the following specific steps:
step S61, splitting the integer part and the binary number part of each chromosome into an integer chromosome part and a binary number chromosome part;
step S62, randomly generating a random number t corresponding to any chromosome individual in the interval [0, 1] aiming at any gene position in any chromosome individual, and determining whether the gene position of the individual is subjected to mutation operation by using the random number;
step S63, if t is less than or equal to pmPerforming a polynomial mutation operation on the integer chromosome portion of the chromosome;
Figure FDA0002861999830000041
wherein, aiA 'represents the gene at the i-th gene position in the chromosome'iThe expression is based on said gene aiThe novel gene so produced; u is in the interval [0, 1]]A random number generated in (a);
Figure FDA0002861999830000051
respectively represent the genes aiUpper and lower bounds of variation;
step S64, if t is less than or equal to pmThen, a flip mutation operation is performed on the binary chromosome part of the chromosome, that is, a plurality of mutation points are randomly selected in the chromosome, and the mutation is performed on the gene locus corresponding to each mutation point, wherein the gene locus with 0 is mutated into 1, and the gene locus with 1 is mutated into 0.
9. The improved searching method for the neural network architecture based on the evolution of the super network as claimed in claim 1, wherein in step S7, each individual in the offspring population obtains the structure weight through inheritance or random initialization, specifically: for any chromosome individual in the offspring population Q, if any calculation node in the chromosome individual is obtained by the hybrid crossover method of the step S5, inheriting the weight from the corresponding calculation node in the parent generation chromosome individual; if the hybrid mutation method in step S6 is used, the weight of the computing node is generated by random initialization.
10. The improved searching method for the neural network architecture based on the evolution of the super network as claimed in claim 1, wherein in step S8, the parent population P and the child population Q are combined into a population R, and a plurality of individuals are selected as the original population of the next generation by using the environment selection method, which comprises the following specific steps:
step S81, according to the fitness value, sorting the individuals in the population R according to the sequence of the fitness value from high to low;
and step S82, selecting individuals ranked from No. 1 to No. N in the population R according to the preset population scale N as the next generation population.
CN202011567363.5A 2020-12-26 2020-12-26 Improved search method of evolutionary neural network architecture based on hyper-network Pending CN112561039A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011567363.5A CN112561039A (en) 2020-12-26 2020-12-26 Improved search method of evolutionary neural network architecture based on hyper-network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011567363.5A CN112561039A (en) 2020-12-26 2020-12-26 Improved search method of evolutionary neural network architecture based on hyper-network

Publications (1)

Publication Number Publication Date
CN112561039A true CN112561039A (en) 2021-03-26

Family

ID=75033047

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011567363.5A Pending CN112561039A (en) 2020-12-26 2020-12-26 Improved search method of evolutionary neural network architecture based on hyper-network

Country Status (1)

Country Link
CN (1) CN112561039A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113128432A (en) * 2021-04-25 2021-07-16 四川大学 Multi-task neural network architecture searching method based on evolutionary computation
CN113537399A (en) * 2021-08-11 2021-10-22 西安电子科技大学 Polarized SAR image classification method and system of multi-target evolutionary graph convolution neural network
CN113642730A (en) * 2021-08-30 2021-11-12 Oppo广东移动通信有限公司 Convolutional network pruning method and device and electronic equipment
CN114943866A (en) * 2022-06-17 2022-08-26 之江实验室 Image classification method based on evolutionary neural network structure search
CN114997360A (en) * 2022-05-18 2022-09-02 四川大学 Evolution parameter optimization method, system and storage medium of neural architecture search algorithm
CN115359337A (en) * 2022-08-23 2022-11-18 四川大学 Searching method, system and application of pulse neural network for image recognition
WO2023124342A1 (en) * 2021-12-31 2023-07-06 江南大学 Low-cost automatic neural architecture search method for image classification

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113128432A (en) * 2021-04-25 2021-07-16 四川大学 Multi-task neural network architecture searching method based on evolutionary computation
CN113128432B (en) * 2021-04-25 2022-09-06 四川大学 Machine vision multitask neural network architecture searching method based on evolution calculation
CN113537399A (en) * 2021-08-11 2021-10-22 西安电子科技大学 Polarized SAR image classification method and system of multi-target evolutionary graph convolution neural network
CN113642730A (en) * 2021-08-30 2021-11-12 Oppo广东移动通信有限公司 Convolutional network pruning method and device and electronic equipment
WO2023124342A1 (en) * 2021-12-31 2023-07-06 江南大学 Low-cost automatic neural architecture search method for image classification
CN114997360A (en) * 2022-05-18 2022-09-02 四川大学 Evolution parameter optimization method, system and storage medium of neural architecture search algorithm
CN114997360B (en) * 2022-05-18 2024-01-19 四川大学 Evolution parameter optimization method, system and storage medium of neural architecture search algorithm
CN114943866A (en) * 2022-06-17 2022-08-26 之江实验室 Image classification method based on evolutionary neural network structure search
CN114943866B (en) * 2022-06-17 2024-04-02 之江实验室 Image classification method based on evolutionary neural network structure search
CN115359337A (en) * 2022-08-23 2022-11-18 四川大学 Searching method, system and application of pulse neural network for image recognition
CN115359337B (en) * 2022-08-23 2023-04-18 四川大学 Searching method, system and application of pulse neural network for image recognition

Similar Documents

Publication Publication Date Title
CN112561039A (en) Improved search method of evolutionary neural network architecture based on hyper-network
CN102413029B (en) Method for partitioning communities in complex dynamic network by virtue of multi-objective local search based on decomposition
CN111737535B (en) Network characterization learning method based on element structure and graph neural network
CN111275172B (en) Feedforward neural network structure searching method based on search space optimization
CN112465120A (en) Fast attention neural network architecture searching method based on evolution method
CN110232434A (en) A kind of neural network framework appraisal procedure based on attributed graph optimization
Gao et al. An improved clonal selection algorithm and its application to traveling salesman problems
Wen et al. Learning ensemble of decision trees through multifactorial genetic programming
CN113128432B (en) Machine vision multitask neural network architecture searching method based on evolution calculation
Pawar et al. Optimized ensembled machine learning model for IRIS plant classification
Bedboudi et al. An heterogeneous population-based genetic algorithm for data clustering
Pan et al. Neural architecture search based on evolutionary algorithms with fitness approximation
Chen et al. A new multiobjective evolutionary algorithm for community detection in dynamic complex networks
Broni-Bediako et al. Evolutionary NAS with gene expression programming of cellular encoding
CN114241267A (en) Structural entropy sampling-based multi-target architecture search osteoporosis image identification method
Wei et al. MOO-DNAS: Efficient neural network design via differentiable architecture search based on multi-objective optimization
CN116611504A (en) Neural architecture searching method based on evolution
Parsa et al. Multi-objective hyperparameter optimization for spiking neural network neuroevolution
Hu et al. Apenas: An asynchronous parallel evolution based multi-objective neural architecture search
Chen et al. MFENAS: multifactorial evolution for neural architecture search
CN115620046A (en) Multi-target neural architecture searching method based on semi-supervised performance predictor
Ma et al. Auto-ORVNet: Orientation-boosted volumetric neural architecture search for 3D shape classification
Xue et al. RARTS: an efficient first-order relaxed architecture search method
Zhang et al. A fast evolutionary knowledge transfer search for multiscale deep neural architecture
Dong et al. Conditionally tractable density estimation using neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination