CN108334949B - Image classifier construction method based on optimized deep convolutional neural network structure fast evolution - Google Patents

Image classifier construction method based on optimized deep convolutional neural network structure fast evolution Download PDF

Info

Publication number
CN108334949B
CN108334949B CN201810141306.7A CN201810141306A CN108334949B CN 108334949 B CN108334949 B CN 108334949B CN 201810141306 A CN201810141306 A CN 201810141306A CN 108334949 B CN108334949 B CN 108334949B
Authority
CN
China
Prior art keywords
cnn
classifier
chromosome
solution
chromosomes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810141306.7A
Other languages
Chinese (zh)
Other versions
CN108334949A (en
Inventor
陈晋音
林翔
熊晖
俞山青
宣琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201810141306.7A priority Critical patent/CN108334949B/en
Publication of CN108334949A publication Critical patent/CN108334949A/en
Application granted granted Critical
Publication of CN108334949B publication Critical patent/CN108334949B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Physiology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

A fast evolution method for optimizing a deep convolutional neural network structure comprises the following steps: 1) effectively constructing a nonlinear CNN network structure by using an evolutionary algorithm based on GNP, and carrying out variation on various hyper-parameters of the CNN structure to find an optimal CNN hyper-parameter combination; 2) in the evolution process, a multi-objective network structure evaluation method is designed, the classification accuracy and the complexity of a classifier are simultaneously used as optimization targets, and the aim is to effectively generate a CNN classifier with high classification accuracy and a simple structure; 3) an incremental training method is provided, and training of the child CNN structure is performed on the basis of the previous generation CNN structure. The invention can reduce the training times of the model and reduce the time complexity of the algorithm.

Description

Image classifier construction method based on optimized deep convolutional neural network structure fast evolution
Technical Field
The invention belongs to the field of image classification, and relates to a method for constructing an image classifier based on the rapid evolution of an optimized deep convolutional neural network structure.
Background
With the rapid development of science and technology, the big data age has come. Deep learning takes a Deep Neural Network (DNN) as a model, and achieves remarkable results in key fields of many human intelligence, such as image recognition, reinforcement learning, semantic analysis and the like. The Convolutional Neural Network (CNN) is a typical DNN structure, can effectively extract hidden layer features of an image and accurately classify the image, and is widely applied to the field of image recognition in recent years. In 1998, LeCun et al proposed a LeNet-5 convolutional network structure, which is considered to be a milestone in deep learning history. LeNet-5 can recognize handwritten digital images composed of 32 x 32 pixels, but LeNet-5 does not achieve significant results in image classification due to its simpler structure and lack of computational power at the time. In 2012, Alex et al have set up a deep learning algorithm AlexNet, and have utilized the deep learning technique greatly to reduce the error rate of ImageNet image classification, so that deep learning has received extensive attention. Later, network frameworks such as ResNet, DenseNet and GoogleNet are also proposed successively, and the algorithms greatly expand the depth of CNN by adding corresponding modules, so that the accuracy of deep learning on image classification is further improved.
Evolutionary computation, a classical method of optimizing parameters, has been used to optimize neural network structures very early. The original neural network does not use a gradient descent algorithm but optimizes the weight parameters using an evolutionary computation method. Evolutionary computation is a natural selection process, and a part of the existing neural network is crossed and mutated and recombined to obtain offspring with better mutation rate, and the offspring has good matching with the neural network. The evolutionary method for optimizing the network weight comprises a CMA-ES algorithm, a SANE algorithm and an ESP algorithm. CMA-ES is a technique for continuous optimization that captures the interactions between weights and works well in optimizing weights. SANE and ESP develop and combine parts of the network architecture into a fully functional network.
Evolutionary algorithms are now being used more to optimize the structure and hyper-parameters of neural networks. Masaniori et al propose CGP-CNN algorithm. They tried to automatically construct a CNN architecture for Cartesian Genetic Programming (CGP) -based image classification tasks, and they used powerful modules as node functions of CGP to evaluate the network structure. Fernando et al evolved a combinatorial model by a self-encoder to generate weights for the network (CPPN) output neural network, and then integrated the trained weights into the CPPN genome using lamac adaptation. Duforq et al propose an evolution DEep Networks (EDEN) algorithm that can effectively evolve a CNN classifier with a certain classification accuracy and a simpler structure. More importantly, the method can complete the whole evolution process only by running on a single GPU for 6-24 hours, and therefore the optimization efficiency of evolutionary computation is greatly improved. Audrey et al propose SES-CNN, which utilize a sexual reproduction method to accelerate evolution development during evolution, aiming at synthesizing more diversified and more generalized offspring networks in offspring by combining two parent networks, so that the generated offspring has more compact characteristic representation. lorenzo et al apply Particle Swarm Optimization (PSO) to hyper-parametric selection in CNN, and design a parallel computation method based on particle swarm optimization to simplify the running time of the evolutionary algorithm, with the purpose of achieving load balancing and executing concurrent tasks. Miikkulareen et al propose a CoDeepNEAT algorithm based on NEAT neural evolution technology, in which a blueprint is built up from modules, and an optimal network structure is found by reusing the modules. Shafiee et al introduced a probabilistic model in the optimization process that presented the genetic code and environmental conditions through a probability distribution. Zoph and the like combine reinforcement learning and a recurrent neural network to obtain a good system structure, and train 800 network classes on 800 GPUs to obtain an optimal solution. Real et al used a neuro-evolution approach, using a parallel system executing on 250 computers to optimize the CNN classifier for the image classification problem.
Disclosure of Invention
In order to overcome the defects of high time complexity and single evaluation index of a CNN model in the conventional evolutionary CNN structural algorithm, the invention provides a rapid evolutionary image classifier construction method based on an optimized deep convolutional neural network structure, which has low time complexity and reasonable evaluation index, wherein a non-linear CNN network structure is effectively constructed by using the evolutionary algorithm based on GNP, and various hyper-parameters of the CNN structure are mutated to find an optimal CNN hyper-parameter combination; in the evolution process, the algorithm designs a multi-target network structure evaluation method, which can effectively simplify the network structure and realize better classification effect of the network structure; finally, the algorithm also provides an incremental training concept, and the training of the child CNN structure is carried out on the basis of the previous generation CNN structure.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a fast evolution method of optimizing a deep convolutional neural network structure, the evolution method comprising the steps of:
1) CNN optimization method based on GNP
Using GNPs as the basis algorithm for the evolutionary process, which comprises the following operations: population initialization, excellent individual selection, crossover operation and mutation operation are carried out as follows:
1.1) in the initialization of the population, a network structure is used for representing an evolution population, one network structure is represented in a form of Photopype and Genotype, in the Photopype, graphs with different shapes represent different CNN modules, different paths represent different initialization chromosomes, in the initialization process, the structures of all chromosomes are randomly generated, the Genotype displays a specific coding mode of each chromosome, and the hyperparameters in the CNN modules are coded;
1.2) after the initialization of the population is completed, training the obtained CNN structure by using training data, testing the classification effect of the classifiers, and selecting the classifiers with better performance for crossing and mutation; based on the GNP algorithm, corresponding crossover and mutation strategies are designed to update the structure and the hyper-parameters of the chromosome; the process is as follows:
1.2.1) crossover is an operation of obtaining a new chromosome by exchanging partial structures of two chromosomes, and to perform crossover operation, two chromosomes are selected as crossover objects; selecting chromosome objects by adopting a competitive bidding selection method, selecting two chromosomes as parent chromosomes in a crossing process by adopting the competitive bidding selection method, randomly selecting crossing points in the two chromosomes respectively after the selection is finished, and modifying connecting paths of the two parent chromosomes at the crossing points in the original population network structure chart to realize the crossing operation of the chromosomes;
1.2.2) the mutation is realized by constructing a new chromosome through the hyper-parameters and the network structure of the mutated chromosome, firstly, a parent chromosome is selected by a competitive competition selection method, and after the parent chromosome is selected, two mutation strategies are designed for the current chromosome: structural variation and hyper-parameter variation, wherein the structural variation is used for changing the depth of the CNN classifier, and a CNN structure capable of effectively extracting image features is developed; the variation of the hyper-parameters is used for searching the optimal parameter combination of each module;
1.3) in the evolution process, the number of the sub-generations generated in the evolution process of each generation is controlled by setting the cross probability and the variation probability of a population, in the evolution process of any generation, firstly, the CNN structure of the sub-generations obtained through cross variation is trained to become image classifiers, then the sub-generations classifiers and the parent classifiers are combined, a multi-target evaluation method is set according to the structural complexity and the test accuracy of each classifier, and the classifier with better selectivity enters the CNN structure evolution of the next round;
2) multi-target network evaluation and optimization method thereof
And taking the classification accuracy and the structural complexity of the classifier as optimization targets, and evaluating one classifier by using a multi-objective optimization method, so that the optimal CNN classifier which accords with practical application can be finally generated by evolutionary computation.
Further, in the step 2), in the process of evolutionary computation, computing a fixness function value of a Pareto optimal solution set on a PF curve by using a density estimation method for reference, thereby determining a specific optimization index of each Pareto solution;
for any one solution x in MOP problemiTwo indicators are defined for the solution: i.e. irankAnd idistance,irankIndicating the dominance level of the solution, irankSmaller means higher dominance level, better corresponding solution; i.e. idistanceIndicating the crowding distance of this point, idistanceThe larger the coverage of the current point is, the closer the corresponding solution is to the optimal solution; for two solution vectors with different rank values, a solution with a lower rank value is selected as an optimal solution, and if the rank values of the two solutions are the same, the solution with the larger distance value is considered to be more suitable to be the optimal solution.
Still further, the evolution method further comprises the following steps:
3) incremental training method
CNNs with similar structures often have similar interlayer weights, because the extraction modes of the CNNs to the image features are similar, in the training process of filial generations, the trained interlayer weights of the CNN of the parent generation are used as the initial weight values of the CNN of the filial generations, and the CNN of the filial generations is trained on the basis of the weight parameters of the parent generation.
The invention has the beneficial effects that: aiming at the problems of high time complexity, single evaluation index of a CNN model and the like of the conventional evolutionary CNN structure algorithm, a fast evolutionary algorithm (GNP _ FEL) for optimizing the CNN structure is provided. The algorithm uses an evolutionary algorithm based on GNP to effectively construct a nonlinear CNN network structure, and varies various hyper-parameters of the CNN structure to find an optimal CNN hyper-parameter combination; in the evolution process, the algorithm designs a multi-target network structure evaluation method, which can effectively simplify the network structure and realize better classification effect of the network structure; finally, the algorithm also provides an incremental training concept, and the training of the child CNN structure is carried out on the basis of the previous generation CNN structure.
Drawings
Fig. 1 is a flow diagram of a fast evolutionary method (GNP _ FEL) to optimize a deep convolutional neural network structure.
Fig. 2 is a schematic diagram of a population initialization process.
FIG. 3 is a schematic diagram of a chromosome crossing process.
FIG. 4 is a schematic diagram of a chromosome mutation process.
Fig. 5 is a schematic diagram of a PF curve and a target vector.
FIG. 6 is epochiWith deltaiThe change curve of (2).
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 6, a fast evolution method for optimizing a deep convolutional neural network structure, the evolution method comprising the steps of:
1) CNN optimization method based on GNP
The first gene network codes (GNPs) were proposed by k. Unlike GA and GP, GNP uses a network of decision nodes and execution nodes to represent a chromosome, which allows more flexibility in the structural changes of chromosomes, and also allows efficient search of parameter spaces to speed up the convergence rate of genetic algorithms. The method comprises the following steps of using GNP as a basic algorithm of the evolution process, designing corresponding population initialization, crossing and variation strategies for the evolution process, aiming at optimizing a network structure and hyper-parameters of the CNN in the evolution process, and finally obtaining a high-performance CNN classifier, wherein the process comprises the following steps:
1.1) in the initialization of the population, a network structure is used for representing an evolved population by taking the networking idea of the GNP algorithm as reference. A network structure can be expressed as a photon type and a Gentype. In Phenotype, the graphs of different shapes represent different CNN modules, which represent the components of some existing classical network structures, including a common convolution module, a single-layer convolution module, a densenet module, a resnet module, a pooling module, and a full-connection module. The different paths represent different initialization chromosomes, which START from the START node and end at the OUTPUT node, and each chromosome is built by a plurality of different CNN modules. During initialization, the structure of all chromosomes is randomly generated. Genotype shows the specific coding pattern for each chromosome. Taking Chromosome 1 as an example, 1_1,2_1 and the like represent the numbers of modules forming the Chromosome, each number corresponds to a module in the photopype one by one, and the coding mode can effectively store the photopype structure of the Chromosome. Further, we also encode the hyper-parameters in these modules to achieve optimization of these parameters during the evolution process.
Compared with GA and GP, the construction method adopts a random walk strategy, realizes the nonlinear creation of the chromosome by taking a module as a unit, ensures the structural diversity of the initialized chromosome, and increases the possibility of evolving the optimal CNN structure.
1.2) after the initialization of the population is completed, training the obtained CNN structure by using training data, testing the classification effect of the classifiers, and selecting the classifiers with better performance for crossing and mutation. In the crossing and mutation process, in order to obtain a CNN network with better classification effect, the structure and the hyper-parameters of the original chromosome are changed. This is the network structure evolution process of CNN. Based on the GNP algorithm, corresponding crossover and mutation strategies are designed to update the structure and the hyperparameters of the chromosome.
1.3) crossover is an operation to obtain a new chromosome by exchanging the partial structures of two chromosomes. In the evolution process of this embodiment, the crossover operation is mainly used to realize the evolution of the CNN structure. To perform the crossover operation, two chromosomes are first selected as the crossover objects. In this embodiment, a competitive competition selection method is used to select chromosome objects. Through the competitive competition selection method, two chromosomes are selected as parent chromosomes in the crossing process and are marked as parent1 and parent 2. After the selection is finished, cross points are respectively and randomly selected from the two chromosomes and are marked as position1 and position2, and in the original population network structure diagram, the cross operation of the chromosomes can be realized by modifying the connection paths of parent1 and parent2 at the cross points.
1.4) mutation is realized by the hyper-parameters and the network structure of the mutated chromosome to construct a new chromosome. Mutation also requires the first selection of parent chromosomal parent through competitive selection. After Parent selection is completed, we have designed two variation strategies for the current chromosome: structural variation and hyper-parametric variation. The structural variation is based on the original chromosome structure, module addition, change or deletion is carried out, the depth of the CNN classifier can be changed, and the CNN structure capable of effectively extracting image features is developed. The hyper-parameter variation is an operation performed on a chromosome module, one module comprises a plurality of hyper-parameters such as filter size, channel depth, activation function and the like, and the hyper-parameter variation aims at finding an optimal parameter combination of each module.
In the evolution process, the number of sub-generations generated in the evolution process of each generation is controlled by setting the cross probability and the variation probability of a population. In the evolution process of any generation, firstly, training a child CNN structure obtained through cross variation to enable the child CNN structure to become image classifiers, then combining the child classifiers and parent classifiers, setting a multi-target evaluation method according to the structural complexity and the test accuracy of each classifier, and enabling the classifier with better selectivity to enter the next round of CNN structure evolution.
2) Multi-target network evaluation and optimization method thereof
The method takes the classification accuracy and the structure complexity of the classifier as optimization targets, and uses a multi-objective optimization Method (MOP) to evaluate one classifier, so that the optimal CNN classifier which accords with practical application can be finally generated by evolutionary computation.
The multi-objective optimization problem (MOP) can be described by the following equation:
F(x)=(f1(x),f2(x),...,fm(x))Ts.t.x∈Ω (1)
where Ω represents the value space of x, and F (x) is the value over the m-dimensional solution space in general, the target values in MOP are contradictory, which means that all targets cannot be minimized at any one point in the feasible solution space. The objective of the multi-objective optimization method is to find an optimal set of Pareto solutions among the solutions.
Several important definitions regarding multi-objective optimization are given below.
Definition 1(Pareto predominance): let x beA,xBE.omega are two feasible solutions of the multi-objective optimization problem, and are called xBIn contrast, xAIs Pareto dominant, if and only if
Figure GDA0002404218260000091
Notation xA>xBAlso known as xADominating xB
Definition 2(Pareto optimal solution): a solution x*E x is called Pareto optimal solution (or non-dominant solution) if and only if the following condition is met:
Figure GDA0002404218260000092
definition 3(Pareto optimal solution set): the Pareto optimal solution set is a set of all Pareto optimal solutions, which is defined as follows:
Figure GDA0002404218260000093
definition 4(Pareto frontier): pareto optimal solution set P*The curved surface formed by the target vectors corresponding to all the Pareto optimal solutions in the Pareto is called a Pareto front surface PF*
PF*={F(x)=(f1(x*),f2(x*),...,fm(x*))T|x*∈P*} (5)
In the MOP application, the PF is represented by a curve or a curved surface formed by a group of optimal Pareto solutions, and the corresponding Pareto optimal solutions can be found through a multi-objective optimization algorithm. After the PF curve is determined, the decision maker selects one solution from the Pareto solution sets as the optimal solution for output. Since MOP usually involves contradictory targets, the knee point at PF curve is often output as the optimal solution. Since the knee point is better able to balance the two target quantities than the other points and shows good performance in many applications.
2.2 in the process of evolution calculation, all solutions correspond to a fitness value, the quality of the solution is determined by the size of the fitness value, and the fitness value also plays a role in guiding the selection probability in the competitive bidding competition selection method. The patent refers to the field of 'electric digital data processing'.
For any one solution x in MOP problemiKalyanmoy et al, defines two indicators for this solution: i.e. irankAnd idistance。irankIndicating the dominance level of the solution, irankSmaller means higher degree of dominance, correspondingThe better the solution; i.e. idistanceIndicating the crowding distance of this point, idistanceThe larger the coverage of the current point is, the closer the corresponding solution is to the optimal. According to the two indexes, we define a fitness function ordering method in the evolution process:
for any two CNN classifiers xiAnd xj
fitnessi>fitnessj if(irank<jrank)or((irank=jrank)and(idistance>jdistance)) (6)
This means that for two solution vectors with different rank values, we tend to choose the solution with the lower rank value as the optimal solution. If the rank values of the two solutions are the same, we consider that the solution with the larger distance value is more suitable to be the optimal solution.
Further, the optimization method further comprises the following steps:
3) incremental training method
The first two sections describe GNP-based evolutionary methods and multi-objective optimization methods during evolution. The combination of the two methods can evolve a CNN classifier with better performance. However, the evolutionary algorithm needs to train and test the classification effect of each newly generated CNN structure, and can calculate the fitness value of the CNN structure. To reduce the time complexity of the evolutionary algorithm, we propose an incremental learning method in this section.
In the evolutionary algorithm, all the offspring chromosomes are obtained through parent crossing or mutation. Crossover is a partial combination of two parent chromosomes, and mutation is a fine-tuning of the parent chromosome structure. CNNs with similar structures tend to have similar inter-layer weights because they extract image features in a similar manner. In the training process of the offspring, the interlayer weight of the parent CNN after training can be used as the initial weight value of the offspring CNN, so that the offspring CNN is trained on the basis of the parent weight parameter, and the purpose of reducing the time complexity of the algorithm is achieved.
For one offspring chromosome CiIf it is operated by interleavingThen, its structure mainly comprises two parts: one part inherited from parent1 and denoted as Pi_1The other part is inherited from parent2 and is denoted as Pi_2I.e. by
Ci=Pi_1+Pi_2 (7)
If it is obtained by mutation operations, its structure is mainly composed of two parts: one part inherited from parent and denoted as Pi(ii) a Some are produced by self-mutation, denoted as MiI.e. by
Ci=Pi+Mi (8)
Definition 5 (degree of structural change): if for a offspring chromosome CiDefining the structural change degree of the child relative to the parent chromosome as
Figure GDA0002404218260000111
Wherein p isi_1Represents Pi_1The number of weight parameters p contained ini_2Represents Pi_2P represents PiM represents MiThe number of weight parameters contained in (1). As can be seen from equation (9), for the child chromosomes generated by crossing, the degree of structural change is related to the ratio of the weight parameters inherited from the two parents. In the offspring chromosome generated by the mutation, the more the hyperparameter of the mutation is, the more the structure of the offspring is changed; for the first created batch of chromosomes, the degree of structural change is 1.
Definition 6 (weight initialization): for chromosomes obtained through population initialization, all weight parameters are set to be random numbers with the mean value of 0 and the variance of 1 when a corresponding CNN network is established; for the offspring chromosomes obtained by the parent through cross variation, the structure of the inheritance part of the parent takes the parameters after the training of the parent as initial values, the weight parameters of the newly generated part are set as random numbers with the average value of 0 and the variance of 1.
Definition 7 (progeny training batch): for the offspring chromosome i, the training batch required by the chromosome is calculated through the structural change degree of the offspring chromosome i. The specific formula is as follows:
Figure GDA0002404218260000121
where min _ epoch represents the minimum training batch for a classifier, max _ epoch represents the maximum training batch for a classifier, equation (10) projects the training batches of children to a value between min _ epoch and max _ epoch, and epochiIs with deltaiCurve of function varying logarithmically. In practical application, a smaller structural change of the CNN classifier can cause larger influence on weight parameters of other layers, and the sensitivity of the CNN structure to small change in incremental learning can be increased by using a function with logarithmic property, so that classifiers trained by descendants have better classification effect.
And after determining the epoch of each descendant, performing data training of the epoch batches on the descendant CNN to obtain the descendant CNN classifier.
4) Examples of the applications
The convolutional neural network is used as an important branch of deep learning and is applied to aspects of image recognition, natural language processing, target detection and the like. The fast evolutionary method for optimizing the deep convolutional neural network structure is a technical improvement oriented to the field of image recognition.
The image recognition problem is intended to process, analyze and understand the contents of pictures by means of computer programs so that the computer can automatically recognize various different patterns of objects and objects from the pictures. Taking the CIFAR-10 data set as an example, this section will explain how the fast evolutionary method for optimizing the structure of the deep convolutional neural network is applied to the improvement of the image classifier.
The CIFAR-10 question collected 60000 color pictures of 10 different species including airplane, car, bird, cat, deer, dog, frog, horse, boat, truck. The picture size in CIFAR-10 is fixed and each picture contains only one kind of entity, and the pixels of the image are 32 x 32.
In order to effectively develop a CNN classifier which has higher classification accuracy on CIFAR-10 and simple structure. The GNP-FEL algorithm comprises the following specific steps:
creating an initialization chromosome.
And splicing different CNN modules by a random walk strategy to create primary chromosomes with different lengths and different module collocation. During initialization, we first need to set a range, denoted as [ m, n ], for the length of each chromosome during evolution, i.e., for each chromosome, the number of modules contained therein is at least m and at most n. The length of the chromosome is limited to prevent excessive complication of the CNN model and improve the efficiency of the evolutionary computation. If the structure of one chromosome is too simple or too complex, the PF curve is usually too long when multi-objective optimization is carried out, so that exploration of a non-optimal solution space is increased, the complexity of the evolutionary algorithm is increased, and the guidance of the evolutionary algorithm is reduced. For the CIFAR-10 dataset, the module length range of each chromosome is set to [5,12], and the number of chromosomes with different module lengths is kept as equal as possible, so as to balance the equilibrium of solution space exploration in the evolution process.
After the initial CNN chromosome population is created, the CNN structure corresponding to each chromosome needs to be trained to finally become CIFAR-10-oriented image classifiers, and the specific effect of the CNN classifier is evaluated according to the classification accuracy of each CNN classifier on a test set and the number of self weight parameters by using a multi-target evaluation method provided by Kalyanmoy and the like.
② cross mutation to create offspring chromosomes.
The purpose of crossover and mutation is to develop a CNN classifier with better classification effect on CIFAR-10 data sets. In the application, a CNN classifier with better performance in the current population is selected as a crossed and mutated parent chromosome by a competitive bidding selection method. The specific methods of intersection and mutation are consistent with the corresponding contents of section 1). In the experiment, the crossover probability is set to be 0.2, the mutation probability is set to be 0.1, and the chromosome structure of each filial generation generated after crossover and mutation is preserved until the total chromosome number of the filial generation is equal to that of the parent generation, and the crossover and mutation operations are stopped. For each child chromosome, the initial weight parameter part of its structure inherits to the parent, and the part is initialized randomly. After the initialization of the weight parameters is completed, the chromosomes are trained by the incremental training method in section 3), so that the training process is accelerated, and the time complexity of evolutionary computation is reduced.
And thirdly, selecting high-performance filial generations by using the multi-objective optimization method.
And when the offspring chromosomes are trained into image classifiers for CIFAR-10 one by one, merging the parent chromosomes and the offspring chromosomes, and selecting a part of chromosomes with higher performance from the parent chromosomes for subsequent evolution. The algorithm aims to evolve a CNN classifier with high classification accuracy and low structural complexity. PF curves of solution vectors corresponding to the chromosomes can be made according to the accuracy and the weight parameter number of each CNN classifier in the existing chromosome population, the performance of all CNN classifiers can be sorted from high to low by combining a multi-target evaluation method provided by Kalyanmoy and the like, and after the sorting is finished, the classifiers with high performance are selected for subsequent evolution.
And fourthly, outputting the optimal CNN classifier.
And repeating the step II and the step III until the iteration number of the evolution meets the condition, and stopping the iteration. And outputs the knee point on the last generation PF curve as the optimal solution.
Experiment and result analysis: to validate the algorithm, we tested the GNP-FEL algorithm on the CIFAR-10 dataset, the Fashinon-MNIST dataset, and the SVHN dataset. The experiment is mainly divided into three parts. In the first part, the evolutionary results of the GNP-FEL algorithm and the GNP-EL (compared with the GNP-FEL algorithm, an incremental training method is omitted) algorithm are compared, and a PF curve and an optimal CNN structure generated by the two algorithms are analyzed. The second part counts the running time of the GNP-EL algorithm and the GNP-FEL algorithm on three data sets, and determines the effectiveness of incremental training. And in the third part, the optimal CNN classifier generated by the GNP-FEL algorithm is compared with CNN classifiers generated by other algorithms, and the characteristics and advantages of the algorithm are displayed.
4.1) for the CIFAR-10 data set, the cross probability is set to be 0.2, the mutation probability is set to be 0.1, and the iteration number of the evolution process is set to be 50. When each CNN classifier is trained, the learning rate is set to be 0.001, and the learning rate attenuation coefficient is set to be 0.95. max _ epoch is 50 and min _ epoch is 25. In the GNP-EL algorithm, the optimal solution O1The error rate of (2) is 0.1550, the number of parameters of the CNN model is 438342, compared with the initial CNN classifier, the optimal solution is reduced by about 0.05 in error rate and by nearly half in parameter number; in the GNP-FEL algorithm, the optimal solution O2The error rate of (c) is 0.1170, the number of parameters of the CNN model is 613206, and compared with the initial CNN classifier, the optimal solution is reduced by about 0.08 in error rate and also reduced by nearly half in the number of parameters.
Further, the optimal solutions obtained by the two algorithms are compared to find O1Error rate ratio of (1)2Is 0.038 higher, and O2Than O1174864, there is a certain difference between the two. This distinction is due to the randomness of the evolutionary algorithm. But in general, O1And O2The difference in the error rate and the number of weight parameters is not very large, and the performance of the corresponding CNN classifiers are also relatively close, which can be regarded as two sub-optimal solutions close to the global optimal solution. This shows that for the CIFAR-10 dataset, the evolutionary effects of the GNP-EL algorithm and the GNP-FEL algorithm are equivalent, and the whole evolutionary algorithm gradually converges towards the optimal solution in the evolutionary process.
For the fast-MNIST dataset and the SVHN dataset, we set the generation to 40, and the other parameters are consistent with the parameter values in CIFAR-10.
Optimal solution O in FashionMNIST dataset using GNP-EL algorithm1The error rate of (2) is 0.0776, the number of parameters of the CNN model is 133474, compared with the initial CNN classifier, the optimal solution is reduced by about 0.008 on the error rate and is reduced by more than half on the number of parameters; in the GNP-FEL algorithm, the optimal solution O2Has an error rate of 0.0806 and a CNN model parameter number of 147126, and compared with the initial CNN classifier, the optimal solution is in errorThe rate was reduced by about 0.006, and the number of parameters was reduced by nearly two thirds. O is1Than O2The error rate of the CNN model is lower than 0.003, and the number of CNN model parameters of the CNN model is basically equal to that of the CNN model, which shows that O1And O2The performance of (A) is very close.
Optimal solution O in SVHN dataset using GNP-EL algorithm1The error rate of (2) is 0.0662, the number of parameters of the CNN model is 182610, compared with the initial CNN classifier, the error rate of the optimal solution is reduced by about 0.015, and the number of parameters is reduced by about 50000; in the GNP-FEL algorithm, the optimal solution O2The error rate of (2) is 0.0719, the number of parameters of the CNN model is 264441, compared with the initial CNN classifier, the optimal solution is reduced by about 0.070 in error rate, and is also reduced by about 50000 in parameter number.
4.2) FIG. 7 shows the time required for both algorithms to generate each generation of CNN classifier during evolution in the CIFAR-10 dataset. It can be seen from the figure that the time required for the GNP-FEL algorithm to generate on average a generation of CNN classifiers is only 0.6 times that of the GNP-EL algorithm. FIGS. 8 and 9 are runtime diagrams of the fast-MNIST dataset and the SVHN dataset in the GNP-EL algorithm and the GNP-FEL algorithm. It can be seen from the averaged curves in the figures that the GNP-FEL algorithm runs less than half the time that the GNP-EL algorithm runs in both data sets. In conjunction with the above analysis, we can conclude that: incremental learning is used in the evolutionary algorithm, so that the time complexity of the algorithm can be effectively reduced, and the stability of the output optimal solution is kept.
4.3) Table 1 shows the results of several algorithms on the CIFAR-10 dataset. Nas (neural Architecture search) is a model constructed based on a reinforcement learning method. VGG and ReNet are manually built CNN architectures. CGP-CNN and EDEN are two evolutionary algorithms for optimizing CNN structure in recent years.
Model (model) Error Rate (%) Number of parameters (× 10)5) Run time Number of GPUs
NAS 3.65 374 - 800
CGP-CNN 6.75 15.2 15.2 days 2
VGG 7.94 152 - -
ReNet 12.35 - - -
EDEN 25.50 1.73 1 day 1
GNP-EL 15.50 4.38 9.8 days 1
GNP-FEL 11.70 6.13 5.8 days 1
TABLE 1
As can be seen from table 1, although NAS and VGG have good error rates, the structures of the two models are very complex, a large number of weight parameters need to be trained, and the computation resources occupied by them are also quite large. The CGP-CNN evolved a CNN classifier with higher performance on the error rate and the number of weight parameters through an evolutionary algorithm, but the time required for the CGP-CNN to complete the evolutionary process under the configuration of two GPUs was 15.2 days. The CNN classifier derived by EDEN, although having few weighting parameters, has the highest error rate among these algorithms. However, although the GNP-EL algorithm and the GNP-FEL algorithm proposed in this embodiment do not reach the optimal values in terms of error rate and number of parameters, the optimal CNN structure developed by them reaches a good balance in terms of two indexes, namely, classification error rate and number of model weight parameters. In addition, in the embodiment, the time for running the GNP-EL algorithm once under one GPU is about 9.8 days, while the time for running the GNP-FEL algorithm once is about 5.8 days, which is greatly improved compared with the CGP-CNN.

Claims (2)

1. An image classifier construction method based on optimized deep convolutional neural network structure fast evolution is characterized by comprising the following steps:
60000 color pictures of 10 different types are collected by a CIFAR-10 data set, the size of the pictures in the CIFAR-10 is fixed, each picture only contains one type of entity, the pixel of the image is 32 multiplied by 32, image training data is formed, and a CNN classifier of the CIFAR-10 is effectively evolved;
the evolution method comprises the following steps:
1) CNN optimization method based on GNP
Using GNPs as the basis algorithm for the evolutionary process, which comprises the following operations: population initialization, excellent individual selection, crossover operation and mutation operation are carried out as follows:
1.1) in the initialization of the population, a network structure is used for representing an evolution population, one network structure is represented in a form of Photopype and Genotype, in the Photopype, graphs with different shapes represent different CNN modules, different paths represent different initialization chromosomes, in the initialization process, the structures of all chromosomes are randomly generated, the Genotype displays a specific coding mode of each chromosome, and the hyperparameters in the CNN modules are coded;
1.2) after the initialization of the population is completed, training the obtained CNN structure by using image training data, forming image classifiers for each chromosome, testing the classification effect of the classifiers, and selecting the classifiers with better performance for crossing and variation; based on the GNP algorithm, corresponding crossover and mutation strategies are designed to update the structure and the hyper-parameters of the chromosome; the process is as follows:
1.2.1) crossover is an operation of obtaining a new chromosome by exchanging partial structures of two chromosomes, and to perform crossover operation, two chromosomes are selected as crossover objects; selecting chromosome objects by adopting a competitive bidding selection method, selecting two chromosomes as parent chromosomes in a crossing process by adopting the competitive bidding selection method, randomly selecting crossing points in the two chromosomes respectively after the selection is finished, and modifying connecting paths of the two parent chromosomes at the crossing points in the original population network structure chart to realize the crossing operation of the chromosomes;
1.2.2) the mutation is realized by constructing a new chromosome through the hyper-parameters and the network structure of the mutated chromosome, firstly, a parent chromosome is selected by a competitive competition selection method, and after the parent chromosome is selected, two mutation strategies are designed for the current chromosome: structural variation and hyper-parameter variation, wherein the structural variation is used for changing the depth of the CNN classifier, and a CNN structure capable of effectively extracting image features is developed; the variation of the hyper-parameters is used for searching the optimal parameter combination of each module;
1.3) in the evolution process, the number of the sub-generations generated in the evolution process of each generation is controlled by setting the cross probability and the variation probability of the population, in the evolution process of any generation, firstly, the structure of the sub-generation CNN obtained by cross variation is trained to become an image classifier,
wherein the child CNN structure of the crossover operation comprises a child CNN structure inherited from parent1 and denoted as Pi_1Inherited from parent2 and denoted Pi_2
The child CNN structure resulting from mutation operation includes the child CNN structure inherited from parent1 and denoted as PiThe result of self-mutation is recorded as Mi
The structural change degree of the child CNN structure relative to the parent structure is:
Figure FDA0002884655810000021
wherein p isi_1Represents Pi_1The number of weight parameters p contained ini_2Represents Pi_2P represents PiM represents MiThe number of the weight parameters contained in the data;
initializing the weight, setting all weight parameters as random numbers with a mean value of 0 and a variance of 1;
the offspring training batch: calculating the training batch of the structure of the child CNN through the structure change degree, wherein the concrete formula is as follows:
Figure FDA0002884655810000031
where min _ epoch represents the minimum training batch for a classifier, max _ epoch represents the maximum training batch for a classifier, equation (2) projects the training batches of children to a value between min _ epoch and max _ epoch, and epochiIs with deltaiA function curve varying in logarithmic properties; after determining the epoch of each descendant, carrying out data training of the epoch batches on the descendant CNN to obtain a descendant CNN classifier;
combining the child CNN classifiers and parent CNN classifiers, setting a multi-target evaluation method according to the structural complexity and the test accuracy of each CNN classifier, and enabling the image classifier with better selectivity to enter the next round of CNN structural evolution;
2) multi-target network evaluation and optimization method thereof
And (3) taking the classification accuracy and the structural complexity of the CNN classifier as optimization targets, and evaluating the CNN classifier by using a multi-objective optimization method, so that the optimal CNN classifier which is in line with practical application can be finally generated by evolutionary computation.
2. The method for constructing the image classifier based on the optimized fast evolution of the deep convolutional neural network structure as claimed in claim 1, characterized in that:
in the step 2), in the process of evolutionary computation, computing a fixness function value of a Pareto optimal solution set on a PF curve by using a density estimation method for reference, thereby determining a specific optimization index of each Pareto solution;
for any one solution xi in the MOP problem, two indices are defined for that solution: irank and identity, wherein irank represents the dominant grade of the solution, and the smaller irank represents the higher dominant grade, the better the corresponding solution; the existence represents the crowding distance of the point, the larger the existence represents that the coverage of the current point is large, and the closer the corresponding solution is to the optimal solution; for two solution vectors with different rank values, the solution with the lower rank value is selected as the optimal solution, and if the rank values of the two solutions are the same, the solution with the larger distance value is considered to be more suitable to be the optimal solution.
CN201810141306.7A 2018-02-11 2018-02-11 Image classifier construction method based on optimized deep convolutional neural network structure fast evolution Active CN108334949B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810141306.7A CN108334949B (en) 2018-02-11 2018-02-11 Image classifier construction method based on optimized deep convolutional neural network structure fast evolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810141306.7A CN108334949B (en) 2018-02-11 2018-02-11 Image classifier construction method based on optimized deep convolutional neural network structure fast evolution

Publications (2)

Publication Number Publication Date
CN108334949A CN108334949A (en) 2018-07-27
CN108334949B true CN108334949B (en) 2021-04-13

Family

ID=62929347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810141306.7A Active CN108334949B (en) 2018-02-11 2018-02-11 Image classifier construction method based on optimized deep convolutional neural network structure fast evolution

Country Status (1)

Country Link
CN (1) CN108334949B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111105029B (en) * 2018-10-29 2024-04-16 北京地平线机器人技术研发有限公司 Neural network generation method, generation device and electronic equipment
CN111222902B (en) * 2018-11-27 2024-02-09 阿里巴巴集团控股有限公司 Advertisement putting method, device, system, computing equipment and storage medium
CN109783857A (en) * 2018-12-12 2019-05-21 珠海博雅科技有限公司 A kind of quick charge pump design method and device
CN109726761B (en) * 2018-12-29 2023-03-31 青岛海洋科学与技术国家实验室发展中心 CNN evolution method, CNN-based AUV cluster working method, CNN evolution device and CNN-based AUV cluster working device and storage medium
CN109784497B (en) * 2019-01-15 2020-12-25 探智立方(北京)科技有限公司 AI model automatic generation method based on computational graph evolution
CN110135498A (en) * 2019-05-17 2019-08-16 电子科技大学 Image identification method based on deep evolution neural network
CN110210609A (en) * 2019-06-12 2019-09-06 北京百度网讯科技有限公司 Model training method, device and terminal based on the search of neural frame
US11928583B2 (en) * 2019-07-08 2024-03-12 International Business Machines Corporation Adaptation of deep learning models to resource constrained edge devices
CN110399917B (en) * 2019-07-24 2023-04-18 东北大学 Image classification method based on hyper-parameter optimization CNN
CN110852435A (en) * 2019-10-12 2020-02-28 沈阳航空航天大学 Neural evolution calculation model
CN112884118A (en) * 2019-11-30 2021-06-01 华为技术有限公司 Neural network searching method, device and equipment
CN111260077A (en) * 2020-01-14 2020-06-09 支付宝(杭州)信息技术有限公司 Method and device for determining hyper-parameters of business processing model
CN111415009B (en) * 2020-03-19 2021-02-09 四川大学 Convolutional variational self-encoder network structure searching method based on genetic algorithm
CN112036512B (en) * 2020-11-03 2021-03-26 浙江大学 Image classification neural network architecture searching method and device based on network clipping
US20220198260A1 (en) * 2020-12-22 2022-06-23 International Business Machines Corporation Multi-level multi-objective automated machine learning
CN112668473B (en) * 2020-12-28 2022-04-08 东南大学 Vehicle state accurate sensing method based on multi-feature deep fusion neural network
CN113743605A (en) * 2021-06-16 2021-12-03 温州大学 Method for searching smoke and fire detection network architecture based on evolution method
CN114461535B (en) * 2022-04-14 2022-07-12 山东建筑大学 Parallel mutation operator-oriented obstinate variant test data generation method and system
CN114912589B (en) * 2022-07-18 2022-10-04 中船重工(武汉)凌久高科有限公司 Image identification method based on full-connection neural network optimization
CN115309043B (en) * 2022-07-25 2024-10-15 中国科学院光电技术研究所 Active disturbance rejection control method for photoelectric tracking system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971162A (en) * 2014-04-04 2014-08-06 华南理工大学 Method for improving BP (back propagation) neutral network and based on genetic algorithm
CN105279555A (en) * 2015-10-28 2016-01-27 清华大学 Self-adaptive learning neural network implementation method based on evolutionary algorithm
CN107609601A (en) * 2017-09-28 2018-01-19 北京计算机技术及应用研究所 A kind of ship seakeeping method based on multilayer convolutional neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971162A (en) * 2014-04-04 2014-08-06 华南理工大学 Method for improving BP (back propagation) neutral network and based on genetic algorithm
CN105279555A (en) * 2015-10-28 2016-01-27 清华大学 Self-adaptive learning neural network implementation method based on evolutionary algorithm
CN107609601A (en) * 2017-09-28 2018-01-19 北京计算机技术及应用研究所 A kind of ship seakeeping method based on multilayer convolutional neural networks

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Intelligent Arrhythmia Detection using Genetic Algorithm and Emphatic SVM (ESVM);Jalal A. Nasiri 等;《2009 Third UKSim European Symposium on Computer Modeling and Simulation》;20061231;1-7 *
NSGA-II基于非支配排序的多目标优化算法(中文翻译);朱勇90 上传;《百度文库》;20160622;1-4 *
改进交叉操作的遗传算法在神经网络优化中的应用;张迅等;《工业控制计算机》;20121231;48-51 *

Also Published As

Publication number Publication date
CN108334949A (en) 2018-07-27

Similar Documents

Publication Publication Date Title
CN108334949B (en) Image classifier construction method based on optimized deep convolutional neural network structure fast evolution
Baymurzina et al. A review of neural architecture search
CN111898689B (en) Image classification method based on neural network architecture search
WO2018161468A1 (en) Global optimization, searching and machine learning method based on lamarck acquired genetic principle
US8489526B2 (en) Controlling quarantining and biasing in cataclysms for optimization simulations
Ünal et al. Evolutionary design of neural network architectures: a review of three decades of research
CN112465120A (en) Fast attention neural network architecture searching method based on evolution method
CN107330902B (en) Chaotic genetic BP neural network image segmentation method based on Arnold transformation
Wen et al. Learning ensemble of decision trees through multifactorial genetic programming
CN114118369B (en) Image classification convolutional neural network design method based on group intelligent optimization
Elhani et al. Optimizing convolutional neural networks architecture using a modified particle swarm optimization for image classification
CN116542382A (en) Sewage treatment dissolved oxygen concentration prediction method based on mixed optimization algorithm
Bai et al. A joint multiobjective optimization of feature selection and classifier design for high-dimensional data classification
CN117253037A (en) Semantic segmentation model structure searching method, automatic semantic segmentation method and system
CN115908909A (en) Evolutionary neural architecture searching method and system based on Bayes convolutional neural network
CN114241267A (en) Structural entropy sampling-based multi-target architecture search osteoporosis image identification method
McGhie et al. Gpcnn: evolving convolutional neural networks using genetic programming
CN114863508B (en) Expression recognition model generation method, medium and device of self-adaptive attention mechanism
Babatunde et al. Comparative analysis of genetic algorithm and particle swam optimization: An application in precision agriculture
CN115661546A (en) Multi-objective optimization classification method based on feature selection and classifier joint design
Reiling Convolutional neural network optimization using genetic algorithms
Liu et al. Evolving hyperparameters for training deep neural networks against adversarial attacks
CN113704570A (en) Large-scale complex network community detection method based on self-supervision learning type evolution
Arifando et al. Hybrid Genetic Algorithm & Learning Vector Quantization for Classification of Social Assistance Recipients
Yagnik et al. Optimizing Activity Recognition in Video Using Evolutionary Computation.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant