CN108334949B

CN108334949B - Image classifier construction method based on optimized deep convolutional neural network structure fast evolution

Info

Publication number: CN108334949B
Application number: CN201810141306.7A
Authority: CN
Inventors: 陈晋音; 林翔; 熊晖; 俞山青; 宣琦
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2018-02-11
Filing date: 2018-02-11
Publication date: 2021-04-13
Anticipated expiration: 2038-02-11
Also published as: CN108334949A

Abstract

A fast evolution method for optimizing a deep convolutional neural network structure comprises the following steps: 1) effectively constructing a nonlinear CNN network structure by using an evolutionary algorithm based on GNP, and carrying out variation on various hyper-parameters of the CNN structure to find an optimal CNN hyper-parameter combination; 2) in the evolution process, a multi-objective network structure evaluation method is designed, the classification accuracy and the complexity of a classifier are simultaneously used as optimization targets, and the aim is to effectively generate a CNN classifier with high classification accuracy and a simple structure; 3) an incremental training method is provided, and training of the child CNN structure is performed on the basis of the previous generation CNN structure. The invention can reduce the training times of the model and reduce the time complexity of the algorithm.

Description

Image classifier construction method based on optimized deep convolutional neural network structure fast evolution

Technical Field

The invention belongs to the field of image classification, and relates to a method for constructing an image classifier based on the rapid evolution of an optimized deep convolutional neural network structure.

Background

With the rapid development of science and technology, the big data age has come. Deep learning takes a Deep Neural Network (DNN) as a model, and achieves remarkable results in key fields of many human intelligence, such as image recognition, reinforcement learning, semantic analysis and the like. The Convolutional Neural Network (CNN) is a typical DNN structure, can effectively extract hidden layer features of an image and accurately classify the image, and is widely applied to the field of image recognition in recent years. In 1998, LeCun et al proposed a LeNet-5 convolutional network structure, which is considered to be a milestone in deep learning history. LeNet-5 can recognize handwritten digital images composed of 32 x 32 pixels, but LeNet-5 does not achieve significant results in image classification due to its simpler structure and lack of computational power at the time. In 2012, Alex et al have set up a deep learning algorithm AlexNet, and have utilized the deep learning technique greatly to reduce the error rate of ImageNet image classification, so that deep learning has received extensive attention. Later, network frameworks such as ResNet, DenseNet and GoogleNet are also proposed successively, and the algorithms greatly expand the depth of CNN by adding corresponding modules, so that the accuracy of deep learning on image classification is further improved.

Evolutionary computation, a classical method of optimizing parameters, has been used to optimize neural network structures very early. The original neural network does not use a gradient descent algorithm but optimizes the weight parameters using an evolutionary computation method. Evolutionary computation is a natural selection process, and a part of the existing neural network is crossed and mutated and recombined to obtain offspring with better mutation rate, and the offspring has good matching with the neural network. The evolutionary method for optimizing the network weight comprises a CMA-ES algorithm, a SANE algorithm and an ESP algorithm. CMA-ES is a technique for continuous optimization that captures the interactions between weights and works well in optimizing weights. SANE and ESP develop and combine parts of the network architecture into a fully functional network.

Evolutionary algorithms are now being used more to optimize the structure and hyper-parameters of neural networks. Masaniori et al propose CGP-CNN algorithm. They tried to automatically construct a CNN architecture for Cartesian Genetic Programming (CGP) -based image classification tasks, and they used powerful modules as node functions of CGP to evaluate the network structure. Fernando et al evolved a combinatorial model by a self-encoder to generate weights for the network (CPPN) output neural network, and then integrated the trained weights into the CPPN genome using lamac adaptation. Duforq et al propose an evolution DEep Networks (EDEN) algorithm that can effectively evolve a CNN classifier with a certain classification accuracy and a simpler structure. More importantly, the method can complete the whole evolution process only by running on a single GPU for 6-24 hours, and therefore the optimization efficiency of evolutionary computation is greatly improved. Audrey et al propose SES-CNN, which utilize a sexual reproduction method to accelerate evolution development during evolution, aiming at synthesizing more diversified and more generalized offspring networks in offspring by combining two parent networks, so that the generated offspring has more compact characteristic representation. lorenzo et al apply Particle Swarm Optimization (PSO) to hyper-parametric selection in CNN, and design a parallel computation method based on particle swarm optimization to simplify the running time of the evolutionary algorithm, with the purpose of achieving load balancing and executing concurrent tasks. Miikkulareen et al propose a CoDeepNEAT algorithm based on NEAT neural evolution technology, in which a blueprint is built up from modules, and an optimal network structure is found by reusing the modules. Shafiee et al introduced a probabilistic model in the optimization process that presented the genetic code and environmental conditions through a probability distribution. Zoph and the like combine reinforcement learning and a recurrent neural network to obtain a good system structure, and train 800 network classes on 800 GPUs to obtain an optimal solution. Real et al used a neuro-evolution approach, using a parallel system executing on 250 computers to optimize the CNN classifier for the image classification problem.

Disclosure of Invention

In order to overcome the defects of high time complexity and single evaluation index of a CNN model in the conventional evolutionary CNN structural algorithm, the invention provides a rapid evolutionary image classifier construction method based on an optimized deep convolutional neural network structure, which has low time complexity and reasonable evaluation index, wherein a non-linear CNN network structure is effectively constructed by using the evolutionary algorithm based on GNP, and various hyper-parameters of the CNN structure are mutated to find an optimal CNN hyper-parameter combination; in the evolution process, the algorithm designs a multi-target network structure evaluation method, which can effectively simplify the network structure and realize better classification effect of the network structure; finally, the algorithm also provides an incremental training concept, and the training of the child CNN structure is carried out on the basis of the previous generation CNN structure.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a fast evolution method of optimizing a deep convolutional neural network structure, the evolution method comprising the steps of:

1) CNN optimization method based on GNP

Using GNPs as the basis algorithm for the evolutionary process, which comprises the following operations: population initialization, excellent individual selection, crossover operation and mutation operation are carried out as follows:

1.1) in the initialization of the population, a network structure is used for representing an evolution population, one network structure is represented in a form of Photopype and Genotype, in the Photopype, graphs with different shapes represent different CNN modules, different paths represent different initialization chromosomes, in the initialization process, the structures of all chromosomes are randomly generated, the Genotype displays a specific coding mode of each chromosome, and the hyperparameters in the CNN modules are coded;

1.2) after the initialization of the population is completed, training the obtained CNN structure by using training data, testing the classification effect of the classifiers, and selecting the classifiers with better performance for crossing and mutation; based on the GNP algorithm, corresponding crossover and mutation strategies are designed to update the structure and the hyper-parameters of the chromosome; the process is as follows:

1.2.1) crossover is an operation of obtaining a new chromosome by exchanging partial structures of two chromosomes, and to perform crossover operation, two chromosomes are selected as crossover objects; selecting chromosome objects by adopting a competitive bidding selection method, selecting two chromosomes as parent chromosomes in a crossing process by adopting the competitive bidding selection method, randomly selecting crossing points in the two chromosomes respectively after the selection is finished, and modifying connecting paths of the two parent chromosomes at the crossing points in the original population network structure chart to realize the crossing operation of the chromosomes;

1.2.2) the mutation is realized by constructing a new chromosome through the hyper-parameters and the network structure of the mutated chromosome, firstly, a parent chromosome is selected by a competitive competition selection method, and after the parent chromosome is selected, two mutation strategies are designed for the current chromosome: structural variation and hyper-parameter variation, wherein the structural variation is used for changing the depth of the CNN classifier, and a CNN structure capable of effectively extracting image features is developed; the variation of the hyper-parameters is used for searching the optimal parameter combination of each module;

1.3) in the evolution process, the number of the sub-generations generated in the evolution process of each generation is controlled by setting the cross probability and the variation probability of a population, in the evolution process of any generation, firstly, the CNN structure of the sub-generations obtained through cross variation is trained to become image classifiers, then the sub-generations classifiers and the parent classifiers are combined, a multi-target evaluation method is set according to the structural complexity and the test accuracy of each classifier, and the classifier with better selectivity enters the CNN structure evolution of the next round;

2) multi-target network evaluation and optimization method thereof

And taking the classification accuracy and the structural complexity of the classifier as optimization targets, and evaluating one classifier by using a multi-objective optimization method, so that the optimal CNN classifier which accords with practical application can be finally generated by evolutionary computation.

Further, in the step 2), in the process of evolutionary computation, computing a fixness function value of a Pareto optimal solution set on a PF curve by using a density estimation method for reference, thereby determining a specific optimization index of each Pareto solution;

for any one solution x in MOP problem_iTwo indicators are defined for the solution: i.e. i_rankAnd i_distance，i_rankIndicating the dominance level of the solution, i_rankSmaller means higher dominance level, better corresponding solution; i.e. i_distanceIndicating the crowding distance of this point, i_distanceThe larger the coverage of the current point is, the closer the corresponding solution is to the optimal solution; for two solution vectors with different rank values, a solution with a lower rank value is selected as an optimal solution, and if the rank values of the two solutions are the same, the solution with the larger distance value is considered to be more suitable to be the optimal solution.

Still further, the evolution method further comprises the following steps:

3) incremental training method

CNNs with similar structures often have similar interlayer weights, because the extraction modes of the CNNs to the image features are similar, in the training process of filial generations, the trained interlayer weights of the CNN of the parent generation are used as the initial weight values of the CNN of the filial generations, and the CNN of the filial generations is trained on the basis of the weight parameters of the parent generation.

The invention has the beneficial effects that: aiming at the problems of high time complexity, single evaluation index of a CNN model and the like of the conventional evolutionary CNN structure algorithm, a fast evolutionary algorithm (GNP _ FEL) for optimizing the CNN structure is provided. The algorithm uses an evolutionary algorithm based on GNP to effectively construct a nonlinear CNN network structure, and varies various hyper-parameters of the CNN structure to find an optimal CNN hyper-parameter combination; in the evolution process, the algorithm designs a multi-target network structure evaluation method, which can effectively simplify the network structure and realize better classification effect of the network structure; finally, the algorithm also provides an incremental training concept, and the training of the child CNN structure is carried out on the basis of the previous generation CNN structure.

Drawings

Fig. 1 is a flow diagram of a fast evolutionary method (GNP _ FEL) to optimize a deep convolutional neural network structure.

Fig. 2 is a schematic diagram of a population initialization process.

FIG. 3 is a schematic diagram of a chromosome crossing process.

FIG. 4 is a schematic diagram of a chromosome mutation process.

Fig. 5 is a schematic diagram of a PF curve and a target vector.

FIG. 6 is epoch_iWith delta_iThe change curve of (2).

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 to 6, a fast evolution method for optimizing a deep convolutional neural network structure, the evolution method comprising the steps of:

1) CNN optimization method based on GNP

The first gene network codes (GNPs) were proposed by k. Unlike GA and GP, GNP uses a network of decision nodes and execution nodes to represent a chromosome, which allows more flexibility in the structural changes of chromosomes, and also allows efficient search of parameter spaces to speed up the convergence rate of genetic algorithms. The method comprises the following steps of using GNP as a basic algorithm of the evolution process, designing corresponding population initialization, crossing and variation strategies for the evolution process, aiming at optimizing a network structure and hyper-parameters of the CNN in the evolution process, and finally obtaining a high-performance CNN classifier, wherein the process comprises the following steps:

1.1) in the initialization of the population, a network structure is used for representing an evolved population by taking the networking idea of the GNP algorithm as reference. A network structure can be expressed as a photon type and a Gentype. In Phenotype, the graphs of different shapes represent different CNN modules, which represent the components of some existing classical network structures, including a common convolution module, a single-layer convolution module, a densenet module, a resnet module, a pooling module, and a full-connection module. The different paths represent different initialization chromosomes, which START from the START node and end at the OUTPUT node, and each chromosome is built by a plurality of different CNN modules. During initialization, the structure of all chromosomes is randomly generated. Genotype shows the specific coding pattern for each chromosome. Taking Chromosome 1 as an example, 1_1,2_1 and the like represent the numbers of modules forming the Chromosome, each number corresponds to a module in the photopype one by one, and the coding mode can effectively store the photopype structure of the Chromosome. Further, we also encode the hyper-parameters in these modules to achieve optimization of these parameters during the evolution process.

Compared with GA and GP, the construction method adopts a random walk strategy, realizes the nonlinear creation of the chromosome by taking a module as a unit, ensures the structural diversity of the initialized chromosome, and increases the possibility of evolving the optimal CNN structure.

1.2) after the initialization of the population is completed, training the obtained CNN structure by using training data, testing the classification effect of the classifiers, and selecting the classifiers with better performance for crossing and mutation. In the crossing and mutation process, in order to obtain a CNN network with better classification effect, the structure and the hyper-parameters of the original chromosome are changed. This is the network structure evolution process of CNN. Based on the GNP algorithm, corresponding crossover and mutation strategies are designed to update the structure and the hyperparameters of the chromosome.

1.3) crossover is an operation to obtain a new chromosome by exchanging the partial structures of two chromosomes. In the evolution process of this embodiment, the crossover operation is mainly used to realize the evolution of the CNN structure. To perform the crossover operation, two chromosomes are first selected as the crossover objects. In this embodiment, a competitive competition selection method is used to select chromosome objects. Through the competitive competition selection method, two chromosomes are selected as parent chromosomes in the crossing process and are marked as parent1 and parent 2. After the selection is finished, cross points are respectively and randomly selected from the two chromosomes and are marked as position1 and position2, and in the original population network structure diagram, the cross operation of the chromosomes can be realized by modifying the connection paths of parent1 and parent2 at the cross points.

1.4) mutation is realized by the hyper-parameters and the network structure of the mutated chromosome to construct a new chromosome. Mutation also requires the first selection of parent chromosomal parent through competitive selection. After Parent selection is completed, we have designed two variation strategies for the current chromosome: structural variation and hyper-parametric variation. The structural variation is based on the original chromosome structure, module addition, change or deletion is carried out, the depth of the CNN classifier can be changed, and the CNN structure capable of effectively extracting image features is developed. The hyper-parameter variation is an operation performed on a chromosome module, one module comprises a plurality of hyper-parameters such as filter size, channel depth, activation function and the like, and the hyper-parameter variation aims at finding an optimal parameter combination of each module.

In the evolution process, the number of sub-generations generated in the evolution process of each generation is controlled by setting the cross probability and the variation probability of a population. In the evolution process of any generation, firstly, training a child CNN structure obtained through cross variation to enable the child CNN structure to become image classifiers, then combining the child classifiers and parent classifiers, setting a multi-target evaluation method according to the structural complexity and the test accuracy of each classifier, and enabling the classifier with better selectivity to enter the next round of CNN structure evolution.

2) Multi-target network evaluation and optimization method thereof

The method takes the classification accuracy and the structure complexity of the classifier as optimization targets, and uses a multi-objective optimization Method (MOP) to evaluate one classifier, so that the optimal CNN classifier which accords with practical application can be finally generated by evolutionary computation.

The multi-objective optimization problem (MOP) can be described by the following equation:

F(x)＝(f₁(x),f₂(x),...,f_m(x))^Ts.t.x∈Ω (1)

where Ω represents the value space of x, and F (x) is the value over the m-dimensional solution space in general, the target values in MOP are contradictory, which means that all targets cannot be minimized at any one point in the feasible solution space. The objective of the multi-objective optimization method is to find an optimal set of Pareto solutions among the solutions.

Several important definitions regarding multi-objective optimization are given below.

Definition 1(Pareto predominance): let x be_A,x_BE.omega are two feasible solutions of the multi-objective optimization problem, and are called x_BIn contrast, x_AIs Pareto dominant, if and only if

Notation x_A＞x_BAlso known as x_ADominating x_B

Definition 2(Pareto optimal solution): a solution x^*E x is called Pareto optimal solution (or non-dominant solution) if and only if the following condition is met:

definition 3(Pareto optimal solution set): the Pareto optimal solution set is a set of all Pareto optimal solutions, which is defined as follows:

definition 4(Pareto frontier): pareto optimal solution set P^*The curved surface formed by the target vectors corresponding to all the Pareto optimal solutions in the Pareto is called a Pareto front surface PF^*：

PF^*＝{F(x)＝(f₁(x^*),f₂(x^*),...,f_m(x^*))^T|x^*∈P^*} (5)

In the MOP application, the PF is represented by a curve or a curved surface formed by a group of optimal Pareto solutions, and the corresponding Pareto optimal solutions can be found through a multi-objective optimization algorithm. After the PF curve is determined, the decision maker selects one solution from the Pareto solution sets as the optimal solution for output. Since MOP usually involves contradictory targets, the knee point at PF curve is often output as the optimal solution. Since the knee point is better able to balance the two target quantities than the other points and shows good performance in many applications.

2.2 in the process of evolution calculation, all solutions correspond to a fitness value, the quality of the solution is determined by the size of the fitness value, and the fitness value also plays a role in guiding the selection probability in the competitive bidding competition selection method. The patent refers to the field of 'electric digital data processing'.

For any one solution x in MOP problem_iKalyanmoy et al, defines two indicators for this solution: i.e. i_rankAnd i_distance。i_rankIndicating the dominance level of the solution, i_rankSmaller means higher degree of dominance, correspondingThe better the solution; i.e. i_distanceIndicating the crowding distance of this point, i_distanceThe larger the coverage of the current point is, the closer the corresponding solution is to the optimal. According to the two indexes, we define a fitness function ordering method in the evolution process:

for any two CNN classifiers x_iAnd x_j，

fitness_i＞fitness_j if(i_rank＜j_rank)or((i_rank＝j_rank)and(i_distance＞j_distance)) (6)

This means that for two solution vectors with different rank values, we tend to choose the solution with the lower rank value as the optimal solution. If the rank values of the two solutions are the same, we consider that the solution with the larger distance value is more suitable to be the optimal solution.

Further, the optimization method further comprises the following steps:

3) incremental training method

The first two sections describe GNP-based evolutionary methods and multi-objective optimization methods during evolution. The combination of the two methods can evolve a CNN classifier with better performance. However, the evolutionary algorithm needs to train and test the classification effect of each newly generated CNN structure, and can calculate the fitness value of the CNN structure. To reduce the time complexity of the evolutionary algorithm, we propose an incremental learning method in this section.

In the evolutionary algorithm, all the offspring chromosomes are obtained through parent crossing or mutation. Crossover is a partial combination of two parent chromosomes, and mutation is a fine-tuning of the parent chromosome structure. CNNs with similar structures tend to have similar inter-layer weights because they extract image features in a similar manner. In the training process of the offspring, the interlayer weight of the parent CNN after training can be used as the initial weight value of the offspring CNN, so that the offspring CNN is trained on the basis of the parent weight parameter, and the purpose of reducing the time complexity of the algorithm is achieved.

For one offspring chromosome C_iIf it is operated by interleavingThen, its structure mainly comprises two parts: one part inherited from parent1 and denoted as P_{i_1}The other part is inherited from parent2 and is denoted as P_{i_2}I.e. by

C_i＝P_{i_1}+P_{i_2} (7)

If it is obtained by mutation operations, its structure is mainly composed of two parts: one part inherited from parent and denoted as P_i(ii) a Some are produced by self-mutation, denoted as M_iI.e. by

C_i＝P_i+M_i (8)

Definition 5 (degree of structural change): if for a offspring chromosome C_iDefining the structural change degree of the child relative to the parent chromosome as

Wherein p is_{i_1}Represents P_{i_1}The number of weight parameters p contained in_{i_2}Represents P_{i_2}P represents P_iM represents M_iThe number of weight parameters contained in (1). As can be seen from equation (9), for the child chromosomes generated by crossing, the degree of structural change is related to the ratio of the weight parameters inherited from the two parents. In the offspring chromosome generated by the mutation, the more the hyperparameter of the mutation is, the more the structure of the offspring is changed; for the first created batch of chromosomes, the degree of structural change is 1.

Definition 6 (weight initialization): for chromosomes obtained through population initialization, all weight parameters are set to be random numbers with the mean value of 0 and the variance of 1 when a corresponding CNN network is established; for the offspring chromosomes obtained by the parent through cross variation, the structure of the inheritance part of the parent takes the parameters after the training of the parent as initial values, the weight parameters of the newly generated part are set as random numbers with the average value of 0 and the variance of 1.

Definition 7 (progeny training batch): for the offspring chromosome i, the training batch required by the chromosome is calculated through the structural change degree of the offspring chromosome i. The specific formula is as follows:

where min _ epoch represents the minimum training batch for a classifier, max _ epoch represents the maximum training batch for a classifier, equation (10) projects the training batches of children to a value between min _ epoch and max _ epoch, and epoch_iIs with delta_iCurve of function varying logarithmically. In practical application, a smaller structural change of the CNN classifier can cause larger influence on weight parameters of other layers, and the sensitivity of the CNN structure to small change in incremental learning can be increased by using a function with logarithmic property, so that classifiers trained by descendants have better classification effect.

And after determining the epoch of each descendant, performing data training of the epoch batches on the descendant CNN to obtain the descendant CNN classifier.

4) Examples of the applications

The convolutional neural network is used as an important branch of deep learning and is applied to aspects of image recognition, natural language processing, target detection and the like. The fast evolutionary method for optimizing the deep convolutional neural network structure is a technical improvement oriented to the field of image recognition.

The image recognition problem is intended to process, analyze and understand the contents of pictures by means of computer programs so that the computer can automatically recognize various different patterns of objects and objects from the pictures. Taking the CIFAR-10 data set as an example, this section will explain how the fast evolutionary method for optimizing the structure of the deep convolutional neural network is applied to the improvement of the image classifier.

The CIFAR-10 question collected 60000 color pictures of 10 different species including airplane, car, bird, cat, deer, dog, frog, horse, boat, truck. The picture size in CIFAR-10 is fixed and each picture contains only one kind of entity, and the pixels of the image are 32 x 32.

In order to effectively develop a CNN classifier which has higher classification accuracy on CIFAR-10 and simple structure. The GNP-FEL algorithm comprises the following specific steps:

creating an initialization chromosome.

And splicing different CNN modules by a random walk strategy to create primary chromosomes with different lengths and different module collocation. During initialization, we first need to set a range, denoted as [ m, n ], for the length of each chromosome during evolution, i.e., for each chromosome, the number of modules contained therein is at least m and at most n. The length of the chromosome is limited to prevent excessive complication of the CNN model and improve the efficiency of the evolutionary computation. If the structure of one chromosome is too simple or too complex, the PF curve is usually too long when multi-objective optimization is carried out, so that exploration of a non-optimal solution space is increased, the complexity of the evolutionary algorithm is increased, and the guidance of the evolutionary algorithm is reduced. For the CIFAR-10 dataset, the module length range of each chromosome is set to [5,12], and the number of chromosomes with different module lengths is kept as equal as possible, so as to balance the equilibrium of solution space exploration in the evolution process.

After the initial CNN chromosome population is created, the CNN structure corresponding to each chromosome needs to be trained to finally become CIFAR-10-oriented image classifiers, and the specific effect of the CNN classifier is evaluated according to the classification accuracy of each CNN classifier on a test set and the number of self weight parameters by using a multi-target evaluation method provided by Kalyanmoy and the like.

② cross mutation to create offspring chromosomes.

The purpose of crossover and mutation is to develop a CNN classifier with better classification effect on CIFAR-10 data sets. In the application, a CNN classifier with better performance in the current population is selected as a crossed and mutated parent chromosome by a competitive bidding selection method. The specific methods of intersection and mutation are consistent with the corresponding contents of section 1). In the experiment, the crossover probability is set to be 0.2, the mutation probability is set to be 0.1, and the chromosome structure of each filial generation generated after crossover and mutation is preserved until the total chromosome number of the filial generation is equal to that of the parent generation, and the crossover and mutation operations are stopped. For each child chromosome, the initial weight parameter part of its structure inherits to the parent, and the part is initialized randomly. After the initialization of the weight parameters is completed, the chromosomes are trained by the incremental training method in section 3), so that the training process is accelerated, and the time complexity of evolutionary computation is reduced.

And thirdly, selecting high-performance filial generations by using the multi-objective optimization method.

And when the offspring chromosomes are trained into image classifiers for CIFAR-10 one by one, merging the parent chromosomes and the offspring chromosomes, and selecting a part of chromosomes with higher performance from the parent chromosomes for subsequent evolution. The algorithm aims to evolve a CNN classifier with high classification accuracy and low structural complexity. PF curves of solution vectors corresponding to the chromosomes can be made according to the accuracy and the weight parameter number of each CNN classifier in the existing chromosome population, the performance of all CNN classifiers can be sorted from high to low by combining a multi-target evaluation method provided by Kalyanmoy and the like, and after the sorting is finished, the classifiers with high performance are selected for subsequent evolution.

And fourthly, outputting the optimal CNN classifier.

And repeating the step II and the step III until the iteration number of the evolution meets the condition, and stopping the iteration. And outputs the knee point on the last generation PF curve as the optimal solution.

Experiment and result analysis: to validate the algorithm, we tested the GNP-FEL algorithm on the CIFAR-10 dataset, the Fashinon-MNIST dataset, and the SVHN dataset. The experiment is mainly divided into three parts. In the first part, the evolutionary results of the GNP-FEL algorithm and the GNP-EL (compared with the GNP-FEL algorithm, an incremental training method is omitted) algorithm are compared, and a PF curve and an optimal CNN structure generated by the two algorithms are analyzed. The second part counts the running time of the GNP-EL algorithm and the GNP-FEL algorithm on three data sets, and determines the effectiveness of incremental training. And in the third part, the optimal CNN classifier generated by the GNP-FEL algorithm is compared with CNN classifiers generated by other algorithms, and the characteristics and advantages of the algorithm are displayed.

4.1) for the CIFAR-10 data set, the cross probability is set to be 0.2, the mutation probability is set to be 0.1, and the iteration number of the evolution process is set to be 50. When each CNN classifier is trained, the learning rate is set to be 0.001, and the learning rate attenuation coefficient is set to be 0.95. max _ epoch is 50 and min _ epoch is 25. In the GNP-EL algorithm, the optimal solution O₁The error rate of (2) is 0.1550, the number of parameters of the CNN model is 438342, compared with the initial CNN classifier, the optimal solution is reduced by about 0.05 in error rate and by nearly half in parameter number; in the GNP-FEL algorithm, the optimal solution O₂The error rate of (c) is 0.1170, the number of parameters of the CNN model is 613206, and compared with the initial CNN classifier, the optimal solution is reduced by about 0.08 in error rate and also reduced by nearly half in the number of parameters.

Further, the optimal solutions obtained by the two algorithms are compared to find O₁Error rate ratio of (1)₂Is 0.038 higher, and O₂Than O₁174864, there is a certain difference between the two. This distinction is due to the randomness of the evolutionary algorithm. But in general, O₁And O₂The difference in the error rate and the number of weight parameters is not very large, and the performance of the corresponding CNN classifiers are also relatively close, which can be regarded as two sub-optimal solutions close to the global optimal solution. This shows that for the CIFAR-10 dataset, the evolutionary effects of the GNP-EL algorithm and the GNP-FEL algorithm are equivalent, and the whole evolutionary algorithm gradually converges towards the optimal solution in the evolutionary process.

For the fast-MNIST dataset and the SVHN dataset, we set the generation to 40, and the other parameters are consistent with the parameter values in CIFAR-10.

Optimal solution O in FashionMNIST dataset using GNP-EL algorithm₁The error rate of (2) is 0.0776, the number of parameters of the CNN model is 133474, compared with the initial CNN classifier, the optimal solution is reduced by about 0.008 on the error rate and is reduced by more than half on the number of parameters; in the GNP-FEL algorithm, the optimal solution O₂Has an error rate of 0.0806 and a CNN model parameter number of 147126, and compared with the initial CNN classifier, the optimal solution is in errorThe rate was reduced by about 0.006, and the number of parameters was reduced by nearly two thirds. O is₁Than O₂The error rate of the CNN model is lower than 0.003, and the number of CNN model parameters of the CNN model is basically equal to that of the CNN model, which shows that O₁And O₂The performance of (A) is very close.

Optimal solution O in SVHN dataset using GNP-EL algorithm₁The error rate of (2) is 0.0662, the number of parameters of the CNN model is 182610, compared with the initial CNN classifier, the error rate of the optimal solution is reduced by about 0.015, and the number of parameters is reduced by about 50000; in the GNP-FEL algorithm, the optimal solution O₂The error rate of (2) is 0.0719, the number of parameters of the CNN model is 264441, compared with the initial CNN classifier, the optimal solution is reduced by about 0.070 in error rate, and is also reduced by about 50000 in parameter number.

4.2) FIG. 7 shows the time required for both algorithms to generate each generation of CNN classifier during evolution in the CIFAR-10 dataset. It can be seen from the figure that the time required for the GNP-FEL algorithm to generate on average a generation of CNN classifiers is only 0.6 times that of the GNP-EL algorithm. FIGS. 8 and 9 are runtime diagrams of the fast-MNIST dataset and the SVHN dataset in the GNP-EL algorithm and the GNP-FEL algorithm. It can be seen from the averaged curves in the figures that the GNP-FEL algorithm runs less than half the time that the GNP-EL algorithm runs in both data sets. In conjunction with the above analysis, we can conclude that: incremental learning is used in the evolutionary algorithm, so that the time complexity of the algorithm can be effectively reduced, and the stability of the output optimal solution is kept.

4.3) Table 1 shows the results of several algorithms on the CIFAR-10 dataset. Nas (neural Architecture search) is a model constructed based on a reinforcement learning method. VGG and ReNet are manually built CNN architectures. CGP-CNN and EDEN are two evolutionary algorithms for optimizing CNN structure in recent years.

Model (model)	Error Rate (%)	Number of parameters (× 10)⁵)	Run time	Number of GPUs
					NAS	3.65	374	-	800
CGP-CNN	6.75	15.2	15.2 days	2
					VGG	7.94	152	-	-
ReNet	12.35	-	-	-
					EDEN	25.50	1.73	1 day	1
GNP-EL	15.50	4.38	9.8 days	1
					GNP-FEL	11.70	6.13	5.8 days	1

TABLE 1

As can be seen from table 1, although NAS and VGG have good error rates, the structures of the two models are very complex, a large number of weight parameters need to be trained, and the computation resources occupied by them are also quite large. The CGP-CNN evolved a CNN classifier with higher performance on the error rate and the number of weight parameters through an evolutionary algorithm, but the time required for the CGP-CNN to complete the evolutionary process under the configuration of two GPUs was 15.2 days. The CNN classifier derived by EDEN, although having few weighting parameters, has the highest error rate among these algorithms. However, although the GNP-EL algorithm and the GNP-FEL algorithm proposed in this embodiment do not reach the optimal values in terms of error rate and number of parameters, the optimal CNN structure developed by them reaches a good balance in terms of two indexes, namely, classification error rate and number of model weight parameters. In addition, in the embodiment, the time for running the GNP-EL algorithm once under one GPU is about 9.8 days, while the time for running the GNP-FEL algorithm once is about 5.8 days, which is greatly improved compared with the CGP-CNN.

Claims

1. An image classifier construction method based on optimized deep convolutional neural network structure fast evolution is characterized by comprising the following steps:

60000 color pictures of 10 different types are collected by a CIFAR-10 data set, the size of the pictures in the CIFAR-10 is fixed, each picture only contains one type of entity, the pixel of the image is 32 multiplied by 32, image training data is formed, and a CNN classifier of the CIFAR-10 is effectively evolved;

the evolution method comprises the following steps:

1) CNN optimization method based on GNP

1.2) after the initialization of the population is completed, training the obtained CNN structure by using image training data, forming image classifiers for each chromosome, testing the classification effect of the classifiers, and selecting the classifiers with better performance for crossing and variation; based on the GNP algorithm, corresponding crossover and mutation strategies are designed to update the structure and the hyper-parameters of the chromosome; the process is as follows:

1.3) in the evolution process, the number of the sub-generations generated in the evolution process of each generation is controlled by setting the cross probability and the variation probability of the population, in the evolution process of any generation, firstly, the structure of the sub-generation CNN obtained by cross variation is trained to become an image classifier,

wherein the child CNN structure of the crossover operation comprises a child CNN structure inherited from parent1 and denoted as P_{i_1}Inherited from parent2 and denoted P_{i_2}；

The child CNN structure resulting from mutation operation includes the child CNN structure inherited from parent1 and denoted as P_iThe result of self-mutation is recorded as M_i；

The structural change degree of the child CNN structure relative to the parent structure is:

wherein p is_{i_1}Represents P_{i_1}The number of weight parameters p contained in_{i_2}Represents P_{i_2}P represents P_iM represents M_iThe number of the weight parameters contained in the data;

initializing the weight, setting all weight parameters as random numbers with a mean value of 0 and a variance of 1;

the offspring training batch: calculating the training batch of the structure of the child CNN through the structure change degree, wherein the concrete formula is as follows:

where min _ epoch represents the minimum training batch for a classifier, max _ epoch represents the maximum training batch for a classifier, equation (2) projects the training batches of children to a value between min _ epoch and max _ epoch, and epoch_iIs with delta_iA function curve varying in logarithmic properties; after determining the epoch of each descendant, carrying out data training of the epoch batches on the descendant CNN to obtain a descendant CNN classifier;

combining the child CNN classifiers and parent CNN classifiers, setting a multi-target evaluation method according to the structural complexity and the test accuracy of each CNN classifier, and enabling the image classifier with better selectivity to enter the next round of CNN structural evolution;

2) multi-target network evaluation and optimization method thereof

And (3) taking the classification accuracy and the structural complexity of the CNN classifier as optimization targets, and evaluating the CNN classifier by using a multi-objective optimization method, so that the optimal CNN classifier which is in line with practical application can be finally generated by evolutionary computation.

2. The method for constructing the image classifier based on the optimized fast evolution of the deep convolutional neural network structure as claimed in claim 1, characterized in that:

in the step 2), in the process of evolutionary computation, computing a fixness function value of a Pareto optimal solution set on a PF curve by using a density estimation method for reference, thereby determining a specific optimization index of each Pareto solution;

for any one solution xi in the MOP problem, two indices are defined for that solution: irank and identity, wherein irank represents the dominant grade of the solution, and the smaller irank represents the higher dominant grade, the better the corresponding solution; the existence represents the crowding distance of the point, the larger the existence represents that the coverage of the current point is large, and the closer the corresponding solution is to the optimal solution; for two solution vectors with different rank values, the solution with the lower rank value is selected as the optimal solution, and if the rank values of the two solutions are the same, the solution with the larger distance value is considered to be more suitable to be the optimal solution.