CN111242281A

CN111242281A - Weight optimization method for deep convolutional neural network

Info

Publication number: CN111242281A
Application number: CN202010014858.9A
Authority: CN
Inventors: 安竹林; 杨传广; 徐勇军; 程坦
Original assignee: Xiamen Institute Of Data Intelligence Institute Of Computing Technology Chinese Academy Of Sciences
Current assignee: Xiamen Institute Of Data Intelligence Institute Of Computing Technology Chinese Academy Of Sciences
Priority date: 2020-01-07
Filing date: 2020-01-07
Publication date: 2020-06-05

Abstract

The invention discloses a method for optimizing a deep convolutional neural network weight, which comprises the following steps: acquiring an initial population, and carrying out initialization and gene coding; performing gradient descending parameter training on all individuals in the initial population until a preset number of times is reached; calculating individual fitness and sequencing; based on a genetic algorithm, carrying out selection, crossing and mutation operations on the initial population to obtain a new generation of population; and judging whether a termination condition is reached, and if not, performing iterative training and evolution on the new generation of population. The invention adopts the combination of the genetic algorithm and the gradient descent method to optimize the weight of the deep convolutional neural network, can improve the recognition rate of the deep convolutional neural network, and simultaneously improves the acquisition speed of the deep convolutional neural network.

Description

Weight optimization method for deep convolutional neural network

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a weight optimization method of a deep convolutional neural network.

Background

Deep learning develops rapidly in the field of artificial intelligence, and the performance of the deep learning is close to or surpasses that of human beings in the aspects of image recognition, voice recognition, natural language processing and the like, but the development of the deep learning is not only stopped, and the optimization of a deep learning algorithm and the combination with the technology in other fields become a new direction for the development of the deep learning. As is well known, an artificial neural network has been studied for decades under deep learning, and the inventive inspiration of the artificial neural network is also an information transfer mechanism from the brain of a biological human, as with other technologies in the field of artificial intelligence. In the optimization field, intelligent algorithms such as genetic algorithm, particle swarm algorithm and the like are also from the biological category, the heuristic biological intelligent algorithms have the advantages of strong global search capability and high convergence speed, and in deep learning, the heuristic method is mainly used for searching the optimal solution or gradient reduction at present. The gradient descent method utilizes the characteristic that the function changes in the negative gradient direction most quickly, subtracts the gradient value of the point from the parameter value and gradually descends to the minimum value of the function, so as to achieve the purpose of parameter optimization, but the method has the defects that the method easily falls into local optimization, complex gradient needs to be calculated, the convergence speed is slow, the non-trivial function cannot be effectively converged, and the like, so that the field combining deep learning and intelligent optimization algorithm is produced, the defect in gradient descent is overcome by utilizing the complementary advantages of the two, and the field gradually draws the attention of more researchers.

The application of genetic algorithm to neural network optimization research has been long, and the emergence of deep neural networks brings new challenges to the research in the field. Compared with the traditional neural network, the deep neural network has deeper layer number and larger network scale. In addition, due to the introduction of convolution operation, the optimized object becomes more complex, and therefore, the traditional method for optimizing the deep neural network based on the genetic algorithm cannot be directly applied to the deep neural network.

At present, the main research of deep neural networks optimized by applying genetic algorithms focuses on the field of reinforcement learning, such as Neural Evolution (NE), and the main purpose of the method is to allow neural networks to survive according to the principle of survival, excellence and decline of the qualified subjects in the biological world, and finally obtain individuals with the highest fitness, namely the best reinforcement learning neural networks. The biological inspiration in this field lies in the evolutionary process of neurons of the human brain, which has so developed learning and memory functions today, without leaving the complex neural network systems in the brain. The whole set of neural network system of the human brain benefits from a long evolution process instead of gradient descent and back propagation in deep learning, so that the evolution of the neural network by using an evolutionary algorithm becomes an emerging field based on the principle and has feasibility in theory.

The evolutionary algorithm is mainly divided into two forms, the first type is fixed topology neural evolution, namely the topological structure of a neural network is still set by a researcher, and the weight of the network is given to the evolutionary algorithm for evolution, so that the optimal solution is found out; the second is an artificial Neural Network (TWEANN, Topologies weightevolution), i.e. the topology and the Network weights are both evolved by using an evolutionary algorithm, and finally the optimal topology and Network weight combination is obtained. Later, more and more algorithms with evolved topology and weight gradually appear, firstly, a topological-enhanced Neural Evolution Algorithm (NEAT) is enhanced, the algorithm provides a historical marking technology and a species technology of genes, the limitation that immature individuals in twearnn can be extincted is solved, and the evolvable scale of the network is enlarged; subsequently, a Hypercube-based NEAT algorithm is changed based on the NEAT algorithm, the algorithm adopts indirect coding for genes, and a composite Pattern generation network CPPN (composite Pattern generating network) is used for generating network connection, so that the scale of the network topology which can be evolved is greatly increased; the emergence of Novel Search (NS) makes the only criterion for judging the quality of a neural network not only good or bad fitness, but also adds a novel concept, which makes it easier to find a potentially optimal individual.

At present, the neural evolution algorithm obtains remarkable results in the field of reinforcement learning, and even an agent evolved by using the neural evolution algorithm can be better than deep learning in performance in certain games. Meanwhile, in the field of supervised learning, the image classifier trained based on the evolutionary algorithm also obtains a good effect, but for the deep convolutional neural network, as the evolution time is longer, the overall performance and performance are not as good as those of the deep convolutional neural network only by using a gradient descent method, so that an effective optimization method and a result are not available.

Disclosure of Invention

In order to solve the problems, the invention provides a weight optimization method of a deep convolutional neural network.

The invention adopts the following technical scheme:

a deep convolutional neural network weight optimization method comprises the following steps:

s1, obtaining an initial population, initializing the weight and bias of an individual of the initial population, and carrying out gene coding;

s2, performing gradient descending parameter training on all individuals in the initial population until the preset times are reached;

s3, setting the weight and bias of the individuals in the initial population into a calculation graph, then obtaining individual fitness by using test set data, sequencing the individual fitness from small to large to form the ranking of each individual, and storing the individual with the maximum fitness;

s4, selecting, crossing and mutating the initial population based on the genetic algorithm to obtain a new generation population;

s5, judging whether a termination condition is reached, if so, terminating, otherwise, executing a step S5, wherein the termination condition is that the maximum fitness meets the requirement or the maximum iteration step number is reached;

and S6, jumping to execute the step S2, and performing iterative training and evolution on the new generation of population.

Preferably, the step S4 includes the following substeps:

s41, aiming at all individuals of the initial population, sequentially selecting n-1 individuals by adopting a roulette algorithm based on the ranking of individual fitness;

s42, performing cross operation on the interior of the selected individuals to obtain n-1 crossed individuals;

s43, performing mutation operation on the crossed n-1 individuals to obtain n-1 individuals after mutation;

and S44, combining the n-1 mutated individuals with the maximum fitness obtained in the step S3 to obtain a new generation population containing n individuals.

Preferably, the preset number of times described in step S2 is set by the following formula:

wherein, ga _ step is the interval step number used by the genetic algorithm, and step is the training step number.

Preferably, the step S1 of obtaining the initial population specifically includes: and randomly generating an initial population according to preset parameters.

Preferably, the initialization described in step S1 is a normal distribution initialization method.

Preferably, the crossover operation described in step S42 is performed in a single-point crossover manner, where a crossover point is randomly selected and the gene segments are segmented, and then the paired parents are swapped with the gene segments to form new offspring.

After adopting the technical scheme, compared with the background technology, the invention has the following advantages:

the invention adopts the combination of the genetic algorithm and the gradient descent method to optimize the weight of the deep convolutional neural network, can improve the recognition rate of the deep convolutional neural network, and simultaneously improves the acquisition speed of the deep convolutional neural network.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 shows gene coding patterns;

FIG. 3 illustrates parameter dimension information;

FIG. 4 is a schematic diagram of a chromosome single point crossing pattern;

fig. 5 is a schematic diagram of a topological structure in a single-point crossing manner.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Examples

Referring to fig. 1, the invention discloses a method for optimizing weights of a deep convolutional neural network, which comprises the following steps:

s1, obtaining initial population, initializing individual weight and bias, and encoding gene. The initial population is randomly generated according to preset parameters, the preset parameters mainly aim at a genetic algorithm part, in the embodiment, a single-point crossing is adopted in a crossing mode, the probability is pc equal to 0.5, a replacement variation mode is adopted in a variation mode, values in a parameter range (set according to the weight and the bias distribution diagram of the convolutional neural network) are randomly generated firstly, then gene values are replaced, the individual variation probability pm is 0.1, and the parameter variation probability pw is 0.8. The initialization adopts a normal distribution initialization method. Since the weights and biases of each layer in the convolutional neural network are real numbers, the genes of the genetic algorithm are encoded by real numbers. Visual coding As shown in FIG. 2, the chromosome is a parameter list including 5 layers of parameters w of the neural network₁，b₁，w₂，b₂，w₃，b₃，w₄，b₄，w₅，b₅Therefore, the length of the chromosome is 10. The weights and offsets are not real scalars as in the conventional encoding method, but are arrays with dimension information, and the dimensions of the parameters are as shown in fig. 3 according to the structural design of the convolutional neural network.

And S2, performing gradient descending parameter training on all individuals in the initial population until the preset times are reached. The preset number of times is set by the following formula:

The above predetermined formula of times is essentially a scheme of attenuation steps. Through preliminary experiments, the training results of the neural networks with ga _ step 2 and ga _ step 5 are difficult to rise after 1000 generations, the training results of the neural networks with ga _ step 40 are poor in the former 700 generations, and the neural networks using step number attenuation can keep better results in the early training and far exceed other neural networks in the later training. Because the classifier has poor effect in the early stage of training, the frequent use of the genetic algorithm is beneficial to a neural network to find good parameters more quickly and accelerate the training speed; in the later period of training, the gradient decline can be influenced by frequent genetic algorithm, which causes training disorder, and at this time, the use of genetic algorithm needs to be reduced, so that the gradient decline is carried out stably.

S3, setting the weight and bias of the individuals in the initial population into a calculation graph, then obtaining the individual fitness by using the test set data, sequencing the individual fitness from small to large to form the ranking of each individual, and storing the individual with the maximum fitness.

And S4, selecting, crossing and mutating the initial population based on the genetic algorithm to obtain a new generation population. The method comprises the following steps:

and S41, sequentially selecting n-1 individuals by adopting a roulette algorithm according to the ranking of the individual fitness for all the individuals of the initial population. The selection operation is responsible for selecting individuals in the population on a certain basis for reproduction or retention in the next generation. In the selection operation of the experiment, uniform sequencing is adopted, namely the fitness of all individuals in the population is arranged in an ascending order, and the obtained ranking is used as the selected basis. Roulette selection is then used, which is a pull-back sampling operation where the probability of a particular individual being selected into the next generation is the ratio of the individual's rank to the sum of the entire population's ranks. Firstly, a fan-shaped wheel disc is manufactured according to the fitness proportion of individuals, and the individuals are selected by rotating the wheel disc each time. It should be noted that the reason why the selection operation does not directly use the fitness as the sampling basis is the difference of the individual fitness, and in the experimental process, it is observed that the difference of all the individual fitness is mostly 10^-2Level, so directly using the fitness ratio to sample is not obvious and cannotThe advantages of the selection operation are exerted to the maximum extent. In order to ensure that the optimal individuals can be reserved, the individuals with the maximum fitness automatically join the new population after each selection is finished.

And S42, performing cross operation on the interior of the selected individuals to obtain n-1 crossed individuals. Crossover refers to the process of exchanging two paired individuals for part of the gene they have, and then creating two new individuals. Considering that the single-point crossing can maintain the integrity of the topology to the maximum extent and can achieve the effect of the topology crossing, the crossing operation in this embodiment adopts the single-point crossing mode, firstly randomly selects a crossing point and divides the gene segment, and then the paired parents exchange the gene segment with each other to form a new offspring (as shown in fig. 4 and fig. 5). In the crossing process, the mode of crossing the 1 st individual and the 2 nd individual, the mode of crossing the 3 rd individual and the 4 th individual, … …, the mode of crossing the n-2 nd individual and the n-1 st individual is adopted in the pairing, and the n-1 st individual is obtained after the crossing.

S43, performing mutation operation on the crossed n-1 individuals to obtain n-1 individuals after mutation.

And S44, combining the n-1 mutated individuals with the maximum fitness obtained in the step S3 to obtain a new generation population containing n individuals. Mutation refers to an operation of changing a gene value on a chromosome according to a certain probability, and is mainly classified into disturbance mutation and substitution mutation according to a change mode. As the name implies, the disturbance variation is to increase or decrease the parameter value by using the variation value, and the replacement variation is to replace the parameter value by using the variation value. In this embodiment, the mutation operation changes the parameter by using the alternative mutation, specifically, an individual to be mutated is determined according to the individual mutation probability, and the determined variant individual is subjected to parameter mutation with a coding length of several times, where the parameter mutation is based on the parameter mutation probability each time.

And S5, judging whether a termination condition is reached, if so, terminating, otherwise, executing a step S6, wherein the termination condition is that the maximum fitness meets the requirement or the maximum iteration step number is reached.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A deep convolutional neural network weight optimization method is characterized by comprising the following steps:

2. The method of claim 1, wherein the step S4 comprises the following sub-steps:

3. The method for optimizing weights of deep convolutional neural network as claimed in claim 1 or 2, wherein the predetermined number of times in step S2 is set by the following formula:

4. The method for optimizing weights of a deep convolutional neural network according to claim 1 or 2, wherein the step of obtaining the initial population in step S1 specifically comprises: and randomly generating an initial population according to preset parameters.

5. The method for optimizing weights of deep convolutional neural network as claimed in claim 1 or 2, wherein the initialization in step S1 is a normal distribution initialization method.

6. The method as claimed in claim 2, wherein the crossover operation in step S42 is a single point crossover, and the crossover operation selects a crossover point randomly and divides the gene segments, and then the paired parents exchange the gene segments to form new offspring.