CN110490320B - Deep neural network structure optimization method based on fusion of prediction mechanism and genetic algorithm - Google Patents

Deep neural network structure optimization method based on fusion of prediction mechanism and genetic algorithm Download PDF

Info

Publication number
CN110490320B
CN110490320B CN201910696239.XA CN201910696239A CN110490320B CN 110490320 B CN110490320 B CN 110490320B CN 201910696239 A CN201910696239 A CN 201910696239A CN 110490320 B CN110490320 B CN 110490320B
Authority
CN
China
Prior art keywords
network
individual
training
code
population
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910696239.XA
Other languages
Chinese (zh)
Other versions
CN110490320A (en
Inventor
魏巍
徐松正
李威
王聪
张艳宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN201910696239.XA priority Critical patent/CN110490320B/en
Publication of CN110490320A publication Critical patent/CN110490320A/en
Application granted granted Critical
Publication of CN110490320B publication Critical patent/CN110490320B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Abstract

The invention discloses a deep neural network structure optimization method based on the fusion of a prediction mechanism and a genetic algorithm, which is used for solving the technical problem of low search efficiency of the conventional network structure search method. The technical scheme is that a deep network structure is coded and expressed to form a network structure code, and then the network structure code is randomly generated to be used as the initial generation of a genetic algorithm; then, carrying out selection, crossing, mutation and prediction processes on individuals in the initial generation, and only carrying out actual training on a network corresponding to an individual with higher expected performance; finally, all individual performances are evaluated and the next round of selection is entered. And after the algorithm is finished, selecting the individual with the best fitness as the optimal network structure under the specific task. By predicting the network performance before the actual training of the network, the time cost for training the search algorithm on the low-price network can be reduced, and the search process of the search algorithm is greatly accelerated.

Description

Deep neural network structure optimization method based on fusion of prediction mechanism and genetic algorithm
Technical Field
The invention relates to a network structure searching method, in particular to a deep neural network structure optimization method based on the fusion of a prediction mechanism and a genetic algorithm.
Background
Document 1 "Lingxi Xie, Alan Yuille, Genetic CNN. computer Vision and Pattern Recognition (2017)" proposes a network structure searching method based on Genetic algorithm, which introduces Darwinian theory of evolution, considers the network structure as an individual in a population, and continuously updates the network structure through the processes of selection, intersection, variation and evaluation. However, the network structure search method requires complete training of the network before evaluating the network performance, which consumes a lot of time and computing resources.
Document 2, "Bowen Baker, Otkrist Gupta1, Ramesh rake: additive Neural Architecture Search Performance prediction, international Conference on Learning recovery (2018)", predicts the final Performance of the network by using the time sequence information of the network training earlier stage, and introduces an "Early Stop" mechanism to terminate the training process of the network with poor effect in advance. Although the method has a certain acceleration effect on the network search algorithm, the method still needs to carry out partial training on the network, thereby limiting the acceleration effect on the structure search algorithm.
Disclosure of Invention
In order to overcome the defect of low searching efficiency of the conventional network structure searching method, the invention provides a deep neural network structure optimization method based on the fusion of a prediction mechanism and a genetic algorithm. The method randomly generates neural networks with different structures to carry out complete training, and trains a network performance prediction model by utilizing information in the network training process; in the network structure searching stage, firstly, coding expression is carried out on a deep network structure to form a network structure code, and then, the network structure code is randomly generated to be used as the initial generation of a genetic algorithm; then, carrying out selection, crossing, mutation and prediction processes on the individuals in the primary generation, and only carrying out actual training on the network corresponding to the individual with higher expected performance; finally, all individual performances are evaluated and the next round of selection is entered. And after the algorithm is finished, selecting the individual with the best fitness as the optimal network structure under the specific task. By predicting the network performance before the actual training of the network, the time cost for training the search algorithm on the low-price network can be reduced, and the search process of the search algorithm is greatly accelerated.
The technical scheme adopted by the invention for solving the technical problems is as follows: a deep neural network structure optimization method based on the fusion of a prediction mechanism and a genetic algorithm is characterized by comprising the following steps:
step one, data preprocessing:
firstly, defining image classification database X ═ X 1 ,x 2 ...x n } T ∈R n×b ,x n ∈R 1×b Representing the nth sample data; the class label vector is Y ═ Y 1 ,y 2 ...y n } T ∈R n×l ,y n ∈R 1×l Is a one-hot label of the nth sample data, where N ═ 1,2.. N }, N is the total number of samples, l represents the total number of classes of the samples, and b represents the spectral dimension; each sample in the image classification database X is then normalized to a range of 0-1, and N is randomly selected therefrom train Obtaining training data X by individual sample data and class labels thereof train And its corresponding category label Y train Wherein N is train < N. In addition, the rest data and labels in the data set are all classified into a test set, and the data and labels are respectively marked as X test And Y test
Step two, determining a coding rule of a network structure:
firstly, M different network structures are generated, wherein the structure code of the mth neural network is C m The code includes S stages, i.e.
Figure GDA0003744175040000021
Wherein
Figure GDA0003744175040000022
Is the coding segment of the s-th stage. The stage comprises K s Each node represents a mixed operation composed of convolution, batch normalization and ReLU activation, and is recorded as
Figure GDA0003744175040000023
The nodes with small numbers in the same stage are connected to the nodes with large numbers, and the connection mode between the nodes is used
Figure GDA0003744175040000024
Bit binary encoding for representation. Wherein the 1 st bit is binary coded to represent (v) s,1 ,v s,2 ) The bit is 1 if there is connection, and is 0 if there is no connection; the next two bits represent three nodes (v) s,1 ,v s,3 ),(v s,2 ,v s,3 ) The situation of the connection between them. Setting S to 3, K 1 =3,K 2 =4,K 3 Network structure code length is 19 bits, i.e. 5
Figure GDA0003744175040000031
Step three, collecting training data of the network performance prediction model:
randomly generating m mutually different structural codes C 1 ,C 2 ,...,C m And after automatic compilation, the depth network corresponding to the code is completely trained on a specified data set. The Adam optimizer is used for training to learn the network parameters, and the training is iterated for T times. When the network is trained in a batch size, recording the iteration times t of the current network and the classification accuracy Ag on the verification set t And taking the data as data required by the prediction model training: data ═ C m ,t,Ag t ],t={1,2...T}。
Step four, constructing and training a network performance prediction model:
defining a network performance prediction model f, inputting a structure code C into the model and mapping mu, and measuring the accuracy rate Ap of the structure neural network on a test set after t times of iterative training by the model t Namely:
Ap t =f(μ(C m ),t) (2)
in the mapping phase, the model maps the structure code C into a network structure code group consisting of s structure codes
Figure GDA0003744175040000032
Wherein p is s First, the
Figure GDA0003744175040000033
Bit to bit
Figure GDA0003744175040000034
The value of each bit is equal to the value of the corresponding position of the original structure code, and the rest positions are filled with zero values, namely:
Figure GDA0003744175040000035
wherein p is s [idx]And C [ idx ]]Coding p for a structure s And the idx-th bit of C.
After the structure code is mapped, p is mapped 1 ,p 2 ...p s And sequentially inputting a single-layer long and short term memory network with the hidden layer size of 128 and finally obtaining the hidden state h of the long and short term memory network unit, wherein the hidden state h is called as a network structure characteristic. Meanwhile, inputting the iteration times t into a multilayer perceptron consisting of a full-link layer with the size of (1,64), a ReLU activation function layer, a full-link layer with the size of (64,32) and a full-link layer with the size of (32,1), and obtaining the contribution D of the iteration times to the final classification accuracy of the network t
Degree of contribution D t Element-by-element multiplication is carried out with the structural feature h of the network:
h[id]=D t ×h[id],id={1,2,...,len(h)} (4)
and inputting the calculation result into a small-sized full-connection module. It comprises a full-junction layer with the size of (128 ), a random inactivation layer with inactivation probability of 0.5, a ReLU kinaseAn activity function layer, a full link layer of size (128,32), a ReLU activation function layer, and a full link layer of size (32, 1). The output result of the full connection module is the predicted value Ap of the final classification accuracy of the current network t
Before training the performance prediction network, randomly initializing network parameters, and solving the following optimization problem by using a back propagation algorithm to learn the network parameters to obtain the optimal parameters theta of the network:
Figure GDA0003744175040000041
wherein | · | purple sweet 2 Is the norm of L2.
Step five, initializing a genetic algorithm:
setting parameters of genetic algorithm, including population individual number G N Number of iteration rounds G T Probability of mutation G M Cross probability G C Variation parameter q M Cross parameter q C And threshold fit mgn And randomly generating G N Coding of a structure
Figure GDA0003744175040000042
As initial population Ge 0 The initial generation population is marked as 0 th generation, and the ith individual in the population is marked as
Figure GDA0003744175040000043
Then, the score of each individual in the population is evaluated to obtain the score of the individual
Figure GDA0003744175040000044
Recording the current highest accuracy as fit max
Step six, selecting the individuals:
the selection operation is directed to each individual in the previous generation population. The method is Ge of the previous generation population j-1 ,j=1,2...G T According to the rules of Russian roulette, according to the individual scores
Figure GDA0003744175040000045
Selecting a new generation of Ge population j (ii) a The higher the individual score, the greater the probability of being selected and retained to the next generation.
Step seven, performing cross operation on the individuals:
interleaving operations for encoding each stage of an individual within a group
Figure GDA0003744175040000046
According to G between every two individuals in the population C Probability crossing, the operation of which is that the code string of three stages in two individuals is according to q C The exchange of probabilities occurs.
Step eight, performing mutation operation on individuals:
the mutation operation aims at each bit of the individual code, and the mutation is represented by that each binary digit on the individual code is according to the probability q M Inversion occurs, i.e., from 0 to 1 or from 1 to 0.
Step nine, predicting the performance of the network corresponding to the individual:
inputting the iteration times of the network structure coding and training ending into the network performance prediction model to obtain the expected score of each individual in the population
Figure GDA0003744175040000051
I.e. the expected classification accuracy after the network has been fully trained.
Figure GDA0003744175040000052
Step ten, evaluating the individual:
will score the expected score
Figure GDA0003744175040000053
Fit with the current best score max And (6) comparing. If it is
Figure GDA0003744175040000054
The algorithm will test the network on the test set after it has been fully trained, and take the actual performance on the test set as the actual score of the individual
Figure GDA0003744175040000055
If it is
Figure GDA0003744175040000056
Then no actual training of the network is performed and only the lower expected performance is taken as the score for that individual
Figure GDA0003744175040000057
After the evaluation is finished, the current best individual score fit is updated max And returning to the step six until the total iteration number is more than T. And obtaining the optimal network structure after the algorithm is finished.
The invention has the beneficial effects that: the method randomly generates neural networks with different structures to carry out complete training, and trains a network performance prediction model by utilizing information in the network training process; in the network structure searching stage, firstly, coding expression is carried out on a deep network structure to form a network structure code, and then, the network structure code is randomly generated to be used as the initial generation of a genetic algorithm; then, carrying out selection, crossing, mutation and prediction processes on the individuals in the primary generation, and only carrying out actual training on the network corresponding to the individual with higher expected performance; finally, all individual performances are evaluated and the next round of selection is entered. And after the algorithm is finished, selecting the individual with the best fitness as the optimal network structure under the specific task. By predicting the network performance before the actual training of the network, the time cost of training the search algorithm on the low-price network can be reduced, and the search process of the search algorithm is greatly accelerated.
Because the network performance prediction model is introduced into the deep neural network structure optimization method based on the genetic algorithm, the network performance can be predicted by the algorithm before the actual training of the network, and the actual training process of the network with poor expected performance is cancelled, so that the time consumption of the structure optimization algorithm is greatly reduced. Compared with the network structure searching algorithm based on the genetic algorithm in the background art, the method has the advantage that the searching speed is improved by 55% on the premise of keeping the searched network performance similar.
The present invention will be described in detail with reference to the following embodiments.
Detailed Description
The deep neural network structure optimization method based on the fusion of the prediction mechanism and the genetic algorithm specifically comprises the following steps:
1. and (4) preprocessing data.
Defining an image classification database X ═ { X ═ X 1 ,x 2 ...x n } T ∈R n×b The class label vector is Y ═ Y 1 ,y 2 ...y n } T ∈R n×l Wherein x is n ∈R 1×b Represents the nth sample data, y n ∈R 1×l Is a one-hot label of the nth sample data, where N ═ 1,2.. N }, N is the total number of samples, l represents the total number of classes of the samples, and b represents the spectral dimension; normalizing each sample in the hyperspectral image data X to be in the range of 0-1, and randomly selecting N from the samples train Obtaining training data X by individual sample data and class labels thereof train And its corresponding category label Y train Wherein N is train < N. In addition, the rest data and labels in the data set are all classified into a test set, and the data and labels are respectively marked as X test And Y test
2. And determining a deep network structure coding rule.
In order to optimize the deep network structure, the topological structure of the deep network structure needs to be represented by coding. The network is divided into a plurality of stages in the coding process, parameters (channel number, convolution kernel size and the like) of convolution operation in the same stage are kept unchanged, and different stages are connected through pooling operation. Each stage of the deep network comprises a plurality of nodes with ordered numbers, and each node represents a mixed operation of convolution, batch standardization and ReLU activation; the small-number nodes in the same stage can be connected to the large-number nodes, and the connection mode among the nodes represents the flowing condition of data in the network in the stage.
M different network structures are generated in the network structure optimization process, and the structure of the mth (M ═ {1,2.., M }) neural network is coded as C m The code includes S stages, i.e.
Figure GDA0003744175040000061
Wherein
Figure GDA0003744175040000062
Is the code segment of the S (S) {1,2., S }). The s-th stage in the coding comprising K s A node, is marked as
Figure GDA0003744175040000071
Therefore, this stage needs to be used
Figure GDA0003744175040000072
A bit binary code (hereinafter, a bit binary code is referred to as a bit) represents a connection relationship between nodes. Wherein the 1 st bit represents (v) s,1 ,v s,2 ) The bit is 1 if there is connection, and is 0 if there is no connection; the next two bits represent three nodes (v) s,1 ,v s,3 ),(v s,2 ,v s,3 ) The situation of the connection between them. In the experiment, S is 3, K 1 =3,K 2 =4,K 3 The total length of the network structure code is 19 bits, that is:
Figure GDA0003744175040000073
where len () represents the length of the code (i.e., the number of bits in the binary code).
3. And collecting training data of the network performance prediction model.
Randomly generating m mutually different structural codes C 1 ,C 2 ,...,C m . After the codes are generated, the codes are automatically compiled into calculation graphs, and then the depth networks corresponding to the calculation graphs are carried out on the specified data setsAnd (4) complete training. The network parameters are learned by using an Adam optimizer, and the parameters of the optimizer are set to be a learning rate alpha of 0.001 and an exponential decay factor beta 1 =0.9,β 2 0.999. The training process is iterated for T times. Meanwhile, in the training process, every time the network is trained in a batch size, the iteration times t of the current network experience and the classification accuracy rate Ag on the verification set need to be recorded t After arrangement, data [ C ] needed by the prediction model training is obtained m ,t,Ag t ],t={1,2...T}。
4. And constructing and training a network performance prediction model.
Recording the network performance prediction model as f, the model firstly encodes the structure C m Mapping mu is performed, and then the mapping result mu (C) can be obtained m ) Predicting the accuracy rate Ap of the structural neural network on the test set after t times of iterative training t Namely:
Ap t =f(μ(C m ),t) (2)
the specific structure of the prediction model is as follows:
(a) structure code mapping
In the mapping phase, the model maps a single structure code C into a network structure code group consisting of s structure codes
Figure GDA0003744175040000081
Noting the mapping process as μ, the mapping for encoding the structure can be expressed as:
Figure GDA0003744175040000082
for a structure-coded set:
Figure GDA0003744175040000083
wherein ps is
Figure GDA0003744175040000084
Bit to bit
Figure GDA0003744175040000085
The value of each bit is equal to the value of the corresponding position of the original structure code, and the rest positions are filled with zero values. The invention marks the value of idx position of p and C of the structural code as p [ idx ]]And C [ idx ]]Then the mapping can be expressed as:
Figure GDA0003744175040000086
(b) network performance prediction model f:
the structure code is mapped to obtain a structure code group
Figure GDA0003744175040000087
Then, p1, p2... ps are input into a single-layer long-short term memory network (LSTM) with hidden layer size of 128 in sequence, and finally a one-dimensional array h with length of 128 is obtained, which is called the network structure characteristic of the predicted network.
And inputting the iteration times t into the multilayer perceptron while obtaining the network structure characteristics h. The multi-layered perceptron consists of a fully-connected layer of size (1,64), a ReLU activation function layer, a fully-connected layer of size (64,32), and a fully-connected layer of size (32, 1). Outputting a scalar value by the multi-layer perception opportunity, thereby giving the contribution D of the iteration number to the final classification accuracy of the network t
Then the contribution degree D t Element-by-element multiplication is performed with the structural feature h of the network, and the operation can be expressed as:
h[id]=D t ×h[id],id={1,2,...,len(h)} (4)
and passing the operation result through a small-sized full-connection module. The full-link module is composed of a full-link module with the size of (128 ), a random deactivation layer with the deactivation probability of 0.5, a ReLU activation function layer, a full-link layer with the size of (128,32), a ReLU activation function layer and a full-link layer with the size of (32,1) which are sequentially connected. All-purposeThe output result of the connection module is the predicted value Ap of the final classification accuracy of the current network t
Before using a network performance prediction model to guide the network optimization process, random initialization needs to be performed on network parameters, and a back propagation algorithm is used to solve the following optimization problem for network training, so as to obtain the optimal parameter theta of the network:
Figure GDA0003744175040000091
wherein r is the number of samples contained in a single training batch, | · | | computationally 2 Is the norm of L2.
5. And initializing a genetic algorithm.
First, a parameter of the genetic algorithm, i.e. the number of population individuals G, is determined N Number of iteration rounds G T Probability of variation G M Cross probability G C Variation parameter q M Cross parameter q C And threshold fit mgn . Random generation of G N Coding of a structure
Figure GDA0003744175040000092
As 0 th generation initial population Ge 0 The ith individual (i.e., the ith structural code) in the population is noted
Figure GDA0003744175040000093
And then, carrying out complete training on the deep network corresponding to each individual in the population, and after the test set tests, taking the classification accuracy of the network as the score of the individual
Figure GDA0003744175040000094
Recording the current highest accuracy as fit max
6. And carrying out selection operation on the individual.
Then, the selection operation O needs to be carried out on the individuals in the population s . In the j-1 generation Ge population j-1 ,j=1,2...G T Selecting the j generation Ge population according to the rule of Russian roulette j (ii) a The selection is based on the score of each individual in the current population
Figure GDA0003744175040000095
By using the russian roulette approach, individuals with higher scores have a greater probability of remaining in the next generation, and the process is iterated.
7. And performing cross operation on the individuals.
For individuals in the population, the probability is G C Parameter is q C The interleaving operation of (2); the interleaving process is directed to a segment of the code string for each stage in the individual
Figure GDA0003744175040000096
Between each two individuals in the population according to G C Probability is crossed, and the specific operation of the cross is that the code strings of three stages in two individuals are crossed according to q C The exchange of probabilities occurs.
8. Performing mutation operation on individuals.
For individuals without crossover, the probability is G M Is represented by each binary digit on the individual code string according to the probability q M Inversion occurs, i.e., from 0 to 1 or from 1 to 0. The mutation process is directed to the change of a single binary digit.
9. And predicting the performance of the network corresponding to the individual.
Inputting the iteration times when the network structure coding and training are finished into the network performance prediction model to obtain the expected score of each individual in the population
Figure GDA0003744175040000101
I.e. the expected classification accuracy after the network has been fully trained.
Figure GDA0003744175040000102
10. And performing evaluation operation on the individual.
Obtained in step 8After the expected score of the individual is obtained, the expected score is obtained
Figure GDA0003744175040000103
Fit with the current best score max And (6) comparing. If it is
Figure GDA0003744175040000104
The expected performance of the individual is better, the algorithm can fully train the individual and then test the individual on the test set, and the actual performance on the test set is used as the actual score of the individual. If it is
Figure GDA0003744175040000105
This indicates that the expected performance of the individual is poor. For an individual with poor expected performance, the algorithm is not actually trained, and only the lower expected performance is taken as the score of the individual
Figure GDA0003744175040000106
After the evaluation is finished, the current best individual score fit is updated max And returning to the step 6 until the total iteration number of the algorithm is more than G T Until now. After the algorithm is finished, the optimal network structure can be given.
The method has better acceleration effect on the optimization tasks of various image classification network structures. Taking the optimization process of the classification network structure on the Pa via University data set as an example, the traditional network structure optimization method based on the genetic algorithm needs 0.99 hour to provide the optimal deep network structure with the classification accuracy rate of 89.1%; the method can provide the optimal deep network structure with the classification accuracy rate of 88.6% only in 0.635 hour. Therefore, the deep neural network structure optimization method based on the fusion of the prediction mechanism and the genetic algorithm can greatly accelerate the structure optimization process, and the classification accuracy of the finally searched network optimal structure on the designated data set is almost the same as the result of the traditional network structure optimization method based on the genetic algorithm.

Claims (1)

1. A deep neural network structure optimization method based on the fusion of a prediction mechanism and a genetic algorithm is characterized by comprising the following steps:
step one, data preprocessing:
first, an image classification database X is defined as X 1 ,x 2 ...x n } T ∈R n×b ,x n ∈R 1×b Representing the nth sample data; the class label vector is Y ═ Y 1 ,y 2 ...y n } T ∈R n×l ,y n ∈R 1×l Is a one-hot tag of the nth sample data, where N ═ {1,2.. N }, N is the total number of samples, l denotes the total number of categories of the samples, and b denotes the spectral dimension; each sample in the image classification database X is then normalized to a range of 0-1, and N is randomly selected therefrom train Obtaining training data X by individual sample data and class labels thereof train And its corresponding category label Y train Wherein N is train < N; in addition, the rest data and labels in the data set are all classified into a test set, and the data and labels are respectively marked as X test And Y test
Step two, determining a coding rule of a network structure:
firstly, M different network structures are generated, wherein the structure code of the mth neural network is C m The code includes S stages, i.e.
Figure FDA0003744175030000011
Wherein
Figure FDA0003744175030000012
Is the coding segment of the s stage; the stage comprises K s Each node represents a mixed operation composed of convolution, batch normalization and ReLU activation, and is recorded as
Figure FDA0003744175030000013
The nodes with small numbers in the same stage are connected to the nodes with large numbers, and the connection mode between the nodes is used
Figure FDA0003744175030000014
Bit binary coding for representation; wherein the 1 st bit is binary coded to represent (v) s,1 ,v s,2 ) The connection condition between the two is that if the connection exists, the bit is 1, and if the connection does not exist, the bit is 0; the next two bits represent three nodes (v) s,1 ,v s,3 ),(v s,2 ,v s,3 ) The connection condition between the two; setting S to 3, K 1 =3,K 2 =4,K 3 Network structure code length is 19 bits, i.e. 5
Figure FDA0003744175030000015
Where "len () represents the length of the structure code in parentheses;
step three, collecting training data of the network performance prediction model:
randomly generating m mutually different structural codes C 1 ,C 2 ,...,C m After automatic compilation, the depth network corresponding to the code is completely trained on a specified data set; training and learning network parameters by using an Adam optimizer, and training for T times in total; when the network is trained in a batch size, recording the iteration times t of the current network and the classification accuracy Ag on the verification set t And taking the data as data required by the prediction model training: data ═ C m ,t,Ag t ],t={1,2...T};
Step four, constructing and training a network performance prediction model:
defining a network performance prediction model f, inputting a structure code C into the model and mapping mu, and measuring the accuracy rate Ap of the neural network of the structure on a test set after t times of iterative training by the model t Namely:
Ap t =f(μ(C m ),t) (2)
in the mapping phase, the model maps the structure code C into a network structure code group consisting of s structure codes
Figure FDA0003744175030000021
Wherein p is s First, the
Figure FDA0003744175030000022
Bit to bit
Figure FDA0003744175030000023
The value of each bit is equal to the value of the corresponding position of the original structure code, and the rest positions are filled with zero values, namely:
Figure FDA0003744175030000024
wherein p is s [idx]And C [ idx ]]Coding p for a structure s And the value at idx-th bit of C;
after the structure code is mapped, p is mapped 1 ,p 2 ...p s Sequentially inputting a single-layer long and short term memory network with a hidden layer size of 128 and finally obtaining a hidden state h of a long and short term memory network unit, wherein the hidden state h is called a network structure characteristic; meanwhile, inputting the iteration times t into a multilayer perceptron consisting of a full-link layer with the size of (1,64), a ReLU activation function layer, a full-link layer with the size of (64,32) and a full-link layer with the size of (32,1), and obtaining the contribution D of the iteration times to the final classification accuracy of the network t
Degree of contribution D t Element-by-element multiplication is carried out with the structural feature h of the network:
h[id]=D t ×h[id],id={1,2,...,len(h)} (4)
inputting the calculation result into a small-sized full-connection module; it contains a full-link layer of size (128 ), a random deactivation layer with deactivation probability of 0.5, a ReLU activation function layer, a full-link layer of size (128,32), a ReLU activation function layer and a full-link layer of size (32, 1); the output result of the full connection module is the predicted value Ap of the final classification accuracy of the current network t
Before training the performance prediction network, randomly initializing network parameters, and solving the following optimization problem by using a back propagation algorithm to learn the network parameters to obtain the optimal parameters theta of the network:
Figure FDA0003744175030000025
wherein | · | purple sweet 2 Is the norm of L2;
step five, initializing a genetic algorithm:
setting parameters of genetic algorithm, including population individual number G N Number of iteration rounds G T Probability of mutation G M Cross probability G C Coding parameter q M Cross parameter q C And threshold fit mgn And randomly generating G N Coding of a structure
Figure FDA0003744175030000031
As initial population Ge 0 The initial generation population is marked as 0 th generation, and the ith individual in the population is marked as
Figure FDA0003744175030000032
Then, the score of each individual in the population is evaluated to obtain the score of the individual
Figure FDA0003744175030000033
Recording the current highest accuracy as fit max
Step six, selecting the individuals:
selecting operation is directed to each individual in the previous generation population; the method is Ge of the previous generation population j-1 ,j=1,2...G T According to the rules of Russian roulette, according to the individual scores
Figure FDA0003744175030000034
Selecting a new generation of Ge population j (ii) a The higher the individual score is, the greater the probability of being selected and retained to the next generation;
step seven, performing cross operation on the individuals:
interleaving encodes for each stage of an individual within a population
Figure FDA0003744175030000035
Between each two individuals in the population according to G C Probability crossing, the operation of which is that the code string of three stages in two individuals is according to q C Exchanging probability;
step eight, carrying out mutation operation on individuals
The mutation operation aims at each bit of the individual code, and the mutation is represented by that each binary digit on the individual code is according to the probability q M Inversion occurs, i.e., from 0 to 1 or from 1 to 0;
step nine, predicting the performance of the network corresponding to the individual:
inputting the iteration times of the network structure coding and training ending into the network performance prediction model to obtain the expected score of each individual in the population
Figure FDA0003744175030000036
Namely the expected classification precision after the network is fully trained;
Figure FDA0003744175030000037
step ten, evaluating the individual:
will score the expectation
Figure FDA0003744175030000038
Fit with the current best score max Comparing; if it is
Figure FDA0003744175030000039
The algorithm will fully train the network and then test it on the test set, and take the actual performance on the test set as the actual score of the individual
Figure FDA00037441750300000310
If it is
Figure FDA00037441750300000311
Then no actual training of the network is performed and only the lower expected performance is taken as the score for that individual
Figure FDA00037441750300000312
After the evaluation is finished, the current best individual score fit is updated max And returning to the step six until the total iteration times are more than T; and obtaining the optimal network structure after the algorithm is finished.
CN201910696239.XA 2019-07-30 2019-07-30 Deep neural network structure optimization method based on fusion of prediction mechanism and genetic algorithm Active CN110490320B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910696239.XA CN110490320B (en) 2019-07-30 2019-07-30 Deep neural network structure optimization method based on fusion of prediction mechanism and genetic algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910696239.XA CN110490320B (en) 2019-07-30 2019-07-30 Deep neural network structure optimization method based on fusion of prediction mechanism and genetic algorithm

Publications (2)

Publication Number Publication Date
CN110490320A CN110490320A (en) 2019-11-22
CN110490320B true CN110490320B (en) 2022-08-23

Family

ID=68548791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910696239.XA Active CN110490320B (en) 2019-07-30 2019-07-30 Deep neural network structure optimization method based on fusion of prediction mechanism and genetic algorithm

Country Status (1)

Country Link
CN (1) CN110490320B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111415009B (en) * 2020-03-19 2021-02-09 四川大学 Convolutional variational self-encoder network structure searching method based on genetic algorithm
CN112084877B (en) * 2020-08-13 2023-08-18 西安理工大学 NSGA-NET-based remote sensing image recognition method
CN112001485B (en) * 2020-08-24 2024-04-09 平安科技(深圳)有限公司 Group convolution number searching method and device
CN112183749B (en) * 2020-10-26 2023-04-18 天津大学 Deep learning library test method based on directed model variation
CN114842328B (en) * 2022-03-22 2024-03-22 西北工业大学 Hyperspectral change detection method based on collaborative analysis autonomous perception network structure
CN114943866B (en) * 2022-06-17 2024-04-02 之江实验室 Image classification method based on evolutionary neural network structure search
CN115994575B (en) * 2023-03-22 2023-06-02 方心科技股份有限公司 Power failure diagnosis neural network architecture design method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915445A (en) * 2012-09-17 2013-02-06 杭州电子科技大学 Method for classifying hyperspectral remote sensing images of improved neural network
CN103971162A (en) * 2014-04-04 2014-08-06 华南理工大学 Method for improving BP (back propagation) neutral network and based on genetic algorithm
CN105303252A (en) * 2015-10-12 2016-02-03 国家计算机网络与信息安全管理中心 Multi-stage nerve network model training method based on genetic algorithm
CN106503802A (en) * 2016-10-20 2017-03-15 上海电机学院 A kind of method of utilization genetic algorithm optimization BP neural network system
US9785886B1 (en) * 2017-04-17 2017-10-10 SparkCognition, Inc. Cooperative execution of a genetic algorithm with an efficient training algorithm for data-driven model creation
CN108021983A (en) * 2016-10-28 2018-05-11 谷歌有限责任公司 Neural framework search
CN108229657A (en) * 2017-12-25 2018-06-29 杭州健培科技有限公司 A kind of deep neural network training and optimization algorithm based on evolution algorithmic
CN109243172A (en) * 2018-07-25 2019-01-18 华南理工大学 Traffic flow forecasting method based on genetic algorithm optimization LSTM neural network
CN110020667A (en) * 2019-02-21 2019-07-16 广州视源电子科技股份有限公司 Searching method, system, storage medium and the equipment of neural network structure

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915445A (en) * 2012-09-17 2013-02-06 杭州电子科技大学 Method for classifying hyperspectral remote sensing images of improved neural network
CN103971162A (en) * 2014-04-04 2014-08-06 华南理工大学 Method for improving BP (back propagation) neutral network and based on genetic algorithm
CN105303252A (en) * 2015-10-12 2016-02-03 国家计算机网络与信息安全管理中心 Multi-stage nerve network model training method based on genetic algorithm
CN106503802A (en) * 2016-10-20 2017-03-15 上海电机学院 A kind of method of utilization genetic algorithm optimization BP neural network system
CN108021983A (en) * 2016-10-28 2018-05-11 谷歌有限责任公司 Neural framework search
US9785886B1 (en) * 2017-04-17 2017-10-10 SparkCognition, Inc. Cooperative execution of a genetic algorithm with an efficient training algorithm for data-driven model creation
CN108229657A (en) * 2017-12-25 2018-06-29 杭州健培科技有限公司 A kind of deep neural network training and optimization algorithm based on evolution algorithmic
CN109243172A (en) * 2018-07-25 2019-01-18 华南理工大学 Traffic flow forecasting method based on genetic algorithm optimization LSTM neural network
CN110020667A (en) * 2019-02-21 2019-07-16 广州视源电子科技股份有限公司 Searching method, system, storage medium and the equipment of neural network structure

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
ACCELERATING NEURAL ARCHITECTURE SEARCH USING PERFORMANCE PREDICTION;Bowen Baker 等;《ICLR 2018》;20181231;1-19 *
Genetic CNN;Lingxi Xie 等;《2017 IEEE International Conference on Computer Vision》;20171231;1388-1397 *
Hyperspectral Image Classification Based on Convolutional Neural Networks With Adaptive Network Structure;Chen Ding 等;《2018 international conference on orange technologies》;20190506;1-5 *
NSGA-Net: Neural Architecture Search using Multi-Objective Genetic Algorithm;Zhichao Lu 等;《arXiv》;20190418;1-13 *
动态贝叶斯网络结构搜索法辨识生物神经网络连接;陈晓艳 等;《生命科学研究》;20171231;第21卷(第6期);527-533 *
遥感影像要素提取的可变结构卷积神经网络方法;王华斌 等;《测绘学报》;20190531;第48卷(第5期);583-596 *

Also Published As

Publication number Publication date
CN110490320A (en) 2019-11-22

Similar Documents

Publication Publication Date Title
CN110490320B (en) Deep neural network structure optimization method based on fusion of prediction mechanism and genetic algorithm
CN108984724B (en) Method for improving emotion classification accuracy of specific attributes by using high-dimensional representation
WO2022083624A1 (en) Model acquisition method, and device
WO2023024412A1 (en) Visual question answering method and apparatus based on deep learning model, and medium and device
US11087086B2 (en) Named-entity recognition through sequence of classification using a deep learning neural network
CN109753571B (en) Scene map low-dimensional space embedding method based on secondary theme space projection
CN111898689A (en) Image classification method based on neural network architecture search
CN112465120A (en) Fast attention neural network architecture searching method based on evolution method
CN111882042B (en) Neural network architecture automatic search method, system and medium for liquid state machine
Tirumala Evolving deep neural networks using coevolutionary algorithms with multi-population strategy
CN114528835A (en) Semi-supervised specialized term extraction method, medium and equipment based on interval discrimination
CN114625882B (en) Network construction method for improving unique diversity of image text description
CN113239897A (en) Human body action evaluation method based on space-time feature combination regression
CN112084877A (en) NSGA-NET-based remote sensing image identification method
Jastrzebska et al. Fuzzy cognitive map-driven comprehensive time-series classification
CN112651499A (en) Structural model pruning method based on ant colony optimization algorithm and interlayer information
CN111461229A (en) Deep neural network optimization and image classification method based on target transfer and line search
CN116167353A (en) Text semantic similarity measurement method based on twin long-term memory network
CN115422945A (en) Rumor detection method and system integrating emotion mining
CN116208399A (en) Network malicious behavior detection method and device based on metagraph
CN114863508A (en) Expression recognition model generation method, medium and device of adaptive attention mechanism
CN115063374A (en) Model training method, face image quality scoring method, electronic device and storage medium
CN112416358B (en) Intelligent contract code defect detection method based on structured word embedded network
CN111259860A (en) Multi-order characteristic dynamic fusion sign language translation method based on data self-driving
Qu et al. Two-stage coevolution method for deep CNN: A case study in smart manufacturing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant