CN110490320A

CN110490320A - Deep neural network structural optimization method based on forecasting mechanism and Genetic Algorithm Fusion

Info

Publication number: CN110490320A
Application number: CN201910696239.XA
Authority: CN
Inventors: 魏巍; 徐松正; 李威; 王聪; 张艳宁
Original assignee: Northwest University of Technology
Current assignee: Northwest University of Technology
Priority date: 2019-07-30
Filing date: 2019-07-30
Publication date: 2019-11-22
Anticipated expiration: 2039-07-30
Also published as: CN110490320B

Abstract

The deep neural network structural optimization method based on forecasting mechanism and Genetic Algorithm Fusion that the invention discloses a kind of, for solving the low technical problem of existing network infrastructure searching method search efficiency.Technical solution is to carry out coded representation to depth network structure first, forms network structure coding, then random to generate network structure coding, as the primary of genetic algorithm；Then, the individual in primary selected, intersected, being made a variation and prediction process, and the only corresponding network progress hands-on of individual higher to estimated performance；Finally, assessing all individual performances, and enter the selection operation of next round.After algorithm, selecting the optimal individual of fitness is the network optimum structure under particular task.By predicting before network hands-on network performance, the time cost that searching algorithm is trained on low value network can be reduced, thus the search process of greatly acceleration search algorithm.

Description

Deep neural network structural optimization method based on forecasting mechanism and Genetic Algorithm Fusion

Technical field

The present invention relates to a kind of network structure searching methods, are melted more particularly to one kind based on forecasting mechanism and genetic algorithm The deep neural network structural optimization method of conjunction.

Background technique

" Lingxi Xie, the Alan Yuille:Genetic CNN.Computer Vision and Pattern of document 1 Recognition (2017) " proposes a kind of network structure searching method based on genetic algorithm, this method introduce Darwin into Change and discuss thought, regard network structure as individual in population, network is constantly updated by selection, intersection, variation and evaluation process Structure.However, the network structure searching method before evaluating network performance, needs completely to train network, This process consumes plenty of time and computing resource.

" Bowen Baker, Otkrist Gupta1, the Ramesh Raskar:Accelerating Neural of document 2 Architecture Search using Performance Prediction.International Conference on Learning Representations (2018) " utilizes the time serial message of network training early period to the final performance of network It is predicted, and introduces " Early Stop " mechanism, terminate the training process of the poor network of effect in advance.This method is although right Searching algorithm has certain acceleration, but this method still needs to carry out network part training, to limit To the acceleration effect of search structure algorithm.

Summary of the invention

In order to overcome the shortcomings of that existing network infrastructure searching method search efficiency is low, the present invention provides a kind of based on prediction machine The deep neural network structural optimization method of system and Genetic Algorithm Fusion.This method generate at random the neural network of configurations with It is completely trained, and network performance prediction model is trained using the information of network training process；It is searched in network structure The rope stage carries out coded representation to depth network structure first, forms network structure coding, and then the random network structure that generates is compiled Code, as the primary of genetic algorithm；Then, the individual in primary selected, intersected, being made a variation and prediction process, and is only right The corresponding network of the higher individual of estimated performance carries out hands-on；Finally, assessing all individual performances, and under entrance The selection operation of one wheel.After algorithm, selecting the optimal individual of fitness is the network optimum structure under particular task.It is logical It crosses and network performance is predicted before network hands-on, can reduce what searching algorithm was trained on low value network Time spends, thus the search process of greatly acceleration search algorithm.

The technical solution adopted by the present invention to solve the technical problems is: one kind being based on forecasting mechanism and Genetic Algorithm Fusion Deep neural network structural optimization method, its main feature is that the following steps are included:

Step 1: data prediction:

Image classification data library X=x is defined first₁,x₂...x_n ^T∈R^n×b,x_n∈R^1×bIndicate n-th of sample data；Its Class label vector is Y=y₁,y₂...y_n ^T∈R^n×l, y_n∈R^1×lIt is the one-hot label of n-th of sample data, n=1, 2...N, N is total sample number, and L indicates that the classification sum of sample, b indicate Spectral dimension；It then will be in the X of image classification data library Each samples normalization is therefrom randomly chosen N to 0~1 range_trainA sample data and its class label, are trained Data X_trainClass label Y corresponding with its_train, wherein N_train< N.In addition, by remaining data and its mark in data set Label all divide test set into, and data and label are denoted as X respectively_testWith Y_test。

Step 2: determining the coding rule of network structure:

M different network structures are firstly generated, remember that the structured coding of wherein m-th of neural network is C_m, encode interior packet Containing S stage, i.e.,WhereinFor the coding section in s stage.The stage includes K_sA node, often A node indicates that one is activated the hybrid manipulation constituted by convolution+batch standardization+ReLU, is denoted asIn same phase Small numbered node is connected to big numbered node, and the connection type between node is usedPosition binary coding is indicated. Wherein, the 1st position binary coding representation (v_s,1,v_s,2) between connection, if having connection if the bit be 1, if without even Connecing the then bit is 0；Next two bits indicate three node (v_s,1,v_s,3),(v_s,2,v_s,3) between connection. Set S=3, K₁=3, K₂=4, K₃=5, it is 19 that network structure, which encodes overall length, i.e.,

Step 3: the training data of collection network performance prediction model:

It is random to generate m mutually different structured coding C₁,C₂,...,C_m, to the corresponding depth of coding after compiling automatically Network is completely trained on specified data set.Training learns network parameter using Adam optimizer, and training changes altogether For T times.After network undergoes the training of one batch of size, the number of iterations t and point on verifying collection of record current network experience Class accuracy rate Ag_t, and the data required in this, as prediction model training: data=C_m,t,Ag_t, t={ 1,2...T }.

Step 4: the building and training of network performance prediction model:

Network performance prediction model f is defined, after carrying out mapping μ to mode input structured coding C and to it, model measures this Artificial neural is in the accuracy rate Ap after t repetitive exercise on test set_t, it may be assumed that

Ap_t=f (μ (C_m),t) (2)

In mapping phase, structured coding C is mapped as the network structure code set being made of s structured coding by modelWherein, P_sTheA bit is toThe value of a bit is equal to former knot Structure encodes the value of corresponding position, remaining position is filled with zero, it may be assumed that

Wherein, p [idx] and C [idx] are the value of structured coding p and C the i-th dx.

After being mapped structured coding, by p₁,p₂...p_sThe single layer shot and long term that hidden layer size is 128 is sequentially inputted to remember Recall network and finally obtains the hidden state h of shot and long term memory network unit, referred to as network structure feature.Meanwhile it is iteration is secondary Number t input is by a full articulamentum having a size of (1,64), a ReLU activation primitive layer, one having a size of the complete of (64,32) The multi-layer perception (MLP) of articulamentum and a full articulamentum composition having a size of (32,1), obtains the number of iterations and network is finally divided The contribution degree D of class accuracy rate_t。

By contribution degree D_tIt carries out with the structure feature h of network by element multiplication:

H [id]=D_t× h [id], id=1,2 ..., len (h) } (4)

Calculated result is inputted into a small-sized full link block.It includes a full articulamentum having a size of (128,128), The random deactivating layer that one inactivation probability is 0.5, a ReLU activation primitive layer, a full connection having a size of (128,32) Layer, a ReLU activation primitive layer and a full articulamentum having a size of (32,1).The output result of full link block is to work as The predicted value Ap of preceding network final classification accuracy rate_t。

Before training performance predicts network, random initializtion is carried out to network parameter, and solve using back-propagation algorithm Following optimization problem learns network parameter, obtains the optimized parameter θ of network:

Wherein, | | | |₂For L2 norm.

Step 5: initial time genetic algorithm:

The parameter of genetic algorithm, including population at individual number G are set_N, iteration wheel number G_T, mutation probability G_M, crossover probability G_C、 Mutation parameter q_M, cross parameter q_CWith threshold value A_mgn, and G is generated at random_NA structured codingAs initial population Ge⁰, population primary was denoted as the 0th generation, and i-th of individual in population is denoted asThen in population it is each individual score into Row assessment, obtains the score of the individualCurrent highest accuracy rate is denoted as fit_max。

Step 6: carrying out selection operation to individual:

Selection operation is for each individual in previous generation population.Method is in previous generation population Ge^j-1, j=1,2...G_T According to Russian roulette rule according to individual scoreSelect the population Ge of a new generation^j；Individual score is higher, quilt It chooses and to remain into follow-on probability bigger.

Step 7: carrying out crossover operation to individual:

Coding of the crossover operation for individual each stage in middle groupAll in accordance with G between every two individual in population_C Probability intersects, and the operation of intersection is the sequence of the three phases in two individuals according to q_CProbability exchanges.

Step 8: carrying out mutation operation to individual

Mutation operation is directed to each bit of individual UVR exposure, each binary number of variation shown as on individual UVR exposure Word is all in accordance with probability q_MIt inverts, i.e., become 1 from 0 or becomes 0 from 1.

Step 9: predicting the performance of individual corresponding network:

The number of iterations at the end of network structure is encoded with training inputs network performance prediction model, obtains every in population The expection score of individualI.e. network train up after expection nicety of grading.

Step 10: carrying out evaluation operation to individual:

It will expected scoreWith current best score fit_maxComparison.IfThen algorithm It is tested on test set after being trained up to the network, and using the actual performance on test set as the individual Practical scoreIfThen without the hands-on of the network, only by lower estimated performance Score as the individualAfter assessment, current optimized individual score fit is updated_max, and return step six, until total Until the number of iterations is greater than T.Optimum network structure is obtained after algorithm.

The beneficial effects of the present invention are: this method generates the neural network of configurations at random completely to be trained, and Network performance prediction model is trained using the information of network training process；In the network structure search phase, first to depth It spends network structure and carries out coded representation, form network structure coding, it is then random to generate network structure coding, as genetic algorithm It is primary；Then, the individual in primary selected, intersected, being made a variation and prediction process, and only higher to estimated performance The corresponding network of body carries out hands-on；Finally, assessing all individual performances, and enter the selection operation of next round. After algorithm, selecting the optimal individual of fitness is the network optimum structure under particular task.By in the practical instruction of network Network performance is predicted before practicing, the time cost that searching algorithm is trained on low value network can be reduced, thus The search process of very big acceleration search algorithm.

Due to introducing network performance prediction model into the deep neural network structural optimization method based on genetic algorithm, Algorithm predicts network performance before carrying out hands-on to network, and cancels the poor net of estimated performance The hands-on process of network, to greatly reduce the time-consuming of structural optimization algorithm.Net with background technique based on genetic algorithm Network search structure algorithm is compared, and under the premise of keeping similar in the network performance searched out, search speed improves this method 55%.

It elaborates With reference to embodiment to the present invention.

Specific embodiment

The present invention is based on the deep neural network structural optimization method specific steps of forecasting mechanism and Genetic Algorithm Fusion such as Under:

1, data prediction.

Define image classification data library X=x₁,x₂...x_n ^T∈R^n×b, class label vector is Y=y₁,y₂...y_n ^T∈Rⁿ ^×l, wherein x_n∈R^1×bIndicate n-th of sample data, y_n∈R^1×lIt is the one-hot label of n-th of sample data, n=1, 2...N, N is total sample number, and L indicates that the classification sum of sample, b indicate Spectral dimension；By each of hyperspectral image data X After samples normalization to 0~1 range, it is therefrom randomly chosen N_trainA sample data and its class label, obtain training data X_trainClass label Y corresponding with its_train, wherein N_train< N.In addition, by data set remaining data and its label it is complete Portion divides test set into, and data and label are denoted as X respectively_testWith Y_test。

2, depth network structure coding rule is determined.

In order to optimize depth network structure, need to carry out coded representation to the topological structure of depth network structure. Network is divided into multiple stages by cataloged procedure, and the parameter (port number, convolution kernel size etc.) of convolution operation is kept in same phase It is constant, it is then attached by pondization operation between the different stages.In each stage of depth network orderly comprising several The node of number, each node indicate " convolution+batch standardization+ReLU activation " hybrid manipulation；In same phase Small numbered node may be coupled to big numbered node, and connection type between node indicates flowing feelings of the data at this stage in network Condition.

M different network structures will be generated during Topological expansion, note m (m={ 1,2 ..., M }) is a The structured coding of neural network is C_m, interior coding includes S stage, i.e.,WhereinFor s (s= 1,2...S) the coding section in stage.The s stage in coding includes K_sA node, is denoted asTherefore should Stage need usingOne binary coding (is known as by position binary coding below One bit) connection relationship node is indicated.Wherein, the 1st bit indicates (v_s,1,v_s,2) between connection, The bit is 1 if having connection, if the connectionless bit is 0；Next two bits indicate three node (v_s,1, v_s,3),(v_s,2,v_s,3) between connection.S=3, K are set in an experiment₁=3, K₂=4, K₃=5, network structure encodes overall length It is 19, it may be assumed that

The wherein length (i.e. binary-coded digit) of len () presentation code.

3, the training data of collection network performance prediction model.

It is random to generate m mutually different structured coding C₁,C₂,...,C_m.After coding generates, certainly by these codings It is dynamic to be compiled as calculating figure, then the corresponding depth network of these calculating figures is completely trained on specified data set.Training Network parameter is learnt using Adam optimizer, optimizer parameter is set as learning rate α=0.001, the exponential damping factor β₁=0.9, β₂=0.999.Training whole iteration T times altogether.Simultaneously in the training process, whenever network undergoes one batch of size After training, the number of iterations t and the classification accuracy Ag on verifying collection of record current network experience are required_t, obtained after arrangement The required data data=C of prediction model training_m,t,Ag_t, t={ 1,2...T }.

4, the building and training of network performance prediction model.

Note network performance prediction model is f, and the model is first to structured coding C_mMapping μ is carried out, it then can be according to reflecting Penetrate result μ (C_m) artificial neural is predicted in the accuracy rate Ap after t repetitive exercise on test set_t, it may be assumed that

Ap_t=f (μ (C_m),t) (2)

The specific structure of the prediction model is as follows:

(a) structured coding maps

In mapping phase, single structure coding C is mapped as the network structure code set being made of s structured coding by modelNote mapping process is μ, then may be expressed as: to the mapping of structured coding

For structured coding group:

Wherein, p_sTheA bit is toThe value of a bit is compiled equal to original structure The value of code corresponding position, remaining position is filled with zero.The value of structured coding p and C the i-th dx are denoted as p by the present invention [idx] and C [idx], then the mapping mode may be expressed as:

(b) network performance prediction model f:

It is mapped by structured coding, and obtains structured coding groupIt afterwards, can be by p₁,p₂...p_s In sequence input hidden layer size be 128 single layer shot and long term memory network (LSTM), and finally obtain length be 128 it is one-dimensional Array h, we are referred to as being predicted the network structure feature of network.

While obtaining network structure feature h, the number of iterations t is inputted into multi-layer perception (MLP).The multi-layer perception (MLP) is by one A full articulamentum having a size of (1,64), a ReLU activation primitive layer, one having a size of the full articulamentum of (64,32) and one Full articulamentum composition having a size of (32,1).Multilayer Perception chance exports a scalar value, to provide the number of iterations for net The contribution degree D of network final classification accuracy rate_t。

Then by contribution degree D_tIt carries out with the structure feature h of network by element multiplication, which may be expressed as:

H [id]=D_t× h [id], id=1,2 ..., len (h) } (4)

Operation result is passed through into a small-sized full link block.Full link block is by one having a size of the complete of (128,128) Link block, the random deactivating layer that an inactivation probability is 0.5, a ReLU activation primitive layer, one having a size of (128,32) Full articulamentum, the full articulamentum sequence of a ReLU activation primitive layer and one having a size of (32,1) is connected to form.Full connection The output result of module is the predicted value Ap of current network final classification accuracy rate_t。

Before instructing network searching process using network performance prediction model, need to carry out network parameter random Initialization, and following optimization problem is solved using back-propagation algorithm to carry out network training, obtain the optimized parameter θ of network:

Wherein, the sample size that r includes by individualized training batch, | | | |₂For L2 norm.

5, genetic algorithm initializes.

The parameter of genetic algorithm, i.e. population at individual number G are determined first_N, iteration wheel number G_T, mutation probability G_M, crossover probability G_C, Mutation parameter q_M, cross parameter q_CWith threshold value A_mgn.It is random to generate G_NA structured codingIt is initial as the 0th generation Population Ge⁰, i-th individual (i.e. i-th of structured coding) in population is denoted asIt is then right to individual institute each in population The depth network answered completely is trained, after test set is tested, using the classification accuracy of the network as the individual ScoreCurrent highest accuracy rate is denoted as fit_max。

6, selection operation is carried out to individual.

Next it needs to carry out selection operation O to the individual in population_S.In jth -1 generation population Ge^j-1(j=1,2...G_T) According to Russian roulette rule selection jth for population Ge^j；The foundation of selection is the score of each individual in current populationBy using the mode of Russian roulette, so that the higher individual of score has bigger probability to remain into the next generation, And continuous iteration this process.

7, crossover operation is carried out to individual.

Making probability for the individual in population is G_C, parameter q_CCrossover operation；Crossover process is directed in individual often The one segment encode string in a stageAll in accordance with G between every two individual in population_CProbability intersects, the concrete operations of intersection According to q between the sequence of the three phases in two individuals_CProbability exchanges.

8, mutation operation is carried out to individual.

Carrying out probability for the individual there is no intersection is G_MMutation operation, that morphs is embodied in this Each binary digit in body sequence is all in accordance with probability q_MIt inverts, i.e., become 1 from 0 or becomes 0 from 1.Mutation process needle Pair be single binary number word change.

9, the performance of individual corresponding network is predicted.

10, evaluation operation is carried out to individual.

After obtaining the expected score of individual obtained in step 8, by expected scoreWith current best score fit_max Comparison.IfThen illustrate that the estimated performance of the individual is preferable, algorithm can train up it It is tested on test set afterwards, and using the actual performance on test set as the practical score of the individualIfThen illustrate that the estimated performance of the individual is poor.The individual poor for estimated performance, algorithm not into Row hands-on, only using lower estimated performance as the score of the individualAfter assessment, current best is updated Body score fit_max, and return step 6, until total the number of iterations of algorithm is greater than G_TUntil.After algorithm, it can provide most Excellent network structure.

This method all has preferable acceleration effect to a variety of image classification Topological expansion tasks.In Pavia For sorter network structure optimization process on University data set, traditional Topological expansion based on genetic algorithm Method needs to spend 0.99 hour to provide the optimal depth network structure that classification accuracy is 89.1%；And our rule only needs The optimal depth network structure that classification accuracy is 88.6% can be provided within 0.635 hour.As it can be seen that proposed by the present invention based on pre- The deep neural network structural optimization method of survey mechanism and Genetic Algorithm Fusion can very big accelerating structure optimization process, and it is final Classification accuracy and traditional network structure based on genetic algorithm of the network optimum structure searched out on specified data set Optimization method result it is almost the same.

Claims

1. a kind of deep neural network structural optimization method based on forecasting mechanism and Genetic Algorithm Fusion, it is characterised in that including Following steps:

Step 1: data prediction:

Image classification data library X=x is defined first₁,x₂...x_n ^T∈R^n×b,x_n∈R^1×bIndicate n-th of sample data；Its classification Label vector is Y=y₁,y₂...y_n ^T∈R^n×l, y_n∈R^1×lIt is the one-hot label of n-th of sample data, n=1,2...N, N For total sample number, L indicates that the classification sum of sample, b indicate Spectral dimension；Then by each sample in the X of image classification data library Originally it is normalized to 0~1 range, and is therefrom randomly chosen N_trainA sample data and its class label, obtain training data X_trainClass label Y corresponding with its_train, wherein N_train< N；In addition, by data set remaining data and its label it is complete Portion divides test set into, and data and label are denoted as X respectively_testWith Y_test；

Step 2: determining the coding rule of network structure:

M different network structures are firstly generated, remember that the structured coding of wherein m-th of neural network is C_m, coding is interior to include S Stage, i.e.,WhereinFor the coding section in s stage；The stage includes K_sA node, each node Indicate that one is activated the hybrid manipulation constituted by convolution+batch standardization+ReLU, is denoted asSmall number in same phase Node is connected to big numbered node, and the connection type between node is usedPosition binary coding is indicated；Wherein, 1st position binary coding representation (v_s,1,v_s,2) between connection, if having connection if the bit be 1, if it is connectionless should Bit is 0；Next two bits indicate three node (v_s,1,v_s,3),(v_s,2,v_s,3) between connection；Set S= 3, K₁=3, K₂=4, K₃=5, it is 19 that network structure, which encodes overall length, i.e.,

Step 3: the training data of collection network performance prediction model:

It is random to generate m mutually different structured coding C₁,C₂,...,C_m, to the corresponding depth network of coding after compiling automatically It is completely trained on specified data set；Training learns network parameter using Adam optimizer, the total iteration T of training It is secondary；After network undergoes the training of one batch of size, the number of iterations t of record current network experience and the classification verified on collection are quasi- True rate Ag_t, and the data required in this, as prediction model training: data=C_m,t,Ag_t, t={ 1,2...T }；

Step 4: the building and training of network performance prediction model:

Network performance prediction model f is defined, after carrying out mapping μ to mode input structured coding C and to it, model measures the structure Neural network is in the accuracy rate Ap after t repetitive exercise on test set_t, it may be assumed that

Ap_t=f (μ (C_m),t) (2)

Wherein, p [idx] and C [idx] are the value of structured coding p and C the i-th dx；

After being mapped structured coding, by p₁,p₂...p_sIt sequentially inputs the single layer shot and long term that hidden layer size is 128 and remembers net Network and the hidden state h for finally obtaining shot and long term memory network unit, referred to as network structure feature；Meanwhile it is the number of iterations t is defeated Enter full articulamentum, a ReLU activation primitive layer, the full articulamentum having a size of (64,32) by one having a size of (1,64) With the multi-layer perception (MLP) of a full articulamentum composition having a size of (32,1), it is accurate for network final classification to obtain the number of iterations The contribution degree D of rate_t；

H [id]=D_t× h [id], id=1,2 ..., len (h) } (4)

Calculated result is inputted into a small-sized full link block；It includes a full articulamentum having a size of (128,128), one The random deactivating layer that inactivation probability is 0.5, a ReLU activation primitive layer, a full articulamentum having a size of (128,32), one A ReLU activation primitive layer and a full articulamentum having a size of (32,1)；The output result of full link block is current network The predicted value Ap of final classification accuracy rate_t；

Before training performance predicts network, random initializtion is carried out to network parameter, and as follows using back-propagation algorithm solution Optimization problem learns network parameter, obtains the optimized parameter θ of network:

Wherein, | | | |₂For L2 norm；

Step 5: initial time genetic algorithm:

The parameter of genetic algorithm, including population at individual number G are set_N, iteration wheel number G_T, mutation probability G_M, crossover probability G_C, variation ginseng Number q_M, cross parameter q_CWith threshold value A_mgn, and G is generated at random_NA structured codingAs initial population Ge⁰, primary Population was denoted as the 0th generation, and i-th of individual in population is denoted asThen individual score each in population is assessed, Obtain the score of the individualCurrent highest accuracy rate is denoted as fit_max；

Step 6: carrying out selection operation to individual:

Selection operation is for each individual in previous generation population；Method is in previous generation population Ge^j-1, j=1,2...G_TIn press According to the regular score according to individual of Russian rouletteSelect the population Ge of a new generation^j；Individual score is higher, is selected And it is bigger to remain into follow-on probability；

Step 7: carrying out crossover operation to individual:

Coding of the crossover operation for individual each stage in middle groupAll in accordance with G between every two individual in population_CProbability Intersect, the operation of intersection is the sequence of the three phases in two individuals according to q_CProbability exchanges；

Step 8: carrying out mutation operation to individual

Mutation operation is directed to each bit of individual UVR exposure, each binary digit of variation shown as on individual UVR exposure According to probability q_MIt inverts, i.e., become 1 from 0 or becomes 0 from 1；

Step 9: predicting the performance of individual corresponding network:

The number of iterations at the end of network structure is encoded with training inputs network performance prediction model, obtains in population per each and every one The expection score of bodyI.e. network train up after expection nicety of grading；

Step 10: carrying out evaluation operation to individual:

It will expected scoreWith current best score fit_maxComparison；IfThen algorithm can be right The network is tested on test set after being trained up, and using the actual performance on test set as the reality of the individual ScoreIfThen without the hands-on of the network, only using lower estimated performance as The score of the individualAfter assessment, current optimized individual score fit is updated_max, and return step six, until total iteration Until number is greater than T；Optimum network structure is obtained after algorithm.