CN107092960A

CN107092960A - A kind of improved parallel channel convolutional neural networks training method

Info

Publication number: CN107092960A
Application number: CN201710247556.4A
Authority: CN
Inventors: 屈景怡; 朱威; 李佳怡; 吴仁彪
Original assignee: Civil Aviation University of China
Current assignee: Civil Aviation University of China
Priority date: 2017-04-17
Filing date: 2017-04-17
Publication date: 2017-08-25

Abstract

A kind of improved parallel channel convolutional neural networks training method.It carries out feature extraction using direct-connected and convolutional channel to data in convolutional neural networks, obtains eigenmatrix；Eigenmatrix is merged, and carries out Data Dimensionality Reduction；Convolutional neural networks are trained, the penalty values of this network training are calculated；Calculate each layer error term and weights gradient；Judge whether network restrains according to penalty values, if do not restrained, be trained according to weights gradient adjustment convolutional neural networks initiation parameter and again, the steps such as network training result are exported if having restrained.The present invention can ensure the circulation of data in a network by the introducing of direct channel, gradient unstable problem when overcoming the training of deep layer convolutional neural networks, can train the network of deeper；Using maximum pondization and average pond, it can be consistent the eigenmatrix dimension between feature extraction twice and the advantage of two kinds of pond methods can be combined.

Description

A kind of improved parallel channel convolutional neural networks training method

Technical field

The invention belongs to deep learning and big data technical field, and in particular to a kind of improved parallel channel convolution god Through network training method.

Background technology

With the development of society, the arriving in big data epoch, associated technology is continued to develop and innovated.Deep learning Because it can improve classification accuracy rate using mass data and by the training of deeper layer network, some row are obtained in recent years and are broken through Property progress.Scholars attempt to lift its performance by increasing the scale of convolutional neural networks, and it is most simple to increase network size Single mode is exactly to increase " depth ".

But the depth network that the structure based on traditional convolutional neural networks is built, with the increase of the network number of plies, precision Saturation, or even reduction can be reached.Document " Romero A, Ballas N, Kahou S E, et al.Fitnets:Hints for thin deep nets[J].arXiv preprint arXiv:1412.6550,2014. " a kind of middle multistage training side of proposition Method, is first respectively trained multiple shallow-layer networks, is finally combined multiple shallow-layer networks, so as to realize a deep layer network.This Sample, which is done, needs that artificially multiple network parameters are adjusted respectively, takes time and effort, and multiple shallow-layer networks are respectively trained can lose The related information between network is lost, influence will be produced on the last performance of network.Document " Lee C Y, Xie S, Gallagher P,et al.Deeply-Supervised Nets[C]//Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics.2015:562- 570 " then introduce multiple subsidiary classification devices in the hidden layer of depth convolutional neural networks, although this method can be to a certain extent Gradient disappearance problem during deep layer network error reverse conduction is compensated, but the subsidiary classification device introduced also can be last to network Precision produces influence.

The problem of network of more depth can not be trained never fundamentally solves, and the network structure proposed is still Designed based on traditional convolutional neural networks, simply used various Optimization Skills in the training process, such as：At the beginning of more preferable network Beginningization parameter, more efficient excitation function etc..

The content of the invention

In order to solve the above problems, instructed it is an object of the invention to provide a kind of improved parallel channel convolutional neural networks Practice method.

In order to achieve the above object, the improved parallel channel convolutional neural networks training method that the present invention is provided includes pressing The following steps that order is carried out：

1) direct-connected and two parallel channels of convolution are utilized respectively feature extraction is carried out to the data in convolutional neural networks, obtained To direct channel eigenmatrix and convolutional channel eigenmatrix；

2) by step 1) obtained two eigenmatrixes merge, and it is input to maximum pond layer and average pond layer enters Row Data Dimensionality Reduction；

3) repeat step 1), step 2), obtain final eigenmatrix；

4) by above-mentioned steps 3) obtained final eigenmatrix carries out global average pond and inputs full articulamentum being changed into one Dimensional feature matrix, and one-dimensional characteristic matrix is classified using softmax graders and convolutional neural networks are trained, Calculate the penalty values of this network training；

5) gradient calculation is carried out using error backpropagation algorithm, calculates each layer error term and weights gradient；

6) according to step 4) in gained penalty values judge whether network restrains, if do not restrained, according to step 5) middle acquisition Weights gradient adjusts convolutional neural networks initiation parameter and is trained again, and network training result is exported if having restrained.

In step 1) in, described is utilized respectively direct-connected and two parallel channels of convolution to the data in convolutional neural networks Feature extraction is carried out, obtaining the method for direct channel eigenmatrix and convolutional channel eigenmatrix is：First, data are distinguished defeated Enter direct channel and convolutional channel；Then data are directly mapped as to direct channel eigenmatrix as defeated in direct channel Go out, convolution operation is carried out to data using multiple convolutional layers on convolutional channel, the input of each convolutional layer is a upper convolution Layer output, using last convolutional layer output matrix as convolutional channel eigenmatrix.

In step 2) in, it is described by step 1) obtained two eigenmatrixes are merged, and it is input to maximum pond The method of layer and average pond layer progress Data Dimensionality Reduction is：First, by obtained by eigenmatrix obtained by direct channel and convolutional channel Eigenmatrix is merged, that is, obtains the set of multiple eigenmatrixes；Then gained eigenmatrix is inputted into maximum pond respectively Layer and average pond layer, in maximum pond layer, the maximum being worth in wave filter is taken using wave filter, and in average pond, layer uses filter Ripple device takes the average value in wave filter.

In step 4) in, it is described by above-mentioned steps 3) obtained final eigenmatrix carries out global average pond and inputs Full articulamentum is changed into one-dimensional characteristic matrix, and one-dimensional characteristic matrix is classified and refreshing to convolution using softmax graders It is trained through network, calculating the method for the penalty values of this network training is：First, final eigenmatrix is carried out global flat Equal pond, uses the average value that data in eigenmatrix are calculated with final eigenmatrix wave filter of the same size；Then, input Each neuron carries out non-linear to the data in the eigenmatrix behind global average pond respectively in full articulamentum, full articulamentum Conversion obtains one-dimensional characteristic matrix；Finally, one-dimensional characteristic Input matrix softmax graders are classified.

In step 5) in, described carries out gradient calculation using error backpropagation algorithm, calculates each layer error term and power The method of value gradient is：First, last layer of penalty values are calculated and as the mistake of last layer according to softmax classifier results Poor item；Then, each layer error term, the ith feature of l-th of convolutional layer are calculated using the chain rule of error backpropagation algorithm The error term of matrix is：

M is l+1 layers of eigenmatrix number,The error term of j-th of eigenmatrix of the l+1 convolutional layer is represented,Represent to this layer of activation primitive derivation, J represents this layer of penalty values,Represent j-th of eigenmatrix of l layers to l+1 The connection weight of layer ith feature matrix；

Finally, formula is utilizedEach layer weights gradient is calculated, whereinRepresent l-1 layers i-th Eigenmatrix.

In step 6) in, it is described according to step 4) in gained penalty values judge whether network restrains, if do not restrained, foundation Step 5) in the weights gradient adjustment that obtains convolutional neural networks initiation parameter and be trained again, as restrained, export The method of network training result is：First, classification results and actual value are compared into simultaneously calculating difference and is used as penalty values；Then will Penalty values and the classification thresholds of realization setting are compared, and such as less than classification thresholds then judge network convergence, otherwise do not restrain；Most Afterwards, such as convergence output web results, otherwise according to formulaW(t+1):=W (t)+V (t+1) convolutional neural networks initiation parameter is adjusted, wherein t represents iterations, and V (t) is momentum term, and μ is factor of momentum, it Determine contribution of the history weight correction to this weight amendment；η is learning rate；λ is weight attenuation coefficient, and W is represented Convolutional neural networks initiation parameter.

The advantage for the improved parallel channel convolutional neural networks training method that the present invention is provided is：1) by direct-connected logical The introducing in road can ensure the circulation of data in a network, gradient unstable difficulty when overcoming the training of deep layer convolutional neural networks Topic, can train the network of deeper；2) it using maximum pondization and average pond, can tie up the eigenmatrix between feature extraction twice Degree is consistent and can combine the advantage of two kinds of pond methods.

Brief description of the drawings

The improved parallel channel convolutional neural networks training method flow chart that Fig. 1 provides for the present invention.

Fig. 2 is characterized the parallel channel structure chart for extracting part；

Fig. 3 is double ponds layer schematic diagram；

Fig. 4 is that parallel channel convolutional neural networks error term calculates schematic diagram；

Fig. 5 is the performance comparision of the different ponds mode on CIFAR-10 data sets；

Fig. 6 trains accuracy with the change curve of iterations for the present invention on CIFAR-10 data sets.

Embodiment

The improved parallel channel convolutional neural networks that the present invention is provided are trained with specific embodiment below in conjunction with the accompanying drawings Method is described in detail.

As shown in figure 1, the improved parallel channel convolutional neural networks training method that the present invention is provided includes entering in order Capable the following steps：

It will be magnified the data set that the small coloured image for being 32 × 32 constitutes by tens thousand of and be input in convolutional neural networks, this hair It is bright to use the CIFAR-10 data sets being made up of the 60000 big small coloured image for 32 × 32, then it is utilized respectively direct-connected and volume The data that two parallel channels of product are concentrated to data carry out feature extraction, and parallel channel structure refers to Fig. 2.In direct channel, Direct channel eigenmatrix is extracted using mapping function Y=X；In convolutional channel, all convolution kernel sizes are that 3 × 3, step-length is 1 and weight term all only only used to carry out convolution to last layer output characteristic matrix, not using bias term, therefore l layers Jth width convolution response characteristic matrixIt can be calculated by formula (1)：

In formula, M represents the set of last layer output characteristic matrix；I-th of output characteristic matrix of l-1 layers of expression,The weights of l layers of jth width convolution response characteristic matrix and l-1 layers of the i-th width output characteristic matrix are represented, that is, need study Convolution kernel；" ⊙ " represents that Hadamard product calculations, i.e. matrix corresponding element are multiplied and summed.

Nonlinear transformation is carried out using ReLU activation primitives to each neuron on convolution response characteristic matrix, the is obtained L layers of eigenmatrix are

When carrying out eigenmatrix merging, the eigenmatrix and direct channel eigenmatrix that each two convolutional layer is obtained are done One-accumulate, is designated as an accumulator module.For convenience of describing, two convolutional layers are designated as a convolution module, each two adds up Module is designated as a structure soon, and adds convolution decay factor to eigenmatrix obtained by convolution module, then obtained by each block structure Eigenmatrix is defined as follows：

Y=λ₁H_n1(X,{W⁽ⁿ¹⁾})+λ₂H_n2[H_n1(X,{W⁽ⁿ¹⁾}),{W⁽ⁿ²⁾}]+X (3)

In formula, X and Y represent input and the output data of block structure respectively；λ₁And λ₂As " convolution decay factor ", is one The individual constant for needing to set in advance, is respectively provided with different size of convolution decay factor for each convolution module here；W⁽ⁿ¹⁾And W⁽ⁿ²⁾The set of eigenmatrix weighting parameter when representing to add up twice respectively, is the parameter for needing to train；H_n1() and H_n2 () represents cumulative for the first time and second of cumulative equivalency transform function respectively.

Wherein, equivalency transform function H cumulative for the first time_n1The expression formula of () is：

H_n1(X,{W⁽ⁿ¹⁾)=W^(m2)⊙f(W^(m1)⊙X) (4)

Second of cumulative equivalency transform function H_n2The expression formula of () is：

H_n2(Y_n1,{W⁽ⁿ²⁾)=W^(m4)⊙f(W^(m3)⊙Y_n1) (5)

Describe for convenience, make Y_n1=H_n1(X,{W⁽ⁿ¹⁾), Y_n2=H_n2(Y_n1,{W⁽ⁿ²⁾), therefore formula (3) can also be described Into following form：

Y=λ₁Y_n1+λ₂Y_n2+X (6)

Further to Data Dimensionality Reduction, to reduce deep layer network calculations amount, the eigenmatrix after above-mentioned merging is carried out maximum Pondization and average pond, double pond modes are as shown in figure 3, two kinds of pond methods use size for filtering that 3 × 3, step-length is 2 Device；The wave filter in wherein maximum pond is that filtering selects maximum, can keeping characteristics matrix as far as possible conspicuousness information；Average Pond be calculate wave filter in character pair average, can keeping characteristics matrix background information.Pass through double ponds, eigenmatrix Number is doubled, and the dimension of single matrix halves.Experiment display, the pond mode in double ponds is than single on CIFAR-10 data sets Pond mode can obtain higher classification accuracy, as shown in Figure 5.

3) repeat step 1), step 2), obtain final eigenmatrix；

Because color image size is 32 × 32 in data set, therefore feature extraction twice and Shuan Chi are carried out to eigenmatrix Change, obtain final eigenmatrix, can so deepen network depth as far as possible.

By above-mentioned steps 3) obtained final eigenmatrix carries out global average pond and inputs full articulamentum, can obtain feature Dimension is (1 × 1) × q eigenmatrix, and it is classified using softmax graders, wherein, q is class categories number, for every The set that class has r sample can be expressed as { (x⁽¹⁾,y⁽¹⁾),(x⁽²⁾,y⁽²⁾),...(x^(r),y^(r)), y^(r)∈{1,2,…, q}.Calculating the method for the penalty values of this network training is：First, Probability p that each classification j occurs (y=j | x) is calculated；So After use h_θ(x) q output probability is represented, then function is：

Wherein, h_θ(x) output of convolutional neural networks is represented, i is sample sequence number, and θ is network parameter,For normalizing Change the factor；

Finally use cross entropy as loss function counting loss value, its expression formula is：

Wherein l represents penalty values, 1 { y⁽ⁱ⁾=j } represent to work as y⁽ⁱ⁾=j durations are 1, are otherwise sample number for 0, r.

When carrying out backpropagation calculation error, last layer of error amount is equal to penalty values, i.e. δ⁽⁰⁾=l, k-th cumulative The error term of module is multiplied by connection weight between the two equal to the error term of+1 accumulator module of kth.Therefore k-th of accumulator module Error term δ^(k)For：

δ^(k)=δ⁽ⁿ¹⁾+δ^(k+1) (9)

δ in formula^(k+1)For the error term of+1 accumulator module of kth；δ⁽ⁿ¹⁾For the error term of the accumulator module convolutional channel.Meter The key of formula (9) is to calculate δ⁽ⁿ¹⁾, then can just obtain the error term of remaining accumulator module successively by formula (9)：

In above formula, λ₁And λ₂The convolution decay factor of respectively two convolution modules, is disposed as the positive integer less than 1； δ^(m1)、δ^(m2)、δ^(m3)And δ^(m4)It is followed successively by the error term of four convolutional layers in Fig. 4；δ⁽ⁿ¹⁾And δ⁽ⁿ²⁾Respectively two convolutional channels Error term.

Every layer in formula (10) of error term is substituted into successively can obtain the error term of convolutional channel：

In formula (11) 1. formula and 2. formula be respectively just equivalency transform functional expression (4) and formula (5) derivative with therefore can be by formula (11) it is abbreviated as：

δ⁽ⁿ¹⁾=[(λ₂δ^(k+1))*H_n1′(A^(m0))+λ₁δ^(k+1)]*H_n2′(A^(m2)) (12)

Formula (12) is substituted into formula (7), it is possible to calculate the mistake of all accumulator modules of binary channels convolutional neural networks one by one Poor item.

According to BP chain rules and gradient calculation formulaCan draw k-th accumulator module last Layer weights gradient be：

According to step 4) in loss function calculate penalty values, penalty values and classification thresholds are compared, such as less than divided Class threshold value then network convergence；Otherwise according to step 5) in gained weights gradient according to formula (14), (15) update weights gradient, lay equal stress on New training network.

W(t+1):=W (t)+V (t+1) (15)

T represents iterations in formula, and V (t) is momentum term, and μ is factor of momentum, and it determines history weight correction to this The contribution of secondary weight amendment；η is learning rate；λ is weight attenuation coefficient.

The present invention is overcome the unstable problem of deep layer convolutional neural networks gradient, can use up by the introducing of direct channel Network may be deepened, so as to improve classification accuracy.It was found from the test result on CIFAR-10 data sets, with convolutional Neural The increase of network depth, classification accuracy rate is improved, and refers to Fig. 6.

Claims

1. a kind of improved parallel channel convolutional neural networks training method, it is characterised in that：Described method is included in order The following steps of progress：

1) direct-connected and two parallel channels of convolution are utilized respectively feature extraction is carried out to the data in convolutional neural networks, obtain straight Communication channel eigenmatrix and convolutional channel eigenmatrix；

2) by step 1) obtained two eigenmatrixes merge, and it is input to maximum pond layer and average pond layer enters line number According to dimensionality reduction；

3) repeat step 1), step 2), obtain final eigenmatrix；

4) by above-mentioned steps 3) obtained final eigenmatrix carries out global average pond and inputs full articulamentum being changed into one-dimensional spy Matrix is levied, and one-dimensional characteristic matrix is classified using softmax graders and convolutional neural networks are trained, is calculated The penalty values of this network training；

6) according to step 4) in gained penalty values judge whether network restrains, if do not restrained, according to step 5) weights of middle acquisition Gradient adjusts convolutional neural networks initiation parameter and is trained again, and network training result is exported if having restrained.

2. improved parallel channel convolutional neural networks training method according to claim 1, it is characterised in that：In step 1) in, described direct-connected and two parallel channels of convolution that are utilized respectively carry out feature extraction to the data in convolutional neural networks, Obtaining the method for direct channel eigenmatrix and convolutional channel eigenmatrix is：First, data are inputted respectively direct channel and Convolutional channel；Then data are directly mapped as to direct channel eigenmatrix as output in direct channel, in convolutional channel Upper to carry out convolution operation to data using multiple convolutional layers, the input of each convolutional layer is the output of a upper convolutional layer, will most Latter convolutional layer output matrix as convolutional channel eigenmatrix.

3. improved parallel channel convolutional neural networks training method according to claim 1, it is characterised in that：In step 2) it is described by step 1 in) obtained two eigenmatrixes are merged, and it is input to maximum pond layer and average pond layer Carrying out the method for Data Dimensionality Reduction is：First, eigenmatrix obtained by eigenmatrix obtained by direct channel and convolutional channel is closed And, that is, obtain the set of multiple eigenmatrixes；Then gained eigenmatrix is inputted into maximum pond layer and average pond layer respectively, In maximum pond layer, the maximum being worth in wave filter is taken using wave filter, layer is taken in wave filter using wave filter in average pond Average value.

4. improved parallel channel convolutional neural networks training method according to claim 1, it is characterised in that：In step 4) it is described by above-mentioned steps 3 in) obtained final eigenmatrix carries out global average pond and inputs full articulamentum being changed into one Dimensional feature matrix, and one-dimensional characteristic matrix is classified using softmax graders and convolutional neural networks are trained, Calculating the method for the penalty values of this network training is：First, global average pond is carried out to final eigenmatrix, using with most Whole eigenmatrix wave filter of the same size calculates the average value of data in eigenmatrix；Then, full articulamentum, full connection are inputted Each neuron carries out nonlinear transformation to the data in the eigenmatrix behind global average pond respectively and obtains one-dimensional spy in layer Levy matrix；Finally, one-dimensional characteristic Input matrix softmax graders are classified.

5. improved parallel channel convolutional neural networks training method according to claim 1, it is characterised in that：In step 5) in, described carries out gradient calculation using error backpropagation algorithm, calculates the method for each layer error term and weights gradient and is： First, last layer of penalty values are calculated and as the error term of last layer according to softmax classifier results；Then, utilize The chain rule of error backpropagation algorithm calculates each layer error term, the error term of the ith feature matrix of l-th of convolutional layer For：

M is l+1 layers of eigenmatrix number,The error term of j-th of eigenmatrix of the l+1 convolutional layer is represented, Represent to this layer of activation primitive derivation, J represents this layer of penalty values,Represent j-th of eigenmatrix of l layers to i-th of l+1 layers The connection weight of eigenmatrix；

Finally, formula is utilizedEach layer weights gradient is calculated, whereinRepresent l-1 layers of ith feature Matrix.

6. improved parallel channel convolutional neural networks training method according to claim 1, it is characterised in that：In step 6) it is described according to step 4 in) in gained penalty values judge whether network restrains, if do not restrained, according to step 5) middle acquisition Weights gradient adjusts convolutional neural networks initiation parameter and is trained again, as restrained then output network training result Method is：First, classification results and actual value are compared into simultaneously calculating difference and is used as penalty values；Then penalty values and realization are set Fixed classification thresholds are compared, and such as less than classification thresholds then judge network convergence, otherwise do not restrain；Finally, such as convergence output net Network result, otherwise according to formulaW(t+1):=W (t)+V (t+1) adjusts convolution Neutral net initiation parameter, wherein t represent iterations, and V (t) is momentum term, and μ is factor of momentum, and it determines that history is weighed Rebuild contribution of the positive quantity to this weight amendment；η is learning rate；λ is weight attenuation coefficient, and W represents convolutional neural networks Initiation parameter.