CN109886397A

CN109886397A - A kind of neural network structure beta pruning compression optimization method for convolutional layer

Info

Publication number: CN109886397A
Application number: CN201910218652.5A
Authority: CN
Inventors: 梅魁志; 张良; 张增; 薛建儒; 鄢健宇; 常藩; 张向楠; 王晓; 陶纪安
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2019-03-21
Filing date: 2019-03-21
Publication date: 2019-06-14

Abstract

The invention discloses a kind of neural network structure beta pruning compression optimization methods for convolutional layer, include: the sparse value distribution of (1) each convolutional layer: (1.1) training archetype obtain respectively can beta pruning convolutional layer weighting parameter, and each convolutional layer importance scores are calculated；(1.2) sequence according to importance scores from small to large, and average scale segmentation is carried out referring to maximin, the configuration of sparse value from small to large successively is carried out to each section convolutional layer, by model retraining adjust, obtain all can beta pruning convolutional layer sparse value configure；(2) structuring beta pruning: Convolution Filter is selected according to the sparse value that step (1.2) determine, carries out structuring beta pruning training；Wherein, every layer of convolutional layer only uses a kind of Convolution Filter.Optimization method of the invention can allow deep neural network more easily to run on resource-constrained platform, can save parameter storage space but also acceleration model operation.

Description

A kind of neural network structure beta pruning compression optimization method for convolutional layer

Technical field

The invention belongs to Artificial smart field, deep neural network optimisation technique field and picture recognition technologies Field, in particular to a kind of neural network structure beta pruning compression optimization method for convolutional layer.

Background technique

In artificial intelligence field, deep neural network is directly affected as one of foundation stone, complexity and portability Application of the artificial intelligence in life.The research of acceleration and compression optimization to depth network, may make that artificial intelligence is more square Just life is more easily served in realization.

Currently, there are several types of 1.Low-Rank: low-rank decomposition for the acceleration of common depth network and compression method； 2.Pruning: beta pruning, pruning method are divided into again: structuring beta pruning, core beta pruning, gradient beta pruning, use scope are wider； 3.Quantization: quantization, quantization are divided into again: low bit quantization, overall training accelerate quantization, distributed training gradient amount Change；4.Knowledge Distillation: knowledge distillation；5.Compact Network Design: compact network design, this It is to be optimized from network structure level to model.

Present invention is generally directed to second of compression method beta prunings to be further improved, and also use in prior art The thought of structuring beta pruning, its method is every layer of convolutional layer using multiple Convolution Filters, and these Convolution Filters Type be by training obtain；Not only very long consuming computing resource cycle of training is huge (leads to not smoothness for existing method Use large-scale training dataset), and such structuring beta pruning can not be saved in model forward calculation process it is more Calculating, storage resource.

To sum up, a kind of novel neural network structure beta pruning compression optimization method is needed.

Summary of the invention

The purpose of the present invention is to provide a kind of neural network structure beta pruning compression optimization method for convolutional layer, with Solve above-mentioned one or more technical problems.Optimization method of the invention, can allow deep neural network resource by It is more easily run on the platform of limit, parameter storage space but also acceleration model operation can be saved.

In order to achieve the above objectives, the invention adopts the following technical scheme:

A kind of neural network structure beta pruning compression optimization method for convolutional layer, comprising:

(1) the sparse value distribution of each convolutional layer, comprising:

(1.1) training archetype obtain respectively can beta pruning convolutional layer weighting parameter, and it is important that each convolutional layer is calculated Property score；

(1.2) sequence according to importance scores from small to large, and average scale segmentation is carried out referring to maximin, according to It is secondary that the configuration of sparse value from small to large is carried out to each section convolutional layer, it is adjusted by model retraining, obtaining all can beta pruning volume The sparse value of lamination configures；

(2) structuring beta pruning, comprising:

Convolution Filter is selected according to the sparse value that step (1.2) determine, carries out structuring beta pruning training；

Wherein, every layer of convolutional layer only uses a kind of Convolution Filter.

A further improvement of the present invention is that in step 1, training archetype obtain respectively can beta pruning convolutional layer weight ginseng Number specifically includes: weighting parameter k_l,nchw, wherein l is the sequence number, and n, c, h, w are that the 4-D tensor of convolutional layer weighting parameter refers to Number, n are input channel number, and c is output channel number, and h, w are respectively the height and width of convolution kernel；N is input channel sum, and C is defeated Total number of channels out, H, W are respectively the total high and beam overall of convolution kernel, n, c, h, w be positive integer and, n ∈ [1, N], c ∈ [1, C], h ∈[1,H]、w∈[1,W]。

A further improvement of the present invention is that in step 1, the calculation expression of each convolutional layer importance scores are as follows:

In formula, for specified layer l, M is used_lIndicate the convolution nuclear operator of this layer and the average value of value square, n, c, H, w are the 4-D tensor indexes of convolutional layer weighting parameter, and n is input channel number, and c is output channel number, and h, w are respectively convolution kernel Height and width；N is input channel sum, and C is output channel sum, and H, W are respectively the total high and beam overall of convolution kernel, n, c, h, w For positive integer and, n ∈ [1, N], c ∈ [1, C], h ∈ [1, H], w ∈ [1, W].

A further improvement of the present invention is that carrying out what sparse value configured from small to large to each section convolutional layer in step 2 Specific steps include: it is each can beta pruning convolutional layer sparse value configuration include: change its sparse value, to model carry out retraining； If model performance keeps good, then continue to increase its sparse value, if model performance has biggish loss, then takes the last time Sparse value is its final sparse value；The sparse value configuration of convolutional layer is repeated, until completing the sparse value of convolutional layer in the last one section Configuration, obtain entirely can beta pruning convolutional layer structuring beta pruning sparse value initial configuration；

Wherein, the evaluation criterion of the performance of model is the mAP value in accuracy rate or target identification；If keep accuracy rate or Person mAP value, which does not decline, then keeps good for model performance, indicates that model performance has biggish damage if decreaseing beyond preset threshold It loses.

A further improvement of the present invention is that obtain entirely can beta pruning convolutional layer structuring beta pruning sparse value initial configuration Afterwards, the convolutional layer close to importance section both ends is finely adjusted；

Fine tuning includes: that the sparse value in the small one end of numerical value becomes larger, and the sparse value in the big one end of numerical value is become smaller, and follow change Once carry out retraining operation immediately, obtain it is final entirely can beta pruning convolutional layer the configuration of sparse value.

A further improvement of the present invention is that Convolution Filter is cutting as convolution nuclear operator size in step (2) Branch template.

A further improvement of the present invention is that Convolution Filter is described using three parameters, Kp_ in step (2) Stride is the step-length of beta pruning or reservation, and Kp_offset=i is that the Position Number of first value subtracted is i, Kp_keepset =j is that the Position Number of first value retained is j.

Compared with prior art, the invention has the following advantages:

Optimization method of the invention carries out one according to every layer of reasonable distribution of importance score of sparse value of each convolutional layer A kind of structuring beta pruning of the convolution operator rank of Convolution Filter of layer through toning ginseng, retraining, adjusts the training mode of ginseng to obtain Final mask；Under the premise of performance is without being substantially reduced, entire convolutional neural networks may make to obtain reasonable structuring beta pruning Compression optimization can not only substantially reduce parameter storage space, be also equipped with the potentiality of huge operation optimization.In addition, structure After changing beta pruning, as soon as a data flow need to only do the reading data work of time partial rules, the data of reading can be recycled, This will save the storage resource of huge hardware platform, and save a large amount of arithmetic operation, has very big operation and accelerates to dive Power can allow deep neural network more easily to run on resource-constrained platform, can save parameter storage space but also Acceleration model operation.

Further, obtain entirely can after the sparse value initial configuration of beta pruning convolutional layer structuring beta pruning, due to be before by It carries out one or several convolutional layers according to importance sectional to change simultaneously as the operation of identical sparse value, the step will lead to Sparse value close to the convolutional layer at importance section both ends be not it is so accurate, it is subsequent to need to these close to importance section two Some convolutional layers at end carry out the small sparse value in one end of numerical value become larger, the fine tuning that the sparse value in one end that numerical value is big becomes smaller, it then follows change Become primary and carry out retraining operation immediately, the variation before and after comparison model performance, obtain it is final entirely can beta pruning convolutional layer it is sparse Value configures, the mAP value in the judging basis accuracy rate or target identification of model performance.

Further, present invention selection manually adjusts sparse rate and each convolutional layer only uses a kind of convolution filter Device so that do not need to carry out sparse value of the very long training to determine each layer, and only uses a kind of convolution filter due to every layer Device enables model forward calculation process to save a large amount of storage, computing resource.

Detailed description of the invention

Fig. 1 is structuring beta pruning schematic illustration in the optimization method of the embodiment of the present invention；

Fig. 2 is structuring beta pruning principle of operation schematic diagram in the optimization method of the embodiment of the present invention；

Fig. 3 is that sparse value configures schematic illustration in the optimization method of the embodiment of the present invention；

Fig. 4 is the Convolution Filter structural schematic diagram of convolution operator size 3*3 in the optimization method of the embodiment of the present invention.

Specific embodiment

Invention is further described in detail in the following with reference to the drawings and specific embodiments.

Referring to Fig. 1, Fig. 1 show total beta pruning compression optimization schematic illustration.The one of the embodiment of the present invention Kind is directed to the structuring beta pruning compression optimization method of deep neural network convolutional layer, and specific steps include: the sparse value of each convolutional layer Distribution and structuring beta pruning two parts.

(1) the sparse value allocation step of each convolutional layer is as follows: firstly, training archetype obtain respectively can beta pruning convolutional layer ginseng Number data, and calculate single layer importance scores.To every layer of importance score M_lSum to obtain M, calculates every layer of importance overall situation and accounts for ThanAccording to from small to large to D_lCarry out sequence ranking, according to D_lMaximum, minimum value carries out equidistant section Segmentation, specific interval number are needed through observation D_lData rule combination experience provides, it then follows total number of segment is no more than the one of layer sum Half, each convolutional layer is distinguished as far as possible.Successively from importance score it is small to big section convolutional layer carry out sparse value from it is small to Configuration；Every once sparse value that changes carries out retraining operation, if model performance keeps good, continues to reduce sparse value, Until model performance has biggish loss, it is end value with the last sparse value of this test.Then, in front on the basis of it is right The convolutional layer in next importance score section repeats above-mentioned work, until the sparse value of convolutional layer for completing the last one section is matched Set work, obtain entirely can beta pruning convolutional layer structuring beta pruning sparse value initial configuration.Finally, a small number of convolutional layers can be modified Sparse value carries out retraining fine tuning, obtain it is final entirely can beta pruning convolutional layer the configuration of sparse value.Wherein, the evaluation of the performance of model Standard is the mAP value in accuracy rate or target identification；It is kept if keeping accuracy rate or mAP value not to decline for model performance Well, indicate that model performance has biggish loss if decreaseing beyond preset threshold.

The neural network archetype that the embodiment of the present invention selects is target detection network YOLOv3, its convolution operator Having a size of 3*3 and 1*1, the convolutional layer having a size of 3*3 is selected to carry out structuring beta pruning.

The data set in 2012 of Pascal VOC is selected, its test set includes 11540=5717+5823 picture, Test set includes 4952 pictures.Then official's configuration file yolov3-voc.cfg is used, training obtains archetype, and surveys Examination obtains the mAP value of model at this time.

The importance score of convolutional layer of the convolution operator having a size of 3*3 is obtained according to archetype parameter:

In formula, for specified layer l, M is used_lIndicate the convolution nuclear operator of this layer and the average value of value square, N, C, H, W are the 4-D tensor indexes of convolutional layer weighting parameter；Wherein, N is input channel number, and C is output channel number, and H, W are respectively to roll up Product core height and width, n, c, h, w be positive integer and, n ∈ [1, N], c ∈ [1, C], h ∈ [1, H], w ∈ [1, W].

Referring to Fig. 3, training archetype obtain respectively can beta pruning convolutional layer supplemental characteristic, and calculate single layer importance point Number.To every layer of importance score M_lSum to obtain M, calculates every layer of importance overall situation accountingAccording to from it is small to Greatly to D_lCarry out sequence ranking, according to D_lMaximum, minimum value carries out equidistant Concourse Division, and specific interval number needs pass through observation D_lData rule combination experience provides, it then follows total number of segment is no more than the half of layer sum, distinguishes each convolutional layer as far as possible.Every change Primary sparse value carries out retraining operation, if model performance keeps good, continues to increase sparse value, until model performance There is biggish loss, is end value with the last sparse value of this test；The evaluation criterion of the performance of model be accuracy rate or MAP value in person's target identification；Keep good if keeping accuracy rate or mAP value not to decline for model performance, if decline is super Crossing preset threshold then indicates that model performance has biggish loss.Then, in front on the basis of to next importance score section Convolutional layer repeat above-mentioned work, until completing the sparse value configuration work of convolutional layer in the last one section, obtaining entirely can beta pruning The sparse value initial configuration of convolutional layer structuring beta pruning.Finally, the sparse value progress retraining that can modify a small number of convolutional layers is micro- Adjust, obtain it is final entirely can beta pruning convolutional layer the configuration of sparse value；Due to be before according to importance sectional carry out one or Several convolutional layers change simultaneously as the operation of identical sparse value, and the step will lead to the convolutional layer close to importance section both ends Sparse value be not it is so accurate, it is subsequent need to carry out numerical value close to some convolutional layers at importance section both ends to these it is small The fine tuning that the sparse value in one end that the sparse value in one end becomes larger, numerical value is big becomes smaller, it then follows change primary progress retraining operation immediately, than Compared with the variation before and after model performance, obtain it is final entirely can beta pruning convolutional layer the configuration of sparse value, the judging basis of model performance is quasi- MAP value in true rate or target identification.

(2) steps are as follows for structuring beta pruning: being randomly choosed in the corresponding Convolution Filter of sparse value according to sparse value One kind, but must comply with a convolutional layer and can only select a kind of Convolution Filter, Convolution Filter are exactly and convolution nuclear operator ruler Very little the same beta pruning template.Every layer of convolutional layer is followed using same Convolution Filter, carries out structuring beta pruning training.

Fig. 2 and Fig. 4 are please referred to, according to the sparse value of configuration, from the Convolution Filter of the middle 3*3 selected as shown in Figure 4.It presses Retraining operation is carried out according to structuring beta pruning principle of operation as shown in Figure 2.Convolution Filter is described using three parameters, Kp_stride is the step-length of beta pruning (or reservation), and Kp_offset=i is that the Position Number of first value subtracted is i, Kp_ Keepset=j is that the Position Number of first value retained is j.

Normal operational is the input data (5*5) obtained by upper layer in Fig. 2, according to convolution kernel (3*3), by resetting image block It is obtained input data matrix (9*9) for rectangular array (im2col), then is multiplied to obtain result (9*1) and structure with convolution kernel (9*1) Changing beta pruning is then that input data (5*5) according to Convolution Filter (3*3), is not read the im2col operation of beta pruning partial data, It obtains input data matrix (9*4), then is multiplied to obtain result (9*1) with the convolution kernel (4*1) after Convolution Filter beta pruning Since every layer only selects a kind of Convolution Filter, then the input data on upper layer only needs to do the input number that an im2col is obtained It other convolution kernels after Convolution Filter beta pruning can be used by this layer according to matrix (9*4), without because in one layer There are several different types of Convolution Filters and multiple im2col is done to input data and is operated.Not only reduce operand but also saving A large amount of storage resource.

Present invention selection convolution kernel size 3*3 carries out Convolution Filter introduction in Fig. 4, and " 0 " value is indicated to convolution kernel in figure Corresponding position carries out beta pruning, and " 1 " value indicates to retain the weighting parameter that position is answered in convolution verification.And under each sparse value Convolution Filter shape is obtained by enumeration methodology.To Kp_stride, Kp_offset and Kp_stride, Kp_keepset two Kind combination carries out Kp_stride, Kp_offset, Kp_keepset ∈ [0, ksize²- 1], wherein three Convolution Filter parameters For integer, ksize is convolution kernel side length.Then the asymmetric Convolution Filter of " 1 " and " 0 " value is weeded out.

Although existing unstructured beta pruning compression method can reach very high compression ratio, compressed model is difficult to Operation optimization is carried out, is unfavorable for convolutional neural networks and is realized on resource-constrained hardware platform.For this problem, the present invention Structuring cut operator is carried out to each convolutional layer.Firstly, selecting reading data more according to the size of convolutional layer convolution operator Smooth Convolution Filter, and it is classified according to sparse value；Then to it is each can the relatively entire convolution of beta pruning convolutional layer Neural network importance is assessed, to it is each can beta pruning the suitable sparse value of convolution Layer assignment and suitable convolution filter Device；Using retraining, ginseng, the training mode of retraining are adjusted, under the premise of performance is without being substantially reduced, so that entire convolutional Neural Network obtains reasonable structuring beta pruning compression optimization, and not only substantially reducing parameter storage space, to be also equipped with huge operation excellent The potentiality of change.

To sum up, the neural network structure beta pruning compression optimization method for convolutional layer of the invention, belongs to pruning method In structuring pruning method, the object of beta pruning is convolution kernel operator；By that must be distributed according to its importance every layer of convolutional layer Suitable sparse value is set, then uses same Convolution Filter for each layer, therefore depth network model is not only stored in parameter It spatially greatly reduces, and a kind of this one layer of set-up mode of Convolution Filter, very big operation can be brought to accelerate effect Fruit.By the method for the invention after structuring beta pruning, a data flow need to only do the reading data work an of partial rules, read Data can be recycled, this will will save the storage resource of huge hardware platform, and save a large amount of operation behaviour Make, has very big operation and accelerate potentiality.

The above embodiments are merely illustrative of the technical scheme of the present invention and are not intended to be limiting thereof, although referring to above-described embodiment pair The present invention is described in detail, those of ordinary skill in the art still can to a specific embodiment of the invention into Row modification perhaps equivalent replacement these without departing from any modification of spirit and scope of the invention or equivalent replacement, applying Within pending claims of the invention.

Claims

1. a kind of neural network structure beta pruning compression optimization method for convolutional layer characterized by comprising

(1) the sparse value distribution of each convolutional layer, comprising:

(1.1) training archetype obtain respectively can beta pruning convolutional layer weighting parameter, and each convolutional layer importance point is calculated Number；

(1.2) sequence according to importance scores from small to large, and average scale segmentation is carried out referring to maximin, it is successively right Each section convolutional layer carries out the configuration of sparse value from small to large, adjusts by model retraining, obtaining all can beta pruning convolutional layer Sparse value configuration；

(2) structuring beta pruning, comprising:

2. a kind of neural network structure beta pruning compression optimization method for convolutional layer according to claim 1, special Sign is, in step 1, training archetype obtain respectively can the weighting parameter of beta pruning convolutional layer specifically include:

Weighting parameter k_l,nchw, wherein l is the sequence number, and n, c, h, w are the 4-D tensor indexes of convolutional layer weighting parameter, and n is defeated Enter port number, c is output channel number, and h, w are respectively the height and width of convolution kernel；N is input channel sum, and C is that output channel is total Number, H, W are respectively the total high and beam overall of convolution kernel, n, c, h, w be positive integer and, n ∈ [1, N], c ∈ [1, C], h ∈ [1, H], w ∈[1,W]。

3. a kind of neural network structure beta pruning compression optimization method for convolutional layer according to claim 1, special Sign is, in step 1, the calculation expression of each convolutional layer importance scores are as follows:

In formula, for specified layer l, M is used_lIndicate the convolution nuclear operator of this layer and the average value of value square, n, c, h, w are The 4-D tensor index of convolutional layer weighting parameter, n be input channel number, c be output channel number, h, w be respectively convolution kernel height and It is wide；N is input channel sum, and C is output channel sum, and H, W are respectively the total high and beam overall of convolution kernel, and n, c, h, w are positive whole Number and, n ∈ [1, N], c ∈ [1, C], h ∈ [1, H], w ∈ [1, W].

4. a kind of neural network structure beta pruning compression optimization method for convolutional layer according to claim 1, special Sign is, in step 2, carrying out the specific steps that sparse value configures from small to large to each section convolutional layer includes:

Each can beta pruning convolutional layer sparse value configuration include: change its sparse value, to model carry out retraining；If model It is able to maintain well, then continues to increase its sparse value, if model performance has biggish loss, then taking last sparse value is it Final sparse value；

The sparse value configuration of convolutional layer is repeated, until the sparse value configuration of the convolutional layer for completing the last one section, obtaining entirely can beta pruning The sparse value initial configuration of convolutional layer structuring beta pruning；

Wherein, the evaluation criterion of the performance of model is mAP value in accuracy rate or target identification；If keeping accuracy rate or mAP Value, which does not decline, then keeps good for model performance, indicates that model performance has biggish loss if decreaseing beyond preset threshold.

5. a kind of neural network structure beta pruning compression optimization method for convolutional layer according to claim 4, special Sign is, obtaining entirely can be after the sparse value initial configuration of beta pruning convolutional layer structuring beta pruning, to close to importance section both ends Convolutional layer is finely adjusted；

Fine tuning includes: that the sparse value in the small one end of numerical value becomes larger, and the sparse value in the big one end of numerical value is become smaller, and it is primary to follow change Carry out retraining operation immediately, obtain it is final entirely can beta pruning convolutional layer the configuration of sparse value.

6. a kind of neural network structure beta pruning compression optimization method for convolutional layer according to claim 1, special Sign is, in step (2), Convolution Filter is the beta pruning template as convolution nuclear operator size.

7. a kind of neural network structure beta pruning compression optimization method for convolutional layer according to claim 1, special Sign is, in step (2), Convolution Filter is described using three parameters, and Kp_stride is the step-length of beta pruning or reservation, Kp_offset=i is that the Position Number of first value subtracted is i, and Kp_keepset=j is the position of first value retained Number is j.