CN112613610A

CN112613610A - Deep neural network compression method based on joint dynamic pruning

Info

Publication number: CN112613610A
Application number: CN202011561741.9A
Authority: CN
Inventors: 张明明; 宋浒; 卢庆宁; 俞俊; 温磊; 刘文盼; 范江; 查易艺
Original assignee: NARI Group Corp; Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Current assignee: NARI Group Corp; Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-04-06

Abstract

The invention discloses a deep neural network compression method based on joint dynamic pruning, which comprises the following steps: step 1: acquiring two super parameters of a convolution kernel dynamic pruning rate beta and a channel dynamic compression rate alpha; step 2: pruning M in convolutional layer by using convolutional kernel dynamic pruning method_i(1- β) convolution kernels; and step 3: selecting N by channel dynamic compression method_iAlpha channels participate in training; and 4, step 4: in the training process, the model parameters are updated, so that the dynamic pruning of the convolution kernel is converged into a subset of the dynamic compression of the channel. The invention accelerates the training and reasoning, maintains the capacity of the model and effectively reduces the floating point operation times and parameter scale of the model.

Description

Deep neural network compression method based on joint dynamic pruning

Technical Field

The invention relates to a neural network compression method, in particular to a deep neural network compression method based on joint dynamic pruning.

Background

While deep learning has promoted advances in computer vision, natural language processing, and the like, the complexity, high storage space, and computational resource consumption of the model make it difficult to be applied to various hardware platforms. For example, the number of parameters of a classic image classification network VGG16 is as large as 1.3 hundred million, the memory space is 500MB, and 309 million floating point operations are required to complete one image recognition task. In fact, a great degree of redundancy exists in the deep-learning neural network, and the rest of the weights can be predicted by using only a small part of the weights. Therefore, model compression is theoretically feasible and is necessary in reality.

Network pruning is a direction that is more popular in the field of model compression, and its principle is to remove the less important weights in the network and then fine-tune the network again to make it converge. Therefore, how to measure the importance of the weights becomes a core problem of network pruning. The static pruning method is common no matter what difference exists between the parameter selection standard and the pruning training process: the pruned parameters are permanently removed from the model and subsequently no longer participate in reasoning and training. Although most parameters in the network are redundant, the static pruning method still permanently removes a part of critical parameters, and it is difficult to avoid the miscut no matter what judgment standard is adopted, which inevitably results in the loss of network capacity.

Compared with a static pruning method, the purpose of dynamic pruning is to retain the capacity of the pruned part and avoid the capacity reduction of the model caused by permanent pruning. The idea is similar to the deep learning over-fitting prevention dropout method, except that dynamic pruning designs some criteria to measure the relationship of the input image to the convolution kernel, rather than simple random discarding. For a particular input image, the convolution kernels that can be activated are present and limited. However, the dynamic pruning algorithm is unique in that it dynamically selects, but it just restricts its ability to compress the network. Theoretically, the activated convolution kernel is fixed for a particular input image. But for uncertain input images, the activated convolution kernel cannot be determined. Since the feature distribution of the input image is difficult to know during the neural network construction and training process, if parts of the convolution kernels are permanently removed, there is still a small loss to the model capacity once these convolution kernels are activated by a particular input image. Thus, dynamic pruning algorithms have difficulty permanently removing the convolution kernel, resulting in network compression ratios that are significantly less than static pruning algorithms.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to solve the defects in the prior art, provides a deep neural network compression method based on joint dynamic pruning, and solves the problems that the static pruning method causes network capacity loss and the network compression ratio of the dynamic pruning method is obviously smaller than that of the static pruning algorithm.

The technical scheme is as follows: the invention relates to a deep neural network compression method based on joint dynamic pruning, which is characterized by comprising the following steps: the method comprises the following steps:

step 1: acquiring two super parameters of a convolution kernel dynamic pruning rate beta and a channel dynamic compression rate alpha;

step 2: a convolution kernel dynamic pruning method is utilized; cutting off M in convolutional layer_i(1-. beta.) convolution kernels of which M_iThe number of the ith layer of convolution kernels is;

and step 3: selecting N by channel dynamic compression method_iAlpha channels are involved in the training, where N_iThe number of input channels of the ith layer is;

and 4, step 4: and updating the model parameters in the training process to ensure that the dynamic pruning of the convolution kernel is converged into a dynamically compressed subset of the channel.

The step 2 comprises the following steps:

step 21: obtaining an L1 norm of each convolution kernel in the same convolution layer;

step 22: m for minimizing the L1 norm in the convolutional layer_iZeroing the (1-beta) convolution kernels;

step 23: updating the model parameters in the back propagation, and then returning to the step (21) to perform the next iteration of the model;

step 24: the convolution kernel of zero is pruned when the iteration is complete.

And step 2, removing the convolution kernel which has small influence on convolution operation after iterative convergence according to the principle of deleting the model parameters and reducing complexity.

The step 3 comprises the following steps:

step 31: sampling an input characteristic diagram by using a global average pooling method;

step 32: taking the samples of the input characteristic graph as the input of a prediction network, and calculating a channel importance function;

step 33: based on the principle of winner eating all the food, N with the minimum weight of the channel importance function_i(1- α) channels are zeroed;

step 34: replacing the scaling factor gamma of the convolutional layer corresponding to the BN layer with a channel importance function weight;

step 35: the model parameters are updated in the back propagation. The prediction network in step 32 is a fully connected neural network and is independent of the training network.

In step 3, according to the principle of reducing complexity while not deleting model parameters, selecting

N_iAlpha channels participate in the training. .

Has the advantages that: compared with the prior art, the method has the obvious advantages of accelerating training and reasoning, maintaining the capacity of the model and effectively reducing the floating point operation times and parameter scale of the model.

Drawings

FIG. 1 is a schematic diagram of the present invention;

FIG. 2 is a schematic diagram of convolution kernel dynamic pruning in accordance with the present invention;

FIG. 3 is a schematic diagram of dynamic compression of a channel according to the present invention.

Detailed Description

The technical scheme of the invention is further explained by combining the attached drawings.

FIG. 1 is a schematic diagram of the present invention, which mainly includes a convolution kernel dynamic pruning module and a channel dynamic compression module. The invention relates to a deep neural network compression method based on joint dynamic pruning, which comprises the following steps:

step 2: pruning M in convolutional layer by using convolutional kernel dynamic pruning method_i(1-. beta.) convolution kernels of which M_iThe number of the ith layer of convolution kernels is;

And 2, pruning a convolution kernel which has the minimum influence on convolution operation after iterative convergence according to the principle of deleting the model parameters and reducing the complexity. The step 2 comprises the following steps:

step 22: m for minimizing the L1 norm in the convolutional layer_i(1- β) zero-setting of convolution kernels;

In step 3, N is selected according to the principle of reducing complexity without deleting model parameters_iAlpha channels participate in the training. The step 3 comprises the following steps:

step 31: sampling an input characteristic diagram by using a global average pooling method; the global average pooling method is to average all elements of the same channel of the feature map, i.e. to convert the feature map of W × H × N into a tensor of 1 × N, and to further reduce the dimension into a feature vector of 1 × N, where W is the image width, H is the image height, and N is the number of channels.

Step 32: taking the samples of the input characteristic graph as the input of a prediction network, and calculating the channel importance function of the prediction network;

step 33: based on the principle of winner eating all the food, N with the minimum weight in the channel importance function_iZeroing (1-alpha) channels; the winner takes the main of eating and gets all or most of the share for the last winner of competition, while the loser is eliminated.

step 35: the model parameters are updated in the back propagation.

The prediction network in step 32 is a fully connected neural network and is independent of the training network.

In this embodiment, each step is described using a simple convolutional neural network, i-th convolutional layer. The input characteristic diagram of the ith convolution layer is X_iNumber of input channels N_i4; the output characteristic diagram of the ith convolution layer is X_i+1Number of output channels N_i+14. Number M of convolution kernels of ith convolution layer_iEqual to the number of output channels N_i+1I.e. equal to 4. Let the convolution kernel dynamic pruning rate β be 0.75 and the channel dynamic compression rate α be 0.5.

In step 2, the schematic diagram of convolution kernel dynamic pruning is shown in fig. 2. First, the L1 norm of each convolution kernel in the ith convolution layer is calculated. The L1 norm is the sum of the absolute values of the weights in the convolution kernel tensor. And the convolution kernel with smaller weight has smaller corresponding output characteristic diagram value, namely the influence is not the same as that of other characteristic diagrams in the same layer. The L1 norm of the layer convolution kernel is [0.457, 0.813, 0.345, 0.136]The 4 th convolution kernel belongs to the 25% proportion of the L1 norm minimum. For this part of the convolution kernel, a zeroing operation is performed, but they are not pruned from the network. If there is a mis-pruned convolution kernel, it will be updated to a larger weight in the back propagationTherefore, higher importance is obtained in subsequent judgment, and the miscut does not occur. During the course of continuous iterative training, the convolution kernel of accidental mis-pruning is recovered, so that the tendency of each pruned part is converged to N_i+1(1-. beta.) fixed convolution kernels, i.e., 1 fixed convolution kernel for the convolutional layer. And finally, permanently pruning the least important convolution kernels to complete the dynamic pruning work of the convolution kernels.

In step 3, the schematic diagram of channel dynamic compression is shown in fig. 3. The importance of the corresponding channel is judged according to the input image, a function is needed, the input of the function is the input image, the output is the importance of different channels of the scalar value, and the method is realized by adopting a prediction network. Since the complexity of this additional operation needs to be reduced as much as possible, the input feature map needs to be sampled first. Global average pooling is used as a down-sampling method to reduce the size of the image and compress each channel of the input feature map into a scalar. For the input image Xi of the ith convolution layer, its size is W_i×H_i×N_iWherein W is_iIs the image width, H_iFor image height, a 1 × 1 × 4 tensor is obtained by global average pooling, and a 1 × 4 tensor is obtained by further dimension reduction, for example [0.621, 1.846, 0.743, 0.543]. The sampled result is then used as input to the prediction network.

The prediction network is a fully connected neural network layer separated from the training network, and the input channel number of the prediction network layer is N_iThe number of output channels is N_i+1. In this embodiment, the input channel and the output channel of the prediction network of the i-th layer are 4 and 4, respectively. The prediction network obtains the relationship between the input characteristic diagram and the channel through learning to obtain a channel importance function. The weighted output of the channel importance function in this embodiment is [0.842, 1.313, 0.594, 0.218 ]]N of lower importance based on the principle of the winner eating it all_i+1(1- α), i.e., the corresponding output results for 2 channels, are zeroed, resulting in [0.842, 1.313, 0]. Then, in the BN layer corresponding to the ith convolutional layer, the value of the scaling factor γ is replaced with the weight of the channel importance function, which is equivalent to skipping the convolution operation of the fractional-1- α channel. Predictive and classification networksThe collaterals are trained together, so that manual interference statistics is avoided.

In the step (4), when training is finished, the dynamic pruning of the convolution kernel is finally converged into a subset of the dynamic compression of the channel. The convolution kernel dynamic pruning selects N_i+1The (1-alpha) channels are zeroed out, and the channel dynamic compression does not converge to the zeroed-out convolution kernels in the learning process.

In this example, the network is pruned of the 1- α convolution kernels, i.e., the least significant 50% of the convolution kernels in each convolution layer are removed. The actual permanently removed convolution kernel in the model is 1- β, i.e., the actual permanently removed convolution kernel is 25%, but due to channel dynamic compression, the complexity of the model is equivalent to a model with a permanent removal of 50% of the convolution kernel.

The invention simultaneously completes the pruning of the convolution kernel and the channel, can be freely matched for use, reduces the complexity of model training and reasoning, improves the processing speed of the characteristic diagram and simultaneously keeps the capacity of the model.

Claims

1. A deep neural network compression method based on joint dynamic pruning is characterized in that: the method comprises the following steps:

2. The deep neural network compression method based on joint dynamic pruning according to claim 1, wherein: the step 2 comprises the following steps:

3. The deep neural network compression method based on joint dynamic pruning according to claim 1, wherein: and 2, cutting out a convolution kernel of which the influence on the convolution operation is less than a set threshold value after the iterative convergence according to the principle of deleting the model parameters and reducing the complexity.

4. The deep neural network compression method based on joint dynamic pruning according to claim 1, wherein: the step 3 comprises the following steps:

step 35: the model parameters are updated in the back propagation.

5. The deep neural network compression method based on joint dynamic pruning of claim 4, wherein: the prediction network in step 32 is a fully connected neural network and is independent of the training network.

6. The deep neural network compression method based on joint dynamic pruning according to claim 1, wherein: in step 3, N is selected according to the principle of reducing complexity while not deleting model parameters_iAlpha channels participate in the training.