CN112633472A

CN112633472A - Convolutional neural network compression method based on channel pruning

Info

Publication number: CN112633472A
Application number: CN202011505386.3A
Authority: CN
Inventors: 王慧青; 焦越; 余厚云; 李坤宇
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2021-04-09

Abstract

The invention discloses a convolutional neural network compression method based on channel pruning, which comprises the following steps: a channel selection method based on feature map average activation is adopted in the convolution layer; a channel selection method based on loss estimation is adopted between convolution layers; and (3) carrying out fine adjustment after the accuracy of the model is reduced. The invention can realize self-adaptive pruning among channels while controlling the whole pruning proportion, and obtains good pruning effect.

Description

Convolutional neural network compression method based on channel pruning

Technical Field

The invention belongs to the field of computer vision, relates to a deep learning technology, and particularly relates to a convolutional neural network compression method based on channel pruning.

Background

In recent years, deep learning has been rapidly developed in the visual field, and more high-precision models have been proposed. However, as the number of layers of the model gets deeper and deeper, the requirements for computation and storage become higher and higher. For example, VGG16 has 1 hundred million, 3 million or more parameters, and requires nearly 150 hundred million floating point operations to complete an image recognition task. In practical applications, the problem of resource limitation is often faced, so that the network model must be compressed and accelerated.

There are many methods for model compression and acceleration, and channel pruning of convolutional layers is one of the most common methods. The method does not damage the original model structure, and the parameters of the original model can be directly applied to the new model without modification, so the method is easy to realize and does not depend on specific hardware and a third-party library. However, the pruning method is relatively extensive, and the generalization capability of the model is easily damaged, so that the quality of the model is rapidly reduced. The existing pruning method usually does not consider the sensitivity difference between the convolutional layers or needs to set the pruning proportion of each layer according to experience, so that the pruning effect is not ideal, the flexibility is poor, and great application difficulty is brought.

Disclosure of Invention

In order to solve the problems, the invention discloses a convolutional neural network compression method based on channel pruning, which can fully consider the sensitivity of different convolutional layers by adopting feature map activation in layers and adopting loss estimation pruning standards among layers and realize self-adaptive pruning under the condition of controlling the integral pruning proportion.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a convolutional neural network compression method based on channel pruning comprises the following steps:

the method comprises the following steps of firstly, calculating average activation of all convolutional layer feature maps in a model, and sequencing the feature maps of each layer according to the average activation;

secondly, selecting the characteristic diagram with the minimum influence on the final loss of the model from the characteristic diagrams with the minimum average activation in each convolution layer, and cutting out the corresponding channel from the model;

thirdly, judging whether the accuracy of the model is lower than a certain threshold value T, if so, finely adjusting the reference model until the model is approximately converged, and returning to the first step;

fourthly, judging whether the pruning proportion reaches a preset proportion R or not, and returning to the second step if the pruning proportion does not reach the preset proportion R;

and fifthly, fine tuning the model again to recover the accuracy.

Further, the first step includes the following processes:

dividing a part of data from a training set of the data set as a sample set, setting the number of the sample sets as N, and setting the convolutional layer input matrix as a ∈ R^N×H×W×CH, W and C are the height, width, and number of channels, respectively, of the feature map. Then the average activation of the feature map corresponding to the kth channel is:

the C channels of each layer are ordered according to the average activation of the signature.

Further, the second step includes the following processes:

note S^(l)Set of all channels for layer l, k^(l)The least active channel is averaged for the ith layer profile, namely:

let K be K ═ K⁽¹⁾,k⁽²⁾,...,k^(L)The set of channels that are the least active on average in each layer, where L is the number of layers of the convolutional layer. Maintaining a set K in the pruning process, tentatively pruning each channel, evaluating the change of model loss on the test set, finally selecting the channel with the minimum model loss as the pruning channel, and pruning the channel from the model

Further, the third step includes the following processes:

and testing the accuracy of the pruned model on the test set of the data set, and judging whether the accuracy is smaller than a preset threshold value T. If the value is less than the threshold value, the model is finely adjusted, namely, the convolution layer of the model is retrained with a smaller learning rate on the basis of the existing parameters. And adopting a method of early termination, stopping training when the loss of the model does not decrease within a plurality of rounds, and returning to the first step.

Further, the fourth step includes the following processes:

calculating the pruning proportion of the model, namely:

and if the pruning proportion is smaller than the preset proportion R, returning to the second step.

Further, the fifth step includes the following processes:

and fifthly, fine-tuning the whole model for more rounds by adopting a method similar to the third step, and recovering the accuracy of the model.

The invention has the beneficial effects that:

the invention adopts two different channel selection standards, adopts a channel selection method based on characteristic diagram activation in the convolution layer, and adopts a channel selection method based on loss estimation between the convolution layers.

Drawings

Fig. 1 is a flowchart of a model compression method based on channel pruning according to the present invention.

FIG. 2 is a comparison of pruning results of the present invention on VGG model using Cifar-10 data set with other methods.

Detailed Description

The present invention will be further illustrated with reference to the accompanying drawings and specific embodiments, which are to be understood as merely illustrative of the invention and not as limiting the scope of the invention.

In this embodiment, TensorFlow is used as a deep learning framework to prune convolutional layers of a VGG16 model trained on a Cifar-10 dataset, and a flow chart is shown in FIG. 1, and the following steps are adopted in this embodiment:

step 1, randomly selecting 100 pictures from a cifar-10 training set as a sample set, taking the sample set as input, and countingCalculating the output matrix of each convolutional layer, i.e. the input matrix a ∈ R of the next convolutional layer^N×H×W×CH, W and C are the height, width, and number of channels, respectively, of the feature map. Calculating the average activation of the characteristic diagram corresponding to the kth channel of the convolutional layer as follows:

and sorting the C channels of each layer according to the average activation of the feature map, and maintaining a sorted channel set.

And 2, polling all convolutional layers, cutting out a channel with the minimum average activation of the characteristic diagram of each convolutional layer, testing and recording the loss on the Cifar-10 test set again, and then recovering the model. After all convolutional layers have been polled, the least lossy channel is selected as the final channel, which is pruned, and the set of convolutional layer channels is updated.

And 3, calculating the accuracy of the model on the Cifar-10 test set, setting a threshold T to be 90%, and training the pruned model if the accuracy is lower than 90%. The training method adopts random gradient descent, the learning rate is fixed to be 2.4e-5, and the Nesterov momentum is 0.9. With the early termination strategy, fine tuning is stopped when the loss no longer decreases for 20 consecutive rounds and is restored to the lowest weight loss, for a maximum of 250 rounds. And after the fine adjustment is finished, returning to the step 1, recalculating the average activation of the feature map, and starting the next round of pruning.

Step 4, calculating the pruning proportion of the model, namely:

and (3) setting the preset pruning proportion R to be 50%, and if the pruning proportion of the model is less than 50%, returning to the step (2) to continue pruning.

And 5, fine tuning the model by adopting the same parameter setting as the step 3, and recovering the accuracy of the model again.

To illustrate the superiority of the present invention, we also compared the pruning results of the present invention with other pruning methods on VGG models using Cifar-10 datasets, including greedy search method (ThiNet), Lasso regression reconstruction method (CP), batch optimization layer constraint method (slim), discriminative power perception method (DCP), scaling method (WM). The comparative results are shown in FIG. 2.

It can be seen that ThiNet, CP and DCP all adopt a fixed ratio clipping method at each layer, i.e. the number of parameters and the floating point operand are about 50% of the original value, and this method is relatively flexible and requires experience to select the clipping ratio in practical application. Both the Sliming and the present invention can adopt different cutting proportions in each layer, but the present invention can also control the whole pruning proportion on the basis of the self-adaptive pruning. From the pruning results, Sliming has the highest parameter compression rate, while DCP has the most excellent performance in accuracy, even improving the accuracy of the reference model by 0.17%. It should also be noted that DCP is a training-phase (Train-stage) pruning method that requires full knowledge of the data set and training of the model from scratch, the most complex. In conclusion, the present invention achieves very high compression rates and most balanced performance at the cost of relatively little accuracy (-0.42%).

The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and such improvements and modifications are also considered to be within the scope of the present invention.

Claims

1. A convolutional neural network model compression method based on channel pruning is characterized by comprising the following steps:

thirdly, judging whether the accuracy of the model is lower than a certain threshold value T, if so, finely adjusting the reference model until the model converges, and returning to the first step;

fourthly, judging whether the pruning proportion reaches a preset proportion R, if not, returning to the second step, otherwise, executing the fifth step;

and fifthly, fine tuning the model again to recover the accuracy.

2. The convolutional neural network model compression method based on channel pruning as claimed in claim 1, wherein the calculation method of the mean activation of feature maps in the first step is as follows:

assuming that the number of sample sets is N, the convolutional layer input matrix is a ∈ R^N×H×W×CH, W and C are the height, width, and number of channels, respectively, of the feature map; then the average activation of the feature map corresponding to the kth channel is:

。

3. the convolutional neural network model compression method based on channel pruning as claimed in claim 1, wherein the specific process of the second step is:

let K be K ═ K⁽¹⁾,k⁽²⁾,...,k^(L)The set of channels with the smallest average activation in each layerAnd L is the number of the convolutional layers, maintaining a set K in the pruning process, tentatively pruning each channel, evaluating the change of model loss on the test set, and finally selecting the channel with the minimum model loss as the pruned channel.

4. The convolutional neural network model compression method based on channel pruning as claimed in claim 1, wherein in the third step, the method for fine tuning the model is as follows:

training all convolutional layers of the pruned model by adopting a smaller learning rate; with the early termination method, training is stopped when the loss of the model no longer decreases within a certain fixed round.

5. The convolutional neural network model compression method based on channel pruning as claimed in claim 1, wherein the fine tuning method adopted in the fifth step is the same as that adopted in the third step.