CN112633472A - Convolutional neural network compression method based on channel pruning - Google Patents

Convolutional neural network compression method based on channel pruning Download PDF

Info

Publication number
CN112633472A
CN112633472A CN202011505386.3A CN202011505386A CN112633472A CN 112633472 A CN112633472 A CN 112633472A CN 202011505386 A CN202011505386 A CN 202011505386A CN 112633472 A CN112633472 A CN 112633472A
Authority
CN
China
Prior art keywords
model
pruning
channel
method based
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011505386.3A
Other languages
Chinese (zh)
Inventor
王慧青
焦越
余厚云
李坤宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202011505386.3A priority Critical patent/CN112633472A/en
Publication of CN112633472A publication Critical patent/CN112633472A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a convolutional neural network compression method based on channel pruning, which comprises the following steps: a channel selection method based on feature map average activation is adopted in the convolution layer; a channel selection method based on loss estimation is adopted between convolution layers; and (3) carrying out fine adjustment after the accuracy of the model is reduced. The invention can realize self-adaptive pruning among channels while controlling the whole pruning proportion, and obtains good pruning effect.

Description

Convolutional neural network compression method based on channel pruning
Technical Field
The invention belongs to the field of computer vision, relates to a deep learning technology, and particularly relates to a convolutional neural network compression method based on channel pruning.
Background
In recent years, deep learning has been rapidly developed in the visual field, and more high-precision models have been proposed. However, as the number of layers of the model gets deeper and deeper, the requirements for computation and storage become higher and higher. For example, VGG16 has 1 hundred million, 3 million or more parameters, and requires nearly 150 hundred million floating point operations to complete an image recognition task. In practical applications, the problem of resource limitation is often faced, so that the network model must be compressed and accelerated.
There are many methods for model compression and acceleration, and channel pruning of convolutional layers is one of the most common methods. The method does not damage the original model structure, and the parameters of the original model can be directly applied to the new model without modification, so the method is easy to realize and does not depend on specific hardware and a third-party library. However, the pruning method is relatively extensive, and the generalization capability of the model is easily damaged, so that the quality of the model is rapidly reduced. The existing pruning method usually does not consider the sensitivity difference between the convolutional layers or needs to set the pruning proportion of each layer according to experience, so that the pruning effect is not ideal, the flexibility is poor, and great application difficulty is brought.
Disclosure of Invention
In order to solve the problems, the invention discloses a convolutional neural network compression method based on channel pruning, which can fully consider the sensitivity of different convolutional layers by adopting feature map activation in layers and adopting loss estimation pruning standards among layers and realize self-adaptive pruning under the condition of controlling the integral pruning proportion.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a convolutional neural network compression method based on channel pruning comprises the following steps:
the method comprises the following steps of firstly, calculating average activation of all convolutional layer feature maps in a model, and sequencing the feature maps of each layer according to the average activation;
secondly, selecting the characteristic diagram with the minimum influence on the final loss of the model from the characteristic diagrams with the minimum average activation in each convolution layer, and cutting out the corresponding channel from the model;
thirdly, judging whether the accuracy of the model is lower than a certain threshold value T, if so, finely adjusting the reference model until the model is approximately converged, and returning to the first step;
fourthly, judging whether the pruning proportion reaches a preset proportion R or not, and returning to the second step if the pruning proportion does not reach the preset proportion R;
and fifthly, fine tuning the model again to recover the accuracy.
Further, the first step includes the following processes:
dividing a part of data from a training set of the data set as a sample set, setting the number of the sample sets as N, and setting the convolutional layer input matrix as a ∈ RN×H×W×CH, W and C are the height, width, and number of channels, respectively, of the feature map. Then the average activation of the feature map corresponding to the kth channel is:
Figure RE-GDA0002947424240000021
the C channels of each layer are ordered according to the average activation of the signature.
Further, the second step includes the following processes:
note S(l)Set of all channels for layer l, k(l)The least active channel is averaged for the ith layer profile, namely:
Figure RE-GDA0002947424240000022
let K be K ═ K(1),k(2),...,k(L)The set of channels that are the least active on average in each layer, where L is the number of layers of the convolutional layer. Maintaining a set K in the pruning process, tentatively pruning each channel, evaluating the change of model loss on the test set, finally selecting the channel with the minimum model loss as the pruning channel, and pruning the channel from the model
Further, the third step includes the following processes:
and testing the accuracy of the pruned model on the test set of the data set, and judging whether the accuracy is smaller than a preset threshold value T. If the value is less than the threshold value, the model is finely adjusted, namely, the convolution layer of the model is retrained with a smaller learning rate on the basis of the existing parameters. And adopting a method of early termination, stopping training when the loss of the model does not decrease within a plurality of rounds, and returning to the first step.
Further, the fourth step includes the following processes:
calculating the pruning proportion of the model, namely:
Figure RE-GDA0002947424240000023
and if the pruning proportion is smaller than the preset proportion R, returning to the second step.
Further, the fifth step includes the following processes:
and fifthly, fine-tuning the whole model for more rounds by adopting a method similar to the third step, and recovering the accuracy of the model.
The invention has the beneficial effects that:
the invention adopts two different channel selection standards, adopts a channel selection method based on characteristic diagram activation in the convolution layer, and adopts a channel selection method based on loss estimation between the convolution layers.
Drawings
Fig. 1 is a flowchart of a model compression method based on channel pruning according to the present invention.
FIG. 2 is a comparison of pruning results of the present invention on VGG model using Cifar-10 data set with other methods.
Detailed Description
The present invention will be further illustrated with reference to the accompanying drawings and specific embodiments, which are to be understood as merely illustrative of the invention and not as limiting the scope of the invention.
In this embodiment, TensorFlow is used as a deep learning framework to prune convolutional layers of a VGG16 model trained on a Cifar-10 dataset, and a flow chart is shown in FIG. 1, and the following steps are adopted in this embodiment:
step 1, randomly selecting 100 pictures from a cifar-10 training set as a sample set, taking the sample set as input, and countingCalculating the output matrix of each convolutional layer, i.e. the input matrix a ∈ R of the next convolutional layerN×H×W×CH, W and C are the height, width, and number of channels, respectively, of the feature map. Calculating the average activation of the characteristic diagram corresponding to the kth channel of the convolutional layer as follows:
Figure RE-GDA0002947424240000031
and sorting the C channels of each layer according to the average activation of the feature map, and maintaining a sorted channel set.
And 2, polling all convolutional layers, cutting out a channel with the minimum average activation of the characteristic diagram of each convolutional layer, testing and recording the loss on the Cifar-10 test set again, and then recovering the model. After all convolutional layers have been polled, the least lossy channel is selected as the final channel, which is pruned, and the set of convolutional layer channels is updated.
And 3, calculating the accuracy of the model on the Cifar-10 test set, setting a threshold T to be 90%, and training the pruned model if the accuracy is lower than 90%. The training method adopts random gradient descent, the learning rate is fixed to be 2.4e-5, and the Nesterov momentum is 0.9. With the early termination strategy, fine tuning is stopped when the loss no longer decreases for 20 consecutive rounds and is restored to the lowest weight loss, for a maximum of 250 rounds. And after the fine adjustment is finished, returning to the step 1, recalculating the average activation of the feature map, and starting the next round of pruning.
Step 4, calculating the pruning proportion of the model, namely:
Figure RE-GDA0002947424240000032
and (3) setting the preset pruning proportion R to be 50%, and if the pruning proportion of the model is less than 50%, returning to the step (2) to continue pruning.
And 5, fine tuning the model by adopting the same parameter setting as the step 3, and recovering the accuracy of the model again.
To illustrate the superiority of the present invention, we also compared the pruning results of the present invention with other pruning methods on VGG models using Cifar-10 datasets, including greedy search method (ThiNet), Lasso regression reconstruction method (CP), batch optimization layer constraint method (slim), discriminative power perception method (DCP), scaling method (WM). The comparative results are shown in FIG. 2.
It can be seen that ThiNet, CP and DCP all adopt a fixed ratio clipping method at each layer, i.e. the number of parameters and the floating point operand are about 50% of the original value, and this method is relatively flexible and requires experience to select the clipping ratio in practical application. Both the Sliming and the present invention can adopt different cutting proportions in each layer, but the present invention can also control the whole pruning proportion on the basis of the self-adaptive pruning. From the pruning results, Sliming has the highest parameter compression rate, while DCP has the most excellent performance in accuracy, even improving the accuracy of the reference model by 0.17%. It should also be noted that DCP is a training-phase (Train-stage) pruning method that requires full knowledge of the data set and training of the model from scratch, the most complex. In conclusion, the present invention achieves very high compression rates and most balanced performance at the cost of relatively little accuracy (-0.42%).
The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and such improvements and modifications are also considered to be within the scope of the present invention.

Claims (5)

1. A convolutional neural network model compression method based on channel pruning is characterized by comprising the following steps:
the method comprises the following steps of firstly, calculating average activation of all convolutional layer feature maps in a model, and sequencing the feature maps of each layer according to the average activation;
secondly, selecting the characteristic diagram with the minimum influence on the final loss of the model from the characteristic diagrams with the minimum average activation in each convolution layer, and cutting out the corresponding channel from the model;
thirdly, judging whether the accuracy of the model is lower than a certain threshold value T, if so, finely adjusting the reference model until the model converges, and returning to the first step;
fourthly, judging whether the pruning proportion reaches a preset proportion R, if not, returning to the second step, otherwise, executing the fifth step;
and fifthly, fine tuning the model again to recover the accuracy.
2. The convolutional neural network model compression method based on channel pruning as claimed in claim 1, wherein the calculation method of the mean activation of feature maps in the first step is as follows:
assuming that the number of sample sets is N, the convolutional layer input matrix is a ∈ RN×H×W×CH, W and C are the height, width, and number of channels, respectively, of the feature map; then the average activation of the feature map corresponding to the kth channel is:
Figure FDA0002844772090000011
3. the convolutional neural network model compression method based on channel pruning as claimed in claim 1, wherein the specific process of the second step is:
note S(l)Set of all channels for layer l, k(l)The least active channel is averaged for the ith layer profile, namely:
Figure FDA0002844772090000012
let K be K ═ K(1),k(2),...,k(L)The set of channels with the smallest average activation in each layerAnd L is the number of the convolutional layers, maintaining a set K in the pruning process, tentatively pruning each channel, evaluating the change of model loss on the test set, and finally selecting the channel with the minimum model loss as the pruned channel.
4. The convolutional neural network model compression method based on channel pruning as claimed in claim 1, wherein in the third step, the method for fine tuning the model is as follows:
training all convolutional layers of the pruned model by adopting a smaller learning rate; with the early termination method, training is stopped when the loss of the model no longer decreases within a certain fixed round.
5. The convolutional neural network model compression method based on channel pruning as claimed in claim 1, wherein the fine tuning method adopted in the fifth step is the same as that adopted in the third step.
CN202011505386.3A 2020-12-18 2020-12-18 Convolutional neural network compression method based on channel pruning Pending CN112633472A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011505386.3A CN112633472A (en) 2020-12-18 2020-12-18 Convolutional neural network compression method based on channel pruning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011505386.3A CN112633472A (en) 2020-12-18 2020-12-18 Convolutional neural network compression method based on channel pruning

Publications (1)

Publication Number Publication Date
CN112633472A true CN112633472A (en) 2021-04-09

Family

ID=75317190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011505386.3A Pending CN112633472A (en) 2020-12-18 2020-12-18 Convolutional neural network compression method based on channel pruning

Country Status (1)

Country Link
CN (1) CN112633472A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114881227A (en) * 2022-05-13 2022-08-09 北京百度网讯科技有限公司 Model compression method, image processing method, device and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114881227A (en) * 2022-05-13 2022-08-09 北京百度网讯科技有限公司 Model compression method, image processing method, device and electronic equipment

Similar Documents

Publication Publication Date Title
US7567972B2 (en) Method and system for data mining in high dimensional data spaces
US20220351043A1 (en) Adaptive high-precision compression method and system based on convolutional neural network model
CN111612144B (en) Pruning method and terminal applied to target detection
CN107392919B (en) Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method
CN114065863B (en) Federal learning method, apparatus, system, electronic device and storage medium
TW202029074A (en) Method, apparatus and computer device for image processing and storage medium thereof
JPH0744514A (en) Learning data contracting method for neural network
CN110991621A (en) Method for searching convolutional neural network based on channel number
CN112990420A (en) Pruning method for convolutional neural network model
CN113283473B (en) CNN feature mapping pruning-based rapid underwater target identification method
JPWO2019146189A1 (en) Neural network rank optimizer and optimization method
CN112270405A (en) Filter pruning method and system of convolution neural network model based on norm
WO2023020456A1 (en) Network model quantification method and apparatus, device, and storage medium
US7809726B2 (en) Mechanism for unsupervised clustering
CN112633472A (en) Convolutional neural network compression method based on channel pruning
CN111160519A (en) Convolutional neural network model pruning method based on structure redundancy detection
US20240135698A1 (en) Image classification method, model training method, device, storage medium, and computer program
CN110826692B (en) Automatic model compression method, device, equipment and storage medium
CN109034372B (en) Neural network pruning method based on probability
WO2020087254A1 (en) Optimization method for convolutional neural network, and related product
CN112329923A (en) Model compression method and device, electronic equipment and readable storage medium
CN110321799B (en) Scene number selection method based on SBR and average inter-class distance
CN112149716A (en) Model compression method and system based on FPGM (field programmable gate array)
CN113313246A (en) Method, apparatus and program product for determining model compression ratio
CN113177627B (en) Optimization system, retraining system, method thereof, processor and readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination