CN112488313A - Convolutional neural network model compression method based on explicit weight - Google Patents

Convolutional neural network model compression method based on explicit weight Download PDF

Info

Publication number
CN112488313A
CN112488313A CN202011434519.2A CN202011434519A CN112488313A CN 112488313 A CN112488313 A CN 112488313A CN 202011434519 A CN202011434519 A CN 202011434519A CN 112488313 A CN112488313 A CN 112488313A
Authority
CN
China
Prior art keywords
neural network
layer
network model
channel
explicit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011434519.2A
Other languages
Chinese (zh)
Inventor
骆春波
濮希同
罗杨
韦仕才
张赟疆
徐加朗
许燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202011434519.2A priority Critical patent/CN112488313A/en
Publication of CN112488313A publication Critical patent/CN112488313A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a convolutional neural network model compression method based on explicit weight, which comprises the following steps: acquiring a training image; establishing a convolutional neural network model to be compressed, and giving a target compression ratio; adding explicit weight to a feature map channel of the convolutional neural network model; performing channel pruning according to the weight value; judging whether the overall compression ratio of the model is greater than or equal to the target compression ratio, if so, proceeding to the next step, otherwise, adjusting the compression ratio of each layer and returning to the previous step; deleting all the added explicit weights, deleting the related calculation related to the added explicit weights, and storing the compressed neural network model and parameters after the accuracy of the neural network model is restored; the method has the advantages that the structured compression model can be generated by utilizing the channel pruning method, and the problems of large parameter quantity, large consumption and low running speed of the conventional convolutional neural network model are solved.

Description

Convolutional neural network model compression method based on explicit weight
Technical Field
The invention relates to the field of neural network model compression, in particular to a convolutional neural network model compression method based on explicit weight.
Background
Convolutional neural network algorithms have been applied to theoretical implementation in many fields, and the speed of operation of convolutional neural network models on each platform and the amount of consumed resources have an important influence on the practical application range of the convolutional neural network algorithms. Because many current convolutional neural network models have the problems of large resource consumption, low running speed and the like caused by parameter redundancy, the convolutional neural network models cannot be applied to a platform with limited resources. And redundant parameters of the convolutional neural network model are deleted, so that the resource consumption of the model is reduced, the running speed of the model is increased, and the actual application range of the convolutional neural network model can be expanded.
Channel pruning is one of the most widely used convolutional neural network model compression methods. The traditional channel pruning algorithm generally provides a standard for measuring the importance of channels, then calculates and measures the importance of each channel according to the standard, and deletes a feature map channel with low importance degree, a convolution kernel corresponding to the feature map channel and a channel corresponding to a next layer of convolution kernel. However, in the existing method, some characteristics of the convolutional network are generally used as criteria for measuring the importance of the channel, such as the sum of absolute values of the channel, the proportion of non-zero values, the gradient average value, and the like, which are implicit, a large amount of calculation is often required to obtain the weight, or a specific role is played in the convolutional network, and the model compression performed by using the criteria has the disadvantages of large calculation amount, complex implementation, low compression rate, and the like.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a convolutional neural network model compression method based on explicit weight, which solves the problems of large parameter quantity, large resource consumption and low operation speed of the conventional convolutional neural network model, and solves the problems of complex realization, large calculation quantity and low compression ratio of the conventional model compression method based on channel pruning.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that:
a convolutional neural network model compression method based on explicit weight comprises the following steps:
s1, acquiring a plurality of training images;
s2, establishing a convolutional neural network model to be compressed, and setting a target compression ratio R;
s3, adding explicit weight to a characteristic diagram channel of the convolutional neural network model;
s4, performing channel pruning on the characteristic diagram channel of each convolution layer of the convolution neural network model according to the size of the explicit weight increased in the step S3;
s5, after channel pruning is completed, judging whether the integral compression rate of the convolutional neural network model convolutional layer is greater than or equal to the target compression rate; if yes, the pruning is finished and the step S6 is executed; if not, adjusting the compression rate of each layer and returning to execute the step S4;
and S6, deleting the explicit weights added by all the characteristic diagram channels, and finally storing the compressed convolutional neural network model and parameters.
The invention has the beneficial effects that: firstly, a compression method for increasing explicit weight is adopted, so that a convolutional neural network model can be effectively compressed, the compression rate is higher, the operation is simple and visual, the calculation amount is reduced, and extra large weight calculation is not needed; the method has good robustness, and can be well applied to compression of each convolutional neural network model; the method finally deletes the explicit weight added in the process, so that the model is restored to the initial state, thus not changing the original information flow mode of the model and not adding any new parameter or structure to the compressed model.
Further, the specific process of step S3 is:
firstly, adding an explicit weight parameter with the quantity equal to the number of channels of a characteristic diagram output by each convolution layer of the convolution neural network model after the nonlinear activation layer in each convolution layer, wherein the characteristic diagram output by the nonlinear activation layer of the model is represented as:
Figure BDA0002827808550000031
wherein X is represented by a feature map, subscript i represents a channel index of the feature map, superscript l represents the number of layers of the model nonlinear activation layer, and N represents the total number of channels of the feature map; the added explicit weight parameter is expressed as:
Figure BDA0002827808550000032
wherein W represents an increased explicit weight parameter equal in number to the total number N of eigenmap channels of the convolutional layer;
and then multiplying the added explicit weight and the feature graph channel by channel, and calculating the feature graph after the explicit weight is added, wherein the calculation formula is represented as:
Figure BDA0002827808550000033
wherein, Y represents the characteristic diagram after adding the explicit weight;
and finally, inputting the feature diagram Y added with the explicit weight into the next layer for convolution calculation.
The beneficial effects of the above further scheme are: the importance of each channel of the convolutional neural network can be represented by adding a small amount of explicit weight of the short temporary duration, the added storage and calculation costs are very low, and the inference result of the original model is not influenced.
Further, the specific process of channel pruning in step S4 is as follows:
firstly, training the convolutional neural network model added with the explicit weight in the step S3, and storing the model and the parameters after the model converges;
then, the number of channels of the output characteristic diagram of the first layer is set as N, and the target compression ratio is set as rlIf the number of the feature map channels to be deleted in the first layer is D:
D=ceil(N×rl)
Wherein ceil (·) represents the upper bound function;
then sorting the increased explicit weights W of the first layer from small to large, recording channel indexes corresponding to each weight, selecting channel indexes corresponding to the first D smallest explicit weights, deleting the explicit weights corresponding to the channel indexes, corresponding characteristic diagram channels, corresponding convolution kernels in the convolution layer and channels corresponding to the channel indexes of all convolution kernels in the next convolution layer;
and finally, performing channel pruning on the feature maps of all the convolutional layers according to the method.
The beneficial effects of the above further scheme are: the parameter quantity and the calculated quantity of the convolutional neural network model can be greatly reduced, and further, the requirements of the model on resources such as computing power and storage of an operation platform are reduced, so that the convolutional neural network model can be operated on the platform with limited resources, and the application range of a convolutional neural network algorithm is widened.
Further, in the pruning process of step S4, when performing channel pruning on the convolutional neural network model with fully-connected layers in the last convolutional layer, the explicit weight, the feature map channel, and the convolutional kernel of the last convolutional layer corresponding to the channel index of the layer are deleted, and the neuron corresponding to the channel index in the fully-connected layer of the next convolutional layer is deleted.
The beneficial effects of the above further scheme are: the universality of the method is improved, the parameter quantity of the full connection layer is further reduced, the parameter redundancy of the whole convolutional neural network model is eliminated, the model parameter quantity is reduced, the model operation speed is accelerated, and the model becomes more efficient.
Drawings
FIG. 1 is a flow chart of a convolutional neural network model compression method based on explicit weights in the present invention;
FIG. 2 is a flow chart of the convolutional neural network channel pruning based on explicit weights in accordance with the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1, a convolutional neural network model compression method based on explicit weights includes the following steps:
s1, acquiring a plurality of training images;
performing convolution operation and maximum pooling operation on a plurality of training images to obtain characteristic graphs of the training images on convolution layers at all levels;
s2, establishing a convolutional neural network model to be compressed, and setting a target compression ratio R;
a convolution neural network model can be reestablished, and the model parameters are initialized randomly; and the model compression can also be carried out on the basis of the trained model. The set target compression ratio R represents the maximum desired compression ratio for the compression model, and this is used as a flag for ending the iterative compression of the model.
S3, adding explicit weight to a characteristic diagram channel of the convolutional neural network model;
and adding weight parameters with the same number as the number of channels of the output characteristic diagram of each layer after the nonlinear activation layer in each convolution layer of the convolution neural network model, initializing all the added weight parameters to 1, and multiplying the added weight parameters with the corresponding characteristic diagram channels channel by channel.
In the embodiment of the present invention, the specific process of step S3 is:
firstly, adding an explicit weight parameter with the quantity equal to the number of channels of a characteristic diagram output by each convolution layer of the convolution neural network model after the nonlinear activation layer in each convolution layer, wherein the characteristic diagram output by the nonlinear activation layer of the model is represented as:
Figure BDA0002827808550000051
the subscript i represents the channel index of the characteristic diagram, the superscript l represents the number of layers of the model nonlinear activation layer, and N represents the total number of the characteristic diagram channels; the added explicit weight parameter is expressed as:
Figure BDA0002827808550000052
wherein W represents an increased explicit weight parameter, with an initial value of 1, the number of which equals the total number N of convolutional layer feature map channels, and the addition of W does not change the inference of the model; the value of W is updated during the model training process according to the gradient descent method, since
Figure BDA0002827808550000053
For characteristic diagram
Figure BDA0002827808550000054
All the values of (a) and (b) are subjected to weighted multiplication,
Figure BDA0002827808550000055
the global size of the ith channel of the feature map can be influenced, so that the importance of the ith channel of the feature map can be reflected.
Figure BDA0002827808550000061
The larger the influence on all values of the ith channel, the larger the influence on the model inferred value. Therefore, the present invention assumes: the importance of the convolutional neural network model feature map channel is proportional to the increased weight value of the layer.
And then multiplying the increased display weight and the feature map channel by channel to calculate the feature map after the increased display weight is calculated, wherein the calculation formula is represented as:
Figure BDA0002827808550000062
wherein Y represents the feature map with the increased display weight;
finally, inputting the feature graph Y subjected to the increased display weight into the next layer for convolution calculation;
s4, performing channel pruning on the characteristic diagram channel of each convolution layer of the convolutional neural network according to the size of the explicit weight increased in the step S3;
in the embodiment of the present invention, the specific process of the channel pruning in step S4 is as follows:
firstly, training the convolutional neural network model with the explicit weight added in step S3, so that the added explicit weight parameter learns the value capable of representing the importance of the corresponding feature map channel in the model training.
Then setting the compression ratio R of each layer of the convolutional neural network model based on the target compression ratio RlTarget compressibility r of each layerlIs a hyper-parameter and needs to be manually adjusted in a plurality of experiments, and the target compression ratio rlThe empirical value of (1) is spindle distribution, the target compression ratio r is in the shallow layer and the deep layer of the neural network modellThe value can be set to be large, and in the middle layer of the model, the target compression ratio rlThe value may be set small. Setting the number of channels of the output characteristic diagram of the l-th layer as N and the target compression ratio as rlIf the number of the feature map channels to be deleted in the ith layer is:
D=ceil(N×rl)
wherein ceil (·) represents the upper bound function;
then sorting the increased explicit weights W of the first layer from small to large, recording channel indexes corresponding to each weight, selecting channel indexes corresponding to the first D smallest explicit weights, deleting the explicit weights corresponding to the channel indexes, corresponding characteristic diagram channels, corresponding convolution kernels in the convolution layer and channels corresponding to the channel indexes of all convolution kernels in the next convolution layer;
finally, according to the method, according to the value of the added explicit weight parameter W, pruning all convolutional layers of the convolutional neural network model layer by layer, wherein the pruning process is as shown in FIG. 2:
in the embodiment of the present invention, in step S4, for the convolutional neural network model with fully-connected layers, when performing channel pruning in the last convolutional layer, the weights, feature map channels and convolutional kernels of the previous layer corresponding to the channel index of the convolutional layer are deleted, and the neurons of the fully-connected layer of the next layer corresponding to the index are deleted.
S5, after channel pruning is completed, whether the compression ratio of each layer of the neural network model is larger than or equal to the target compression ratio is judged; if yes, the pruning is ended and the step S6 is entered, if not, the compression ratio of each layer is adjusted and the S4 is returned;
this step is the training process for the convolutional neural network, used to fine tune the model to recover the accuracy;
s6, deleting the explicit weights added by all the feature maps, and finally storing the compressed convolutional neural network model and parameters;
after the model compression is finished, deleting all weights added in the neural network model in the step S3, deleting the related calculation related to the added explicit weight, and restoring the calculation flow of the image in the neural network model to the initial state of S2; after a few batches of training, the model can restore the precision at the end of the step S4; and after the precision of the neural network model is recovered, storing the compressed neural network model and parameters, and ending the model compression.
The invention provides an explicit weight representing the importance of each characteristic diagram channel, carries out channel pruning according to the weight value, deletes smaller weight value, and the corresponding characteristic diagram channel, convolution kernel and the corresponding channel of the next layer of convolution kernel. And after the channel pruning is finished, deleting the increased weight, and finely adjusting the model to quickly recover the precision. By the method for increasing the temporary explicit weight, a direct basis for judging the importance of parameters is provided for the compression of the convolutional neural network model, the problems of large resource consumption and low running speed caused by the redundancy of a large number of parameters of the conventional convolutional neural network model are solved, and the problems of large calculation amount, standard implicit pruning and non-intuitive performance of the conventional channel pruning method are solved.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (4)

1. A convolutional neural network model compression method based on explicit weight is characterized by comprising the following steps:
s1, acquiring a plurality of training images;
s2, establishing a convolutional neural network model to be compressed, and setting a target compression ratio R;
s3, adding explicit weight to a characteristic diagram channel of the convolutional neural network model;
s4, performing channel pruning on the characteristic diagram channel of each convolution layer of the convolution neural network model according to the size of the explicit weight increased in the step S3;
s5, after channel pruning is completed, judging whether the integral compression rate of the convolutional neural network model convolutional layer is greater than or equal to the target compression rate; if yes, the pruning is finished and the step S6 is executed; if not, adjusting the compression rate of each layer and returning to execute the step S4;
and S6, deleting the explicit weights added by all the characteristic diagram channels, continuing training until the model converges, and storing the compressed convolutional neural network model and parameters.
2. The explicit weight-based convolutional neural network model compression method as claimed in claim 1, wherein the specific process of step S3 is:
firstly, adding an explicit weight parameter with the quantity equal to the number of channels of a characteristic diagram output by each convolution layer of the convolution neural network model after the nonlinear activation layer in each convolution layer, wherein the characteristic diagram output by the nonlinear activation layer of the model is represented as:
Figure FDA0002827808540000011
wherein X is represented by a feature map, subscript i represents a channel index of the feature map, superscript l represents the number of layers of the model nonlinear activation layer, and N represents the total number of channels of the feature map; the added explicit weight parameter is expressed as:
Wi l=1,i=1,2,3,…,N
wherein W represents an incremental explicit weight parameter equal in number to the total number N of convolutional layer feature map channels;
and then multiplying the added explicit weight and the feature graph channel by channel to calculate the weighted feature graph, wherein the calculation formula is represented as:
Figure FDA0002827808540000021
wherein, Y represents the characteristic diagram after adding the explicit weight;
and finally, inputting the feature diagram Y added with the explicit weight into the next layer for convolution calculation.
3. The explicit weight-based convolutional neural network model compression method as claimed in claim 2, wherein the specific process of channel pruning in step S4 is as follows:
firstly, training the convolutional neural network model added with the explicit weight in the step S3, and storing the model and the parameters after the model converges;
then, the number of the I-th layer output characteristic diagram channels is set to be N, and the target compression ratio is set to be rlThen, the number D of feature map channels to be deleted at the l-th layer is represented as:
D=ceil(N×rl)
wherein ceil (·) represents the upper bound function;
then sorting the increased explicit weights W of the first layer from small to large, recording channel indexes corresponding to each weight, selecting channel indexes corresponding to the first D smallest explicit weights, deleting the explicit weights corresponding to the channel indexes, corresponding characteristic diagram channels and corresponding convolution kernels in the convolution layer, and deleting the channels corresponding to the channel indexes of all the convolution kernels in the next convolution layer;
and finally, performing channel pruning on the characteristic diagram channels of all the convolutional layers according to the method.
4. The method according to claim 3, wherein in the step S4, in the channel pruning process, for the convolutional neural network model with fully-connected layers, when channel pruning is performed in the last convolutional layer, the explicit weights, the feature map channels, and the convolutional kernels of the convolutional layers above the convolutional layer corresponding to the channel index of the layer are deleted, and the neurons corresponding to the channel index in the fully-connected layer of the convolutional layer below the convolutional layer are deleted.
CN202011434519.2A 2020-12-10 2020-12-10 Convolutional neural network model compression method based on explicit weight Pending CN112488313A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011434519.2A CN112488313A (en) 2020-12-10 2020-12-10 Convolutional neural network model compression method based on explicit weight

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011434519.2A CN112488313A (en) 2020-12-10 2020-12-10 Convolutional neural network model compression method based on explicit weight

Publications (1)

Publication Number Publication Date
CN112488313A true CN112488313A (en) 2021-03-12

Family

ID=74939850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011434519.2A Pending CN112488313A (en) 2020-12-10 2020-12-10 Convolutional neural network model compression method based on explicit weight

Country Status (1)

Country Link
CN (1) CN112488313A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033804A (en) * 2021-03-29 2021-06-25 北京理工大学重庆创新中心 Convolution neural network compression method for remote sensing image
CN113408724A (en) * 2021-06-17 2021-09-17 博众精工科技股份有限公司 Model compression method and device
CN113762502A (en) * 2021-04-22 2021-12-07 腾讯科技(深圳)有限公司 Training method and device of neural network model
TWI767757B (en) * 2021-06-18 2022-06-11 中華電信股份有限公司 Method for determining weight initial value of binary classification problem in neural network and computer readable medium thereof
CN115134425A (en) * 2022-06-20 2022-09-30 北京京东乾石科技有限公司 Message processing method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033804A (en) * 2021-03-29 2021-06-25 北京理工大学重庆创新中心 Convolution neural network compression method for remote sensing image
CN113033804B (en) * 2021-03-29 2022-07-01 北京理工大学重庆创新中心 Convolution neural network compression method for remote sensing image
CN113762502A (en) * 2021-04-22 2021-12-07 腾讯科技(深圳)有限公司 Training method and device of neural network model
CN113762502B (en) * 2021-04-22 2023-09-19 腾讯科技(深圳)有限公司 Training method and device for neural network model
CN113408724A (en) * 2021-06-17 2021-09-17 博众精工科技股份有限公司 Model compression method and device
TWI767757B (en) * 2021-06-18 2022-06-11 中華電信股份有限公司 Method for determining weight initial value of binary classification problem in neural network and computer readable medium thereof
CN115134425A (en) * 2022-06-20 2022-09-30 北京京东乾石科技有限公司 Message processing method and device

Similar Documents

Publication Publication Date Title
CN112488313A (en) Convolutional neural network model compression method based on explicit weight
CN109002889B (en) Adaptive iterative convolution neural network model compression method
CN111079899A (en) Neural network model compression method, system, device and medium
CN113159173A (en) Convolutional neural network model compression method combining pruning and knowledge distillation
CN109784420B (en) Image processing method and device, computer equipment and storage medium
CN112488070A (en) Neural network compression method for remote sensing image target detection
CN111612144A (en) Pruning method and terminal applied to target detection
CN113240307B (en) Power system economic dispatching method based on improved differential evolution algorithm
CN113222138A (en) Convolutional neural network compression method combining layer pruning and channel pruning
CN113159276A (en) Model optimization deployment method, system, equipment and storage medium
CN113610227A (en) Efficient deep convolutional neural network pruning method
CN116188878A (en) Image classification method, device and storage medium based on neural network structure fine adjustment
CN114943335A (en) Layer-by-layer optimization method of ternary neural network
KR102505946B1 (en) Method and system for training artificial neural network models
Park et al. Squantizer: Simultaneous learning for both sparse and low-precision neural networks
CN115983320A (en) Federal learning model parameter quantification method based on deep reinforcement learning
CN116187387A (en) Neural network model quantization method, device, computer equipment and storage medium
CN112488291B (en) 8-Bit quantization compression method for neural network
CN113554104B (en) Image classification method based on deep learning model
CN115564043A (en) Image classification model pruning method and device, electronic equipment and storage medium
CN113033804B (en) Convolution neural network compression method for remote sensing image
CN114239826A (en) Neural network pruning method, medium and electronic device
CN115170902A (en) Training method of image processing model
CN114139678A (en) Convolutional neural network quantization method and device, electronic equipment and storage medium
Sarkar et al. An incremental pruning strategy for fast training of CNN models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210312