CN112488313A

CN112488313A - Convolutional neural network model compression method based on explicit weight

Info

Publication number: CN112488313A
Application number: CN202011434519.2A
Authority: CN
Inventors: 骆春波; 濮希同; 罗杨; 韦仕才; 张赟疆; 徐加朗; 许燕
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-12-10
Filing date: 2020-12-10
Publication date: 2021-03-12

Abstract

The invention discloses a convolutional neural network model compression method based on explicit weight, which comprises the following steps: acquiring a training image; establishing a convolutional neural network model to be compressed, and giving a target compression ratio; adding explicit weight to a feature map channel of the convolutional neural network model; performing channel pruning according to the weight value; judging whether the overall compression ratio of the model is greater than or equal to the target compression ratio, if so, proceeding to the next step, otherwise, adjusting the compression ratio of each layer and returning to the previous step; deleting all the added explicit weights, deleting the related calculation related to the added explicit weights, and storing the compressed neural network model and parameters after the accuracy of the neural network model is restored; the method has the advantages that the structured compression model can be generated by utilizing the channel pruning method, and the problems of large parameter quantity, large consumption and low running speed of the conventional convolutional neural network model are solved.

Description

Convolutional neural network model compression method based on explicit weight

Technical Field

The invention relates to the field of neural network model compression, in particular to a convolutional neural network model compression method based on explicit weight.

Background

Convolutional neural network algorithms have been applied to theoretical implementation in many fields, and the speed of operation of convolutional neural network models on each platform and the amount of consumed resources have an important influence on the practical application range of the convolutional neural network algorithms. Because many current convolutional neural network models have the problems of large resource consumption, low running speed and the like caused by parameter redundancy, the convolutional neural network models cannot be applied to a platform with limited resources. And redundant parameters of the convolutional neural network model are deleted, so that the resource consumption of the model is reduced, the running speed of the model is increased, and the actual application range of the convolutional neural network model can be expanded.

Channel pruning is one of the most widely used convolutional neural network model compression methods. The traditional channel pruning algorithm generally provides a standard for measuring the importance of channels, then calculates and measures the importance of each channel according to the standard, and deletes a feature map channel with low importance degree, a convolution kernel corresponding to the feature map channel and a channel corresponding to a next layer of convolution kernel. However, in the existing method, some characteristics of the convolutional network are generally used as criteria for measuring the importance of the channel, such as the sum of absolute values of the channel, the proportion of non-zero values, the gradient average value, and the like, which are implicit, a large amount of calculation is often required to obtain the weight, or a specific role is played in the convolutional network, and the model compression performed by using the criteria has the disadvantages of large calculation amount, complex implementation, low compression rate, and the like.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a convolutional neural network model compression method based on explicit weight, which solves the problems of large parameter quantity, large resource consumption and low operation speed of the conventional convolutional neural network model, and solves the problems of complex realization, large calculation quantity and low compression ratio of the conventional model compression method based on channel pruning.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that:

a convolutional neural network model compression method based on explicit weight comprises the following steps:

s1, acquiring a plurality of training images;

s2, establishing a convolutional neural network model to be compressed, and setting a target compression ratio R;

s3, adding explicit weight to a characteristic diagram channel of the convolutional neural network model;

s4, performing channel pruning on the characteristic diagram channel of each convolution layer of the convolution neural network model according to the size of the explicit weight increased in the step S3;

s5, after channel pruning is completed, judging whether the integral compression rate of the convolutional neural network model convolutional layer is greater than or equal to the target compression rate; if yes, the pruning is finished and the step S6 is executed; if not, adjusting the compression rate of each layer and returning to execute the step S4;

and S6, deleting the explicit weights added by all the characteristic diagram channels, and finally storing the compressed convolutional neural network model and parameters.

The invention has the beneficial effects that: firstly, a compression method for increasing explicit weight is adopted, so that a convolutional neural network model can be effectively compressed, the compression rate is higher, the operation is simple and visual, the calculation amount is reduced, and extra large weight calculation is not needed; the method has good robustness, and can be well applied to compression of each convolutional neural network model; the method finally deletes the explicit weight added in the process, so that the model is restored to the initial state, thus not changing the original information flow mode of the model and not adding any new parameter or structure to the compressed model.

Further, the specific process of step S3 is:

firstly, adding an explicit weight parameter with the quantity equal to the number of channels of a characteristic diagram output by each convolution layer of the convolution neural network model after the nonlinear activation layer in each convolution layer, wherein the characteristic diagram output by the nonlinear activation layer of the model is represented as:

wherein X is represented by a feature map, subscript i represents a channel index of the feature map, superscript l represents the number of layers of the model nonlinear activation layer, and N represents the total number of channels of the feature map; the added explicit weight parameter is expressed as:

wherein W represents an increased explicit weight parameter equal in number to the total number N of eigenmap channels of the convolutional layer;

and then multiplying the added explicit weight and the feature graph channel by channel, and calculating the feature graph after the explicit weight is added, wherein the calculation formula is represented as:

wherein, Y represents the characteristic diagram after adding the explicit weight;

and finally, inputting the feature diagram Y added with the explicit weight into the next layer for convolution calculation.

The beneficial effects of the above further scheme are: the importance of each channel of the convolutional neural network can be represented by adding a small amount of explicit weight of the short temporary duration, the added storage and calculation costs are very low, and the inference result of the original model is not influenced.

Further, the specific process of channel pruning in step S4 is as follows:

firstly, training the convolutional neural network model added with the explicit weight in the step S3, and storing the model and the parameters after the model converges;

then, the number of channels of the output characteristic diagram of the first layer is set as N, and the target compression ratio is set as r^lIf the number of the feature map channels to be deleted in the first layer is D：

D＝ceil(N×r^l)

Wherein ceil (·) represents the upper bound function;

then sorting the increased explicit weights W of the first layer from small to large, recording channel indexes corresponding to each weight, selecting channel indexes corresponding to the first D smallest explicit weights, deleting the explicit weights corresponding to the channel indexes, corresponding characteristic diagram channels, corresponding convolution kernels in the convolution layer and channels corresponding to the channel indexes of all convolution kernels in the next convolution layer;

and finally, performing channel pruning on the feature maps of all the convolutional layers according to the method.

The beneficial effects of the above further scheme are: the parameter quantity and the calculated quantity of the convolutional neural network model can be greatly reduced, and further, the requirements of the model on resources such as computing power and storage of an operation platform are reduced, so that the convolutional neural network model can be operated on the platform with limited resources, and the application range of a convolutional neural network algorithm is widened.

Further, in the pruning process of step S4, when performing channel pruning on the convolutional neural network model with fully-connected layers in the last convolutional layer, the explicit weight, the feature map channel, and the convolutional kernel of the last convolutional layer corresponding to the channel index of the layer are deleted, and the neuron corresponding to the channel index in the fully-connected layer of the next convolutional layer is deleted.

The beneficial effects of the above further scheme are: the universality of the method is improved, the parameter quantity of the full connection layer is further reduced, the parameter redundancy of the whole convolutional neural network model is eliminated, the model parameter quantity is reduced, the model operation speed is accelerated, and the model becomes more efficient.

Drawings

FIG. 1 is a flow chart of a convolutional neural network model compression method based on explicit weights in the present invention;

FIG. 2 is a flow chart of the convolutional neural network channel pruning based on explicit weights in accordance with the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

As shown in fig. 1, a convolutional neural network model compression method based on explicit weights includes the following steps:

s1, acquiring a plurality of training images;

performing convolution operation and maximum pooling operation on a plurality of training images to obtain characteristic graphs of the training images on convolution layers at all levels;

a convolution neural network model can be reestablished, and the model parameters are initialized randomly; and the model compression can also be carried out on the basis of the trained model. The set target compression ratio R represents the maximum desired compression ratio for the compression model, and this is used as a flag for ending the iterative compression of the model.

and adding weight parameters with the same number as the number of channels of the output characteristic diagram of each layer after the nonlinear activation layer in each convolution layer of the convolution neural network model, initializing all the added weight parameters to 1, and multiplying the added weight parameters with the corresponding characteristic diagram channels channel by channel.

In the embodiment of the present invention, the specific process of step S3 is:

the subscript i represents the channel index of the characteristic diagram, the superscript l represents the number of layers of the model nonlinear activation layer, and N represents the total number of the characteristic diagram channels; the added explicit weight parameter is expressed as:

wherein W represents an increased explicit weight parameter, with an initial value of 1, the number of which equals the total number N of convolutional layer feature map channels, and the addition of W does not change the inference of the model; the value of W is updated during the model training process according to the gradient descent method, since

For characteristic diagram

All the values of (a) and (b) are subjected to weighted multiplication,

the global size of the ith channel of the feature map can be influenced, so that the importance of the ith channel of the feature map can be reflected.

The larger the influence on all values of the ith channel, the larger the influence on the model inferred value. Therefore, the present invention assumes: the importance of the convolutional neural network model feature map channel is proportional to the increased weight value of the layer.

And then multiplying the increased display weight and the feature map channel by channel to calculate the feature map after the increased display weight is calculated, wherein the calculation formula is represented as:

wherein Y represents the feature map with the increased display weight;

finally, inputting the feature graph Y subjected to the increased display weight into the next layer for convolution calculation;

s4, performing channel pruning on the characteristic diagram channel of each convolution layer of the convolutional neural network according to the size of the explicit weight increased in the step S3;

in the embodiment of the present invention, the specific process of the channel pruning in step S4 is as follows:

firstly, training the convolutional neural network model with the explicit weight added in step S3, so that the added explicit weight parameter learns the value capable of representing the importance of the corresponding feature map channel in the model training.

Then setting the compression ratio R of each layer of the convolutional neural network model based on the target compression ratio R^lTarget compressibility r of each layer^lIs a hyper-parameter and needs to be manually adjusted in a plurality of experiments, and the target compression ratio r^lThe empirical value of (1) is spindle distribution, the target compression ratio r is in the shallow layer and the deep layer of the neural network model^lThe value can be set to be large, and in the middle layer of the model, the target compression ratio r^lThe value may be set small. Setting the number of channels of the output characteristic diagram of the l-th layer as N and the target compression ratio as r^lIf the number of the feature map channels to be deleted in the ith layer is:

D＝ceil(N×r^l)

wherein ceil (·) represents the upper bound function;

finally, according to the method, according to the value of the added explicit weight parameter W, pruning all convolutional layers of the convolutional neural network model layer by layer, wherein the pruning process is as shown in FIG. 2:

in the embodiment of the present invention, in step S4, for the convolutional neural network model with fully-connected layers, when performing channel pruning in the last convolutional layer, the weights, feature map channels and convolutional kernels of the previous layer corresponding to the channel index of the convolutional layer are deleted, and the neurons of the fully-connected layer of the next layer corresponding to the index are deleted.

S5, after channel pruning is completed, whether the compression ratio of each layer of the neural network model is larger than or equal to the target compression ratio is judged; if yes, the pruning is ended and the step S6 is entered, if not, the compression ratio of each layer is adjusted and the S4 is returned;

this step is the training process for the convolutional neural network, used to fine tune the model to recover the accuracy;

s6, deleting the explicit weights added by all the feature maps, and finally storing the compressed convolutional neural network model and parameters;

after the model compression is finished, deleting all weights added in the neural network model in the step S3, deleting the related calculation related to the added explicit weight, and restoring the calculation flow of the image in the neural network model to the initial state of S2; after a few batches of training, the model can restore the precision at the end of the step S4; and after the precision of the neural network model is recovered, storing the compressed neural network model and parameters, and ending the model compression.

The invention provides an explicit weight representing the importance of each characteristic diagram channel, carries out channel pruning according to the weight value, deletes smaller weight value, and the corresponding characteristic diagram channel, convolution kernel and the corresponding channel of the next layer of convolution kernel. And after the channel pruning is finished, deleting the increased weight, and finely adjusting the model to quickly recover the precision. By the method for increasing the temporary explicit weight, a direct basis for judging the importance of parameters is provided for the compression of the convolutional neural network model, the problems of large resource consumption and low running speed caused by the redundancy of a large number of parameters of the conventional convolutional neural network model are solved, and the problems of large calculation amount, standard implicit pruning and non-intuitive performance of the conventional channel pruning method are solved.

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. A convolutional neural network model compression method based on explicit weight is characterized by comprising the following steps:

s1, acquiring a plurality of training images;

and S6, deleting the explicit weights added by all the characteristic diagram channels, continuing training until the model converges, and storing the compressed convolutional neural network model and parameters.

2. The explicit weight-based convolutional neural network model compression method as claimed in claim 1, wherein the specific process of step S3 is:

W_i ^l＝1,i＝1,2,3,…,N

wherein W represents an incremental explicit weight parameter equal in number to the total number N of convolutional layer feature map channels;

and then multiplying the added explicit weight and the feature graph channel by channel to calculate the weighted feature graph, wherein the calculation formula is represented as:

3. The explicit weight-based convolutional neural network model compression method as claimed in claim 2, wherein the specific process of channel pruning in step S4 is as follows:

then, the number of the I-th layer output characteristic diagram channels is set to be N, and the target compression ratio is set to be r^lThen, the number D of feature map channels to be deleted at the l-th layer is represented as:

D＝ceil(N×r^l)

wherein ceil (·) represents the upper bound function;

then sorting the increased explicit weights W of the first layer from small to large, recording channel indexes corresponding to each weight, selecting channel indexes corresponding to the first D smallest explicit weights, deleting the explicit weights corresponding to the channel indexes, corresponding characteristic diagram channels and corresponding convolution kernels in the convolution layer, and deleting the channels corresponding to the channel indexes of all the convolution kernels in the next convolution layer;

and finally, performing channel pruning on the characteristic diagram channels of all the convolutional layers according to the method.

4. The method according to claim 3, wherein in the step S4, in the channel pruning process, for the convolutional neural network model with fully-connected layers, when channel pruning is performed in the last convolutional layer, the explicit weights, the feature map channels, and the convolutional kernels of the convolutional layers above the convolutional layer corresponding to the channel index of the layer are deleted, and the neurons corresponding to the channel index in the fully-connected layer of the convolutional layer below the convolutional layer are deleted.