CN114049514B

CN114049514B - Image classification network compression method based on parameter reinitialization

Info

Publication number: CN114049514B
Application number: CN202111251560.0A
Authority: CN
Inventors: 苏雨; 刘广哲; 张科; 王靖宇
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-10-24
Filing date: 2021-10-24
Publication date: 2024-03-19
Anticipated expiration: 2041-10-24
Also published as: CN114049514A

Abstract

The invention relates to an image classification network compression method based on parameter reinitialization, and belongs to the technical field of image processing and recognition. In the network pre-training process, once for every t iterations on the complete training data set, a parameter re-initialization is performed for the unimportant channels. The unimportant channels are subjected to parameter reinitialization in the network training process, so that more filter forms can be introduced into the model, and the wrongly pruned convolution channels can be reactivated, thereby being beneficial to improving the performance of the compressed network model. The method is applied to the image classification task, the parameter quantity and the operation amount of the model can be reduced on the premise of ensuring the classification accuracy, and the method is convenient to use in mobile equipment such as mobile phones and the like.

Description

Image classification network compression method based on parameter reinitialization

Technical Field

The invention belongs to the technical field of image processing and recognition, and particularly relates to an image classification network compression method based on parameter reinitialization.

Background

With the development of neural network technology, more and more network models replace traditional artificial models, and great success is achieved in the machine vision fields of image classification, face recognition and the like. The convolutional neural network model realizes the extraction of image features by means of a huge training data set and a large number of complex operations, and improves the high abstraction capability of the model to the target essential features through repeated iterative training, so as to obtain the robust recognition effect.

However, the neural network model with superior performance has large storage space and calculation resources due to large scale, and not only needs to consume a large amount of calculation resources for optimizing in the training stage, but also needs to perform complex operation in the actual reasoning stage, which is not friendly to mobile phones, automobiles, satellites and other terminal equipment with limited resources, and has poor instantaneity.

In order to reduce the complexity of the neural network model while maintaining its superior performance, the neural network model compression method has received increasing attention. Parameter pruning is an effective compression method, and can effectively remove redundant structures and parameters in a network model. Song Shefan, wang Guoshu and Cheng Buyun (sparse training image recognition algorithm of mixed threshold pruning, science, technology and engineering, 2021, 21 (2): 638-643) propose a sparse training pruning algorithm, and the scaling factors of a batch normalization layer (BN layer) are applied with L1 regularization constraint in a training stage to enable convolution channels to tend to be sparse, so that channels with scaling factors close to 0 are removed, and the aim of pruning network parameters is fulfilled. According to the method, important convolution channels are identified by means of sparse training, but unimportant channels are always in a suppressed state in the sparse training, so that important functions are difficult to take place again under the sparse regularization action, erroneous pruning is easy to cause, an optimal pruning scheme is not easy to find, and the overall performance of the model is reduced.

Disclosure of Invention

Technical problem to be solved

The existing sparse pruning method of the convolutional neural network performs sparse training on all convolutional channels in the network model, and unimportant channels are always in a suppressed state, so that erroneous pruning is easy to cause. In order to avoid the defects of the prior art, the invention provides an image classification network compression method based on parameter reinitialization, which achieves the aim of improving the performance of a compressed model by reinitializing unimportant parameters in the network training process.

Technical proposal

An image classification network compression method based on parameter reinitialization is characterized in that: in the network pre-training process, once the iteration of epoch on the complete training data set is carried out for t times, the parameter re-initialization is carried out on the unimportant channels, and the steps are as follows:

step 1: according to the expected compression rate p ₀ ，0＜p ₀ < 1, determining the compression rate currently required to be achieved as p=s _p p ₀ The method comprises the steps of carrying out a first treatment on the surface of the Wherein s is _p Is a number of iterationsThe amount increasing gradually from 0 to 1 increases in a cubic function form, i.e

s _p ＝(e _i /(e-t)) ³

In the formula e _i E is the total iteration number of the pre-training stage;

for a convolution layer with n output channels, calculating according to the current compression rate p to obtain the number of channels needing to be re-initialized with parameters as n ₁ ＝np；

Step 2: n channels in the BN layer correspond to n channels in the convolution layer, and a scaling factor gamma in the BN layer can be used as a channel importance measurement basis; ordering the importance of the n channels, and calculating the convolution layer weight w corresponding to the channel with the highest importance _m Mean value of mu _m Standard deviation is sigma _m Recording the scaling factor of the channel in the BN layer as gamma _m Offset to beta _m ；

Step 3: will n ₁ Convolutional layer parameters corresponding to the channel to be reinitializedRandomly initializing it to give its average value mu _m Standard deviation is sigma _m Is a normal distribution of (2); at the same time, alter the n in BN layer ₁ Scaling factor for each channel is sγ _m Offset to sβ _m The method comprises the steps of carrying out a first treatment on the surface of the Where s is a quantity gradually decreasing from 1 to 0 as the number of iterations increases, decaying by a form of a cubic function, i.e

s＝1-(e _i /(e-t)) ³

Step 4: mean value in BN layer is not updated in network parameter reinitialization processAnd standard deviation sigma, so that these statistical parameters need to be updated on a training set of a batch, during which other parameters are fixed;

after the network pre-training stage is completed, n is cut off according to the scaling factor gamma in the BN layer ₁ And obtaining a compressed network model by using the unimportant convolution layer channel and the BN layer channel.

A computer system, comprising: one or more processors, a computer-readable storage medium storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods described above.

A computer readable storage medium, characterized by storing computer executable instructions that when executed are configured to implement the method described above.

A computer program comprising computer executable instructions which when executed are adapted to implement the method described above.

Advantageous effects

According to the image classification network compression method based on parameter reinitialization, parameter reinitialization is carried out on unimportant channels in the network training process, more filter forms can be introduced into the model, and the convolved channel which is wrongly pruned can be reactivated, so that the performance of the compressed network model is improved. The method is applied to the image classification task, the parameter quantity and the operation amount of the model can be reduced on the premise of ensuring the classification accuracy, and the method is convenient to use in mobile equipment such as mobile phones and the like.

Compared with the prior art, the invention has the following beneficial effects:

(1) The image classification network compression method based on parameter reinitialization designed by the invention can reactivate the convolution channel which is pruned by mistake in the network training process, and is beneficial to improving the performance of the compressed network. For example, the number of network parameters of the ResNet-56 after compression is reduced from 0.85M to 0.35M, the calculated amount is reduced from 127M to 53.7M, and the accuracy is improved from 92.76% to 93.10%.

(2) The parameter re-initialization method provides a new thought for network training, more filter forms can be introduced through parameter re-initialization in the network training process, and the method is favorable for extracting the richer characteristics of the target image, so that the performance of the network is improved. For example, the accuracy of the ResNet-56 image classification after compression is improved from 92.76% to 93.10%.

Drawings

The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.

Fig. 1 is a parameter reinitialization flowchart.

Fig. 2 is a graph of the frequency distribution of the reinitialization of the channel parameters in the convolutional layer.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

For a convolution layer in a neural network, the calculation process is as follows:

Y＝X*w (1)

in the method, in the process of the invention,is the input feature map tensor, < >>Is the output feature map tensor, < >>Is a convolution weight parameter, c and n are the number of input and output channels, h and w are the height and width of the input feature map, h 'and w' are the height and width of the output feature map, k×k is the size of the convolution kernel, and is an image convolution operation.

The convolutional layer is often followed by a BN layer, which is calculated by:

in the method, in the process of the invention,is the input of BN layer, i.e. the output of the last convolution layer +.> Is the output of BN layer, < >>Is the mean value of x in each training batch, +.>Is the standard deviation of x in each training batch,and->Respectively trainable scaling factors and offsets in BN layer, epsilon is a positive number preventing denominator from being 0.

In the network pre-training process, once for every t iterations (t epochs) on the complete training data set, a parameter re-initialization flow is performed on the unimportant channels as shown in fig. 1.

(1) According to the expected compression rate p ₀ (0＜p ₀ < 1), the compression rate which is required to be achieved at present is determined to be p=s _p p ₀ . Wherein s is _p For a quantity gradually increasing from 0 to 1 with increasing iteration number, the increment is performed in the form of a cubic function, i.e

s _p ＝(e _i /(e-t)) ³ (3)

In the formula e _i E is the total iteration number of the pre-training stage. The denominator is taken as e-t because no re-initialization is necessary at the last iteration, s will be needed at the last re-initialization (i.e., e-t iterations) _p Growing to 1.

For a convolution layer with n output channels, calculating according to the current compression rate p to obtain the number of channels needing to be re-initialized with parameters as n ₁ ＝np。

(2) The n channels in the BN layer correspond to the n channels of the convolutional layer, and the scaling factor γ in the BN layer may be used as a channel importance measure basis. Ordering the importance of the n channels, and calculating the convolution layer weight w corresponding to the channel with the highest importance _m Mean value of mu _m Standard deviation is sigma _m Recording the scaling factor of the channel in the BN layer as gamma _m Offset to beta _m 。

(3) Will n ₁ Convolutional layer parameters corresponding to the channel to be reinitializedRandomly initializing it to give its average value mu _m Standard deviation is sigma _m Is a normal distribution of (c). At the same time, alter the n in BN layer ₁ Scaling factor for each channel is sγ _m Offset to sβ _m . Where s is a quantity gradually decreasing from 1 to 0 as the number of iterations increases, decaying by a form of a cubic function, i.e

s＝1-(e _i /(e-t)) ³ (4)

(4) Mean value in BN layer is not updated in network parameter reinitialization processAnd standard deviation sigma, so these statistical parameters need to be updated on a training set of a batch, during which other parameters are fixed.

After the network pre-training stage is completed, n is cut off according to the scaling factor gamma in the BN layer ₁ Are not importantAnd the convolution layer channel and the BN layer channel are used for obtaining a compressed network model. The model is used for training according to a conventional training method, so that a final network model can be obtained, and the purpose of model compression is achieved.

The following describes embodiments of the present invention in connection with image classification examples, but the technical content of the present invention is not limited to the described scope, and the embodiments include the following steps:

(1) A convolutional neural network for image classification is built, and an image dataset with a large number of training samples and labels is constructed.

(2) The network is pre-trained on a training set, and the parameter re-initialization method is adopted for each time t epochs, and the parameter re-initialization is carried out on unimportant channels.

(3) The last epoch of the pre-training is not needed to be initialized again, and after the pre-training is finished, n is cut off according to the scaling factor gamma in the BN layer ₁ And obtaining a compressed network model by using the unimportant convolution layer channel and the BN layer channel.

(4) Training the compressed network model according to a conventional training method to obtain a final network model. The network parameters and the calculated amount of the model are lower than those of the original model, and the effect of network compression can be achieved while the correct classification result is ensured.

The method of the invention is adopted to compress the ResNet-56 network. The ResNet-56 network model contains multiple Basicblock, each containing two convolutional layers, only the first convolutional layer is compressed and set to a compression ratio of 0.6, i.e., 60%, channels for parameter reinitialization in order to ensure feature alignment. The pre-training phase performs parameter re-initialization once every t=5 epochs, and a total of e=160 epochs are trained. FIG. 2 is a plot of the frequency of re-initialization of the channel parameters in the final BasicBLock convolutional layer, which has a total of 64 channels, with less frequent channels representing relatively important reserved channels and more frequent channels representing relatively unimportant channels that are frequently re-initialized. After the reinitialization of some of the insignificant channels, the channels that were incorrectly pruned, corresponding to the channels that were centered in the figure, can be recovered because of the introduction of new filter morphology, which may become significant in the subsequent training process, replacing other channels.

Table 1 shows the compression results for ResNet-56 networks using the present invention. The original ResNet-56 model before compression comprises 0.85M parameter and 127M calculated amount, the accuracy rate on CIFAR-10 is 92.76%, the model parameter obtained by adopting the compression method is 0.35M, the required calculated amount is 54M, and the accuracy rate is improved to 93.10%. Therefore, the parameter reinitialization method provided by the invention can effectively reactivate the convolution channel which is pruned by mistake, introduce more filter forms and further improve the accuracy of the network model while compressing the network scale.

Table 1 the network compression results table of the present invention

Evaluation index	Quantity of parameters (M)	Calculation amount (M)	Accuracy rate of
				Before compression	0.85	127	92.76％
After compression	0.35	53.7	93.10％

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made without departing from the spirit and scope of the invention.

Claims

1. An image classification network compression method based on parameter reinitialization is characterized in that: in the network pre-training process, once the iteration of epoch on the complete training data set is carried out for t times, the parameter re-initialization is carried out on the unimportant channels, and the steps are as follows:

step 1: according to the expected compression rate p ₀ ，0＜p ₀ < 1, determining the compression rate currently required to be achieved as p=s _p p ₀ The method comprises the steps of carrying out a first treatment on the surface of the Wherein s is _p For a quantity gradually increasing from 0 to 1 with increasing iteration number, the increment is performed in the form of a cubic function, i.e

s _p ＝(e _i /(e-t)) ³

In the formula e _i E is the total iteration number of the pre-training stage;

s＝1-(e _i /(e-t)) ³

2. A computer system, comprising: one or more processors, a computer-readable storage medium storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of claim 1.

3. A computer readable storage medium, characterized by storing computer executable instructions that, when executed, are adapted to implement the method of claim 1.

4. A computer program product comprising computer executable instructions which, when executed, are adapted to implement the method of claim 1.