CN113627595B

CN113627595B - Probability-based MobileNet V1 network channel pruning method

Info

Publication number: CN113627595B
Application number: CN202110903135.9A
Authority: CN
Inventors: 赵汉理; 史开杰; 潘飞; 卢望龙; 黄辉
Original assignee: Wenzhou University
Current assignee: Wenzhou University
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2023-07-25
Anticipated expiration: 2041-08-06
Also published as: CN113627595A

Abstract

The invention provides a probability-based MobileNet V1 network channel pruning method. The method comprises the following three steps: pre-training, pruning and fusion. Pre-training stage: training is carried out by using the cross entropy loss and the L1 loss of the BN scaling factor, and a pre-training model is obtained. Pruning stage: and calculating the probability that the output of each BN channel is smaller than 0 by using the characteristics of BN and ReLU when the MobileNet V1 network structure is designed, and pruning the channel with the large probability. Fusion stage: because the influence of the pruned channel on the accuracy is usually present in the offset factor of the deep convolutional output layer BN, the invention fuses it into the offset factor of the next layer BN to obtain the final pruned network. By implementing the method and the device, the acquisition time of pruning can be shortened, the operation amount of the network is reduced, and meanwhile, the same accuracy as that of the pre-training network is maintained as much as possible.

Description

Probability-based MobileNet V1 network channel pruning method

Technical Field

The invention relates to the field of neural network pruning algorithms, in particular to a probability-based MobileNet V1 pruning algorithm.

Background

Convolutional Neural Networks (CNNs) are receiving widespread attention in the industry because they enable very high recognition and detection accuracy in the field of computer vision. However, the speed of convolutional neural network operations affects the final hardware deployment. It is a very important issue how to speed up the neural network calculations while achieving high accuracy. The MobileNet V1 proposal (please refer to Howard AG, zhu M, chen B, et al Mobilens: efficient convolutional neural networks for mobile vision applications [ J ]. ArXiv preprint arXiv:1704.04861,2017.) initially reduces the computational effort of the neural network. However, in a given task, not all channels are important for output given a certain network, and some channels that have little impact on the final output may be deleted. The current popular pruning method only utilizes the scaling factor of a Batch Normalization (BN) layer in the network design to judge the importance of the channel, and does not fully consider the offset factor of the BN layer and the architecture design of the neural network; and these pruning methods require three flows: pretraining, pruning and tuning result in a huge time required for the whole pruning algorithm flow. From the above point of view, the channel importance judging method of the present invention considers the scaling factor and the offset factor of BN and the ReLU layer behind the BN layer at the same time; and the operations contained in the pruned path are fused into the bias factors of the lower-layer convolution to remove the tuning flow. The invention utilizes the mathematical properties of the BN layer and the ReLU layer commonly used in network design to calculate the probability that a certain channel can be deleted in a given task. At the same time, after deleting the channel, the performance in a given task may be reduced, and the fusion offset factor is proposed by the invention. I.e. for pruned channels, its contribution to the underlying computation is often concentrated on the shift factor of BN. The calculation related to the offset factor is fused into the offset factor of the next BN layer, compared with a non-fusion method, the pruning MobileNet V1 model with higher accuracy can be obtained, and the fusion process does not need additional parameters. And finally, an optimization stage is not needed, so that the speed of the whole flow of the pruning algorithm is improved.

Disclosure of Invention

The technical problem to be solved by the embodiment of the invention is to provide a probability-based MobileNet V1 network channel pruning method, which fully utilizes a BN layer and a ReLU layer in network design, selects channels from the probability angle to prune, and subtracts the channels which should be pruned more; meanwhile, the constant of the fusion depth convolution existing after pruning is calculated into the offset factor of the BN of the next layer; and removing the tuning stage, and accelerating the time of the whole pruning algorithm.

In order to solve the technical problems, the embodiment of the invention provides a probability-based MobileNetV1 network channel pruning method, which comprises the following steps:

step S1, giving a training set and a testing set, and calculating a cross loss function loss of a predicted label and a real label in the training process of the MobileNet V1 _cls The BN scaling factor is additionally calculatedL1 loss function loss of child _norm The method comprises the steps of carrying out a first treatment on the surface of the Calculating the gradient by using the two loss functions, and updating the parameters of the MobileNet V1; obtaining a pre-training model;

step S2, defining a given parameter Z e [2,4], z=β+z×|γ|; wherein β, γ are trainable parameters of BN layer respectively: an offset factor and a scaling factor; calculating Z of each channel of all BN layers in the pre-training model in the step S1;

step S3, for Z output in the step S2, pruning is performed if Z is smaller than 0, otherwise pruning is not performed; in mobilenet v1, its basic module is a depth separable convolution; depth separable convolutions include depth convolutions and point-by-point convolutions; the characteristic of the depth convolution is that the input channels and the output channels are in one-to-one correspondence, and certain input channels and output channels need to be pruned or not pruned at the same time, otherwise, the convolution mode is destroyed; for a pair of input and output channels, there are 4 cases: 1) The input channel is not pruned, and the output channel is not pruned; 2) The input channel does not prune, and the output channel prunes; 3) Pruning is carried out on the input channel, and pruning is not carried out on the output channel; 4) Pruning an input channel and pruning an output channel; carrying out channel pruning on the depth convolution conforming to the conditions of 2), 3) and 4) to obtain a preliminary pruning MobileNet V1 model;

step S4, for the 3 rd condition in the step S3), the output channel outputs a constant, and the constant is not influenced by network input; fusing the related calculation result into the offset factor of the BN of the next layer to reduce the drop of the accuracy of the pruning network, and the fusion process does not increase additional parameters;

and S5, outputting and storing a final pruning MobileNet V1 model.

As a further refinement, in said step S1, the given training set comprises images and corresponding class labels. The calculation of the loss function includes two types: 1) Cross entropy loss _cls The method comprises the steps of carrying out a first treatment on the surface of the 2) L1 loss _norm . The total Loss is the weighted sum Loss of both _cls +10 ^-5 ×loss _norm . Calculating gradient required by back propagation based on total loss, and updating parameters of the Mobile NetV1 model to obtain the final pre-training MobileNetV1 model.

As a further improvement, step S1, giving training set D ^train ＝{(Image _i ,Label _i )|i∈[1,M]Sum test set D ^test ＝{(Image _j ,Label _j )|j∈[1,N]-a }; wherein Image is _i Represents the ith sample of the training set, label _i Representing real label and Image corresponding to ith sample of training set _j Represents the j-th sample of the test set, label _j Representing the real label corresponding to the jth sample of the test set, M representing the training set D ^train N represents the test set D ^test Is a sample number of (a) in a sample. Initializing parameters of a given network mobilenet v1 and a random gradient descent optimizer SGD; wherein the parameters of the MobileNet V1 comprise iteration times q and network parameters theta _q Network parameters θ of the optimal model _best ；l represents the index of the corresponding network layer number, W represents the parameter of the corresponding convolution layer, and B represents the learnable parameter of the BN layer, namely the scaling factor alpha and the offset factor beta; />Parameters representing the corresponding layer 1 convolution layer in the q-th training,/>The leavable parameters corresponding to the first BN layer in the q-th training are represented; initializing the iteration times q to 1, adding 1 each time, and 150 times in total; network parameter θ _q Initialized to θ ₁ Network parameters θ of the optimal model _best Initialized to θ ₁ The method comprises the steps of carrying out a first treatment on the surface of the Parameter initialization of the random gradient descent optimizer SGD includes initializing a learning rate of 0.01, a momentum of 0.9 and a weight decay factor of 4 x 10 ^-5 。

For a certain iteration number q, the training set D ^train ＝{(Image _i ,Label _i )|i∈[1,M]Samples in the sequence are input into a MobileNet V1 to perform forward calculation to obtain a corresponding prediction training set P ^train ＝{(Image _i ,Predict _i )|i∈[1,M]-a }; wherein, the Preject _i Representing mobilenet v1 versus training sample Image _i Is a predictive label of (a).

According to a preset cross entropy loss function and an L1 loss function, calculating a training set D ^train Predictive label prediction of (c) _i And a real Label Label _i The error between the two values is used for obtaining a cross entropy loss value; obtaining an L1 loss value according to scaling factors gamma of all BN layers in the MobileNet V1; adding the cross entropy loss value and the L1 loss value to obtain a final loss value, and carrying out back propagation on the network parameter theta of the MobileNet V1 by utilizing the obtained loss value _q And (5) adjusting. The loss function formula includes: 1) Cross entropy loss _cls The method comprises the steps of carrying out a first treatment on the surface of the 2) L1 loss _norm . Wherein loss is _norm Only on the scaling factor y of BN. The loss function formula is:

Loss＝loss _cls +10 ^-5 ×loss _norm .

wherein Label is _i Real label representing image, and prediction _i Represents the predictive label output by MobileNet V1, M represents the number of training sets, gamma _b The scaling factor representing a certain BN layer in a certain channel (gamma, beta are scalar in the present invention, if necessary, add a subscript, if not necessary, add no subscript), a represents the scaling factor number of all BN layers in mobilenet v1, i.e. the sum of the channel numbers of all BN layers in the mobilenet v1 network. Loss is the final Loss value, back-propagation is performed through Loss, and the parameter θ of the MobileNetV1 network is updated _q 。

Using test set D ^test Evaluating a mobilenet V1 network if the parameters theta of the mobilenet V1 network are _q The test accuracy of (2) is highest, let θ _best ＝θ _q The method comprises the steps of carrying out a first treatment on the surface of the Meanwhile, in the parameter updating ending stage, judging whether the training iteration number reaches the maximum iteration number 150, if so, ending the training stage, and entering the next step S2; otherwise, the training described above is continued and q=q+1.

Updating network parameter θ _q The formula of (2) is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,respectively representing parameters of a corresponding convolution layer of a first layer and parameters of a BN layer in the model network parameters of the q-th iteration; />Representing that the network parameters of the (q+1) th time are obtained through the network parameter updating of the (q) th time; η represents that the learning rate in the super-parameters is 0.01; />The gradient of the corresponding convolution layer parameter and the gradient of the BN layer parameter are respectively represented and obtained through a chain type derivative rule.

And calculating the accuracy of the MobileNet V1 network on the test set. Test set D ^test The samples in the model are used as the input of the MobileNet V1 and are calculated layer by layer through a network to obtain a prediction test set P of the corresponding samples ^test ＝{(Image _j ,Predict _j )|j∈[1,N]}. To test set D ^test Corresponding to the real Label Label _j As a reference, P ^test Medium predictive label prediction _j And D ^test In (3) real Label Label _j Comparing one by one, calculating a test set D ^test The accuracy of (3); definition of current MobileNNetwork parameter θ of etV1 _q Test accuracy of (c) is ACC _q And define the optimal model network parameters theta _best Accuracy of (1) ACC _best (initialized to 0), if ACC _q >ACC _best Let theta _best ＝θ _q . Training 150 times to pre-training mobilenet V1, with parameters of θ _best 。

Step S2, give z ε [2,4]]Let z=β+z×|γ|. Wherein, beta and gamma are respectively an offset factor and a scaling factor of the BN layer. Calculating the MobileNet V1 model parameter theta obtained in the step S1 _best Z of all BN layers.

The principle is as follows, and the calculation formula of the BN layer is as follows:

wherein, x is,respectively representing the input and the output of the BN layer, E (x), var (x) is the statistic obtained in the network training process; epsilon is to prevent the denominator from being 0, and has a value equal to 10 ^-5 . Gamma, beta are called scaling factor and offset factor of BN layer, respectively. />The variance of (2) is gamma ² The average is beta. If Z.ltoreq.0, it means that, in terms of probability, the output of the BN layer is +.>The probability is less than or equal to 0; conversely, the probability that the BN layer output is less than 0 is not very large. Meanwhile, the BN is followed by a ReLU layer, whose formula is:

then, whenWhen the probability is smaller than or equal to 0, the ReLU has equal probability output 0, and the network calculation with the participation of the value 0 can prune; otherwise, no pruning is performed.

Network parameters θ for MobileNetV1 saved in step S1 _best Calculating Z values of all channels of BN of each layer according to Z=beta+z×|gamma|, and storing Z values of the idx th channel in the first layer into an arrayIn (3) obtaining an array->

Step S3, obtaining an array according to the step S2To the parameters theta of the MobileNet V1 trained in the step S1 _best Pruning is carried out. The depth separable convolution consists of two parts: 1) Deep convolution; 2) And (5) point-by-point convolution. For the method of pruning the channel by point convolution, please refer to (Liu Z, li J, shen Z, et al learning efficient convolutional networks through network slimming [ C ]]v/Proceedings of the IEEE International Conference on Computer Vision.2017:2736-2744.). For the depth convolution, the input channels and the output channels are in one-to-one correspondence, as shown in table 1, under the method of calculating Z in step S2, there are 4 cases in total: 1) The input channel is not pruned, and the output channel is not pruned; 2) The input channel does not prune, and the output channel prunes; 3) Pruning is carried out on the input channel, and pruning is not carried out on the output channel; 4) Pruning is carried out on the input channel and pruning is carried out on the output channel.

Table 1: 4 cases involved in deep convolution pruning

In table 1, for a certain input channel and output channel, if the 1 st situation is met, the input channel and output channel are important, and pruning is not needed; if the condition of the 2) is met, no matter how important the input channel is, the output channel is unimportant, and the output is 0, so pruning is directly carried out; if the condition of the step 3) is met, according to the formula of the step S2, the output of the input channel is 0, the input of the output channel is fixed to be 0, the output of the output channel is a constant, pruning can be carried out, and the step S4 is carried out in the invention so as to keep the accuracy of the calculated value; if the conditions of the 4) are met, the input channel and the output channel can output 0, and pruning is directly carried out. Obtaining preliminary pruning MobileNet V1 with the parameter of theta _p1 。

Step S4, the parameters theta of the pruning MobileNet V1 obtained in the step S3 _p1 The accuracy over the test set will typically be lower than the network parameter θ of step S1 _best . This is due to the fact that during pruning, the 3 rd situation in table 1 is pruned directly, which can cause errors in the calculated values of the mobilenet v1 network before and after pruning. The invention directly calculates the related numerical value and fuses the numerical value into the beta of the BN of the next layer, thereby reducing the error of the numerical value calculated by the network before and after pruning. The accuracy of the pruning network on the test set can be close to theta _best Accuracy over the test set.

The numerical calculation method according to the present invention for calculating the 3 rd case in table 1 is as follows. For case 3) in Table 1, the output of layer 1 is the input of layer 1, which for convenience of explanation is assumed to be the kth lane, soThe calculation formula of the output of the first layer is as follows:

wherein the method comprises the steps ofIs a parameter of theta _best The output of the kth channel of the ith layer BN in MobileNetV 1; /> The average value obtained during training is a fixed constant during testing; />So too, so->Is also a fixed constant. Constant->The corresponding next layer point-by-point convolution is calculated as follows:

wherein K is ₁ ,K ₃ Channel position sets corresponding to cases 1) and 3) in table 1, cases 2) and 4) in table 1 output 0, which can be omitted in the calculation of the formula;the weights representing the convolutions, l the number of layers, and k the number of channels (at this time training has been completed, in the test phase, so q in step S1 is ignored). In the test, in->In (I)>Is a fixed constant,/->Is also a constant fixed, so ∈ -> Is a fixed constant. Meanwhile, the operation of the layer l+1 BN is as follows:

3 rd case in Table 1, without pruning, the offset factor of l+1 layer BN isIn the case of pruning, the invention fuses the constants in the 3) case into the offset factor of the l+1 layer BN, the new offset factor beingfusion represents the bias factor after fusion, with +.>Replacement ofBy changing the beta parameter of the first +1th layer BN through the fusion, the result of numerical calculation which is the same as that of the unbeard MobileNet V1 network can be achieved as far as possible under the condition of pruning. Therefore, by the above calculation, θ will be _p1 The parameters of BN in (a) are updated to +.>Obtaining the final pruning MobileNet V1 with the parameter of theta _p 。

And S5, storing a final pruning network model.

The embodiment of the invention has the following beneficial effects:

1. compared with the existing network channel pruning method, the method can reduce the time consumption of the whole pruning algorithm, and only needs two processes of pre-training and pruning. The conventional method adds a third procedure: and (3) optimizing to obtain a better effect. The invention can reduce considerable operand, accuracy and pretraining network level under the condition of only two processes.

2. Compared with the existing network channel pruning method, the method judges whether a certain channel needs pruning or not from the aspect of probability. And compared with a method which only starts from the magnitude order, the method is more reasonable and has better interpretability.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that it is within the scope of the invention to one skilled in the art to obtain other drawings from these drawings without inventive faculty.

Fig. 1 is a flowchart of a probability-based mobile netv1 network channel pruning method according to an embodiment of the present invention;

detailed description of the preferred embodiments

The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.

As shown in fig. 1, in an embodiment of the present invention, a probability-based MobileNetV1 network channel pruning method is provided, where the method includes the following steps:

step S1, giving training set D ^train ＝{(Image _i ,Label _i )|i∈[1,M]Sum test set D ^test ＝{(Image _j ,Label _j )|j∈[1,N]-a }; wherein Image is _i Represents the ith sample of the training set, label _i Representing real label and Image corresponding to ith sample of training set _j Represents the j-th sample of the test set, label _j Representing the real label corresponding to the jth sample of the test set, the dimensions of all images are 3×224×224 (3 represents the number of channels, the first 224 represents the high of the Image, the second 224 represents the wide of the Image, the size of the batch process is ignored here, no effect on the operation), the dimensions of all labels are 1000 (1000 represents the number of categories to be classified, the size of the batch process is ignored here, no effect on the operation), and M represents the training set D ^train N represents the test set D ^test Is a sample number of (a) in a sample. Initializing parameters of a given network mobilenet v1 (network structure is shown in table 2) and a random gradient descent optimizer SGD; wherein the parameters of the MobileNet V1 comprise iteration times q and network parameters theta _q Network parameters θ of the optimal model _best ；l represents the index of the corresponding network layer number, W represents the parameter of the corresponding convolution layer, and B represents the learnable parameter of the BN layer, namely the scaling factor alpha and the offset factor beta; />Parameters representing the corresponding layer 1 convolution layer in the q-th training,/>The leavable parameters corresponding to the first BN layer in the q-th training are represented; initializing the iteration times q to 1, adding 1 each time, and 150 times in total; network parameter θ _q Initialized to θ ₁ Network parameters θ of the optimal model _best Initialized to θ ₁ The method comprises the steps of carrying out a first treatment on the surface of the Parameter initialization of the random gradient descent optimizer SGD includes initializing a learning rate of 0.01, a momentum of 0.9 and a weight decay factor of 4 x 10 ^-5 (the convolution parameters require weight decay, the BN parameters do not).

A loss function is calculated. According to a preset cross entropy loss function and an L1 loss function, calculating a training set D ^train Predictive label prediction of (c) _i And a real Label Label _i The error between the two values is used for obtaining a cross entropy loss value; obtaining an L1 loss value according to scaling factors gamma of all BN layers in the MobileNet V1; adding the cross entropy loss value and the L1 loss value to obtain a final loss value, and carrying out back propagation on the network parameter theta of the MobileNet V1 by utilizing the obtained loss value _q And (5) adjusting. The loss function formula includes: 1) Cross entropy loss _cls The method comprises the steps of carrying out a first treatment on the surface of the 2) L1 loss _norm . Wherein loss is _norm Only on the scaling factor y of BN. The loss function formula is:

Loss＝Loss _cls +10 ^-5 ×loss _norm .

wherein Label is _i Real label representing image, and prediction _i Represents the predictive label output by MobileNet V1, M represents the number of training sets, gamma _b Representing the scaling factor of a certain channel of a BN layer, a represents the scaling factor number of all BN layers in the mobilenet v1, i.e. the sum of the channel numbers of all BN layers in the mobilenet v1 network. Loss is the final Loss value, back-propagation is performed through Loss, and the parameter θ of the MobileNetV1 network is updated _q 。

Using test set D ^test For MoThe BileNetV1 network is evaluated, if the parameters theta of the MobileNetV1 network are _q The test accuracy of (2) is highest, let θ _best ＝θ _q The method comprises the steps of carrying out a first treatment on the surface of the Meanwhile, in the parameter updating ending stage, judging whether the training iteration number reaches the maximum iteration number 150, if so, ending the training stage, and entering the next step S2; otherwise, the training described above is continued and q=q+1.

Updating network parameter θ _q The formula of (2) is as follows:

And calculating the accuracy of the MobileNet V1 network on the test set. Test set D ^test The samples in the model are used as the input of the MobileNet V1 and are calculated layer by layer through a network to obtain a prediction test set P of the corresponding samples ^test ＝{(Image _j ,Predict _j )|j∈[1,N]}. To test set D ^test Corresponding to the real Label Label _j As a reference, P ^test Medium predictive label prediction _j And D ^test In (3) real Label Label _j Comparing one by one, calculating a test set D ^test The accuracy of (3); defining the network parameter θ of the current MobileNetV1 _q Test accuracy of (c) is ACC _q And define the optimal model network parameters theta _best Accuracy of (1) ACC _best (initialized to 0), if ACC _q >ACC _best Let theta _best ＝θ _q . Training 150 times to pre-training mobilenet V1, with parameters of θ _best 。

Table 2: default BN and ReLU layers after each layer of convolution

In the above table, s1 represents a convolution kernel step size of 1, s2 represents a convolution kernel step size of 2, dw represents a depth convolution, and no dw represents a point-by-point convolution (except the first one represents a standard convolution).

Step S2, give z ε [2,4]]Let z=β+z×|γ|. Where β, γ are the offset factor and the scaling factor of the BN layer, respectively, and are both scalar (i.e. just one number). Calculating the MobileNet V1 model parameter theta obtained in the step S1 _best Z of all BN layers.

wherein, x is,respectively represent the outputs of BN layersInput and output, E (x), var (x) is the statistic obtained during the network training process; epsilon is to prevent the denominator from being 0, and has a value equal to 10 ^-5 . Gamma, beta are called scaling factor and offset factor of BN layer, respectively.The variance of (2) is gamma ² The average is beta. If Z.ltoreq.0, it means that, in terms of probability, the output of the BN layer is +.>The probability is less than or equal to 0; conversely, the probability that the BN layer output is less than 0 is not very large. Meanwhile, the BN is followed by a ReLU layer, whose formula is:

Step S3, obtaining an array according to the step S2To the parameters theta of the MobileNet V1 trained in the step S1 _best Pruning is carried out. The depth separable convolution consists of two parts: 1) Depth rollAccumulating; 2) And (5) point-by-point convolution. For the method of pruning the channel by point convolution, please refer to (Liu Z, li J, shen Z, et al learning efficient convolutional networks through network slimming [ C ]]v/Proceedings of the IEEE International Conference on Computer Vision.2017:2736-2744.). For the depth convolution, the input channels and the output channels are in one-to-one correspondence, as shown in table 1, under the method of calculating Z in step S2, there are 4 cases in total: 1) The input channel is not pruned, and the output channel is not pruned; 2) The input channel does not prune, and the output channel prunes; 3) Pruning is carried out on the input channel, and pruning is not carried out on the output channel; 4) Pruning is carried out on the input channel and pruning is carried out on the output channel.

The present invention calculates the numerical value calculation method involved in the 3 rd case in Table 1 such asAnd (3) downwards. For case 3) in Table 1, the output of layer 1 is the input of layer 1, which for convenience of explanation is assumed to be the kth lane, soThe calculation formula of the output of the first layer is as follows: />

Wherein the method comprises the steps ofIs a parameter of theta _best The output of the kth channel of the first BN layer in MobileNetV 1; /> The average value obtained during training is a fixed constant during testing; />So too, so->Is also a fixed constant scalar (finally fused into scalar β, here +.>Can be treated as a scalar process). Constant->The corresponding next layer point-by-point convolution is calculated as follows:

wherein K is ₁ ,K ₃ Channel position sets corresponding to cases 1) and 3) in table 1, cases 2) and 4) in table 1 output 0, which can be omitted in the calculation of the formula;the weights representing the convolutions, l the number of layers, and k the number of channels (at this time training has been completed, in the test phase, so q in step S1 is ignored). In the test, in->In (I)>Is variable with input, so that no pruning is required, and forward computation must be performed in a pruning network; at->In (I)>Is a fixed constant,/->Is also a constant fixed, so ∈ ->Is a fixed constant. Meanwhile, the operation of the layer l+1 BN is as follows:

3 rd case in Table 1, without pruning, the offset factor of l+1 layer BN is β ^l+1 . In the case of pruning, the invention fuses the constants in the 3) case into the offset factor of the l+1 layer BN, the new offset factor beingf _usion Represents the bias factor after fusion, with +.>Replacement of beta ^l+1 . By changing the beta parameter of the first +1th layer BN through the fusion, the result of numerical calculation which is the same as that of the unbeard MobileNet V1 network can be achieved as far as possible under the condition of pruning. Therefore, by the above calculation, θ will be _p1 The parameters of BN in (a) are updated to +.>Obtaining the final pruning MobileNet V1 with the parameter of theta _p 。

And S5, storing a final pruning network model.

The embodiment of the invention has the following beneficial effects:

1. compared with the existing network channel pruning method, the method can reduce the time of the whole algorithm, and only needs two processes of pre-training and pruning. The conventional method adds a third procedure: and (3) optimizing to obtain a better effect. The invention can reduce considerable operand, accuracy and pretraining network level under the condition of only two processes.

2. Compared with the existing network channel pruning method, the method judges whether a certain channel needs pruning or not from the aspect of probability.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in implementing the methods of the above embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc.

The above disclosure is only a preferred embodiment of the present invention, and it is needless to say that the scope of the invention is not limited thereto, and therefore, the equivalent changes according to the claims of the present invention still fall within the scope of the present invention.

Claims

1. A probability-based MobileNetV1 network channel pruning method, the method comprising the steps of:

step S1, giving a training set and a testing set, and calculating a cross loss function loss of a predicted label and a real label in the training process of the MobileNet V1 _ds The L1 loss function loss of BN scaling factor is additionally calculated _norm The method comprises the steps of carrying out a first treatment on the surface of the Calculating the gradient by using the two loss functions, and updating the parameters of the MobileNet V1; obtaining a pre-training model;

step S2, defining a given parameter Z e [2,4], z=β -z×|γ|; wherein β·γ are trainable parameters of the BN layer, respectively: an offset factor and a scaling factor; calculating Z of each channel of all BN layers in the pre-training model in the step S1;

s5, outputting and storing a final pruning MobileNet V1 model;

in said step S1, a given training set comprises images and corresponding class labels; the calculation of the loss function includes two types: 1) Cross entropyLoss _ds The method comprises the steps of carrying out a first treatment on the surface of the 2) L1 loss _norm The method comprises the steps of carrying out a first treatment on the surface of the The total Loss is the weighted sum Loss of both _cb +10 ^-5 ×loss _norm The method comprises the steps of carrying out a first treatment on the surface of the And calculating the gradient required by back propagation based on the total loss, and updating parameters of the MobileNet V1 model to obtain the pre-trained MobileNet V1 model.