CN113627595A - Probability-based MobileNet V1 network channel pruning method - Google Patents
Probability-based MobileNet V1 network channel pruning method Download PDFInfo
- Publication number
- CN113627595A CN113627595A CN202110903135.9A CN202110903135A CN113627595A CN 113627595 A CN113627595 A CN 113627595A CN 202110903135 A CN202110903135 A CN 202110903135A CN 113627595 A CN113627595 A CN 113627595A
- Authority
- CN
- China
- Prior art keywords
- pruning
- channel
- network
- mobilenet
- loss
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a probability-based MobileNet V1 network channel pruning method. The method comprises the following three steps: pre-training, pruning and fusing. A pre-training stage: and training by using the cross entropy loss and the L1 loss of the BN scaling factor to obtain a pre-training model. Pruning: and (3) calculating the probability that the output of each BN channel is less than 0 by using the characteristics of BN and ReLU when the MobileNet V1 network structure is designed, and pruning the channels with high probability. A fusion stage: because the influence of the pruned channel on the accuracy usually exists in the offset factor of the deep convolution output layer BN, the invention fuses the pruned channel into the offset factor of the next layer BN to obtain the final pruned network. By implementing the invention, the acquisition time of pruning can be shortened, the calculation amount of the network is reduced, and the accuracy rate which is the same as that of the pre-training network is kept as far as possible.
Description
Technical Field
The invention relates to the field of neural network pruning algorithms, in particular to a probability-based MobileNet V1 pruning algorithm.
Background
Convolutional Neural Networks (CNNs) have received much attention from the industry as they enable very high recognition and detection accuracy in the field of computer vision. However, the speed of convolutional neural network operations impacts the ultimate hardware deployment. How to accelerate neural network computation is a very important issue while achieving high accuracy. The proposal of MobileNet V1 (please refer to: Howard AG, Zhu M, Chen B, et al. Mobilenes: Efficient connected neural networks for mobile vision applications [ J ]. arXiv preprints: 1704.04861,2017.), initially reduced the computational load of neural networks. However, in a given task, not all channels are important to the output given a certain network, and some channels may be deleted that have little impact on the final output. The currently popular pruning method only utilizes the scaling factor of a Batch Normalization (BN) layer in the network design for judging the importance of the channel, and does not fully consider the offset factor of the BN layer and the architecture design of a neural network; and these pruning methods require three processes: pre-training, pruning and tuning result in huge time required by the whole pruning algorithm process. From the above perspective, the channel importance determination method of the present invention considers the scaling factor and the offset factor of the BN and the ReLU layer behind the BN layer at the same time; and the operations contained by the pruned channels are fused into the bias factors of the lower layer convolution to remove the tuning flow. The invention utilizes the mathematical properties of the commonly used BN layer and the ReLU layer in the network design to calculate the probability that a certain channel can be deleted in a given task. Also, performance may be degraded in a given task after a channel is deleted, and the present invention proposes fusing offset factors. I.e., for pruned channels, its contribution to the underlying computation is often centered on the offset factor of the BN. The calculation related to the offset factor is fused into the offset factor of the next BN layer, compared with a non-fusion method, the pruning MobileNet V1 model with higher accuracy can be obtained, and extra parameters are not needed in the fusion process. And finally, the tuning stage is not needed, and the speed of the whole process of the pruning algorithm is improved.
Disclosure of Invention
The technical problem to be solved by the embodiment of the invention is to provide a probability-based MobileNetV1 network channel pruning method, which makes full use of a BN layer and a ReLU layer in network design, selects a channel from a probability angle for pruning, and subtracts the channel which is more to be pruned; meanwhile, constants existing after pruning of the depth convolution are fused and calculated into the offset factor of the next layer of BN; and removing the tuning stage, and accelerating the time of the whole pruning algorithm.
In order to solve the above technical problem, an embodiment of the present invention provides a probability-based MobileNetV1 network channel pruning method, including the following steps:
step S1, given training set and testing set, in the training process of MobileNet V1, except that the cross loss function loss of the predicted label and the real label is calculatedclsAdditionally, the L1 loss function loss of the BN scaling factor is also calculatednorm(ii) a Calculating gradient by using the two loss functions, and updating parameters of the MobileNet V1; obtaining a pre-training model;
step S2, a parameter Z belongs to [2,4], and Z is defined as beta + zXy gamma; wherein, beta and gamma are trainable parameters of the BN layer respectively: an offset factor and a scaling factor; calculating the Z of each channel of all BN layers in the pre-training model in the step S1;
step S3, regarding Z output in step S2, if Z is less than 0, pruning is carried out, otherwise, pruning is not carried out; in MobileNetV1, its basic module is a deep separable convolution; the depth separable convolution includes a depth convolution and a point-by-point convolution; the deep convolution is characterized in that input channels and output channels are in one-to-one correspondence, a certain input channel and a certain output channel need to be pruned or not pruned at the same time, otherwise, the convolution mode can be damaged; for a certain pair of input and output channels, there are 4 cases: 1) the input channel is not pruned, and the output channel is not pruned; 2) the input channel is not pruned, and the output channel is pruned; 3) the input channel is pruned, and the output channel is not pruned; 4) input channel pruning and output channel pruning; performing channel pruning on the depth convolution conforming to the conditions of 2), 3) and 4) to obtain a preliminary pruning MobileNet V1 model;
step S4, for the case of the 3 rd in the step S3), the output channel outputs a constant, and the constant is not affected by the network input; the related calculation results are fused into the offset factor of the next layer of BN, so that the reduction of the accuracy rate of the pruning network is reduced, and no additional parameter is added in the fusion process;
and step S5, outputting and storing the final pruning MobileNet V1 model.
As a further improvement, in said step S1, the given training set comprises images and corresponding class labels. The computation of the loss function includes two categories: 1) cross entropy losscls(ii) a 2) Loss of L1norm. The total Loss is the weighted sum of the two Losscls+10-5×lossnorm. And calculating the gradient required by back propagation based on the total loss, and updating the parameters of the MobileNet V1 model to obtain a final pre-trained MobileNet V1 model.
As a further improvement, step S1, a training set D is giventrain={(Imagei,Labeli)|i∈[1,M]And test set Dtest={(Imagej,Labelj)|j∈[1,N]}; wherein the ImageiRepresents the ith sample, Label, of the training setiRepresenting the real label, Image, corresponding to the ith sample of the training setjDenotes the jth sample, Label, of the test setjRepresenting the true label corresponding to the jth sample of the test set, and M representing the training set DtrainN represents the test set DtestThe number of samples. Initializing parameters of a given network MobileNet V1 and a random gradient descent optimizer SGD; the parameters of the MobileNet V1 comprise iteration number q and network parameter thetaqNetwork parameter θ of the optimal modelbest;l represents the index of the corresponding network layer number, W represents the parameter of the corresponding convolution layer, B represents the learnable parameters of the BN layer, namely a scaling factor alpha and an offset factor beta;represents the parameters corresponding to the first convolutional layer in the q training,representing learnable parameters corresponding to the l-th BN layer in the q-th training; the iteration number q is initialized to 1, 1 is added each time, and the number of the iterations is 150; network parameter thetaqInitialized to theta1Network parameter θ of the optimal modelbestInitialized to theta1(ii) a The initialization of the parameters of the stochastic gradient descent optimizer SGD includes initializing the learning rate 0.01, the momentum 0.9 and the weight attenuation coefficient 4 x 10-5。
For a certain number of iterations q, training set Dtrain={(Imagei,Labeli)|i∈[1,M]Samples in the training set are input into a MobileNetV1 for forward calculation to obtain a corresponding prediction training set Ptrain={(Imagei,Predicti)|i∈[1,M]}; wherein PredictiRepresents the Image of MobileNet V1 on the training sampleiThe predictive tag of (1).
Calculating a training set D according to a preset cross entropy loss function and an L1 loss functiontrainPrediction tag Predict ofiAnd true tag LabeliObtaining a cross entropy loss value through the error between the two points; obtaining an L1 loss value according to the scaling factor gamma of all BN layers in the MobileNet V1; adding the cross entropy loss value and the L1 loss value to obtain a final loss value, and performing back propagation by using the obtained loss value to the network parameter theta of the MobileNet V1qAnd (6) adjusting. The loss function formula includes: 1) cross entropy losscls(ii) a 2) Loss of L1norm. Therein, lossnormActing only on the scaling factor gamma of BN. The loss function is formulated as:
Loss=losscls+10-5×lossnorm.
wherein, LabeliReal labels, Presect, representing imagesiRepresents the predicted label output by MobileNet V1, M represents the number of training sets, γbAnd (b) represents the scaling factor of a certain channel of a certain layer of BN (in the invention, gamma and beta are scalar quantities, if necessary, an upper subscript is added, if not necessary, the lower subscript is not added), and A represents the number of the scaling factors of all BN layers in the MobileNet V1, namely the sum of the channel numbers of all BN layers in the MobileNet V1 network. Loss is the final Loss value, propagates back through the Loss, and updates the parameter θ of the MobileNet V1 networkq。
Using test set DtestEvaluating the MobileNet V1 network if the parameter theta of the MobileNet V1 networkqWhen the test accuracy is highest, let θbest=θq(ii) a Meanwhile, at the parameter updating end stage, whether the training iteration number reaches the maximum iteration number 150 is judged, if the training iteration number reaches the maximum iteration number 150, the training stage is ended, and the next step S2 is entered; otherwise, the training is continued and q is q + 1.
Updating network parameter θqThe formula of (1) is as follows:
wherein the content of the first and second substances,respectively representing the parameters of the convolution layer of the corresponding l layer and the parameters of the BN layer in model network parameters of the q-th iteration;representing passage through q timesUpdating the network parameters to obtain the network parameters of the (q + 1) th time; η represents a learning rate of 0.01 in the hyper-parameter;the gradient of the corresponding convolutional layer parameter and the gradient of the BN layer parameter are respectively expressed and obtained by a chain derivation rule.
The accuracy of the MobileNetV1 network on the test set was calculated. Test set DtestThe samples in (1) are used as the input of the MobileNet V1 and are calculated layer by layer through a network to obtain a prediction test set P of the corresponding samplestest={(Imagej,Predictj)|j∈[1,N]}. To test set DtestMiddle corresponding real Label LabeljAs a reference, P istestMiddle predictive tag PredictjAnd DtestTrue tag Label in (1)jComparing one by one, and calculating a test set Dtest(iv) accuracy of; defining the network parameter θ of the current MobileNet V1qTest accuracy of ACCqAnd defining an optimal model network parameter thetabestWith an accuracy of ACCbest(initialization to 0), if ACCq>ACCbestThen let θbest=θq. 150 times of training to pre-training MobileNet V1 with the parameter thetabest。
Step S2, given z ∈ [2,4]]And let Z be β + zx γ |. Where β, γ are the offset factor and the scaling factor of the BN layer, respectively. Calculating the MobileNet V1 model parameter theta obtained in the step S1bestZ of all BN layers.
The principle is as follows, and the calculation formula of the BN layer is as follows:
wherein, the ratio of x,respectively representing the input and the output of a BN layer, and E (x), Var (x) is statistic obtained in the network training process; ε is to prevent the denominator from being 0, and its value is equal to 10-5. γ, β are called scaling factor and offset factor of the BN layer, respectively.Variance of is γ2The average is β. If Z is less than or equal to 0, then the output of the BN layer in the angle of probability is representedThere is a great probability of being less than or equal to 0; on the contrary, the probability that the output of the BN layer is less than 0 is not very large. Meanwhile, a ReLU layer is connected behind the BN, and the formula is as follows:
then, whenWhen the probability is greater than or equal to 0, the equal probability of the ReLU outputs 0, and the network calculation with the value of 0 can be pruned; otherwise, no pruning is performed.
Network parameter θ for MobileNetV1 saved at step S1bestCalculating the Z values of all channels of BN in each layer according to the Z ═ beta + zXy | gamma |, and storing the Z value of the idx channel in the l-th layer into an arrayIn (1), get the array
Step S3, obtaining the array according to the step S2To the parameter theta of the MobileNetV1 trained in the step S1bestPruning is carried out. The depth separable convolution consists of two parts: 1) performing depth convolution; 2) and (4) performing point-by-point convolution. Wherein, please refer to the channel pruning method of point-by-point convolution (Liu Z, Li J, Shen Z, et al]// Proceedings of the IEEE International Conference on Computer Vision.2017: 2736-. For the deep convolution, the input channels and the output channels are in one-to-one correspondence, as shown in table 1, under the method of calculating Z in step S2, there are 4 cases in total: 1) the input channel is not pruned, and the output channel is not pruned; 2) the input channel is not pruned, and the output channel is pruned; 3) the input channel is pruned, and the output channel is not pruned; 4) input channel pruning and output channel pruning.
Table 1: 4 cases involved in deep convolutional pruning
In table 1, for a certain input channel and output channel, if the condition 1) is met, the input channel and the output channel are both important, and pruning is not needed; if the condition 2) is met, no matter how important the input channel is, the output channel is unimportant, and the output is 0, so that pruning is directly carried out; if the case 3) is satisfied, the output of the input channel is 0 according to the formula of the step S2, the input of the output channel is fixed to 0, the output of the output channel is a constant, pruning can be performed, and the step S4 is performed in the present invention to maintain the accuracy of the calculated value; if the condition of 4) is met, 0 is output by the input channel and 0 is output by the output channel, and pruning is directly carried out. Obtaining a preliminary pruning MobileNet V1 with the parameter of thetap1。
Step S4, parameter theta of pruning MobileNet V1 obtained in step S3p1The accuracy on the test set is usually lower than the network parameter θ of step S1best. This is due to direct pruning for case 3) of table 1 during pruning, which results in errors in the calculated values of the MobileNetV1 network before and after pruning. The invention is toThe related numerical calculation is directly calculated and fused into the beta of the next layer of BN, so that the error of the network calculated numerical value before and after pruning is reduced. The accuracy of the pruning network on the test set can be close to thetabestAccuracy on top of the test set.
The numerical calculation method involved in the case 3) in the calculation table 1 of the present invention is as follows. For the case 3) in Table 1, the output of layer l-1 is the input of layer l, assuming for ease of explanation the kth channel, soThe formula for the output of layer l is as follows:
whereinIs given a parameter of thetabestOutput of kth channel of l-th layer BN in MobileNetV 1; the average value obtained in the training process is a fixed constant in the testing process;also so as toIs also a fixed constant. Constant numberThe corresponding next layer point-by-point convolution is calculated as follows:
wherein, K1,K3The channel position sets corresponding to the 1 st) and 3) cases in the table 1, and the 2 nd) and 4) case outputs in the table 1 are 0, which can be omitted in the formula calculation;represents the weight of the convolution, l represents the number of layers, and k represents the number of channels (at this point training has been completed, in the test phase, so q in step S1 is ignored). At the time of testing, atIn (1),is a constant that is fixed in the number of,is also a fixed constant, therefore Is a fixed constant. Meanwhile, the operation of the l +1 th layer BN is as follows:
in case 3) of Table 1, the offset factor of l +1 layer BN without pruning isIn the case of pruning, the invention fuses the constants in case 3) to the bias of the l +1 layer BNOf the shift factors, the new shift factor isfusion represents the offset factor after fusion, usingReplacement ofBy changing the beta parameter of the l +1 th layer BN through the fusion, the same numerical calculation result as the non-pruned MobileNet V1 network can be achieved as far as possible under the pruning condition. Therefore, by the above calculation, θ will bep1Parameter update of BN inObtaining the final pruning MobileNet V1 with the parameter of thetap。
And step S5, saving the final pruning network model.
The implementation of the embodiment of the invention has the following beneficial effects:
1. compared with the existing network channel pruning method, the method can reduce the time consumption of the whole pruning algorithm, and only needs pre-training and pruning and two processes. The traditional method adds a third process: and (4) adjusting the quality to obtain better effect. The invention can reduce considerable computation amount under the condition of only two processes, and the accuracy rate is equal to the pre-training network.
2. Compared with the existing network channel pruning method, the method clearly judges whether a certain channel needs to be pruned or not from the perspective of probability. Compared with the method only starting from the magnitude order, the method is more reasonable and has better interpretability.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is within the scope of the present invention for those skilled in the art to obtain other drawings based on the drawings without inventive exercise.
Fig. 1 is a flowchart of a probability-based MobileNetV1 network channel pruning method according to an embodiment of the present invention;
detailed description of the invention
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings.
As shown in fig. 1, in the embodiment of the present invention, a probability-based MobileNetV1 network channel pruning method is provided, where the method includes the following steps:
step S1, giving training set Dtrain={(Imagei,Labeli)|i∈[1,M]And test set Dtest={(Imagej,Labelj)|j∈[1,N]}; wherein the ImageiRepresents the ith sample, Label, of the training setiRepresenting the real label, Image, corresponding to the ith sample of the training setjDenotes the jth sample, Label, of the test setjRepresenting the true label corresponding to the jth sample of the test set, all dimensions of Image are 3 × 224(3 represents the number of channels, the first 224 represents the height of the Image, the second 224 represents the width of the Image, here, the size of batch processing is ignored, and no influence is exerted on the operation), all dimensions of label are 1000(1000 represents the number of categories to be classified, here, the size of batch processing is ignored, and no influence is exerted on the operation), M represents the training set DtrainN represents the test set DtestThe number of samples. Initializing the parameters of a given network MobileNetV1 (the network structure is shown in table 2) and a random gradient descent optimizer SGD; the parameters of the MobileNet V1 comprise iteration number q and network parameter thetaqNetwork parameter θ of the optimal modelbest;l denotes the index of the corresponding network layer number, W denotes the parameter of the corresponding convolutional layer, B denotes the learnable parameter of the BN layer, i.e. puncturingA scaling factor α and an offset factor β;represents the parameters corresponding to the first convolutional layer in the q training,representing learnable parameters corresponding to the l-th BN layer in the q-th training; the iteration number q is initialized to 1, 1 is added each time, and the number of the iterations is 150; network parameter thetaqInitialized to theta1Network parameter θ of the optimal modelbestInitialized to theta1(ii) a The initialization of the parameters of the stochastic gradient descent optimizer SGD includes initializing the learning rate 0.01, the momentum 0.9 and the weight attenuation coefficient 4 x 10-5(convolution parameters require weight attenuation, BN parameters do not).
For a certain number of iterations q, training set Dtrain={(Imagei,Labeli)|i∈[1,M]Samples in the training set are input into a MobileNetV1 for forward calculation to obtain a corresponding prediction training set Ptrain={(Imagei,Predicti)|i∈[1,M]}; wherein PredictiRepresents the Image of MobileNet V1 on the training sampleiThe predictive tag of (1).
A loss function is calculated. Calculating a training set D according to a preset cross entropy loss function and an L1 loss functiontrainPrediction tag Predict ofiAnd true tag LabeliObtaining a cross entropy loss value through the error between the two points; obtaining an L1 loss value according to the scaling factor gamma of all BN layers in the MobileNet V1; adding the cross entropy loss value and the L1 loss value to obtain a final loss value, and performing back propagation by using the obtained loss value to the network parameter theta of the MobileNet V1qAnd (6) adjusting. The loss function formula includes: 1) cross entropy losscls(ii) a 2) Loss of L1norm. Therein, lossnormActing only on the scaling factor gamma of BN. The loss function is formulated as:
Loss=Losscls+10-5×lossnorm.
wherein, LabeliReal labels, Presect, representing imagesiRepresents the predicted label output by MobileNet V1, M represents the number of training sets, γbRepresents the scaling factor of a channel of a BN in a certain layer, and A represents the number of the scaling factors of all BN layers in the MobileNet V1, namely the sum of the number of the channels of all BN layers in the MobileNet V1 network. Loss is the final Loss value, propagates back through the Loss, and updates the parameter θ of the MobileNet V1 networkq。
Using test set DtestEvaluating the MobileNet V1 network if the parameter theta of the MobileNet V1 networkqWhen the test accuracy is highest, let θbest=θq(ii) a Meanwhile, at the parameter updating end stage, whether the training iteration number reaches the maximum iteration number 150 is judged, if the training iteration number reaches the maximum iteration number 150, the training stage is ended, and the next step S2 is entered; otherwise, the training is continued and q is q + 1.
Updating network parameter θqThe formula of (1) is as follows:
wherein the content of the first and second substances,respectively representing the parameters of the convolution layer of the corresponding l layer and the parameters of the BN layer in model network parameters of the q-th iteration;representing that the network parameter of the (q + 1) th time is obtained through the network parameter update of the (q) th time; η represents a learning rate of 0.01 in the hyper-parameter;the gradient of the corresponding convolutional layer parameter and the gradient of the BN layer parameter are respectively expressed and obtained by a chain derivation rule.
The accuracy of the MobileNetV1 network on the test set was calculated. Test set DtestThe samples in (1) are used as the input of the MobileNet V1 and are calculated layer by layer through a network to obtain a prediction test set P of the corresponding samplestest={(Imagej,Predictj)|j∈[1,N]}. To test set DtestMiddle corresponding real Label LabeljAs a reference, P istestMiddle predictive tag PredictjAnd DtestTrue tag Label in (1)jComparing one by one, and calculating a test set Dtest(iv) accuracy of; defining the network parameter θ of the current MobileNet V1qTest accuracy of ACCqAnd defining an optimal model network parameter thetabestWith an accuracy of ACCbest(initialization to 0), if ACCq>ACCbestThen let θbest=θq. 150 times of training to pre-training MobileNet V1 with the parameter thetabest。
Table 2: default BN layer and ReLU layer after each layer of convolution
In the above table, s1 indicates that the convolution kernel step size is 1, s2 indicates that the convolution kernel step size is 2, dw indicates the depth convolution, and no dw indicates the point-by-point convolution (except the first one indicating the standard convolution).
Step S2, given z ∈ [2,4]]And let Z be β + zx γ |. Wherein, beta and gamma are BN layers respectivelyAnd a scaling factor, and are both scalar (i.e., only one number). Calculating the MobileNet V1 model parameter theta obtained in the step S1bestZ of all BN layers.
The principle is as follows, and the calculation formula of the BN layer is as follows:
wherein, the ratio of x,respectively representing the input and the output of a BN layer, and E (x), Var (x) is statistic obtained in the network training process; ε is to prevent the denominator from being 0, and its value is equal to 10-5. γ, β are called scaling factor and offset factor of the BN layer, respectively.Variance of is γ2The average is β. If Z is less than or equal to 0, then the output of the BN layer in the angle of probability is representedThere is a great probability of being less than or equal to 0; on the contrary, the probability that the output of the BN layer is less than 0 is not very large. Meanwhile, a ReLU layer is connected behind the BN, and the formula is as follows:
then, whenWhen the probability is greater than or equal to 0, the equal probability of the ReLU outputs 0, and the network calculation with the value of 0 can be pruned; otherwise, no pruning is performed.
Network parameter θ for MobileNetV1 saved at step S1bestCalculating the Z values of all channels of BN in each layer according to the Z ═ beta + zXy | gamma |, and storing the Z value of the idx channel in the l-th layer into an arrayIn (1), get the array
Step S3, obtaining the array according to the step S2To the parameter theta of the MobileNetV1 trained in the step S1bestPruning is carried out. The depth separable convolution consists of two parts: 1) performing depth convolution; 2) and (4) performing point-by-point convolution. Wherein, please refer to the channel pruning method of point-by-point convolution (Liu Z, Li J, Shen Z, et al]// Proceedings of the IEEE International Conference on Computer Vision.2017: 2736-. For the deep convolution, the input channels and the output channels are in one-to-one correspondence, as shown in table 1, under the method of calculating Z in step S2, there are 4 cases in total: 1) the input channel is not pruned, and the output channel is not pruned; 2) the input channel is not pruned, and the output channel is pruned; 3) the input channel is pruned, and the output channel is not pruned; 4) input channel pruning and output channel pruning.
In table 1, for a certain input channel and output channel, if the condition 1) is met, the input channel and the output channel are both important, and pruning is not needed; if the condition 2) is met, no matter how important the input channel is, the output channel is unimportant, and the output is 0, so that pruning is directly carried out; if the case 3) is satisfied, the output of the input channel is 0 according to the formula of the step S2, the input of the output channel is fixed to 0, the output of the output channel is a constant, pruning can be performed, and the step S4 is performed in the present invention to maintain the accuracy of the calculated value; if the condition of 4) is satisfied, both the input channel and the output channel0 is output and pruning is directly carried out. Obtaining a preliminary pruning MobileNet V1 with the parameter of thetap1。
Step S4, parameter theta of pruning MobileNet V1 obtained in step S3p1The accuracy on the test set is usually lower than the network parameter θ of step S1best. This is due to direct pruning for case 3) of table 1 during pruning, which results in errors in the calculated values of the MobileNetV1 network before and after pruning. The invention directly calculates the related numerical calculation and fuses the numerical calculation into the beta of the next layer of BN, thereby reducing the error of the network calculated numerical calculation before and after pruning. The accuracy of the pruning network on the test set can be close to thetabestAccuracy on top of the test set.
The numerical calculation method involved in the case 3) in the calculation table 1 of the present invention is as follows. For the case 3) in Table 1, the output of layer l-1 is the input of layer l, assuming for ease of explanation the kth channel, soThe formula for the output of layer l is as follows:
whereinIs given a parameter of thetabestThe output of the kth channel of the l-th BN layer in MobileNetV 1; the average value obtained in the training process is a fixed constant in the testing process;also so, soTo be provided withIs also a fixed constant scalar (eventually fused to the scalar beta, because of the presence of broadcast computations, hereCan be treated as a scalar). Constant numberThe corresponding next layer point-by-point convolution is calculated as follows:
wherein, K1,K3The channel position sets corresponding to the 1 st) and 3) cases in the table 1, and the 2 nd) and 4) case outputs in the table 1 are 0, which can be omitted in the formula calculation;represents the weight of the convolution, l represents the number of layers, and k represents the number of channels (at this point training has been completed, in the test phase, so q in step S1 is ignored). At the time of testing, atIn (1),the input is changed, so that the forward calculation is required to be carried out in a pruning network without pruning; in thatIn (1),is a constant that is fixed in the number of,is also solidConstant, thereforeIs a fixed constant. Meanwhile, the operation of the l +1 th layer BN is as follows:
in case 3) of Table 1, the offset factor of the BN in l +1 layers is β without pruningl+1. In the case of pruning, the invention fuses the constants in case 3) into the offset factor of the l +1 layer BN, the new offset factor beingfusionRepresenting the offset factor after fusion bySubstitution of betal+1. By changing the beta parameter of the l +1 th layer BN through the fusion, the same numerical calculation result as the non-pruned MobileNet V1 network can be achieved as far as possible under the pruning condition. Therefore, by the above calculation, θ will bep1Parameter update of BN inObtaining the final pruning MobileNet V1 with the parameter of thetap。
And step S5, saving the final pruning network model.
The embodiment of the invention has the following beneficial effects:
1. compared with the existing network channel pruning method, the method can reduce the time of the whole algorithm, and only needs pre-training and pruning, namely two processes. The traditional method adds a third process: and (4) adjusting the quality to obtain better effect. The invention can reduce considerable computation amount under the condition of only two processes, and the accuracy rate is equal to the pre-training network.
2. Compared with the existing network channel pruning method, the method clearly judges whether a certain channel needs to be pruned or not from the perspective of probability.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by relevant hardware instructed by a program, and the program may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
Claims (2)
1. A probability-based MobileNet V1 network channel pruning method is characterized by comprising the following steps:
step S1, given training set and testing set, in the training process of MobileNet V1, except that the cross loss function loss of the predicted label and the real label is calculatedclsAdditionally, the L1 loss function loss of the BN scaling factor is also calculatednorm(ii) a Calculating gradient by using the two loss functions, and updating parameters of the MobileNet V1; obtaining a pre-training model;
step S2, a parameter Z belongs to [2,4], and Z is defined as beta + zXy gamma; wherein, beta and gamma are trainable parameters of the BN layer respectively: an offset factor and a scaling factor; calculating the Z of each channel of all BN layers in the pre-training model in the step S1;
step S3, regarding Z output in step S2, if Z is less than 0, pruning is carried out, otherwise, pruning is not carried out; in MobileNetV1, its basic module is a deep separable convolution; the depth separable convolution includes a depth convolution and a point-by-point convolution; the deep convolution is characterized in that input channels and output channels are in one-to-one correspondence, a certain input channel and a certain output channel need to be pruned or not pruned at the same time, otherwise, the convolution mode can be damaged; for a certain pair of input and output channels, there are 4 cases: 1) the input channel is not pruned, and the output channel is not pruned; 2) the input channel is not pruned, and the output channel is pruned; 3) the input channel is pruned, and the output channel is not pruned; 4) input channel pruning and output channel pruning; performing channel pruning on the depth convolution conforming to the conditions of 2), 3) and 4) to obtain a preliminary pruning MobileNet V1 model;
step S4, for the case of the 3 rd in the step S3), the output channel outputs a constant, and the constant is not affected by the network input; the related calculation results are fused into the offset factor of the next layer of BN, so that the reduction of the accuracy rate of the pruning network is reduced, and no additional parameter is added in the fusion process;
and step S5, outputting and storing the final pruning MobileNet V1 model.
2. The probability-based MobileNetV1 network channel pruning method according to claim 1, wherein in the step S1, a given training set comprises images and corresponding class labels; the computation of the loss function includes two categories: 1) cross entropy losscls(ii) a 2) Loss of L1norm(ii) a The total Loss is the weighted sum of the two Losscls+10-5×lossnorm(ii) a And calculating the gradient required by back propagation based on the total loss, and updating the parameters of the MobileNet V1 model to obtain a pre-trained MobileNet V1 model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110903135.9A CN113627595B (en) | 2021-08-06 | 2021-08-06 | Probability-based MobileNet V1 network channel pruning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110903135.9A CN113627595B (en) | 2021-08-06 | 2021-08-06 | Probability-based MobileNet V1 network channel pruning method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113627595A true CN113627595A (en) | 2021-11-09 |
CN113627595B CN113627595B (en) | 2023-07-25 |
Family
ID=78383364
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110903135.9A Active CN113627595B (en) | 2021-08-06 | 2021-08-06 | Probability-based MobileNet V1 network channel pruning method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113627595B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108764471A (en) * | 2018-05-17 | 2018-11-06 | 西安电子科技大学 | The neural network cross-layer pruning method of feature based redundancy analysis |
CN109635936A (en) * | 2018-12-29 | 2019-04-16 | 杭州国芯科技股份有限公司 | A kind of neural networks pruning quantization method based on retraining |
CN111291806A (en) * | 2020-02-02 | 2020-06-16 | 西南交通大学 | Identification method of label number of industrial product based on convolutional neural network |
CN111652366A (en) * | 2020-05-09 | 2020-09-11 | 哈尔滨工业大学 | Combined neural network model compression method based on channel pruning and quantitative training |
KR102165273B1 (en) * | 2019-04-02 | 2020-10-13 | 국방과학연구소 | Method and system for channel pruning of compact neural networks |
-
2021
- 2021-08-06 CN CN202110903135.9A patent/CN113627595B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108764471A (en) * | 2018-05-17 | 2018-11-06 | 西安电子科技大学 | The neural network cross-layer pruning method of feature based redundancy analysis |
CN109635936A (en) * | 2018-12-29 | 2019-04-16 | 杭州国芯科技股份有限公司 | A kind of neural networks pruning quantization method based on retraining |
KR102165273B1 (en) * | 2019-04-02 | 2020-10-13 | 국방과학연구소 | Method and system for channel pruning of compact neural networks |
CN111291806A (en) * | 2020-02-02 | 2020-06-16 | 西南交通大学 | Identification method of label number of industrial product based on convolutional neural network |
CN111652366A (en) * | 2020-05-09 | 2020-09-11 | 哈尔滨工业大学 | Combined neural network model compression method based on channel pruning and quantitative training |
Also Published As
Publication number | Publication date |
---|---|
CN113627595B (en) | 2023-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022141754A1 (en) | Automatic pruning method and platform for general compression architecture of convolutional neural network | |
CN107729999A (en) | Consider the deep neural network compression method of matrix correlation | |
CN113435590B (en) | Edge calculation-oriented searching method for heavy parameter neural network architecture | |
CN111259940A (en) | Target detection method based on space attention map | |
US11610154B1 (en) | Preventing overfitting of hyperparameters during training of network | |
CN112766399B (en) | Self-adaptive neural network training method for image recognition | |
CN112381763A (en) | Surface defect detection method | |
US8626676B2 (en) | Regularized dual averaging method for stochastic and online learning | |
US11574193B2 (en) | Method and system for training of neural networks using continuously differentiable models | |
Hebbal et al. | Multi-objective optimization using deep Gaussian processes: application to aerospace vehicle design | |
CN114139683A (en) | Neural network accelerator model quantization method | |
CN112766603A (en) | Traffic flow prediction method, system, computer device and storage medium | |
CN113627595A (en) | Probability-based MobileNet V1 network channel pruning method | |
CN115599918B (en) | Graph enhancement-based mutual learning text classification method and system | |
CN116681945A (en) | Small sample class increment recognition method based on reinforcement learning | |
CN116579408A (en) | Model pruning method and system based on redundancy of model structure | |
US20200372363A1 (en) | Method of Training Artificial Neural Network Using Sparse Connectivity Learning | |
He et al. | GA-based optimization of generative adversarial networks on stock price prediction | |
Simon et al. | Towards a robust differentiable architecture search under label noise | |
CN114511069A (en) | Method and system for improving performance of low bit quantization model | |
CN111652430A (en) | Internet financial platform default rate prediction method and system | |
Wang et al. | Exploring quantization in few-shot learning | |
US20230325664A1 (en) | Method and apparatus for generating neural network | |
CN112052626B (en) | Automatic design system and method for neural network | |
US11900238B1 (en) | Removing nodes from machine-trained network based on introduction of probabilistic noise during training |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |