CN113627595A - Probability-based MobileNet V1 network channel pruning method - Google Patents

Probability-based MobileNet V1 network channel pruning method Download PDF

Info

Publication number
CN113627595A
CN113627595A CN202110903135.9A CN202110903135A CN113627595A CN 113627595 A CN113627595 A CN 113627595A CN 202110903135 A CN202110903135 A CN 202110903135A CN 113627595 A CN113627595 A CN 113627595A
Authority
CN
China
Prior art keywords
pruning
channel
network
mobilenet
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110903135.9A
Other languages
Chinese (zh)
Other versions
CN113627595B (en
Inventor
赵汉理
史开杰
潘飞
卢望龙
黄辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wenzhou University
Original Assignee
Wenzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wenzhou University filed Critical Wenzhou University
Priority to CN202110903135.9A priority Critical patent/CN113627595B/en
Publication of CN113627595A publication Critical patent/CN113627595A/en
Application granted granted Critical
Publication of CN113627595B publication Critical patent/CN113627595B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a probability-based MobileNet V1 network channel pruning method. The method comprises the following three steps: pre-training, pruning and fusing. A pre-training stage: and training by using the cross entropy loss and the L1 loss of the BN scaling factor to obtain a pre-training model. Pruning: and (3) calculating the probability that the output of each BN channel is less than 0 by using the characteristics of BN and ReLU when the MobileNet V1 network structure is designed, and pruning the channels with high probability. A fusion stage: because the influence of the pruned channel on the accuracy usually exists in the offset factor of the deep convolution output layer BN, the invention fuses the pruned channel into the offset factor of the next layer BN to obtain the final pruned network. By implementing the invention, the acquisition time of pruning can be shortened, the calculation amount of the network is reduced, and the accuracy rate which is the same as that of the pre-training network is kept as far as possible.

Description

Probability-based MobileNet V1 network channel pruning method
Technical Field
The invention relates to the field of neural network pruning algorithms, in particular to a probability-based MobileNet V1 pruning algorithm.
Background
Convolutional Neural Networks (CNNs) have received much attention from the industry as they enable very high recognition and detection accuracy in the field of computer vision. However, the speed of convolutional neural network operations impacts the ultimate hardware deployment. How to accelerate neural network computation is a very important issue while achieving high accuracy. The proposal of MobileNet V1 (please refer to: Howard AG, Zhu M, Chen B, et al. Mobilenes: Efficient connected neural networks for mobile vision applications [ J ]. arXiv preprints: 1704.04861,2017.), initially reduced the computational load of neural networks. However, in a given task, not all channels are important to the output given a certain network, and some channels may be deleted that have little impact on the final output. The currently popular pruning method only utilizes the scaling factor of a Batch Normalization (BN) layer in the network design for judging the importance of the channel, and does not fully consider the offset factor of the BN layer and the architecture design of a neural network; and these pruning methods require three processes: pre-training, pruning and tuning result in huge time required by the whole pruning algorithm process. From the above perspective, the channel importance determination method of the present invention considers the scaling factor and the offset factor of the BN and the ReLU layer behind the BN layer at the same time; and the operations contained by the pruned channels are fused into the bias factors of the lower layer convolution to remove the tuning flow. The invention utilizes the mathematical properties of the commonly used BN layer and the ReLU layer in the network design to calculate the probability that a certain channel can be deleted in a given task. Also, performance may be degraded in a given task after a channel is deleted, and the present invention proposes fusing offset factors. I.e., for pruned channels, its contribution to the underlying computation is often centered on the offset factor of the BN. The calculation related to the offset factor is fused into the offset factor of the next BN layer, compared with a non-fusion method, the pruning MobileNet V1 model with higher accuracy can be obtained, and extra parameters are not needed in the fusion process. And finally, the tuning stage is not needed, and the speed of the whole process of the pruning algorithm is improved.
Disclosure of Invention
The technical problem to be solved by the embodiment of the invention is to provide a probability-based MobileNetV1 network channel pruning method, which makes full use of a BN layer and a ReLU layer in network design, selects a channel from a probability angle for pruning, and subtracts the channel which is more to be pruned; meanwhile, constants existing after pruning of the depth convolution are fused and calculated into the offset factor of the next layer of BN; and removing the tuning stage, and accelerating the time of the whole pruning algorithm.
In order to solve the above technical problem, an embodiment of the present invention provides a probability-based MobileNetV1 network channel pruning method, including the following steps:
step S1, given training set and testing set, in the training process of MobileNet V1, except that the cross loss function loss of the predicted label and the real label is calculatedclsAdditionally, the L1 loss function loss of the BN scaling factor is also calculatednorm(ii) a Calculating gradient by using the two loss functions, and updating parameters of the MobileNet V1; obtaining a pre-training model;
step S2, a parameter Z belongs to [2,4], and Z is defined as beta + zXy gamma; wherein, beta and gamma are trainable parameters of the BN layer respectively: an offset factor and a scaling factor; calculating the Z of each channel of all BN layers in the pre-training model in the step S1;
step S3, regarding Z output in step S2, if Z is less than 0, pruning is carried out, otherwise, pruning is not carried out; in MobileNetV1, its basic module is a deep separable convolution; the depth separable convolution includes a depth convolution and a point-by-point convolution; the deep convolution is characterized in that input channels and output channels are in one-to-one correspondence, a certain input channel and a certain output channel need to be pruned or not pruned at the same time, otherwise, the convolution mode can be damaged; for a certain pair of input and output channels, there are 4 cases: 1) the input channel is not pruned, and the output channel is not pruned; 2) the input channel is not pruned, and the output channel is pruned; 3) the input channel is pruned, and the output channel is not pruned; 4) input channel pruning and output channel pruning; performing channel pruning on the depth convolution conforming to the conditions of 2), 3) and 4) to obtain a preliminary pruning MobileNet V1 model;
step S4, for the case of the 3 rd in the step S3), the output channel outputs a constant, and the constant is not affected by the network input; the related calculation results are fused into the offset factor of the next layer of BN, so that the reduction of the accuracy rate of the pruning network is reduced, and no additional parameter is added in the fusion process;
and step S5, outputting and storing the final pruning MobileNet V1 model.
As a further improvement, in said step S1, the given training set comprises images and corresponding class labels. The computation of the loss function includes two categories: 1) cross entropy losscls(ii) a 2) Loss of L1norm. The total Loss is the weighted sum of the two Losscls+10-5×lossnorm. And calculating the gradient required by back propagation based on the total loss, and updating the parameters of the MobileNet V1 model to obtain a final pre-trained MobileNet V1 model.
As a further improvement, step S1, a training set D is giventrain={(Imagei,Labeli)|i∈[1,M]And test set Dtest={(Imagej,Labelj)|j∈[1,N]}; wherein the ImageiRepresents the ith sample, Label, of the training setiRepresenting the real label, Image, corresponding to the ith sample of the training setjDenotes the jth sample, Label, of the test setjRepresenting the true label corresponding to the jth sample of the test set, and M representing the training set DtrainN represents the test set DtestThe number of samples. Initializing parameters of a given network MobileNet V1 and a random gradient descent optimizer SGD; the parameters of the MobileNet V1 comprise iteration number q and network parameter thetaqNetwork parameter θ of the optimal modelbest
Figure BDA0003200698500000031
l represents the index of the corresponding network layer number, W represents the parameter of the corresponding convolution layer, B represents the learnable parameters of the BN layer, namely a scaling factor alpha and an offset factor beta;
Figure BDA0003200698500000032
represents the parameters corresponding to the first convolutional layer in the q training,
Figure BDA0003200698500000041
representing learnable parameters corresponding to the l-th BN layer in the q-th training; the iteration number q is initialized to 1, 1 is added each time, and the number of the iterations is 150; network parameter thetaqInitialized to theta1Network parameter θ of the optimal modelbestInitialized to theta1(ii) a The initialization of the parameters of the stochastic gradient descent optimizer SGD includes initializing the learning rate 0.01, the momentum 0.9 and the weight attenuation coefficient 4 x 10-5
For a certain number of iterations q, training set Dtrain={(Imagei,Labeli)|i∈[1,M]Samples in the training set are input into a MobileNetV1 for forward calculation to obtain a corresponding prediction training set Ptrain={(Imagei,Predicti)|i∈[1,M]}; wherein PredictiRepresents the Image of MobileNet V1 on the training sampleiThe predictive tag of (1).
Calculating a training set D according to a preset cross entropy loss function and an L1 loss functiontrainPrediction tag Predict ofiAnd true tag LabeliObtaining a cross entropy loss value through the error between the two points; obtaining an L1 loss value according to the scaling factor gamma of all BN layers in the MobileNet V1; adding the cross entropy loss value and the L1 loss value to obtain a final loss value, and performing back propagation by using the obtained loss value to the network parameter theta of the MobileNet V1qAnd (6) adjusting. The loss function formula includes: 1) cross entropy losscls(ii) a 2) Loss of L1norm. Therein, lossnormActing only on the scaling factor gamma of BN. The loss function is formulated as:
Figure BDA0003200698500000042
Figure BDA0003200698500000043
Loss=losscls+10-5×lossnorm.
wherein, LabeliReal labels, Presect, representing imagesiRepresents the predicted label output by MobileNet V1, M represents the number of training sets, γbAnd (b) represents the scaling factor of a certain channel of a certain layer of BN (in the invention, gamma and beta are scalar quantities, if necessary, an upper subscript is added, if not necessary, the lower subscript is not added), and A represents the number of the scaling factors of all BN layers in the MobileNet V1, namely the sum of the channel numbers of all BN layers in the MobileNet V1 network. Loss is the final Loss value, propagates back through the Loss, and updates the parameter θ of the MobileNet V1 networkq
Using test set DtestEvaluating the MobileNet V1 network if the parameter theta of the MobileNet V1 networkqWhen the test accuracy is highest, let θbest=θq(ii) a Meanwhile, at the parameter updating end stage, whether the training iteration number reaches the maximum iteration number 150 is judged, if the training iteration number reaches the maximum iteration number 150, the training stage is ended, and the next step S2 is entered; otherwise, the training is continued and q is q + 1.
Updating network parameter θqThe formula of (1) is as follows:
Figure BDA0003200698500000051
Figure BDA0003200698500000052
wherein the content of the first and second substances,
Figure BDA0003200698500000053
respectively representing the parameters of the convolution layer of the corresponding l layer and the parameters of the BN layer in model network parameters of the q-th iteration;
Figure BDA0003200698500000054
representing passage through q timesUpdating the network parameters to obtain the network parameters of the (q + 1) th time; η represents a learning rate of 0.01 in the hyper-parameter;
Figure BDA0003200698500000055
the gradient of the corresponding convolutional layer parameter and the gradient of the BN layer parameter are respectively expressed and obtained by a chain derivation rule.
The accuracy of the MobileNetV1 network on the test set was calculated. Test set DtestThe samples in (1) are used as the input of the MobileNet V1 and are calculated layer by layer through a network to obtain a prediction test set P of the corresponding samplestest={(Imagej,Predictj)|j∈[1,N]}. To test set DtestMiddle corresponding real Label LabeljAs a reference, P istestMiddle predictive tag PredictjAnd DtestTrue tag Label in (1)jComparing one by one, and calculating a test set Dtest(iv) accuracy of; defining the network parameter θ of the current MobileNet V1qTest accuracy of ACCqAnd defining an optimal model network parameter thetabestWith an accuracy of ACCbest(initialization to 0), if ACCq>ACCbestThen let θbest=θq. 150 times of training to pre-training MobileNet V1 with the parameter thetabest
Step S2, given z ∈ [2,4]]And let Z be β + zx γ |. Where β, γ are the offset factor and the scaling factor of the BN layer, respectively. Calculating the MobileNet V1 model parameter theta obtained in the step S1bestZ of all BN layers.
The principle is as follows, and the calculation formula of the BN layer is as follows:
Figure BDA0003200698500000061
Figure BDA0003200698500000062
wherein, the ratio of x,
Figure BDA0003200698500000063
respectively representing the input and the output of a BN layer, and E (x), Var (x) is statistic obtained in the network training process; ε is to prevent the denominator from being 0, and its value is equal to 10-5. γ, β are called scaling factor and offset factor of the BN layer, respectively.
Figure BDA0003200698500000064
Variance of is γ2The average is β. If Z is less than or equal to 0, then the output of the BN layer in the angle of probability is represented
Figure BDA0003200698500000065
There is a great probability of being less than or equal to 0; on the contrary, the probability that the output of the BN layer is less than 0 is not very large. Meanwhile, a ReLU layer is connected behind the BN, and the formula is as follows:
Figure BDA0003200698500000066
then, when
Figure BDA0003200698500000067
When the probability is greater than or equal to 0, the equal probability of the ReLU outputs 0, and the network calculation with the value of 0 can be pruned; otherwise, no pruning is performed.
Network parameter θ for MobileNetV1 saved at step S1bestCalculating the Z values of all channels of BN in each layer according to the Z ═ beta + zXy | gamma |, and storing the Z value of the idx channel in the l-th layer into an array
Figure BDA0003200698500000068
In (1), get the array
Figure BDA0003200698500000069
Step S3, obtaining the array according to the step S2
Figure BDA00032006985000000610
To the parameter theta of the MobileNetV1 trained in the step S1bestPruning is carried out. The depth separable convolution consists of two parts: 1) performing depth convolution; 2) and (4) performing point-by-point convolution. Wherein, please refer to the channel pruning method of point-by-point convolution (Liu Z, Li J, Shen Z, et al]// Proceedings of the IEEE International Conference on Computer Vision.2017: 2736-. For the deep convolution, the input channels and the output channels are in one-to-one correspondence, as shown in table 1, under the method of calculating Z in step S2, there are 4 cases in total: 1) the input channel is not pruned, and the output channel is not pruned; 2) the input channel is not pruned, and the output channel is pruned; 3) the input channel is pruned, and the output channel is not pruned; 4) input channel pruning and output channel pruning.
Table 1: 4 cases involved in deep convolutional pruning
Figure BDA0003200698500000071
In table 1, for a certain input channel and output channel, if the condition 1) is met, the input channel and the output channel are both important, and pruning is not needed; if the condition 2) is met, no matter how important the input channel is, the output channel is unimportant, and the output is 0, so that pruning is directly carried out; if the case 3) is satisfied, the output of the input channel is 0 according to the formula of the step S2, the input of the output channel is fixed to 0, the output of the output channel is a constant, pruning can be performed, and the step S4 is performed in the present invention to maintain the accuracy of the calculated value; if the condition of 4) is met, 0 is output by the input channel and 0 is output by the output channel, and pruning is directly carried out. Obtaining a preliminary pruning MobileNet V1 with the parameter of thetap1
Step S4, parameter theta of pruning MobileNet V1 obtained in step S3p1The accuracy on the test set is usually lower than the network parameter θ of step S1best. This is due to direct pruning for case 3) of table 1 during pruning, which results in errors in the calculated values of the MobileNetV1 network before and after pruning. The invention is toThe related numerical calculation is directly calculated and fused into the beta of the next layer of BN, so that the error of the network calculated numerical value before and after pruning is reduced. The accuracy of the pruning network on the test set can be close to thetabestAccuracy on top of the test set.
The numerical calculation method involved in the case 3) in the calculation table 1 of the present invention is as follows. For the case 3) in Table 1, the output of layer l-1 is the input of layer l, assuming for ease of explanation the kth channel, so
Figure BDA0003200698500000081
The formula for the output of layer l is as follows:
Figure BDA0003200698500000082
wherein
Figure BDA0003200698500000083
Is given a parameter of thetabestOutput of kth channel of l-th layer BN in MobileNetV 1;
Figure BDA0003200698500000084
Figure BDA0003200698500000085
the average value obtained in the training process is a fixed constant in the testing process;
Figure BDA0003200698500000086
also so as to
Figure BDA0003200698500000087
Is also a fixed constant. Constant number
Figure BDA0003200698500000088
The corresponding next layer point-by-point convolution is calculated as follows:
Figure BDA0003200698500000089
wherein, K1,K3The channel position sets corresponding to the 1 st) and 3) cases in the table 1, and the 2 nd) and 4) case outputs in the table 1 are 0, which can be omitted in the formula calculation;
Figure BDA00032006985000000810
represents the weight of the convolution, l represents the number of layers, and k represents the number of channels (at this point training has been completed, in the test phase, so q in step S1 is ignored). At the time of testing, at
Figure BDA00032006985000000811
In (1),
Figure BDA00032006985000000812
is a constant that is fixed in the number of,
Figure BDA00032006985000000813
is also a fixed constant, therefore
Figure BDA00032006985000000814
Figure BDA00032006985000000815
Is a fixed constant. Meanwhile, the operation of the l +1 th layer BN is as follows:
Figure BDA00032006985000000816
Figure BDA0003200698500000091
in case 3) of Table 1, the offset factor of l +1 layer BN without pruning is
Figure BDA0003200698500000092
In the case of pruning, the invention fuses the constants in case 3) to the bias of the l +1 layer BNOf the shift factors, the new shift factor is
Figure BDA0003200698500000093
fusion represents the offset factor after fusion, using
Figure BDA0003200698500000094
Replacement of
Figure BDA0003200698500000095
By changing the beta parameter of the l +1 th layer BN through the fusion, the same numerical calculation result as the non-pruned MobileNet V1 network can be achieved as far as possible under the pruning condition. Therefore, by the above calculation, θ will bep1Parameter update of BN in
Figure BDA0003200698500000096
Obtaining the final pruning MobileNet V1 with the parameter of thetap
And step S5, saving the final pruning network model.
The implementation of the embodiment of the invention has the following beneficial effects:
1. compared with the existing network channel pruning method, the method can reduce the time consumption of the whole pruning algorithm, and only needs pre-training and pruning and two processes. The traditional method adds a third process: and (4) adjusting the quality to obtain better effect. The invention can reduce considerable computation amount under the condition of only two processes, and the accuracy rate is equal to the pre-training network.
2. Compared with the existing network channel pruning method, the method clearly judges whether a certain channel needs to be pruned or not from the perspective of probability. Compared with the method only starting from the magnitude order, the method is more reasonable and has better interpretability.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is within the scope of the present invention for those skilled in the art to obtain other drawings based on the drawings without inventive exercise.
Fig. 1 is a flowchart of a probability-based MobileNetV1 network channel pruning method according to an embodiment of the present invention;
detailed description of the invention
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings.
As shown in fig. 1, in the embodiment of the present invention, a probability-based MobileNetV1 network channel pruning method is provided, where the method includes the following steps:
step S1, giving training set Dtrain={(Imagei,Labeli)|i∈[1,M]And test set Dtest={(Imagej,Labelj)|j∈[1,N]}; wherein the ImageiRepresents the ith sample, Label, of the training setiRepresenting the real label, Image, corresponding to the ith sample of the training setjDenotes the jth sample, Label, of the test setjRepresenting the true label corresponding to the jth sample of the test set, all dimensions of Image are 3 × 224(3 represents the number of channels, the first 224 represents the height of the Image, the second 224 represents the width of the Image, here, the size of batch processing is ignored, and no influence is exerted on the operation), all dimensions of label are 1000(1000 represents the number of categories to be classified, here, the size of batch processing is ignored, and no influence is exerted on the operation), M represents the training set DtrainN represents the test set DtestThe number of samples. Initializing the parameters of a given network MobileNetV1 (the network structure is shown in table 2) and a random gradient descent optimizer SGD; the parameters of the MobileNet V1 comprise iteration number q and network parameter thetaqNetwork parameter θ of the optimal modelbest
Figure BDA0003200698500000101
l denotes the index of the corresponding network layer number, W denotes the parameter of the corresponding convolutional layer, B denotes the learnable parameter of the BN layer, i.e. puncturingA scaling factor α and an offset factor β;
Figure BDA0003200698500000102
represents the parameters corresponding to the first convolutional layer in the q training,
Figure BDA0003200698500000103
representing learnable parameters corresponding to the l-th BN layer in the q-th training; the iteration number q is initialized to 1, 1 is added each time, and the number of the iterations is 150; network parameter thetaqInitialized to theta1Network parameter θ of the optimal modelbestInitialized to theta1(ii) a The initialization of the parameters of the stochastic gradient descent optimizer SGD includes initializing the learning rate 0.01, the momentum 0.9 and the weight attenuation coefficient 4 x 10-5(convolution parameters require weight attenuation, BN parameters do not).
For a certain number of iterations q, training set Dtrain={(Imagei,Labeli)|i∈[1,M]Samples in the training set are input into a MobileNetV1 for forward calculation to obtain a corresponding prediction training set Ptrain={(Imagei,Predicti)|i∈[1,M]}; wherein PredictiRepresents the Image of MobileNet V1 on the training sampleiThe predictive tag of (1).
A loss function is calculated. Calculating a training set D according to a preset cross entropy loss function and an L1 loss functiontrainPrediction tag Predict ofiAnd true tag LabeliObtaining a cross entropy loss value through the error between the two points; obtaining an L1 loss value according to the scaling factor gamma of all BN layers in the MobileNet V1; adding the cross entropy loss value and the L1 loss value to obtain a final loss value, and performing back propagation by using the obtained loss value to the network parameter theta of the MobileNet V1qAnd (6) adjusting. The loss function formula includes: 1) cross entropy losscls(ii) a 2) Loss of L1norm. Therein, lossnormActing only on the scaling factor gamma of BN. The loss function is formulated as:
Figure BDA0003200698500000111
Figure BDA0003200698500000112
Loss=Losscls+10-5×lossnorm.
wherein, LabeliReal labels, Presect, representing imagesiRepresents the predicted label output by MobileNet V1, M represents the number of training sets, γbRepresents the scaling factor of a channel of a BN in a certain layer, and A represents the number of the scaling factors of all BN layers in the MobileNet V1, namely the sum of the number of the channels of all BN layers in the MobileNet V1 network. Loss is the final Loss value, propagates back through the Loss, and updates the parameter θ of the MobileNet V1 networkq
Using test set DtestEvaluating the MobileNet V1 network if the parameter theta of the MobileNet V1 networkqWhen the test accuracy is highest, let θbest=θq(ii) a Meanwhile, at the parameter updating end stage, whether the training iteration number reaches the maximum iteration number 150 is judged, if the training iteration number reaches the maximum iteration number 150, the training stage is ended, and the next step S2 is entered; otherwise, the training is continued and q is q + 1.
Updating network parameter θqThe formula of (1) is as follows:
Figure BDA0003200698500000121
Figure BDA0003200698500000122
wherein the content of the first and second substances,
Figure BDA0003200698500000123
respectively representing the parameters of the convolution layer of the corresponding l layer and the parameters of the BN layer in model network parameters of the q-th iteration;
Figure BDA0003200698500000124
representing that the network parameter of the (q + 1) th time is obtained through the network parameter update of the (q) th time; η represents a learning rate of 0.01 in the hyper-parameter;
Figure BDA0003200698500000125
the gradient of the corresponding convolutional layer parameter and the gradient of the BN layer parameter are respectively expressed and obtained by a chain derivation rule.
The accuracy of the MobileNetV1 network on the test set was calculated. Test set DtestThe samples in (1) are used as the input of the MobileNet V1 and are calculated layer by layer through a network to obtain a prediction test set P of the corresponding samplestest={(Imagej,Predictj)|j∈[1,N]}. To test set DtestMiddle corresponding real Label LabeljAs a reference, P istestMiddle predictive tag PredictjAnd DtestTrue tag Label in (1)jComparing one by one, and calculating a test set Dtest(iv) accuracy of; defining the network parameter θ of the current MobileNet V1qTest accuracy of ACCqAnd defining an optimal model network parameter thetabestWith an accuracy of ACCbest(initialization to 0), if ACCq>ACCbestThen let θbest=θq. 150 times of training to pre-training MobileNet V1 with the parameter thetabest
Table 2: default BN layer and ReLU layer after each layer of convolution
Figure BDA0003200698500000126
Figure BDA0003200698500000131
In the above table, s1 indicates that the convolution kernel step size is 1, s2 indicates that the convolution kernel step size is 2, dw indicates the depth convolution, and no dw indicates the point-by-point convolution (except the first one indicating the standard convolution).
Step S2, given z ∈ [2,4]]And let Z be β + zx γ |. Wherein, beta and gamma are BN layers respectivelyAnd a scaling factor, and are both scalar (i.e., only one number). Calculating the MobileNet V1 model parameter theta obtained in the step S1bestZ of all BN layers.
The principle is as follows, and the calculation formula of the BN layer is as follows:
Figure BDA0003200698500000141
Figure BDA0003200698500000142
wherein, the ratio of x,
Figure BDA0003200698500000143
respectively representing the input and the output of a BN layer, and E (x), Var (x) is statistic obtained in the network training process; ε is to prevent the denominator from being 0, and its value is equal to 10-5. γ, β are called scaling factor and offset factor of the BN layer, respectively.
Figure BDA0003200698500000144
Variance of is γ2The average is β. If Z is less than or equal to 0, then the output of the BN layer in the angle of probability is represented
Figure BDA0003200698500000145
There is a great probability of being less than or equal to 0; on the contrary, the probability that the output of the BN layer is less than 0 is not very large. Meanwhile, a ReLU layer is connected behind the BN, and the formula is as follows:
Figure BDA0003200698500000146
then, when
Figure BDA0003200698500000147
When the probability is greater than or equal to 0, the equal probability of the ReLU outputs 0, and the network calculation with the value of 0 can be pruned; otherwise, no pruning is performed.
Network parameter θ for MobileNetV1 saved at step S1bestCalculating the Z values of all channels of BN in each layer according to the Z ═ beta + zXy | gamma |, and storing the Z value of the idx channel in the l-th layer into an array
Figure BDA0003200698500000148
In (1), get the array
Figure BDA0003200698500000149
Step S3, obtaining the array according to the step S2
Figure BDA00032006985000001410
To the parameter theta of the MobileNetV1 trained in the step S1bestPruning is carried out. The depth separable convolution consists of two parts: 1) performing depth convolution; 2) and (4) performing point-by-point convolution. Wherein, please refer to the channel pruning method of point-by-point convolution (Liu Z, Li J, Shen Z, et al]// Proceedings of the IEEE International Conference on Computer Vision.2017: 2736-. For the deep convolution, the input channels and the output channels are in one-to-one correspondence, as shown in table 1, under the method of calculating Z in step S2, there are 4 cases in total: 1) the input channel is not pruned, and the output channel is not pruned; 2) the input channel is not pruned, and the output channel is pruned; 3) the input channel is pruned, and the output channel is not pruned; 4) input channel pruning and output channel pruning.
In table 1, for a certain input channel and output channel, if the condition 1) is met, the input channel and the output channel are both important, and pruning is not needed; if the condition 2) is met, no matter how important the input channel is, the output channel is unimportant, and the output is 0, so that pruning is directly carried out; if the case 3) is satisfied, the output of the input channel is 0 according to the formula of the step S2, the input of the output channel is fixed to 0, the output of the output channel is a constant, pruning can be performed, and the step S4 is performed in the present invention to maintain the accuracy of the calculated value; if the condition of 4) is satisfied, both the input channel and the output channel0 is output and pruning is directly carried out. Obtaining a preliminary pruning MobileNet V1 with the parameter of thetap1
Step S4, parameter theta of pruning MobileNet V1 obtained in step S3p1The accuracy on the test set is usually lower than the network parameter θ of step S1best. This is due to direct pruning for case 3) of table 1 during pruning, which results in errors in the calculated values of the MobileNetV1 network before and after pruning. The invention directly calculates the related numerical calculation and fuses the numerical calculation into the beta of the next layer of BN, thereby reducing the error of the network calculated numerical calculation before and after pruning. The accuracy of the pruning network on the test set can be close to thetabestAccuracy on top of the test set.
The numerical calculation method involved in the case 3) in the calculation table 1 of the present invention is as follows. For the case 3) in Table 1, the output of layer l-1 is the input of layer l, assuming for ease of explanation the kth channel, so
Figure BDA0003200698500000151
The formula for the output of layer l is as follows:
Figure BDA0003200698500000152
wherein
Figure BDA0003200698500000161
Is given a parameter of thetabestThe output of the kth channel of the l-th BN layer in MobileNetV 1;
Figure BDA0003200698500000162
Figure BDA0003200698500000163
the average value obtained in the training process is a fixed constant in the testing process;
Figure BDA0003200698500000164
also so, soTo be provided with
Figure BDA0003200698500000165
Is also a fixed constant scalar (eventually fused to the scalar beta, because of the presence of broadcast computations, here
Figure BDA0003200698500000166
Can be treated as a scalar). Constant number
Figure BDA0003200698500000167
The corresponding next layer point-by-point convolution is calculated as follows:
Figure BDA0003200698500000168
wherein, K1,K3The channel position sets corresponding to the 1 st) and 3) cases in the table 1, and the 2 nd) and 4) case outputs in the table 1 are 0, which can be omitted in the formula calculation;
Figure BDA0003200698500000169
represents the weight of the convolution, l represents the number of layers, and k represents the number of channels (at this point training has been completed, in the test phase, so q in step S1 is ignored). At the time of testing, at
Figure BDA00032006985000001610
In (1),
Figure BDA00032006985000001611
the input is changed, so that the forward calculation is required to be carried out in a pruning network without pruning; in that
Figure BDA00032006985000001612
In (1),
Figure BDA00032006985000001613
is a constant that is fixed in the number of,
Figure BDA00032006985000001614
is also solidConstant, therefore
Figure BDA00032006985000001615
Is a fixed constant. Meanwhile, the operation of the l +1 th layer BN is as follows:
Figure BDA00032006985000001616
in case 3) of Table 1, the offset factor of the BN in l +1 layers is β without pruningl+1. In the case of pruning, the invention fuses the constants in case 3) into the offset factor of the l +1 layer BN, the new offset factor being
Figure BDA00032006985000001617
fusionRepresenting the offset factor after fusion by
Figure BDA00032006985000001618
Substitution of betal+1. By changing the beta parameter of the l +1 th layer BN through the fusion, the same numerical calculation result as the non-pruned MobileNet V1 network can be achieved as far as possible under the pruning condition. Therefore, by the above calculation, θ will bep1Parameter update of BN in
Figure BDA0003200698500000171
Obtaining the final pruning MobileNet V1 with the parameter of thetap
And step S5, saving the final pruning network model.
The embodiment of the invention has the following beneficial effects:
1. compared with the existing network channel pruning method, the method can reduce the time of the whole algorithm, and only needs pre-training and pruning, namely two processes. The traditional method adds a third process: and (4) adjusting the quality to obtain better effect. The invention can reduce considerable computation amount under the condition of only two processes, and the accuracy rate is equal to the pre-training network.
2. Compared with the existing network channel pruning method, the method clearly judges whether a certain channel needs to be pruned or not from the perspective of probability.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by relevant hardware instructed by a program, and the program may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (2)

1. A probability-based MobileNet V1 network channel pruning method is characterized by comprising the following steps:
step S1, given training set and testing set, in the training process of MobileNet V1, except that the cross loss function loss of the predicted label and the real label is calculatedclsAdditionally, the L1 loss function loss of the BN scaling factor is also calculatednorm(ii) a Calculating gradient by using the two loss functions, and updating parameters of the MobileNet V1; obtaining a pre-training model;
step S2, a parameter Z belongs to [2,4], and Z is defined as beta + zXy gamma; wherein, beta and gamma are trainable parameters of the BN layer respectively: an offset factor and a scaling factor; calculating the Z of each channel of all BN layers in the pre-training model in the step S1;
step S3, regarding Z output in step S2, if Z is less than 0, pruning is carried out, otherwise, pruning is not carried out; in MobileNetV1, its basic module is a deep separable convolution; the depth separable convolution includes a depth convolution and a point-by-point convolution; the deep convolution is characterized in that input channels and output channels are in one-to-one correspondence, a certain input channel and a certain output channel need to be pruned or not pruned at the same time, otherwise, the convolution mode can be damaged; for a certain pair of input and output channels, there are 4 cases: 1) the input channel is not pruned, and the output channel is not pruned; 2) the input channel is not pruned, and the output channel is pruned; 3) the input channel is pruned, and the output channel is not pruned; 4) input channel pruning and output channel pruning; performing channel pruning on the depth convolution conforming to the conditions of 2), 3) and 4) to obtain a preliminary pruning MobileNet V1 model;
step S4, for the case of the 3 rd in the step S3), the output channel outputs a constant, and the constant is not affected by the network input; the related calculation results are fused into the offset factor of the next layer of BN, so that the reduction of the accuracy rate of the pruning network is reduced, and no additional parameter is added in the fusion process;
and step S5, outputting and storing the final pruning MobileNet V1 model.
2. The probability-based MobileNetV1 network channel pruning method according to claim 1, wherein in the step S1, a given training set comprises images and corresponding class labels; the computation of the loss function includes two categories: 1) cross entropy losscls(ii) a 2) Loss of L1norm(ii) a The total Loss is the weighted sum of the two Losscls+10-5×lossnorm(ii) a And calculating the gradient required by back propagation based on the total loss, and updating the parameters of the MobileNet V1 model to obtain a pre-trained MobileNet V1 model.
CN202110903135.9A 2021-08-06 2021-08-06 Probability-based MobileNet V1 network channel pruning method Active CN113627595B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110903135.9A CN113627595B (en) 2021-08-06 2021-08-06 Probability-based MobileNet V1 network channel pruning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110903135.9A CN113627595B (en) 2021-08-06 2021-08-06 Probability-based MobileNet V1 network channel pruning method

Publications (2)

Publication Number Publication Date
CN113627595A true CN113627595A (en) 2021-11-09
CN113627595B CN113627595B (en) 2023-07-25

Family

ID=78383364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110903135.9A Active CN113627595B (en) 2021-08-06 2021-08-06 Probability-based MobileNet V1 network channel pruning method

Country Status (1)

Country Link
CN (1) CN113627595B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764471A (en) * 2018-05-17 2018-11-06 西安电子科技大学 The neural network cross-layer pruning method of feature based redundancy analysis
CN109635936A (en) * 2018-12-29 2019-04-16 杭州国芯科技股份有限公司 A kind of neural networks pruning quantization method based on retraining
CN111291806A (en) * 2020-02-02 2020-06-16 西南交通大学 Identification method of label number of industrial product based on convolutional neural network
CN111652366A (en) * 2020-05-09 2020-09-11 哈尔滨工业大学 Combined neural network model compression method based on channel pruning and quantitative training
KR102165273B1 (en) * 2019-04-02 2020-10-13 국방과학연구소 Method and system for channel pruning of compact neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764471A (en) * 2018-05-17 2018-11-06 西安电子科技大学 The neural network cross-layer pruning method of feature based redundancy analysis
CN109635936A (en) * 2018-12-29 2019-04-16 杭州国芯科技股份有限公司 A kind of neural networks pruning quantization method based on retraining
KR102165273B1 (en) * 2019-04-02 2020-10-13 국방과학연구소 Method and system for channel pruning of compact neural networks
CN111291806A (en) * 2020-02-02 2020-06-16 西南交通大学 Identification method of label number of industrial product based on convolutional neural network
CN111652366A (en) * 2020-05-09 2020-09-11 哈尔滨工业大学 Combined neural network model compression method based on channel pruning and quantitative training

Also Published As

Publication number Publication date
CN113627595B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
WO2022141754A1 (en) Automatic pruning method and platform for general compression architecture of convolutional neural network
CN107729999A (en) Consider the deep neural network compression method of matrix correlation
CN113435590B (en) Edge calculation-oriented searching method for heavy parameter neural network architecture
CN111259940A (en) Target detection method based on space attention map
US11610154B1 (en) Preventing overfitting of hyperparameters during training of network
CN112766399B (en) Self-adaptive neural network training method for image recognition
CN112381763A (en) Surface defect detection method
US8626676B2 (en) Regularized dual averaging method for stochastic and online learning
US11574193B2 (en) Method and system for training of neural networks using continuously differentiable models
Hebbal et al. Multi-objective optimization using deep Gaussian processes: application to aerospace vehicle design
CN114139683A (en) Neural network accelerator model quantization method
CN112766603A (en) Traffic flow prediction method, system, computer device and storage medium
CN113627595A (en) Probability-based MobileNet V1 network channel pruning method
CN115599918B (en) Graph enhancement-based mutual learning text classification method and system
CN116681945A (en) Small sample class increment recognition method based on reinforcement learning
CN116579408A (en) Model pruning method and system based on redundancy of model structure
US20200372363A1 (en) Method of Training Artificial Neural Network Using Sparse Connectivity Learning
He et al. GA-based optimization of generative adversarial networks on stock price prediction
Simon et al. Towards a robust differentiable architecture search under label noise
CN114511069A (en) Method and system for improving performance of low bit quantization model
CN111652430A (en) Internet financial platform default rate prediction method and system
Wang et al. Exploring quantization in few-shot learning
US20230325664A1 (en) Method and apparatus for generating neural network
CN112052626B (en) Automatic design system and method for neural network
US11900238B1 (en) Removing nodes from machine-trained network based on introduction of probabilistic noise during training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant