CN113627595B - Probability-based MobileNet V1 network channel pruning method - Google Patents

Probability-based MobileNet V1 network channel pruning method Download PDF

Info

Publication number
CN113627595B
CN113627595B CN202110903135.9A CN202110903135A CN113627595B CN 113627595 B CN113627595 B CN 113627595B CN 202110903135 A CN202110903135 A CN 202110903135A CN 113627595 B CN113627595 B CN 113627595B
Authority
CN
China
Prior art keywords
pruning
channel
mobilenet
network
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110903135.9A
Other languages
Chinese (zh)
Other versions
CN113627595A (en
Inventor
赵汉理
史开杰
潘飞
卢望龙
黄辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wenzhou University
Original Assignee
Wenzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wenzhou University filed Critical Wenzhou University
Priority to CN202110903135.9A priority Critical patent/CN113627595B/en
Publication of CN113627595A publication Critical patent/CN113627595A/en
Application granted granted Critical
Publication of CN113627595B publication Critical patent/CN113627595B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a probability-based MobileNet V1 network channel pruning method. The method comprises the following three steps: pre-training, pruning and fusion. Pre-training stage: training is carried out by using the cross entropy loss and the L1 loss of the BN scaling factor, and a pre-training model is obtained. Pruning stage: and calculating the probability that the output of each BN channel is smaller than 0 by using the characteristics of BN and ReLU when the MobileNet V1 network structure is designed, and pruning the channel with the large probability. Fusion stage: because the influence of the pruned channel on the accuracy is usually present in the offset factor of the deep convolutional output layer BN, the invention fuses it into the offset factor of the next layer BN to obtain the final pruned network. By implementing the method and the device, the acquisition time of pruning can be shortened, the operation amount of the network is reduced, and meanwhile, the same accuracy as that of the pre-training network is maintained as much as possible.

Description

Probability-based MobileNet V1 network channel pruning method
Technical Field
The invention relates to the field of neural network pruning algorithms, in particular to a probability-based MobileNet V1 pruning algorithm.
Background
Convolutional Neural Networks (CNNs) are receiving widespread attention in the industry because they enable very high recognition and detection accuracy in the field of computer vision. However, the speed of convolutional neural network operations affects the final hardware deployment. It is a very important issue how to speed up the neural network calculations while achieving high accuracy. The MobileNet V1 proposal (please refer to Howard AG, zhu M, chen B, et al Mobilens: efficient convolutional neural networks for mobile vision applications [ J ]. ArXiv preprint arXiv:1704.04861,2017.) initially reduces the computational effort of the neural network. However, in a given task, not all channels are important for output given a certain network, and some channels that have little impact on the final output may be deleted. The current popular pruning method only utilizes the scaling factor of a Batch Normalization (BN) layer in the network design to judge the importance of the channel, and does not fully consider the offset factor of the BN layer and the architecture design of the neural network; and these pruning methods require three flows: pretraining, pruning and tuning result in a huge time required for the whole pruning algorithm flow. From the above point of view, the channel importance judging method of the present invention considers the scaling factor and the offset factor of BN and the ReLU layer behind the BN layer at the same time; and the operations contained in the pruned path are fused into the bias factors of the lower-layer convolution to remove the tuning flow. The invention utilizes the mathematical properties of the BN layer and the ReLU layer commonly used in network design to calculate the probability that a certain channel can be deleted in a given task. At the same time, after deleting the channel, the performance in a given task may be reduced, and the fusion offset factor is proposed by the invention. I.e. for pruned channels, its contribution to the underlying computation is often concentrated on the shift factor of BN. The calculation related to the offset factor is fused into the offset factor of the next BN layer, compared with a non-fusion method, the pruning MobileNet V1 model with higher accuracy can be obtained, and the fusion process does not need additional parameters. And finally, an optimization stage is not needed, so that the speed of the whole flow of the pruning algorithm is improved.
Disclosure of Invention
The technical problem to be solved by the embodiment of the invention is to provide a probability-based MobileNet V1 network channel pruning method, which fully utilizes a BN layer and a ReLU layer in network design, selects channels from the probability angle to prune, and subtracts the channels which should be pruned more; meanwhile, the constant of the fusion depth convolution existing after pruning is calculated into the offset factor of the BN of the next layer; and removing the tuning stage, and accelerating the time of the whole pruning algorithm.
In order to solve the technical problems, the embodiment of the invention provides a probability-based MobileNetV1 network channel pruning method, which comprises the following steps:
step S1, giving a training set and a testing set, and calculating a cross loss function loss of a predicted label and a real label in the training process of the MobileNet V1 cls The BN scaling factor is additionally calculatedL1 loss function loss of child norm The method comprises the steps of carrying out a first treatment on the surface of the Calculating the gradient by using the two loss functions, and updating the parameters of the MobileNet V1; obtaining a pre-training model;
step S2, defining a given parameter Z e [2,4], z=β+z×|γ|; wherein β, γ are trainable parameters of BN layer respectively: an offset factor and a scaling factor; calculating Z of each channel of all BN layers in the pre-training model in the step S1;
step S3, for Z output in the step S2, pruning is performed if Z is smaller than 0, otherwise pruning is not performed; in mobilenet v1, its basic module is a depth separable convolution; depth separable convolutions include depth convolutions and point-by-point convolutions; the characteristic of the depth convolution is that the input channels and the output channels are in one-to-one correspondence, and certain input channels and output channels need to be pruned or not pruned at the same time, otherwise, the convolution mode is destroyed; for a pair of input and output channels, there are 4 cases: 1) The input channel is not pruned, and the output channel is not pruned; 2) The input channel does not prune, and the output channel prunes; 3) Pruning is carried out on the input channel, and pruning is not carried out on the output channel; 4) Pruning an input channel and pruning an output channel; carrying out channel pruning on the depth convolution conforming to the conditions of 2), 3) and 4) to obtain a preliminary pruning MobileNet V1 model;
step S4, for the 3 rd condition in the step S3), the output channel outputs a constant, and the constant is not influenced by network input; fusing the related calculation result into the offset factor of the BN of the next layer to reduce the drop of the accuracy of the pruning network, and the fusion process does not increase additional parameters;
and S5, outputting and storing a final pruning MobileNet V1 model.
As a further refinement, in said step S1, the given training set comprises images and corresponding class labels. The calculation of the loss function includes two types: 1) Cross entropy loss cls The method comprises the steps of carrying out a first treatment on the surface of the 2) L1 loss norm . The total Loss is the weighted sum Loss of both cls +10 -5 ×loss norm . Calculating gradient required by back propagation based on total loss, and updating parameters of the Mobile NetV1 model to obtain the final pre-training MobileNetV1 model.
As a further improvement, step S1, giving training set D train ={(Image i ,Label i )|i∈[1,M]Sum test set D test ={(Image j ,Label j )|j∈[1,N]-a }; wherein Image is i Represents the ith sample of the training set, label i Representing real label and Image corresponding to ith sample of training set j Represents the j-th sample of the test set, label j Representing the real label corresponding to the jth sample of the test set, M representing the training set D train N represents the test set D test Is a sample number of (a) in a sample. Initializing parameters of a given network mobilenet v1 and a random gradient descent optimizer SGD; wherein the parameters of the MobileNet V1 comprise iteration times q and network parameters theta q Network parameters θ of the optimal model bestl represents the index of the corresponding network layer number, W represents the parameter of the corresponding convolution layer, and B represents the learnable parameter of the BN layer, namely the scaling factor alpha and the offset factor beta; />Parameters representing the corresponding layer 1 convolution layer in the q-th training,/>The leavable parameters corresponding to the first BN layer in the q-th training are represented; initializing the iteration times q to 1, adding 1 each time, and 150 times in total; network parameter θ q Initialized to θ 1 Network parameters θ of the optimal model best Initialized to θ 1 The method comprises the steps of carrying out a first treatment on the surface of the Parameter initialization of the random gradient descent optimizer SGD includes initializing a learning rate of 0.01, a momentum of 0.9 and a weight decay factor of 4 x 10 -5
For a certain iteration number q, the training set D train ={(Image i ,Label i )|i∈[1,M]Samples in the sequence are input into a MobileNet V1 to perform forward calculation to obtain a corresponding prediction training set P train ={(Image i ,Predict i )|i∈[1,M]-a }; wherein, the Preject i Representing mobilenet v1 versus training sample Image i Is a predictive label of (a).
According to a preset cross entropy loss function and an L1 loss function, calculating a training set D train Predictive label prediction of (c) i And a real Label Label i The error between the two values is used for obtaining a cross entropy loss value; obtaining an L1 loss value according to scaling factors gamma of all BN layers in the MobileNet V1; adding the cross entropy loss value and the L1 loss value to obtain a final loss value, and carrying out back propagation on the network parameter theta of the MobileNet V1 by utilizing the obtained loss value q And (5) adjusting. The loss function formula includes: 1) Cross entropy loss cls The method comprises the steps of carrying out a first treatment on the surface of the 2) L1 loss norm . Wherein loss is norm Only on the scaling factor y of BN. The loss function formula is:
Loss=loss cls +10 -5 ×loss norm .
wherein Label is i Real label representing image, and prediction i Represents the predictive label output by MobileNet V1, M represents the number of training sets, gamma b The scaling factor representing a certain BN layer in a certain channel (gamma, beta are scalar in the present invention, if necessary, add a subscript, if not necessary, add no subscript), a represents the scaling factor number of all BN layers in mobilenet v1, i.e. the sum of the channel numbers of all BN layers in the mobilenet v1 network. Loss is the final Loss value, back-propagation is performed through Loss, and the parameter θ of the MobileNetV1 network is updated q
Using test set D test Evaluating a mobilenet V1 network if the parameters theta of the mobilenet V1 network are q The test accuracy of (2) is highest, let θ best =θ q The method comprises the steps of carrying out a first treatment on the surface of the Meanwhile, in the parameter updating ending stage, judging whether the training iteration number reaches the maximum iteration number 150, if so, ending the training stage, and entering the next step S2; otherwise, the training described above is continued and q=q+1.
Updating network parameter θ q The formula of (2) is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,respectively representing parameters of a corresponding convolution layer of a first layer and parameters of a BN layer in the model network parameters of the q-th iteration; />Representing that the network parameters of the (q+1) th time are obtained through the network parameter updating of the (q) th time; η represents that the learning rate in the super-parameters is 0.01; />The gradient of the corresponding convolution layer parameter and the gradient of the BN layer parameter are respectively represented and obtained through a chain type derivative rule.
And calculating the accuracy of the MobileNet V1 network on the test set. Test set D test The samples in the model are used as the input of the MobileNet V1 and are calculated layer by layer through a network to obtain a prediction test set P of the corresponding samples test ={(Image j ,Predict j )|j∈[1,N]}. To test set D test Corresponding to the real Label Label j As a reference, P test Medium predictive label prediction j And D test In (3) real Label Label j Comparing one by one, calculating a test set D test The accuracy of (3); definition of current MobileNNetwork parameter θ of etV1 q Test accuracy of (c) is ACC q And define the optimal model network parameters theta best Accuracy of (1) ACC best (initialized to 0), if ACC q >ACC best Let theta best =θ q . Training 150 times to pre-training mobilenet V1, with parameters of θ best
Step S2, give z ε [2,4]]Let z=β+z×|γ|. Wherein, beta and gamma are respectively an offset factor and a scaling factor of the BN layer. Calculating the MobileNet V1 model parameter theta obtained in the step S1 best Z of all BN layers.
The principle is as follows, and the calculation formula of the BN layer is as follows:
wherein, x is,respectively representing the input and the output of the BN layer, E (x), var (x) is the statistic obtained in the network training process; epsilon is to prevent the denominator from being 0, and has a value equal to 10 -5 . Gamma, beta are called scaling factor and offset factor of BN layer, respectively. />The variance of (2) is gamma 2 The average is beta. If Z.ltoreq.0, it means that, in terms of probability, the output of the BN layer is +.>The probability is less than or equal to 0; conversely, the probability that the BN layer output is less than 0 is not very large. Meanwhile, the BN is followed by a ReLU layer, whose formula is:
then, whenWhen the probability is smaller than or equal to 0, the ReLU has equal probability output 0, and the network calculation with the participation of the value 0 can prune; otherwise, no pruning is performed.
Network parameters θ for MobileNetV1 saved in step S1 best Calculating Z values of all channels of BN of each layer according to Z=beta+z×|gamma|, and storing Z values of the idx th channel in the first layer into an arrayIn (3) obtaining an array->
Step S3, obtaining an array according to the step S2To the parameters theta of the MobileNet V1 trained in the step S1 best Pruning is carried out. The depth separable convolution consists of two parts: 1) Deep convolution; 2) And (5) point-by-point convolution. For the method of pruning the channel by point convolution, please refer to (Liu Z, li J, shen Z, et al learning efficient convolutional networks through network slimming [ C ]]v/Proceedings of the IEEE International Conference on Computer Vision.2017:2736-2744.). For the depth convolution, the input channels and the output channels are in one-to-one correspondence, as shown in table 1, under the method of calculating Z in step S2, there are 4 cases in total: 1) The input channel is not pruned, and the output channel is not pruned; 2) The input channel does not prune, and the output channel prunes; 3) Pruning is carried out on the input channel, and pruning is not carried out on the output channel; 4) Pruning is carried out on the input channel and pruning is carried out on the output channel.
Table 1: 4 cases involved in deep convolution pruning
In table 1, for a certain input channel and output channel, if the 1 st situation is met, the input channel and output channel are important, and pruning is not needed; if the condition of the 2) is met, no matter how important the input channel is, the output channel is unimportant, and the output is 0, so pruning is directly carried out; if the condition of the step 3) is met, according to the formula of the step S2, the output of the input channel is 0, the input of the output channel is fixed to be 0, the output of the output channel is a constant, pruning can be carried out, and the step S4 is carried out in the invention so as to keep the accuracy of the calculated value; if the conditions of the 4) are met, the input channel and the output channel can output 0, and pruning is directly carried out. Obtaining preliminary pruning MobileNet V1 with the parameter of theta p1
Step S4, the parameters theta of the pruning MobileNet V1 obtained in the step S3 p1 The accuracy over the test set will typically be lower than the network parameter θ of step S1 best . This is due to the fact that during pruning, the 3 rd situation in table 1 is pruned directly, which can cause errors in the calculated values of the mobilenet v1 network before and after pruning. The invention directly calculates the related numerical value and fuses the numerical value into the beta of the BN of the next layer, thereby reducing the error of the numerical value calculated by the network before and after pruning. The accuracy of the pruning network on the test set can be close to theta best Accuracy over the test set.
The numerical calculation method according to the present invention for calculating the 3 rd case in table 1 is as follows. For case 3) in Table 1, the output of layer 1 is the input of layer 1, which for convenience of explanation is assumed to be the kth lane, soThe calculation formula of the output of the first layer is as follows:
wherein the method comprises the steps ofIs a parameter of theta best The output of the kth channel of the ith layer BN in MobileNetV 1; /> The average value obtained during training is a fixed constant during testing; />So too, so->Is also a fixed constant. Constant->The corresponding next layer point-by-point convolution is calculated as follows:
wherein K is 1 ,K 3 Channel position sets corresponding to cases 1) and 3) in table 1, cases 2) and 4) in table 1 output 0, which can be omitted in the calculation of the formula;the weights representing the convolutions, l the number of layers, and k the number of channels (at this time training has been completed, in the test phase, so q in step S1 is ignored). In the test, in->In (I)>Is a fixed constant,/->Is also a constant fixed, so ∈ -> Is a fixed constant. Meanwhile, the operation of the layer l+1 BN is as follows:
3 rd case in Table 1, without pruning, the offset factor of l+1 layer BN isIn the case of pruning, the invention fuses the constants in the 3) case into the offset factor of the l+1 layer BN, the new offset factor beingfusion represents the bias factor after fusion, with +.>Replacement ofBy changing the beta parameter of the first +1th layer BN through the fusion, the result of numerical calculation which is the same as that of the unbeard MobileNet V1 network can be achieved as far as possible under the condition of pruning. Therefore, by the above calculation, θ will be p1 The parameters of BN in (a) are updated to +.>Obtaining the final pruning MobileNet V1 with the parameter of theta p
And S5, storing a final pruning network model.
The embodiment of the invention has the following beneficial effects:
1. compared with the existing network channel pruning method, the method can reduce the time consumption of the whole pruning algorithm, and only needs two processes of pre-training and pruning. The conventional method adds a third procedure: and (3) optimizing to obtain a better effect. The invention can reduce considerable operand, accuracy and pretraining network level under the condition of only two processes.
2. Compared with the existing network channel pruning method, the method judges whether a certain channel needs pruning or not from the aspect of probability. And compared with a method which only starts from the magnitude order, the method is more reasonable and has better interpretability.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that it is within the scope of the invention to one skilled in the art to obtain other drawings from these drawings without inventive faculty.
Fig. 1 is a flowchart of a probability-based mobile netv1 network channel pruning method according to an embodiment of the present invention;
detailed description of the preferred embodiments
The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.
As shown in fig. 1, in an embodiment of the present invention, a probability-based MobileNetV1 network channel pruning method is provided, where the method includes the following steps:
step S1, giving training set D train ={(Image i ,Label i )|i∈[1,M]Sum test set D test ={(Image j ,Label j )|j∈[1,N]-a }; wherein Image is i Represents the ith sample of the training set, label i Representing real label and Image corresponding to ith sample of training set j Represents the j-th sample of the test set, label j Representing the real label corresponding to the jth sample of the test set, the dimensions of all images are 3×224×224 (3 represents the number of channels, the first 224 represents the high of the Image, the second 224 represents the wide of the Image, the size of the batch process is ignored here, no effect on the operation), the dimensions of all labels are 1000 (1000 represents the number of categories to be classified, the size of the batch process is ignored here, no effect on the operation), and M represents the training set D train N represents the test set D test Is a sample number of (a) in a sample. Initializing parameters of a given network mobilenet v1 (network structure is shown in table 2) and a random gradient descent optimizer SGD; wherein the parameters of the MobileNet V1 comprise iteration times q and network parameters theta q Network parameters θ of the optimal model bestl represents the index of the corresponding network layer number, W represents the parameter of the corresponding convolution layer, and B represents the learnable parameter of the BN layer, namely the scaling factor alpha and the offset factor beta; />Parameters representing the corresponding layer 1 convolution layer in the q-th training,/>The leavable parameters corresponding to the first BN layer in the q-th training are represented; initializing the iteration times q to 1, adding 1 each time, and 150 times in total; network parameter θ q Initialized to θ 1 Network parameters θ of the optimal model best Initialized to θ 1 The method comprises the steps of carrying out a first treatment on the surface of the Parameter initialization of the random gradient descent optimizer SGD includes initializing a learning rate of 0.01, a momentum of 0.9 and a weight decay factor of 4 x 10 -5 (the convolution parameters require weight decay, the BN parameters do not).
For a certain iteration number q, the training set D train ={(Image i ,Label i )|i∈[1,M]Samples in the sequence are input into a MobileNet V1 to perform forward calculation to obtain a corresponding prediction training set P train ={(Image i ,Predict i )|i∈[1,M]-a }; wherein, the Preject i Representing mobilenet v1 versus training sample Image i Is a predictive label of (a).
A loss function is calculated. According to a preset cross entropy loss function and an L1 loss function, calculating a training set D train Predictive label prediction of (c) i And a real Label Label i The error between the two values is used for obtaining a cross entropy loss value; obtaining an L1 loss value according to scaling factors gamma of all BN layers in the MobileNet V1; adding the cross entropy loss value and the L1 loss value to obtain a final loss value, and carrying out back propagation on the network parameter theta of the MobileNet V1 by utilizing the obtained loss value q And (5) adjusting. The loss function formula includes: 1) Cross entropy loss cls The method comprises the steps of carrying out a first treatment on the surface of the 2) L1 loss norm . Wherein loss is norm Only on the scaling factor y of BN. The loss function formula is:
Loss=Loss cls +10 -5 ×loss norm .
wherein Label is i Real label representing image, and prediction i Represents the predictive label output by MobileNet V1, M represents the number of training sets, gamma b Representing the scaling factor of a certain channel of a BN layer, a represents the scaling factor number of all BN layers in the mobilenet v1, i.e. the sum of the channel numbers of all BN layers in the mobilenet v1 network. Loss is the final Loss value, back-propagation is performed through Loss, and the parameter θ of the MobileNetV1 network is updated q
Using test set D test For MoThe BileNetV1 network is evaluated, if the parameters theta of the MobileNetV1 network are q The test accuracy of (2) is highest, let θ best =θ q The method comprises the steps of carrying out a first treatment on the surface of the Meanwhile, in the parameter updating ending stage, judging whether the training iteration number reaches the maximum iteration number 150, if so, ending the training stage, and entering the next step S2; otherwise, the training described above is continued and q=q+1.
Updating network parameter θ q The formula of (2) is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,respectively representing parameters of a corresponding convolution layer of a first layer and parameters of a BN layer in the model network parameters of the q-th iteration; />Representing that the network parameters of the (q+1) th time are obtained through the network parameter updating of the (q) th time; η represents that the learning rate in the super-parameters is 0.01; />The gradient of the corresponding convolution layer parameter and the gradient of the BN layer parameter are respectively represented and obtained through a chain type derivative rule.
And calculating the accuracy of the MobileNet V1 network on the test set. Test set D test The samples in the model are used as the input of the MobileNet V1 and are calculated layer by layer through a network to obtain a prediction test set P of the corresponding samples test ={(Image j ,Predict j )|j∈[1,N]}. To test set D test Corresponding to the real Label Label j As a reference, P test Medium predictive label prediction j And D test In (3) real Label Label j Comparing one by one, calculating a test set D test The accuracy of (3); defining the network parameter θ of the current MobileNetV1 q Test accuracy of (c) is ACC q And define the optimal model network parameters theta best Accuracy of (1) ACC best (initialized to 0), if ACC q >ACC best Let theta best =θ q . Training 150 times to pre-training mobilenet V1, with parameters of θ best
Table 2: default BN and ReLU layers after each layer of convolution
In the above table, s1 represents a convolution kernel step size of 1, s2 represents a convolution kernel step size of 2, dw represents a depth convolution, and no dw represents a point-by-point convolution (except the first one represents a standard convolution).
Step S2, give z ε [2,4]]Let z=β+z×|γ|. Where β, γ are the offset factor and the scaling factor of the BN layer, respectively, and are both scalar (i.e. just one number). Calculating the MobileNet V1 model parameter theta obtained in the step S1 best Z of all BN layers.
The principle is as follows, and the calculation formula of the BN layer is as follows:
wherein, x is,respectively represent the outputs of BN layersInput and output, E (x), var (x) is the statistic obtained during the network training process; epsilon is to prevent the denominator from being 0, and has a value equal to 10 -5 . Gamma, beta are called scaling factor and offset factor of BN layer, respectively.The variance of (2) is gamma 2 The average is beta. If Z.ltoreq.0, it means that, in terms of probability, the output of the BN layer is +.>The probability is less than or equal to 0; conversely, the probability that the BN layer output is less than 0 is not very large. Meanwhile, the BN is followed by a ReLU layer, whose formula is:
then, whenWhen the probability is smaller than or equal to 0, the ReLU has equal probability output 0, and the network calculation with the participation of the value 0 can prune; otherwise, no pruning is performed.
Network parameters θ for MobileNetV1 saved in step S1 best Calculating Z values of all channels of BN of each layer according to Z=beta+z×|gamma|, and storing Z values of the idx th channel in the first layer into an arrayIn (3) obtaining an array->
Step S3, obtaining an array according to the step S2To the parameters theta of the MobileNet V1 trained in the step S1 best Pruning is carried out. The depth separable convolution consists of two parts: 1) Depth rollAccumulating; 2) And (5) point-by-point convolution. For the method of pruning the channel by point convolution, please refer to (Liu Z, li J, shen Z, et al learning efficient convolutional networks through network slimming [ C ]]v/Proceedings of the IEEE International Conference on Computer Vision.2017:2736-2744.). For the depth convolution, the input channels and the output channels are in one-to-one correspondence, as shown in table 1, under the method of calculating Z in step S2, there are 4 cases in total: 1) The input channel is not pruned, and the output channel is not pruned; 2) The input channel does not prune, and the output channel prunes; 3) Pruning is carried out on the input channel, and pruning is not carried out on the output channel; 4) Pruning is carried out on the input channel and pruning is carried out on the output channel.
In table 1, for a certain input channel and output channel, if the 1 st situation is met, the input channel and output channel are important, and pruning is not needed; if the condition of the 2) is met, no matter how important the input channel is, the output channel is unimportant, and the output is 0, so pruning is directly carried out; if the condition of the step 3) is met, according to the formula of the step S2, the output of the input channel is 0, the input of the output channel is fixed to be 0, the output of the output channel is a constant, pruning can be carried out, and the step S4 is carried out in the invention so as to keep the accuracy of the calculated value; if the conditions of the 4) are met, the input channel and the output channel can output 0, and pruning is directly carried out. Obtaining preliminary pruning MobileNet V1 with the parameter of theta p1
Step S4, the parameters theta of the pruning MobileNet V1 obtained in the step S3 p1 The accuracy over the test set will typically be lower than the network parameter θ of step S1 best . This is due to the fact that during pruning, the 3 rd situation in table 1 is pruned directly, which can cause errors in the calculated values of the mobilenet v1 network before and after pruning. The invention directly calculates the related numerical value and fuses the numerical value into the beta of the BN of the next layer, thereby reducing the error of the numerical value calculated by the network before and after pruning. The accuracy of the pruning network on the test set can be close to theta best Accuracy over the test set.
The present invention calculates the numerical value calculation method involved in the 3 rd case in Table 1 such asAnd (3) downwards. For case 3) in Table 1, the output of layer 1 is the input of layer 1, which for convenience of explanation is assumed to be the kth lane, soThe calculation formula of the output of the first layer is as follows: />
Wherein the method comprises the steps ofIs a parameter of theta best The output of the kth channel of the first BN layer in MobileNetV 1; /> The average value obtained during training is a fixed constant during testing; />So too, so->Is also a fixed constant scalar (finally fused into scalar β, here +.>Can be treated as a scalar process). Constant->The corresponding next layer point-by-point convolution is calculated as follows:
wherein K is 1 ,K 3 Channel position sets corresponding to cases 1) and 3) in table 1, cases 2) and 4) in table 1 output 0, which can be omitted in the calculation of the formula;the weights representing the convolutions, l the number of layers, and k the number of channels (at this time training has been completed, in the test phase, so q in step S1 is ignored). In the test, in->In (I)>Is variable with input, so that no pruning is required, and forward computation must be performed in a pruning network; at->In (I)>Is a fixed constant,/->Is also a constant fixed, so ∈ ->Is a fixed constant. Meanwhile, the operation of the layer l+1 BN is as follows:
3 rd case in Table 1, without pruning, the offset factor of l+1 layer BN is β l+1 . In the case of pruning, the invention fuses the constants in the 3) case into the offset factor of the l+1 layer BN, the new offset factor beingf usion Represents the bias factor after fusion, with +.>Replacement of beta l+1 . By changing the beta parameter of the first +1th layer BN through the fusion, the result of numerical calculation which is the same as that of the unbeard MobileNet V1 network can be achieved as far as possible under the condition of pruning. Therefore, by the above calculation, θ will be p1 The parameters of BN in (a) are updated to +.>Obtaining the final pruning MobileNet V1 with the parameter of theta p
And S5, storing a final pruning network model.
The embodiment of the invention has the following beneficial effects:
1. compared with the existing network channel pruning method, the method can reduce the time of the whole algorithm, and only needs two processes of pre-training and pruning. The conventional method adds a third procedure: and (3) optimizing to obtain a better effect. The invention can reduce considerable operand, accuracy and pretraining network level under the condition of only two processes.
2. Compared with the existing network channel pruning method, the method judges whether a certain channel needs pruning or not from the aspect of probability.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in implementing the methods of the above embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc.
The above disclosure is only a preferred embodiment of the present invention, and it is needless to say that the scope of the invention is not limited thereto, and therefore, the equivalent changes according to the claims of the present invention still fall within the scope of the present invention.

Claims (1)

1. A probability-based MobileNetV1 network channel pruning method, the method comprising the steps of:
step S1, giving a training set and a testing set, and calculating a cross loss function loss of a predicted label and a real label in the training process of the MobileNet V1 ds The L1 loss function loss of BN scaling factor is additionally calculated norm The method comprises the steps of carrying out a first treatment on the surface of the Calculating the gradient by using the two loss functions, and updating the parameters of the MobileNet V1; obtaining a pre-training model;
step S2, defining a given parameter Z e [2,4], z=β -z×|γ|; wherein β·γ are trainable parameters of the BN layer, respectively: an offset factor and a scaling factor; calculating Z of each channel of all BN layers in the pre-training model in the step S1;
step S3, for Z output in the step S2, pruning is performed if Z is smaller than 0, otherwise pruning is not performed; in mobilenet v1, its basic module is a depth separable convolution; depth separable convolutions include depth convolutions and point-by-point convolutions; the characteristic of the depth convolution is that the input channels and the output channels are in one-to-one correspondence, and certain input channels and output channels need to be pruned or not pruned at the same time, otherwise, the convolution mode is destroyed; for a pair of input and output channels, there are 4 cases: 1) The input channel is not pruned, and the output channel is not pruned; 2) The input channel does not prune, and the output channel prunes; 3) Pruning is carried out on the input channel, and pruning is not carried out on the output channel; 4) Pruning an input channel and pruning an output channel; carrying out channel pruning on the depth convolution conforming to the conditions of 2), 3) and 4) to obtain a preliminary pruning MobileNet V1 model;
step S4, for the 3 rd condition in the step S3), the output channel outputs a constant, and the constant is not influenced by network input; fusing the related calculation result into the offset factor of the BN of the next layer to reduce the drop of the accuracy of the pruning network, and the fusion process does not increase additional parameters;
s5, outputting and storing a final pruning MobileNet V1 model;
in said step S1, a given training set comprises images and corresponding class labels; the calculation of the loss function includes two types: 1) Cross entropyLoss ds The method comprises the steps of carrying out a first treatment on the surface of the 2) L1 loss norm The method comprises the steps of carrying out a first treatment on the surface of the The total Loss is the weighted sum Loss of both cb +10 -5 ×loss norm The method comprises the steps of carrying out a first treatment on the surface of the And calculating the gradient required by back propagation based on the total loss, and updating parameters of the MobileNet V1 model to obtain the pre-trained MobileNet V1 model.
CN202110903135.9A 2021-08-06 2021-08-06 Probability-based MobileNet V1 network channel pruning method Active CN113627595B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110903135.9A CN113627595B (en) 2021-08-06 2021-08-06 Probability-based MobileNet V1 network channel pruning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110903135.9A CN113627595B (en) 2021-08-06 2021-08-06 Probability-based MobileNet V1 network channel pruning method

Publications (2)

Publication Number Publication Date
CN113627595A CN113627595A (en) 2021-11-09
CN113627595B true CN113627595B (en) 2023-07-25

Family

ID=78383364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110903135.9A Active CN113627595B (en) 2021-08-06 2021-08-06 Probability-based MobileNet V1 network channel pruning method

Country Status (1)

Country Link
CN (1) CN113627595B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764471A (en) * 2018-05-17 2018-11-06 西安电子科技大学 The neural network cross-layer pruning method of feature based redundancy analysis
CN109635936A (en) * 2018-12-29 2019-04-16 杭州国芯科技股份有限公司 A kind of neural networks pruning quantization method based on retraining
CN111291806A (en) * 2020-02-02 2020-06-16 西南交通大学 Identification method of label number of industrial product based on convolutional neural network
CN111652366A (en) * 2020-05-09 2020-09-11 哈尔滨工业大学 Combined neural network model compression method based on channel pruning and quantitative training
KR102165273B1 (en) * 2019-04-02 2020-10-13 국방과학연구소 Method and system for channel pruning of compact neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764471A (en) * 2018-05-17 2018-11-06 西安电子科技大学 The neural network cross-layer pruning method of feature based redundancy analysis
CN109635936A (en) * 2018-12-29 2019-04-16 杭州国芯科技股份有限公司 A kind of neural networks pruning quantization method based on retraining
KR102165273B1 (en) * 2019-04-02 2020-10-13 국방과학연구소 Method and system for channel pruning of compact neural networks
CN111291806A (en) * 2020-02-02 2020-06-16 西南交通大学 Identification method of label number of industrial product based on convolutional neural network
CN111652366A (en) * 2020-05-09 2020-09-11 哈尔滨工业大学 Combined neural network model compression method based on channel pruning and quantitative training

Also Published As

Publication number Publication date
CN113627595A (en) 2021-11-09

Similar Documents

Publication Publication Date Title
US11093826B2 (en) Efficient determination of optimized learning settings of neural networks
CN109034205B (en) Image classification method based on direct-push type semi-supervised deep learning
WO2022141754A1 (en) Automatic pruning method and platform for general compression architecture of convolutional neural network
CN107729999A (en) Consider the deep neural network compression method of matrix correlation
CN113052239B (en) Image classification method and system of neural network based on gradient direction parameter optimization
CN111259940A (en) Target detection method based on space attention map
US11610154B1 (en) Preventing overfitting of hyperparameters during training of network
Hebbal et al. Multi-objective optimization using deep Gaussian processes: application to aerospace vehicle design
CN112381763A (en) Surface defect detection method
Tang et al. On training recurrent networks with truncated backpropagation through time in speech recognition
US11574193B2 (en) Method and system for training of neural networks using continuously differentiable models
CN114937204A (en) Lightweight multi-feature aggregated neural network remote sensing change detection method
CN115809624B (en) Automatic analysis design method for integrated circuit microstrip line transmission line
CN112818893A (en) Lightweight open-set landmark identification method facing mobile terminal
CN112507114A (en) Multi-input LSTM-CNN text classification method and system based on word attention mechanism
CN112766603A (en) Traffic flow prediction method, system, computer device and storage medium
CN113627595B (en) Probability-based MobileNet V1 network channel pruning method
Li et al. Filter pruning via probabilistic model-based optimization for accelerating deep convolutional neural networks
US20200372363A1 (en) Method of Training Artificial Neural Network Using Sparse Connectivity Learning
He et al. GA-based optimization of generative adversarial networks on stock price prediction
WO2022104271A1 (en) Automatic early-exiting machine learning models
Ni et al. Linear Range in Gradient Descent
CN112052626B (en) Automatic design system and method for neural network
US11900238B1 (en) Removing nodes from machine-trained network based on introduction of probabilistic noise during training
Sri et al. Facial emotion recognition using dcnn algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant