CN114913441A

CN114913441A - Channel pruning method, target detection method and remote sensing image vehicle detection method

Info

Publication number: CN114913441A
Application number: CN202210738608.9A
Authority: CN
Inventors: 方乐缘; 朱定舜; 吴洁
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2022-06-28
Filing date: 2022-06-28
Publication date: 2022-08-16
Anticipated expiration: 2042-06-28
Also published as: CN114913441B

Abstract

The invention discloses a channel pruning method, which comprises the steps of determining a target network model; training a target network model to obtain a basic network model; equivalently decoupling the convolution layer of the basic network model to obtain a basic network decoupling model; training a basic network decoupling model to obtain a decoupling model; determining channels which can be compressed finally and reserved channels; and equivalently combining the decoupling models to obtain a network model after channel pruning, and finishing final channel pruning. The invention also discloses a target detection method comprising the channel pruning method and a remote sensing image vehicle detection method comprising the target detection method. Equivalently decoupling convolution layers in the model into cascade of the original convolution and the structural convolution, separately training and equivalently combining the cascade into an original network, and finally cutting channels according to parameters in the structural convolution; therefore, the method not only can keep the original precision of the model, but also has high compression ratio and good reliability.

Description

Channel pruning method, target detection method and remote sensing image vehicle detection method

Technical Field

The invention belongs to the field of digital signal processing, and particularly relates to a channel pruning method, a target detection method and a remote sensing image vehicle detection method.

Background

With the development of economic technology and the improvement of living standard of people, the target detection technology is widely applied to the production and life of people, and brings endless convenience to the production and life of people. Therefore, ensuring the accuracy and rapidity of target detection becomes the key point of research on target detection technology.

At present, the mode of adopting unmanned aerial vehicles to detect targets is already used in a large range. Different from an offline target detection process, target detection on edge devices such as an unmanned aerial vehicle needs to detect a target in a shot image in real time. However, the platform is limited by computing power, memory and power consumption, and generally, a target detection method based on deep learning cannot realize real-time deployment, so that high-precision and light-weight target detection is realized, and the method is particularly important for edge devices such as unmanned aerial vehicles.

In order to meet the real-time deployment of the deep neural network on the end side, researchers have conducted a great deal of research on a model compression method, and the purpose of the method is to simplify the model so as to reduce the calculation amount and the storage amount of the model, and meanwhile, the performance of the model is not affected. The channel pruning method is an important model compression method, the structure of the model does not need to be redefined, and the size of the model is reduced by directly deleting redundant channels, so that the training time of a deep neural network is reduced, and the reasoning speed of the model is accelerated. The channel pruning method provides possibility for the target detection method of deep learning to be deployed on edge equipment.

However, the performance of the deep neural network is closely related to the number of convoluted channels, and the convoluted channels may affect the performance of the model to some extent after pruning, so that a trade-off needs to be made between the pruning degree and the performance. In the training process of traditional model pruning, each parameter participates in training and pruning at the same time, namely precision training and pruning training are coupled; on one hand, however, the optimization target of the model can be changed by a weight penalty term (such as structure sparsity) introduced in pruning training, and the performance of the deep neural network in the training process can be seriously reduced; on the other hand, if the pruning constraint is reduced in order to maintain the model performance and the pruning degree cannot be guaranteed, a pruning model with a high compression rate cannot be obtained.

Disclosure of Invention

The invention aims to provide a channel pruning method which is high in compression rate and good in reliability and can keep the original precision of a model.

It is a further object of the present invention to provide a method of object detection comprising said method of channel pruning.

The invention also aims to provide a remote sensing image vehicle detection method comprising the target detection method.

The channel pruning method provided by the invention comprises the following steps:

s1, determining a target network model;

s2, acquiring a training data set and a loss function, and training the target network model determined in the step S1 by using the acquired training data set and the loss function to obtain a basic network model;

s3, carrying out equivalent decoupling on the convolution layer of the basic network model obtained in the step S2 to obtain a basic network decoupling model;

s4, training the basic network decoupling model obtained in the step S3 by adopting the training data set and the loss function obtained in the step S2 to obtain a decoupling model;

s5, determining a channel which can be compressed finally and a reserved channel according to the decoupling model obtained in the step S4;

and S6, equivalently combining the decoupling models obtained in the step S4 according to the channels which can be compressed and the reserved channels determined in the step S5 to obtain the network models after the channels are pruned, and finishing the channel pruning of the final target network model.

The acquiring of the training data set in step S2 specifically includes the following steps:

acquiring a training picture; carrying out random multi-scale transformation on the obtained training picture; after transformation, randomly turning left and right according to a set probability; finally, unifying the picture size in a gray value supplementing mode;

arranging the pictures into a uniform format: the unified format isn,x,y,w,h) WhereinnIs a target category; (x,y) The central coordinate of the target frame after the relative length and width normalization is obtained; (w,h) The width and the height of the target frame after normalization.

Step S3, performing equivalent decoupling on the convolutional layer of the basic network model obtained in step S2 to obtain a basic network decoupling model, specifically including the following steps:

the basic network model obtained in the step S2WTo (1) acA convolution layerw _c Equivalent decoupling as cascaded protolayersw _c And structural convolutionw _e ；

Wherein the structure is convolutedw _e A convolutional layer of 1 x 1 cores; structural convolutionw _e Has an initial weight ofd _o *d _o The unit matrix of (a) is,d _o is laminated to the original coilw _c The number of output channels.

To speed up the data processing flow, the structure is convolvedw _e Translating to the original convolution layerw _c The latter batch normalization layer.

Step S4, which is to train the basic network decoupling model obtained in step S3 by using the training data set and the loss function obtained in step S2, to obtain a decoupling model, specifically includes the following steps:

A. setting a learning rate by adopting the training data set and the loss function obtained in the step S2, and training the basic network decoupling model obtained in the step S3 again;

during training, beforeNTraining the wheel normally;Nafter the round, sorting according to the size of the parameters of the structural convolution, selecting channels needing to be compressed, and applying an extra punishment gradient to the parameters corresponding to the structural convolution;

B. updating parameters of the structural convolution to

，DThe number of convolution kernel channels of the structure convolution layer; then, the original convolution is calculated according to the following formula by using the parameters of the structural convolutiondChannel importance of individual channelsI _d ：

In the formula

Convolving each channel with the structuredA parameter of a location;

C. selecting the number of channels to be compressedM: at the beginningM= 0; from the firstNAt the beginning of the wheel, each timeXAfter the training of each batch, the training is completed,Mincrease ofYUntil reaching the preset channel compression ratio; meanwhile, the number of channels of each convolution is not lower than a set value when the channels are selectedS(ii) a Wherein the content of the first and second substances,X、YandSare all set positive integers, and

；

D. the convolution parameters are updated by

Wherein

For the purpose of the updated convolution parameters,Win order to obtain the convolution parameters before the update,lin order to obtain a learning rate,Gpair volumes for loss functionThe backtransmission gradient of the product;

in the structural convolution, for a channel which does not need to be compressed, the parameter updating mode is the same as the original convolution parameter updating mode; for the channel needing to be compressed, the gradient updating mode is changed, an additional penalty gradient is applied to the channel, and the parameter updating mode is

WhereinQFor the parameters before the update of the structural convolution,

for the updated parameters of the structural convolution,

is an imposed penalty gradient;

is a penalty factor, and

；

，

is a function of a sign and

。

step S5, determining a channel that can be finally compressed and a channel that is reserved according to the decoupling model obtained in step S4, specifically including the steps of:

calculating the channel importance of each channel of the original convolution by using the parameters of the structural convolution, whereinIThe channel importance of the strip channel isI _d ；

If importance of each channel of the original convolutionI _d Satisfy the requirement of

WhereinkIs a pruning threshold value andk=and 0.01, determining that the channel corresponding to the original convolution is a cut channel, and not reducing the performance of the model after cutting.

The equivalently merging of the decoupling model obtained in step S4 according to the channel that can be compressed and the channel that is reserved and determined in step S5 described in step S6 specifically includes the following steps:

a. combining the calculation formulas of the convolution layer and the batch normalization layer to obtain

In the formulaxIn order to input the features of the image,yis the output of the input features after passing through the convolutional layer and the batch normalization layer,wis a weight parameter of the convolutional layer,bas a bias parameter for the convolutional layer,

for the scaling factor of the batch normalization layer,

is the average of the batch normalization layers,

is the standard deviation of the batch normalization layer,

taken for a set minimum

，

Offset coefficients for batch normalization layer, convolution operator;

b. the combined calculation formula is arranged into a convolution calculation format to obtain

The corresponding convolution is a new convolution;

c. and (c) calculating the weight and the bias of the new convolution obtained in the step (b) by adopting the following formula:

in the formula

Weight parameters for the new convolution;

bias parameters for the new convolution, and convolution operator;

d. and c, combining the new convolution obtained in the step b with the structural convolution, and calculating the weight and the bias of the combined convolution:

in the formula

Weight of the merged convolution layer;w _Q is the weight of the structural convolution;wis the weight of the original convolution;

bias for the merged convolution layer;bas the bias of the original convolution, as the convolution operator;

e. in the combined convolution layer of step d, if the convolution layer includes a channel to be cut,then

And

and simultaneously deleting the parameters on the corresponding channels to finish the cutting of the corresponding channels.

The invention also provides a target detection method comprising the channel pruning method, which comprises the following steps:

(1) constructing an original model of target detection;

(2) performing channel pruning on the target detection original model constructed in the step (1) by adopting the channel pruning method, thereby obtaining a target detection model;

(3) and (3) adopting the target detection model obtained in the step (2) to carry out actual target detection.

The invention also provides a remote sensing image vehicle detection method comprising the target detection method, which comprises the following steps:

1) acquiring a remote sensing image vehicle detection data set;

2) constructing a target detection original model as a Yolov5 model;

3) performing channel pruning on the target detection original model constructed in the step 2) by adopting the channel pruning method, thereby obtaining a cut target detection model;

4) and 3) carrying out actual vehicle detection on the remote sensing image by adopting the target detection model obtained in the step 3).

According to the channel pruning method, the target detection method and the remote sensing image vehicle detection method, convolution layers in a model are equivalently decoupled into cascade connection of original convolution and structural convolution according to the characteristic that the convolution is linear, detachable and combinable, then precision related training is trained according to a normal training mode, pruning related training is only operated on the structural convolution, equivalent combination is carried out into an original network after training is finished, and finally channel pruning is carried out according to parameters in the structural convolution; therefore, the method not only can keep the original precision of the model, but also has high compression ratio and good reliability.

Drawings

FIG. 1 is a schematic process flow diagram of the channel pruning method of the present invention.

Fig. 2 is a schematic view of the pruning principle of the channel pruning method of the present invention.

FIG. 3 is a schematic method flow chart of the target detection method of the present invention.

FIG. 4 is a schematic method flow diagram of the remote sensing image vehicle detection method of the present invention.

Detailed Description

Fig. 1 is a schematic flow chart of the channel pruning method of the present invention: the channel pruning method provided by the invention comprises the following steps:

s1, determining a target network model, such as yolov5 network;

s2, acquiring a training data set and a loss function, and training the target network model determined in the step S1 by using the acquired training data set and the loss function to obtain a basic network model; the method specifically comprises the following steps:

acquiring a training picture; carrying out random multi-scale transformation and parameter on the obtained training picturesPreferably, it is

(ii) a After transformation, randomly turning left and right according to a set probability (preferably 50%); finally, unifying the sizes of the pictures (preferably unifying the sizes to 640 × 640) in a mode of complementing the gray values;

arranging the pictures into a uniform format: the unified format isn,x,y,w,h) In whichnIs a target category; (x,y) The central coordinate after the relative length and width normalization of the target frame is obtained; (w,h) The width and the height of the target frame after normalization;

then, training the target network model determined in the step S1 by using the obtained training data set and the loss function; in training, the learning rate is preferably set to 0.01; after the training is finished, a basic network model is obtainedW；

S3, carrying out equivalent decoupling on the convolution layer of the basic network model obtained in the step S2 to obtain a basic network decoupling model; the method specifically comprises the following steps:

To speed up the data processing flow, the structure is convolvedw _e Translating to the original convolution layerw _c The subsequent batch normalization layer;

the specific process of the step is as follows:

for modelWTo (1) acA convolution layerw _c Let the input feature map bex _i The output characteristic diagram isy _c Then the process is represented as

；

Then, the layers are laminated on the original windingw _c Post-join structural convolutionw _e Wherein, the input and output channels of the original convolution are respectivelyd _i Andd _o ，w _e convolution layer of 1 x 1 kernel with initial weight ofd _o *d _o The unit matrix of (2) is set as an input characteristic diagramx _e =y _c The feature map is output through structural convolution asy _e The process is represented as

(ii) a Since the initial weight of the structural convolution is an identity matrix, the structural convolution is performed by using the identity matrixx _e =y _e ；

Laminating the layersw _c Equivalent decoupling as cascaded protolayersw _c And structural convolutionw _e The whole process is

(ii) a Therefore, the decoupling transformation is completely equivalent mathematically, and the performance of the model before and after the structural convolution is added is completely consistent; in order to simplify data processing brought by the merging model in the step 5, in the actual decoupling operation, after the structural convolution is translated to the batch normalization layer, the performances before and after translation are still completely consistent;

s4, training the basic network decoupling model obtained in the step S3 by adopting the training data set and the loss function obtained in the step S2 to obtain a decoupling model; the method specifically comprises the following steps:

A. setting a learning rate (the same as the learning rate set in step S2) by using the training data set and the loss function (the same as in step S2) acquired in step S2, and training the basic network decoupling model obtained in step S3 again;

during training, beforeNTraining the wheel normally;Nafter the round, the model finishes adapting to the decoupled parameters, then sorts the parameters according to the size of the structural convolution, selects the channels to be compressed, and applies an extra punishment gradient to the parameters corresponding to the structural convolution;Npreferably 5;

B. parameters of the structural convolution are updated to

In the formula

Convolving each channel with the structuredA parameter of a location;

C. selecting the number of channels to be compressedM: at the beginningM=0; from the firstNAt the beginning of the wheel, each timeXAfter the training of each batch, the training is completed,Mincrease ofYUntil reaching the preset channel compression ratio; meanwhile, the number of channels of each convolution is not lower than a set value when the channels are selectedS(ii) a Wherein the content of the first and second substances,X、YandSare all set positive integers, and

；Xpreferably a mixture of 256 and preferably,Ypreferably 16 when the compression requirements are not highSPreferably 8, which can make the network performance better;

D. the convolution parameters are updated by

Wherein

For the purpose of the updated convolution parameters,Win order to update the convolution parameters before the update,lin order to obtain the learning rate of the learning,Gregression gradient of convolution for loss function;

WhereinQFor the parameters before the update of the structural convolution,

for the updated parameters of the structural convolution,

is an imposed penalty gradient;

is a penalty factor, and

；

，

is a function of a sign and

；

under the action of the penalty gradient, for the channel needing to be compressed, the corresponding parameter in the structural convolution gradually approaches to zero; when a certain line of parameters in the structural convolution approaches zero, the neuron of a certain channel of the preceding convolution layer is inactivated, and the channel can be removed in the subsequent steps;

s5, determining a channel which can be compressed finally and a reserved channel according to the decoupling model obtained in the step S4; the method specifically comprises the following steps:

after pruning training, some channels in the structure convolution approach to zero, namely the output of the corresponding channel of the original convolution filter can be ignored under the effect of the structure convolution, so that the removal of the channels can not affect the network performance;

Setting a threshold valuek(ii) a If importance of each channel of the original convolutionI _d Satisfy the requirement of

WhereinkIs a pruning threshold value andk=0.01, determining the channel corresponding to the original convolution as a cut channel, wherein the performance of the model cannot be reduced after cutting;

s6, equivalently combining the decoupling models obtained in the step S4 according to the channels which can be compressed and the reserved channels determined in the step S5 to obtain network models after channel pruning, and finishing the channel pruning of the final target network model; the method specifically comprises the following steps:

a. the convolution layer is calculated as

(ii) a Batch normalization layer calculation as

(ii) a Combining the calculation formulas of the convolution layer and the batch normalization layer to obtain

for the scaling factor of the batch normalization layer,

is the average of the batch normalization layers,

is the standard deviation of the batch normalization layer,

is taken as a minimum

，

Offset coefficients for batch normalization layer, convolution operator;

The corresponding convolution is a new convolution;

c. calculating the weight and bias of the new convolution obtained in step b using the following formula:

in the formula

Weight parameters for the new convolution;

bias parameters for the new convolution, and convolution operator;

d. the new convolution is calculated as

The calculation formula of the structural convolution is

The convolution calculation format of the combined calculation formula is

(ii) a And then, combining the new convolution obtained in the step b with the structural convolution, and calculating the weight and the bias of the combined convolution layer:

in the formula

Is the weight of the merged convolution layer;w _Q is a knotConstructing weights of the convolution;wis the weight of the original convolution;

e. d, in the combined coiling layer in the step d, if the coiling layer comprises a channel to be cut, the coiling layer comprises a channel to be cut

And

Fig. 2 is a schematic view of the pruning principle of the channel pruning method according to the present invention: the method for pruning the structural decoupling channel utilizes the size of the parameter of the structural convolution to represent the importance of each channel of the original convolution and can represent the strength of the information transmission capability of the corresponding convolution channel. After channels which can be cut are selected gradually and iteratively, unimportant channels are gradually attenuated to zero by utilizing punishment gradient, the channels are gradually inactivated in iterative pruning, and the channels can be cut with almost no reduction of performance when networks are combined, so that the pruning with lossless performance is achieved.

Fig. 3 is a schematic flow chart of the method of the target detection method of the present invention: the invention provides a target detection method comprising the channel pruning method, which comprises the following steps:

(1) constructing an original model of target detection;

FIG. 4 is a schematic flow chart of the method of the remote sensing image vehicle detection method of the present invention: the invention provides a remote sensing image vehicle detection method comprising the target detection method, which comprises the following steps:

1) acquiring a remote sensing image vehicle detection data set;

2) constructing a target detection original model as a Yolov5 model;

Claims

1. A channel pruning method comprises the following steps:

s1, determining a target network model;

and S6, equivalently combining the decoupling models obtained in the step S4 according to the channel which can be compressed and the reserved channel determined in the step S5 to obtain a network model after channel pruning, and finishing the channel pruning of the final target network model.

2. The channel pruning method according to claim 1, wherein the step of obtaining the training data set in step S2 specifically includes the following steps:

arranging the pictures into a uniform format: the unified format isn,x,y,w,h) In whichnIs a target category; (x,y) The central coordinate of the target frame after the relative length and width normalization is obtained; (w,h) The width and the height of the target frame after normalization are obtained.

3. The channel pruning method according to claim 2, wherein the step S3 of equivalently decoupling the convolutional layer of the basic network model obtained in the step S2 to obtain a basic network decoupling model specifically comprises the following steps:

Wherein the structure is convolutedw _e A convolutional layer of 1 x 1 cores; structural convolutionw _e Has an initial weight ofd _o *d _o The unit matrix of (a) is,d _o is laminated to the original volumew _c The number of output channels.

4. Method for pruning channels according to claim 3, characterized in that for speeding up the data processing flow, the structure is convolvedw _e Translating to the original convolution layerw _c The latter batch normalization layer.

5. The channel pruning method according to claim 4, wherein the step S4 of training the basic network decoupling model obtained in the step S3 by using the training data set and the loss function obtained in the step S2 to obtain the decoupling model specifically comprises the steps of:

B. updating parameters of the structural convolution to

In the formula

Convolving each channel with the structuredA parameter of a location;

；

D. the convolution parameters are updated by

Wherein

For the purpose of the updated convolution parameters,Win order to update the convolution parameters before the update,lin order to obtain a learning rate,Ga return gradient of the convolution for the loss function;

WhereinQFor the parameters before the update of the structural convolution,

for the updated parameters of the structural convolution,

is an imposed penalty gradient;

is a penalty factor, and

；

，

is a function of a sign and

。

6. the channel pruning method according to claim 5, wherein the step S5 of determining the channels that can be finally compressed and the remaining channels according to the decoupling model obtained in the step S4 specifically comprises the steps of:

In whichkIs a pruning threshold value andk=and 0.01, determining that the channel corresponding to the original convolution is a cut channel, and not reducing the performance of the model after cutting.

7. The channel pruning method according to claim 6, wherein the step S6 of equivalently merging the decoupling models obtained in the step S4 according to the channels that can be compressed and the reserved channels determined in the step S5 includes the following steps: