CN112802141A

CN112802141A - Model compression method and terminal applied to image target detection

Info

Publication number: CN112802141A
Application number: CN202110300622.6A
Authority: CN
Inventors: 潘成龙; 张宇; 刘东剑; 杨伟强
Original assignee: Santachi Video Technology Shenzhen Co ltd
Current assignee: Santachi Video Technology Shenzhen Co ltd
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2021-05-14
Anticipated expiration: 2041-03-22
Also published as: CN112802141B

Abstract

The invention discloses a model compression method and a terminal applied to image target detection.A significance factor layer independent of an original convolution network is added after a convolution layer needing pruning in a preset target detection algorithm, and significance factor vectors of each significance factor layer are thinned, so that the characteristics which do not contribute to the algorithm model are preliminarily removed; the threshold value of the importance factor parameter is determined according to the preset pruning rate, the importance of the convolutional layer channel is judged according to the threshold value, the convolutional layer channel corresponding to the importance factor parameter lower than the threshold value is deleted, the algorithm model can be pruned under the condition of not depending on a specific layer structure, and the model volume can be greatly reduced and the precision loss can be reduced by using the compression method in the target detection algorithm; the trimmed model is trained to a preset precision in a fine tuning mode, the model precision and accuracy can be guaranteed while the model is compressed, and the model is easy to realize and deploy without a large amount of computing time resources.

Description

Model compression method and terminal applied to image target detection

Technical Field

The invention relates to the field of computer image processing, in particular to a model compression method and a terminal applied to image target detection.

Background

When the convolutional neural network model is compressed, usually by using the characteristic relationship between layers in a convolutional layer, the next convolutional neural network result with the same or similar channels can be calculated by using the specific subset channels in the convolutional neural network of the previous layer, so that all channels except the specific subset channels in the convolutional neural network of the previous layer can be deleted, thereby achieving the effect of deleting the channels and the compression model. However, in practical use, the error between the complete model and the pruning model needs to be reconstructed in a minimized manner by using the least square method, and the method is difficult to apply to the industry.

Another common method is to use the known parameters in the convolutional neural network or the parameters of the existing network layer as the channel importance criteria, for example, the γ parameter in the BatchNorm layer for accelerating the algorithm convergence and preventing overfitting as the channel importance criteria, and to sparsify the γ parameter in the BatchNorm layer before pruning, when the γ parameter in the BatchNorm layer is larger at this time, the more important the channel of the current BatchNorm layer containing the γ parameter is considered, and the more important the convolution layer channel corresponding to the channel is considered, whereas, the smaller the γ parameter in the BatchNorm layer is, the less important the channel of the current BatchNorm layer containing the γ parameter is considered, and the convolution layer channel corresponding to the channel is deleted. However, this method requires the inclusion of the BatchNorm layer in the deep neural network, and once the method is used without the BatchNorm layer, the method cannot be used, and the method consumes a lot of time and resources for training and fine tuning at a later stage.

Another commonly used method is that the weight itself represents the importance of the network channel, and by using the sum of the weights normalized by L1 as the judgment basis of the importance of the channel, when the sum of the weights normalized by L1 of the channel is small, it represents that the channel is not important and can be deleted, but when the method is actually applied, the judgment method is too simple, and the accuracy loss is serious especially in the field of target detection.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the model compression method and the terminal applied to image target detection are provided, and are independent of a specific layer structure in the compression process and guarantee the precision after compression.

In order to solve the technical problems, the invention adopts the technical scheme that:

a model compression method applied to image target detection comprises the following steps:

training a preset target detection algorithm based on a preset data set to obtain a converged target detection algorithm model;

adding an importance factor layer independent of an original convolution network after a convolution layer needing pruning in the target detection algorithm model, and thinning the importance factor vector of each importance factor layer;

calculating the threshold value of the importance factor parameter in the importance factor vector according to a preset pruning rate;

judging whether each importance factor parameter of each importance factor layer is smaller than the threshold value, if so, cutting out the convolutional layer channel corresponding to the importance factor parameter;

training the pruned target detection algorithm model based on the preset data set to obtain a fine tuning model, judging whether the fine tuning model reaches preset precision, if so, stopping training, and if not, continuing training the fine tuning model until the preset precision is reached.

In order to solve the technical problem, the invention adopts another technical scheme as follows:

a model compression terminal for image object detection, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

The invention has the beneficial effects that: adding an importance factor layer independent of an original convolution network after a convolution layer needing pruning in a preset target detection algorithm, and thinning an importance factor vector of each importance factor layer; determining a threshold value of the importance factor parameter according to a preset pruning rate, judging the importance of the convolutional layer channel according to the threshold value, and deleting the convolutional layer channel corresponding to the importance factor parameter lower than the threshold value; in the prior art, the parameters of the original convolutional layer or the original network parameters are usually used, and the parameters of the original network are thinned, and the thinning can cause the large change of the original network parameters, the precision after pruning is greatly reduced, and the subsequent training and fine adjustment are required to be large; compared with the prior art, the method has the advantages that an importance factor layer is newly added, the importance factor layer is not related to the original network, only the newly added importance factor layer is thinned, and parameter distribution of the original network is not changed when the importance factor layer is thinned, so that the original network is not influenced during thinning, precision after pruning is almost lossless, a large amount of time for training and fine-tuning of a model after pruning is not needed, calculation time and resources are saved, an algorithm model can be pruned without depending on a specific layer structure, the method is suitable for any network, the size of the model can be greatly reduced by using the compression method in a target detection algorithm, and precision loss is reduced; the trimmed model is finely tuned and trained to a preset precision, the model precision can be ensured while the model is compressed, the accuracy of the compressed model is further ensured, and the training accuracy is improved while the model is easy to realize and deploy.

Drawings

FIG. 1 is a flowchart of a model compression method applied to image target detection according to an embodiment of the present invention;

FIG. 2 is a diagram of a model compression terminal applied to image target detection according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a model compression method applied to image target detection according to an embodiment of the present invention, in which an importance factor layer is added after a BatchNorm layer;

FIG. 4 is a schematic diagram of a model compression method applied to image target detection according to an embodiment of the present invention, in which an importance factor layer is added after another Norm layer;

fig. 5 is a line graph of the number of channels reserved after the Retina algorithm is compressed, which is three types of model compression methods applied to image target detection in the embodiment of the present invention.

Detailed Description

In order to explain technical contents, achieved objects, and effects of the present invention in detail, the following description is made with reference to the accompanying drawings in combination with the embodiments.

The noun explains:

the deep learning target detection means that a deep learning technology is utilized to find out all interested targets in an image, the category and the position of the targets in the image are determined at the same time, the position of an object in the image is marked by using an identification frame on the basis of positioning the targets, and the category of the object is given;

model pruning, which means measuring the importance of each neuron weight in deep learning by different methods, and pruning unimportant neurons according to the importance degree of the neurons to achieve the purpose of model compression;

finenetune model: the original data set is utilized to perform a small amount of training on a new model, and the long time of the original training model is not needed, which is also called as a fine tuning model;

mAP: namely mean-AP, mAP is an algorithm evaluation standard of passalvac challenge, each category has an AP value, and finally, the average of the AP values of all the categories is the mAP, and the closer to 1, the more excellent the algorithm is;

BatchNorm layer: the name of a specific layer in the neural network, which is used to speed up algorithm convergence and prevent overfitting;

backbone: a backbone network in the convolutional neural network, i.e., a main network used for extracting features;

RetinaNet: one of the classical target detection algorithms can be combined with different backbones to form detection algorithms with different performances;

resnet 50: a typical Residual Network, Resnet, is widely used in the field of object classification and the like, and is part of a classical neural Network that is the backbone of computer vision tasks.

Referring to fig. 1, an embodiment of the present invention provides a model compression method applied to image target detection, including the steps of:

From the above description, the beneficial effects of the present invention are: adding an importance factor layer independent of an original convolution network after a convolution layer needing pruning in a preset target detection algorithm, and thinning an importance factor vector of each importance factor layer; determining a threshold value of the importance factor parameter according to a preset pruning rate, judging the importance of the convolutional layer channel according to the threshold value, and deleting the convolutional layer channel corresponding to the importance factor parameter lower than the threshold value; in the prior art, the parameters of the original convolutional layer or the original network parameters are usually used, and the parameters of the original network are thinned, and the thinning can cause the large change of the original network parameters, the precision after pruning is greatly reduced, and the subsequent training and fine adjustment are required to be large; compared with the prior art, the method has the advantages that an importance factor layer is newly added, the importance factor layer is not related to the original network, only the newly added importance factor layer is thinned, and parameter distribution of the original network is not changed when the importance factor layer is thinned, so that the original network is not influenced during thinning, precision after pruning is almost lossless, a large amount of time for training and fine-tuning of a model after pruning is not needed, calculation time and resources are saved, an algorithm model can be pruned without depending on a specific layer structure, the method is suitable for any network, the size of the model can be greatly reduced by using the compression method in a target detection algorithm, and precision loss is reduced; the trimmed model is finely tuned and trained to a preset precision, the model precision can be ensured while the model is compressed, the accuracy of the compressed model is further ensured, and the training accuracy is improved while the model is easy to realize and deploy.

Further, the adding of the importance factor layer independent of the original convolutional layer network after the convolutional layer needing pruning in the target detection algorithm model comprises:

and judging whether a BatchNorm layer is included after the convolutional layer needing pruning of the target detection algorithm model, if so, adding the importance factor layer after the BatchNorm layer, and if not, adding the importance factor layer after other Norm layers after the convolutional layer.

As can be seen from the above description, the importance factor layer is added after the BatchNorm layer after the convolutional layer or after the other Norm layer after the convolutional layer, and is applicable to any convolutional layer structure.

Further, the importance factor vector sparsifying for each of the importance factor layers comprises:

forming the importance factor parameters of each importance factor layer into corresponding importance factor vectors S;

the importance factor vector is thinned by:

；

wherein x represents input data during training, y represents desired output data, W represents a weight of a current convolutional layer, l () represents a loss function during training, f () represents output value calculation for the input data based on the weight of the current convolutional layer, λ represents a balance coefficient,

a set of importance factor vectors representing all importance factor layers, g(s) a L1 regularization of the importance factor vectors, L a loss function with adjustment factors added,

is an added regulatory factor;

and thinning the importance factor vector to a stable state.

According to the description, the importance factor vectors are thinned, and can be regularized according to a formula, so that preliminary arrangement of the importance vectors is facilitated, the influence of unimportant features is removed, subsequent calculation is facilitated, and the calculation workload is reduced.

Further, the calculating the threshold value of the importance factor parameter in the importance factor vector according to the preset pruning rate includes:

calculating the threshold value serial number threshold _ id of the importance factor parameter by the following formula:

threshold_id = floor( len(sorted_s) * p)；

in the formula, floor () represents a downward rounding function, len () represents an acquisition length function, sorted _ s represents an array obtained by sorting all importance factor parameters of each importance factor layer, and p represents a pruning rate;

calculating the threshold value threshold of the importance factor parameter according to the threshold value serial number threshold _ id by the following formula:

threshold = sorted_s[threshold_id]。

as can be seen from the above description, the threshold value number is calculated according to the pruning rate and the number of the importance factor parameters of each importance factor layer, an array is obtained by sequentially arranging the importance factor parameters of each importance factor layer, and the corresponding threshold value is determined in the sequential array according to the threshold value number, so that the threshold value can be simply and intuitively obtained.

Further, the determining whether the fine tuning model reaches a preset precision includes:

based on a preset algorithm evaluation standard, calculating the average number of data obtained by the fine tuning model in all evaluation categories of the algorithm evaluation standard, and judging whether the average number reaches a preset precision.

According to the above description, the average number of data obtained by the fine tuning model in all evaluation categories of the algorithm evaluation standard can be used for accurately obtaining the evaluation value of the current fine tuning model, and the precision of the model after compression is further ensured by comparing the evaluation value with the preset precision.

Referring to fig. 2, an embodiment of the present invention provides a model compression terminal applied to image object detection, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the following steps when executing the computer program:

the importance factor vector is thinned by:

；

wherein x represents input data during training, y represents desired output data, W represents weight of a current convolutional layer, l () represents a loss function during training, and f () represents a function based on the current convolutional layerIs calculated for the input data, lambda represents a balance coefficient,

is an added regulatory factor;

and thinning the importance factor vector to a stable state.

threshold_id = floor( len(sorted_s) * p)；

threshold = sorted_s[threshold_id]。

The above-mentioned model compression method and terminal applied to image target detection of the present invention are applicable to the compression of convolutional layer structures of various convolutional neural networks, and are particularly applicable to the model compression in the field of target detection, and the following description is made by specific embodiments:

example one

Referring to fig. 1, a model compression method applied to image target detection includes the steps of:

s1, training a preset target detection algorithm based on a preset data set to obtain a converged target detection algorithm model;

specifically, in the embodiment, a deep learning target detection algorithm serving as a reference is selected, the target detection algorithm is trained to be convergent by using a preset existing data set, and objective model evaluation data is obtained by using a test standard evaluation algorithm model of pascalloc;

s2, adding an importance factor layer independent of an original convolution network after the convolution layer needing pruning in the target detection algorithm model, and thinning the importance factor vector of each importance factor layer;

wherein adding an importance factor layer independent of an original convolutional layer network after a convolutional layer needing pruning in the target detection algorithm model comprises:

judging whether a BatchNorm layer is included after the convolutional layer needing pruning of the target detection algorithm model, if so, adding the importance factor layer after the BatchNorm layer, and if not, adding the importance factor layer after other Norm layers after the convolutional layer;

specifically, referring to fig. 3 and 4, in the present embodiment, according to the actual situation of the convolutional layer structure, if there is a BatchNorm layer after the convolutional layer, the importance factor layer is added after the BatchNorm layer, otherwise, the importance factor layer is added after other Norm layers after the convolutional layer, such as a GroupNorm layer, and the newly added importance factor layer has no substantial association relationship with the original convolutional layer, so that the original convolutional layer is not affected during the thinning;

wherein the importance factor vector sparsifying for each of the importance factor layers comprises:

the importance factor vector is thinned by:

；

is an added regulatory factor;

sparsifying the importance factor vector to a steady state;

the importance factor layer has no correlation with the original network layer, and is used for reflecting the importance of the channel in the corresponding convolutional layer, the importance factor parameter of each channel of the importance factor layer corresponds to the importance of each channel of the convolutional layer, and the importance factor parameter is used as the importance judgment criterion of the channel of the corresponding convolutional layer;

specifically, S = (S1, S2, S3, … …, sc), where c is the number of channels in each convolutional layer, and corresponds to each channel in each convolutional layer, so that a trainable and learned one-dimensional vector importance factor S may be multiplied to each original channel output;

specifically, in this embodiment, the importance factor vector is thinned according to a thinning formula, and the importance factor vector of each layer is thinned to be stable, at this time, in each importance factor layer, the larger the value of the importance factor parameter is, the more important the channel of the parameter in the importance factor layer is, and the more important the corresponding convolutional layer channel is, so that the importance of the convolutional layer channel can be judged through the importance factor parameter;

s3, calculating the threshold value of the importance factor parameter in the importance factor vector according to a preset pruning rate;

wherein, the threshold sequence number threshold _ id of the importance factor parameter is calculated by the following formula:

threshold_id = floor( len(sorted_s) * p)；

threshold = sorted_s[threshold_id]；

specifically, by the method for calculating the threshold value of the importance factor parameter, the threshold value threshold of the importance factor parameter can be calculated by using the pruning rate p considered to be needed, and the pruning strength of each layer can be determined by using the threshold;

s4, judging whether each importance factor parameter of each importance factor layer is smaller than the threshold value, if so, cutting out the convolutional layer channel corresponding to the importance factor parameter;

specifically, in this embodiment, after the threshold value is determined, deleting the channels in which the convolutional layers to be pruned are smaller than the threshold value, so as to complete model pruning;

s5, training the pruned target detection algorithm model based on the preset data set to obtain a fine tuning model, judging whether the fine tuning model reaches preset precision, if so, stopping training, and if not, continuing to train the fine tuning model until the preset precision is reached;

specifically, in this embodiment, a preset data set is used to perform a small amount of training on the pruned algorithm model to obtain a Finetune model, and after the Finetune model reaches a preset precision, the training is stopped;

referring to fig. 5, in this embodiment, model pruning is performed on the RetinaNet algorithm of the rescnet 50, and characteristics or differences of different algorithms are distinguished by comparing channel labels and channel numbers reserved for the same network after different pruning schemes, and since model pruning itself needs to comply with corresponding pruning rules, only the first two convolutional layers of all blocks (blocks) that constitute the rescnet 50 are pruned, the third convolutional layer of each block is not pruned, 32 layers can be pruned based on the above principle, and the number of channels of each layer is reserved after pruning is performed on the same trained RetinaNet model by using different methods;

in fig. 5, in the diagram, L1_ Norm represents an L1 regularization method, scaling represents an existing model pruning-network slimming method, scale represents the method provided in this embodiment, and it can be seen that channels finally retained by different algorithms based on the same model are all inconsistent, since the regularization sparse parameter is also used as a standard for evaluating channel importance, the channel number distribution of the method is close to the channel number distribution in the scaling method, but since the method is a newly added importance factor layer, the original convolutional layer and the Norm layer are not affected by the importance factor layer sparseness, and the scaling method further affects the parameters of the whole network by affecting the γ parameter in the existing batchm layer, thereby further causing the accuracy loss of the whole network; because the precision is hardly lost after the thinning, the method can realize lossless pruning, and the fine adjustment times are few or even can be omitted;

the deeper the convolutional layer of the convolutional neural network is, the more the number of channels is, the more important the shallow channel is than the deep channel, so that by combining the high-precision pruning method in this embodiment with the experimental data obtained in fig. 5, the pruning model obtained in this embodiment focuses more on the preservation of the number of channels in the shallow layer and ignores the channel in the deepest layer, so as to prevent overfitting of the compressed model, and meanwhile, the redundancy of the shallow filter is smaller.

Example two

The difference between this embodiment and the first embodiment is that how to determine whether the fine tuning model reaches the predetermined precision is further defined:

wherein, the judging whether the fine tuning model reaches the preset precision comprises:

based on a preset algorithm evaluation standard, calculating the average number of data obtained by the fine tuning model in all evaluation categories of the algorithm evaluation standard, and judging whether the average number reaches a preset precision;

specifically, in this embodiment, the mAP is used for algorithm evaluation, the mAP is mean-AP, and is an algorithm evaluation standard of passicac challenge, each category of the algorithm evaluation standard has an AP value, and finally, the mAP value of all categories is averaged to obtain the mAP, and the closer the mAP is to 1, the better the algorithm is;

specifically, in this embodiment, a RetinaNet algorithm of a backbone used for extracting features in resnet50 is subjected to a pruning precision and model size test, RetinaNet is set to train and test a public data set pascal VOC data set, training is performed on a VOC2012+ VOC2007 training set, testing is performed on a voctest2007, at this time, a pruning ratio is set to 0.5, and after pruning by the method, various AP values and maps are shown in table 1;

TABLE 1 comparison tables of AP and mAP without pruning and with pruning according to the method of the present example for RetinaNet algorithm

As can be seen from table 1, when the pruning ratio is set to 0.5, the size of the model after pruning is about 0.5 times that of the original model, and the precision loss is 0.043, which proves that the algorithm is effective, and the resnet50 is already relatively simplified as a backbone for extracting features, and if a more complex and more redundant network is used, the compression ratio is larger, and the precision loss is smaller.

EXAMPLE III

Referring to fig. 2, a model compression terminal applied to image object detection includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the model compression method applied to image object detection in the first embodiment or the second embodiment.

In summary, the model compression method and terminal applied to image target detection provided by the invention select a target detection algorithm to be used, add an importance factor layer independent of an original convolution network after a convolution layer to be pruned, and perform sparsification on an importance factor vector of each importance factor layer; determining a threshold value of the importance factor parameter according to a preset pruning rate, judging the importance of the convolutional layer channel according to the threshold value, and deleting the convolutional layer channel corresponding to the importance factor parameter lower than the threshold value; in the prior art, the scale parameter of the existing network layer associated with the convolutional layer is usually used as an importance index, but the existing parameter is thinned, so that the existing network layer and the convolutional layer data are affected, the precision after thinning is greatly reduced, and a large amount of follow-up time is needed for fine adjustment and training; due to the fact that the importance factor layer is newly added, the importance factor layer is not related to the original network layer structure, the original network is not affected during sparsification, only the newly added importance factor layer is sparsified, lossless pruning is achieved, and only a small amount of fine tuning training or even the step of the fine tuning training is omitted for the model after pruning, computing time and resources are saved; the algorithm model is pruned without depending on a specific layer structure, the method can be suitable for any network, and the model volume can be greatly reduced and the precision loss can be reduced by using the compression method in the target detection algorithm; the method comprises the steps of training a pruned model to a preset precision in a fine tuning mode, wherein the new pruned model is evaluated by using the mAP, training the model continuously when the evaluation result does not reach the preset precision, and can ensure that the shallow channel number is more emphasized in the compression process and the deep channel number is more ignored, so that the shallow filter redundancy is smaller, the compressed precision is further ensured, a large amount of calculation time and calculation resources are not needed, and the training accuracy is improved while the implementation and the deployment are easy.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to the related technical fields, are included in the scope of the present invention.

Claims

1. A model compression method applied to image target detection is characterized by comprising the following steps:

2. The method of claim 1, wherein the adding of the importance factor layer independent of the original convolutional layer network after the convolutional layer requiring pruning in the object detection algorithm model comprises:

3. The method of claim 1, wherein the thinning of the importance factor vector for each of the importance factor layers comprises:

the importance factor vector is thinned by:

；

is an added regulatory factor;

and thinning the importance factor vector to a stable state.

4. The method of claim 1, wherein the calculating the threshold value of the importance factor parameter in the importance factor vector according to the preset pruning rate comprises:

threshold_id = floor( len(sorted_s) * p)；

threshold = sorted_s[threshold_id]。

5. the method as claimed in any one of claims 1 to 4, wherein the determining whether the fine tuning model reaches a predetermined precision comprises:

6. A model compression terminal applied to image object detection, comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor implements the following steps when executing the computer program:

7. The model compression terminal applied to image object detection according to claim 6, wherein the adding of the importance factor layer independent of the original convolution network after the convolution layer requiring pruning in the object detection algorithm model comprises:

8. The model compression terminal applied to image target detection as claimed in claim 6, wherein the importance factor vector sparsification for each of the importance factor layers comprises:

the importance factor vector is thinned by:

；

is an added regulatory factor;

and thinning the importance factor vector to a stable state.

9. The model compression terminal applied to image target detection according to claim 6, wherein the calculating the threshold value of the importance factor parameter in the importance factor vector according to the preset pruning rate comprises:

threshold_id = floor( len(sorted_s) * p)；

threshold = sorted_s[threshold_id]。

10. the terminal of any one of claims 6 to 9, wherein the determining whether the fine-tuning model reaches the preset precision comprises: