CN111612145A

CN111612145A - Model compression and acceleration method based on heterogeneous separation convolution kernel

Info

Publication number: CN111612145A
Application number: CN202010442785.3A
Authority: CN
Inventors: 门爱东; 张秋林; 姜竹青; 路齐硕; 韩佳男; 曾正欣
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2020-09-01

Abstract

The invention provides an SPConv, a model compression and acceleration method based on heterogeneous separation convolution kernels, aiming at the phenomenon that a convolution characteristic graph has a large number of similarities. The SPConv divides an input feature map into a representative channel and a redundant channel, and extracts important essential information stored in the representative channel by using a convolution kernel which is large in calculation amount and strong in feature extraction capability; and the convolution kernel with very small computational overhead is utilized to extract the hidden tiny detail information in the 'redundant channel'. Then the two are subjected to feature fusion by the feature fusion method without parameters designed by the invention. The SPConv designed by the invention is a plug-and-play convolution module and can be directly replaced in the current network architecture. Experiments on image classification and target detection data sets show that the model performance and the reasoning speed on the GPU exceed those of a reference method under the condition that the parameter quantity and the floating point number calculation quantity are greatly reduced.

Description

Model compression and acceleration method based on heterogeneous separation convolution kernel

Technical Field

The invention belongs to the field of computer vision-basic network, and is suitable for the computer vision sub-fields of image classification, target detection and the like.

Background

With the development of the neural network, the performance of the neural network model on the computer vision is broken suddenly, the scale of the model is enlarged continuously, and the computational power requirement on the graphic computation display card is higher and higher. However, the computational resources are limited, and at present, there is a great demand on how to embed a huge neural network into a mobile terminal device, so how to compress and accelerate a huge neural network model under the condition of limited computational resources, and meanwhile, ensuring that the model performance is not greatly lost becomes one of the research hotspots of the current neural network.

Designing an efficient convolution operation is one of the main research directions for current model compression. Many methods achieve very good model compression effects by making reasonable use of group-wise convolution, layer-wise convolution and point-wise convolution, such as Xception, MobileNet, resenext and ShuffleNet. These methods show that the power of a large convolution kernel can be approximated by a number of discrete small convolution kernels. Other methods such as HetConv, OctConv, GhostConv, etc. can also reduce the model parameters and floating point number computations by a large amount by making reasonable improvements to the original convolution kernel. The other main direction of model compression is to quantize the model parameters, and the main methods are binary networks, ternary networks and the like.

Although model compression and acceleration have been well developed at present, most methods do not allow for both model compression and model acceleration. On one hand, the efficiency of a large convolution kernel is often realized by fitting a discrete small convolution kernel through a method for designing a high-efficiency convolution kernel, but the discrete small convolution kernel is actually not beneficial to calculation on a graphic calculation display card, so that the reasoning speed of the model is reduced. On the other hand, model compression is performed by parameter value quantization, and the performance of the model is also reduced due to the great reduction of parameter precision; therefore, a method is proposed herein that can both compress the model size while ensuring model performance, and enable it to accelerate the model inference speed on a graphics computing graphics card.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, finds the phenomenon that a convolution characteristic diagram has a large amount of similarity, provides SPConv, and provides a model compression and acceleration method for greatly reducing model parameters and floating point number calculation under the condition of ensuring model precision and reasoning speed.

The technical problem to be solved by the invention is realized by adopting the following technical scheme:

SPConv, a model compression and acceleration method based on heterogeneous separation convolution kernel, as shown in fig. 2, includes the following steps:

step 1, dividing an input characteristic diagram into two parts according to a proportion alpha, wherein one part is called a representative channel, and the other part is called a redundant channel;

and 2, respectively executing three operations for the representative channel: 1) the features are divided into several main parts by group convolution (groupwisconvolution); 2) acquiring mutual information between channels lost due to group Convolution through point Convolution (Pointwise Convolution); 3) directly adding the characteristic diagrams obtained by the two operations to complement information;

step 3, for the redundant channel, extracting hidden tiny detailed information on the redundant channel by using point convolution with small calculation amount;

and 4, performing feature fusion without parameter quantity on the features obtained in the steps 2 and 3.

The parameter-free feature fusion method in the step 4 comprises the following steps:

⑴, assume that the characteristic scale output from step 2 and step 3 is NxCxHxW, and is denoted as U₃,U₁(subscript represents the size of convolution kernel), global information on the space is obtained by respectively carrying out global mean pooling operation on the space to obtain S with the NxC scale₃,S₁Two matrices, called "feature importance matrices";

secondly, stacking channel importance matrixes S1 and S3 in channel dimension, and then performing SoftMax operation in the channel dimension to obtain weight coefficients beta and gamma of each channel;

performing weighted summation on the three;

Y＝βU₃+γU₁

through the four steps of operation, the model can be ensured not to lose important information, and meanwhile, the reasoning speed is accelerated.

The invention has the advantages and positive effects that:

the method is reasonable in design, and provides the SPConv through a large number of similar phenomena in observed convolution characteristic graphs, and the method is a model compression and acceleration method based on heterogeneous separation convolution kernels. The SPConv achieves large-amplitude parameter quantity and floating point number calculation quantity reduction under the condition of ensuring model precision and GPU reasoning speed. This is superior to current work, which, although achieving larger magnitudes of parameter and floating-point number computations, has much lower GPU inference speeds than the baseline approach.

Drawings

FIG. 1 shows that the convolution characteristic diagram found by the present invention has a large number of similarity phenomena; fig. 2 is a diagram of the SPConv overall network framework proposed by the present invention.

Detailed Description

The invention is further described below with reference to examples:

the SPConv provided by the invention directly replaces a 3x3 convolution kernel in a classical network (such as ResNet, VGGNet and the like), and can improve the performance of a model and the inference speed on a GPU while ensuring the large reduction of parameter and floating point number calculation amount without adjusting other hyper-parameters and the like. The specific experiment is as follows: the following experiment was conducted according to the method of the present invention to explain the recognition effect of the present invention.

And (3) testing environment: python 3.6; a PyTorch frame; ubuntu16.04 system; NVIDIA Tesla V100 GPU

And (3) testing sequence: the selected datasets are the small classification dataset CIFAR-10, the large classification dataset ImageNet and the target detection dataset MS COCO. The CIFAR-10 comprises 5 ten thousand training pictures and 1 ten thousand verification pictures, the ImageNet data set comprises 128 ten thousand training images and 5 ten thousand verification pictures, and the MS COCO target detection data set comprises 3.5 ten thousand training pictures and 5 thousand verification pictures.

Testing indexes are as follows: for the picture classification task, the accuracy rates of Top1 and Top5 are used as evaluation indexes; for the target detection task, the mAP (mean Average precision) was used as an evaluation index. The index data are calculated by different algorithms of the current flow, and then the results are compared, so that the method has better performance than other current works in the fields of pattern classification and target detection.

The test results were as follows:

TABLE 1 comparison of Performance of the invention on CIFAR-10 dataset with different feature separation ratios

TABLE 2 comparison of Performance of the invention on ImageNet datasets with different feature separation ratios

TABLE 3 comparison of Performance of the invention on MS COCO data set with different feature separation ratios

It can be seen from the comparison data that in the image classification task, no matter whether the image classification task is a small data set CIFAR or a large data set ImageNet, the SPConv provided by the invention ensures that the parameter number and the floating point number calculation amount are greatly reduced, and meanwhile, the precision of the SPConv is still slightly better than that of a reference method and exceeds other current works; the inference speed on the GPU is slightly better than that of the reference method, and far exceeds other current works.

In the field of target detection, experimental results show that the performance of the method exceeds that of a benchmark method while the backsbone parameter quantity and floating point number calculation quantity are greatly reduced.

It should be emphasized that the embodiments described herein are illustrative rather than restrictive, and thus the present invention is not limited to the embodiments described in the detailed description, and that other embodiments derived from the teachings of the present invention by those skilled in the art are also within the scope of the present invention.

Claims

SPConv, a model compression and acceleration method based on heterogeneous separation convolution kernel, characterized by:

the conventional 3x3 convolution kernel operation is performed on only a portion of a representative input channel; the input feature map is divided into two parts at a separation ratio α, one part is referred to as a "representative channel" and the other part is referred to as a "redundant channel".
2. The model compression and acceleration method based on the heterogeneous separation convolution kernel as claimed in claim 1, characterized in that:

extracting important essential information in a representative channel by using a convolution kernel with a large calculation amount, and extracting hidden tiny detailed information in a redundant channel by using a convolution kernel with a small calculation amount; the method comprises the following two steps:

2.1 respectively performing group convolution and point convolution on the representative channels to extract main essential features, and then directly adding the features after the two types of convolution;

2.2 performing a computationally inexpensive point convolution on another portion of the "redundant channel" to complement the hidden detail information.
3. The model compression and acceleration method based on the heterogeneous separation convolution kernel as claimed in claim 1 or 2, characterized in that:

designing a 'parameter-free feature fusion method' to fuse the features generated in claim 2; the method comprises the following 3 steps:

3.1, respectively executing global mean pooling operation on feature maps obtained by convolution of the representative channel and the redundant channel in the space field;

3.2 stacking the two obtained matrixes in the channel dimension, and then performing SoftMax operation in the channel dimension to obtain an importance matrix between channels;

and 3.3, taking the obtained channel importance matrix as the weight among the channels to carry out weighted summation on the feature maps obtained by convolution of the representative channel and the redundant channel, so as to obtain the final convolution output.