CN111612145A - Model compression and acceleration method based on heterogeneous separation convolution kernel - Google Patents

Model compression and acceleration method based on heterogeneous separation convolution kernel Download PDF

Info

Publication number
CN111612145A
CN111612145A CN202010442785.3A CN202010442785A CN111612145A CN 111612145 A CN111612145 A CN 111612145A CN 202010442785 A CN202010442785 A CN 202010442785A CN 111612145 A CN111612145 A CN 111612145A
Authority
CN
China
Prior art keywords
convolution
channel
convolution kernel
representative
spconv
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010442785.3A
Other languages
Chinese (zh)
Inventor
门爱东
张秋林
姜竹青
路齐硕
韩佳男
曾正欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202010442785.3A priority Critical patent/CN111612145A/en
Publication of CN111612145A publication Critical patent/CN111612145A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an SPConv, a model compression and acceleration method based on heterogeneous separation convolution kernels, aiming at the phenomenon that a convolution characteristic graph has a large number of similarities. The SPConv divides an input feature map into a representative channel and a redundant channel, and extracts important essential information stored in the representative channel by using a convolution kernel which is large in calculation amount and strong in feature extraction capability; and the convolution kernel with very small computational overhead is utilized to extract the hidden tiny detail information in the 'redundant channel'. Then the two are subjected to feature fusion by the feature fusion method without parameters designed by the invention. The SPConv designed by the invention is a plug-and-play convolution module and can be directly replaced in the current network architecture. Experiments on image classification and target detection data sets show that the model performance and the reasoning speed on the GPU exceed those of a reference method under the condition that the parameter quantity and the floating point number calculation quantity are greatly reduced.

Description

Model compression and acceleration method based on heterogeneous separation convolution kernel
Technical Field
The invention belongs to the field of computer vision-basic network, and is suitable for the computer vision sub-fields of image classification, target detection and the like.
Background
With the development of the neural network, the performance of the neural network model on the computer vision is broken suddenly, the scale of the model is enlarged continuously, and the computational power requirement on the graphic computation display card is higher and higher. However, the computational resources are limited, and at present, there is a great demand on how to embed a huge neural network into a mobile terminal device, so how to compress and accelerate a huge neural network model under the condition of limited computational resources, and meanwhile, ensuring that the model performance is not greatly lost becomes one of the research hotspots of the current neural network.
Designing an efficient convolution operation is one of the main research directions for current model compression. Many methods achieve very good model compression effects by making reasonable use of group-wise convolution, layer-wise convolution and point-wise convolution, such as Xception, MobileNet, resenext and ShuffleNet. These methods show that the power of a large convolution kernel can be approximated by a number of discrete small convolution kernels. Other methods such as HetConv, OctConv, GhostConv, etc. can also reduce the model parameters and floating point number computations by a large amount by making reasonable improvements to the original convolution kernel. The other main direction of model compression is to quantize the model parameters, and the main methods are binary networks, ternary networks and the like.
Although model compression and acceleration have been well developed at present, most methods do not allow for both model compression and model acceleration. On one hand, the efficiency of a large convolution kernel is often realized by fitting a discrete small convolution kernel through a method for designing a high-efficiency convolution kernel, but the discrete small convolution kernel is actually not beneficial to calculation on a graphic calculation display card, so that the reasoning speed of the model is reduced. On the other hand, model compression is performed by parameter value quantization, and the performance of the model is also reduced due to the great reduction of parameter precision; therefore, a method is proposed herein that can both compress the model size while ensuring model performance, and enable it to accelerate the model inference speed on a graphics computing graphics card.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, finds the phenomenon that a convolution characteristic diagram has a large amount of similarity, provides SPConv, and provides a model compression and acceleration method for greatly reducing model parameters and floating point number calculation under the condition of ensuring model precision and reasoning speed.
The technical problem to be solved by the invention is realized by adopting the following technical scheme:
SPConv, a model compression and acceleration method based on heterogeneous separation convolution kernel, as shown in fig. 2, includes the following steps:
step 1, dividing an input characteristic diagram into two parts according to a proportion alpha, wherein one part is called a representative channel, and the other part is called a redundant channel;
and 2, respectively executing three operations for the representative channel: 1) the features are divided into several main parts by group convolution (groupwisconvolution); 2) acquiring mutual information between channels lost due to group Convolution through point Convolution (Pointwise Convolution); 3) directly adding the characteristic diagrams obtained by the two operations to complement information;
step 3, for the redundant channel, extracting hidden tiny detailed information on the redundant channel by using point convolution with small calculation amount;
and 4, performing feature fusion without parameter quantity on the features obtained in the steps 2 and 3.
The parameter-free feature fusion method in the step 4 comprises the following steps:
⑴, assume that the characteristic scale output from step 2 and step 3 is NxCxHxW, and is denoted as U3,U1(subscript represents the size of convolution kernel), global information on the space is obtained by respectively carrying out global mean pooling operation on the space to obtain S with the NxC scale3,S1Two matrices, called "feature importance matrices";
Figure BDA0002504560150000021
secondly, stacking channel importance matrixes S1 and S3 in channel dimension, and then performing SoftMax operation in the channel dimension to obtain weight coefficients beta and gamma of each channel;
Figure BDA0002504560150000022
performing weighted summation on the three;
Y=βU3+γU1
through the four steps of operation, the model can be ensured not to lose important information, and meanwhile, the reasoning speed is accelerated.
The invention has the advantages and positive effects that:
the method is reasonable in design, and provides the SPConv through a large number of similar phenomena in observed convolution characteristic graphs, and the method is a model compression and acceleration method based on heterogeneous separation convolution kernels. The SPConv achieves large-amplitude parameter quantity and floating point number calculation quantity reduction under the condition of ensuring model precision and GPU reasoning speed. This is superior to current work, which, although achieving larger magnitudes of parameter and floating-point number computations, has much lower GPU inference speeds than the baseline approach.
Drawings
FIG. 1 shows that the convolution characteristic diagram found by the present invention has a large number of similarity phenomena; fig. 2 is a diagram of the SPConv overall network framework proposed by the present invention.
Detailed Description
The invention is further described below with reference to examples:
the SPConv provided by the invention directly replaces a 3x3 convolution kernel in a classical network (such as ResNet, VGGNet and the like), and can improve the performance of a model and the inference speed on a GPU while ensuring the large reduction of parameter and floating point number calculation amount without adjusting other hyper-parameters and the like. The specific experiment is as follows: the following experiment was conducted according to the method of the present invention to explain the recognition effect of the present invention.
And (3) testing environment: python 3.6; a PyTorch frame; ubuntu16.04 system; NVIDIA Tesla V100 GPU
And (3) testing sequence: the selected datasets are the small classification dataset CIFAR-10, the large classification dataset ImageNet and the target detection dataset MS COCO. The CIFAR-10 comprises 5 ten thousand training pictures and 1 ten thousand verification pictures, the ImageNet data set comprises 128 ten thousand training images and 5 ten thousand verification pictures, and the MS COCO target detection data set comprises 3.5 ten thousand training pictures and 5 thousand verification pictures.
Testing indexes are as follows: for the picture classification task, the accuracy rates of Top1 and Top5 are used as evaluation indexes; for the target detection task, the mAP (mean Average precision) was used as an evaluation index. The index data are calculated by different algorithms of the current flow, and then the results are compared, so that the method has better performance than other current works in the fields of pattern classification and target detection.
The test results were as follows:
TABLE 1 comparison of Performance of the invention on CIFAR-10 dataset with different feature separation ratios
Figure BDA0002504560150000031
TABLE 2 comparison of Performance of the invention on ImageNet datasets with different feature separation ratios
Figure BDA0002504560150000041
TABLE 3 comparison of Performance of the invention on MS COCO data set with different feature separation ratios
Figure BDA0002504560150000042
It can be seen from the comparison data that in the image classification task, no matter whether the image classification task is a small data set CIFAR or a large data set ImageNet, the SPConv provided by the invention ensures that the parameter number and the floating point number calculation amount are greatly reduced, and meanwhile, the precision of the SPConv is still slightly better than that of a reference method and exceeds other current works; the inference speed on the GPU is slightly better than that of the reference method, and far exceeds other current works.
In the field of target detection, experimental results show that the performance of the method exceeds that of a benchmark method while the backsbone parameter quantity and floating point number calculation quantity are greatly reduced.
It should be emphasized that the embodiments described herein are illustrative rather than restrictive, and thus the present invention is not limited to the embodiments described in the detailed description, and that other embodiments derived from the teachings of the present invention by those skilled in the art are also within the scope of the present invention.

Claims (3)

  1. SPConv, a model compression and acceleration method based on heterogeneous separation convolution kernel, characterized by:
    the conventional 3x3 convolution kernel operation is performed on only a portion of a representative input channel; the input feature map is divided into two parts at a separation ratio α, one part is referred to as a "representative channel" and the other part is referred to as a "redundant channel".
  2. 2. The model compression and acceleration method based on the heterogeneous separation convolution kernel as claimed in claim 1, characterized in that:
    extracting important essential information in a representative channel by using a convolution kernel with a large calculation amount, and extracting hidden tiny detailed information in a redundant channel by using a convolution kernel with a small calculation amount; the method comprises the following two steps:
    2.1 respectively performing group convolution and point convolution on the representative channels to extract main essential features, and then directly adding the features after the two types of convolution;
    2.2 performing a computationally inexpensive point convolution on another portion of the "redundant channel" to complement the hidden detail information.
  3. 3. The model compression and acceleration method based on the heterogeneous separation convolution kernel as claimed in claim 1 or 2, characterized in that:
    designing a 'parameter-free feature fusion method' to fuse the features generated in claim 2; the method comprises the following 3 steps:
    3.1, respectively executing global mean pooling operation on feature maps obtained by convolution of the representative channel and the redundant channel in the space field;
    3.2 stacking the two obtained matrixes in the channel dimension, and then performing SoftMax operation in the channel dimension to obtain an importance matrix between channels;
    and 3.3, taking the obtained channel importance matrix as the weight among the channels to carry out weighted summation on the feature maps obtained by convolution of the representative channel and the redundant channel, so as to obtain the final convolution output.
CN202010442785.3A 2020-05-22 2020-05-22 Model compression and acceleration method based on heterogeneous separation convolution kernel Pending CN111612145A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010442785.3A CN111612145A (en) 2020-05-22 2020-05-22 Model compression and acceleration method based on heterogeneous separation convolution kernel

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010442785.3A CN111612145A (en) 2020-05-22 2020-05-22 Model compression and acceleration method based on heterogeneous separation convolution kernel

Publications (1)

Publication Number Publication Date
CN111612145A true CN111612145A (en) 2020-09-01

Family

ID=72199589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010442785.3A Pending CN111612145A (en) 2020-05-22 2020-05-22 Model compression and acceleration method based on heterogeneous separation convolution kernel

Country Status (1)

Country Link
CN (1) CN111612145A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113411583A (en) * 2021-05-24 2021-09-17 西北工业大学 Image compression method based on dimension splitting
CN113762200A (en) * 2021-09-16 2021-12-07 深圳大学 Mask detection method based on LFFD
CN113850368A (en) * 2021-09-08 2021-12-28 深圳供电局有限公司 Lightweight convolutional neural network model suitable for edge-end equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113411583A (en) * 2021-05-24 2021-09-17 西北工业大学 Image compression method based on dimension splitting
CN113411583B (en) * 2021-05-24 2022-09-02 西北工业大学 Image compression method based on dimension splitting
CN113850368A (en) * 2021-09-08 2021-12-28 深圳供电局有限公司 Lightweight convolutional neural network model suitable for edge-end equipment
CN113762200A (en) * 2021-09-16 2021-12-07 深圳大学 Mask detection method based on LFFD
CN113762200B (en) * 2021-09-16 2023-06-30 深圳大学 Mask detection method based on LFD

Similar Documents

Publication Publication Date Title
Liu et al. FDDWNet: a lightweight convolutional neural network for real-time semantic segmentation
Paszke et al. Enet: A deep neural network architecture for real-time semantic segmentation
CN111612145A (en) Model compression and acceleration method based on heterogeneous separation convolution kernel
CN114067153B (en) Image classification method and system based on parallel double-attention light-weight residual error network
CN111639692A (en) Shadow detection method based on attention mechanism
Li et al. Depth-wise asymmetric bottleneck with point-wise aggregation decoder for real-time semantic segmentation in urban scenes
CN111091130A (en) Real-time image semantic segmentation method and system based on lightweight convolutional neural network
CN111612017B (en) Target detection method based on information enhancement
CN111028146A (en) Image super-resolution method for generating countermeasure network based on double discriminators
CN111582044A (en) Face recognition method based on convolutional neural network and attention model
CN111860683B (en) Target detection method based on feature fusion
CN110866938B (en) Full-automatic video moving object segmentation method
CN110598788A (en) Target detection method and device, electronic equipment and storage medium
CN112037228A (en) Laser radar point cloud target segmentation method based on double attention
CN113066089B (en) Real-time image semantic segmentation method based on attention guide mechanism
CN111899203B (en) Real image generation method based on label graph under unsupervised training and storage medium
CN112836651B (en) Gesture image feature extraction method based on dynamic fusion mechanism
CN114155371A (en) Semantic segmentation method based on channel attention and pyramid convolution fusion
CN113065426A (en) Gesture image feature fusion method based on channel perception
CN114529982A (en) Lightweight human body posture estimation method and system based on stream attention
CN114998756A (en) Yolov 5-based remote sensing image detection method and device and storage medium
CN112989919B (en) Method and system for extracting target object from image
CN118247645A (en) Novel DDCE-YOLOv s model underwater image target detection method
CN113327227A (en) Rapid wheat head detection method based on MobilenetV3
CN113177546A (en) Target detection method based on sparse attention module

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200901