CN112016639B - Flexible separable convolution framework and feature extraction method and application thereof in VGG and ResNet - Google Patents

Flexible separable convolution framework and feature extraction method and application thereof in VGG and ResNet Download PDF

Info

Publication number
CN112016639B
CN112016639B CN202011199528.8A CN202011199528A CN112016639B CN 112016639 B CN112016639 B CN 112016639B CN 202011199528 A CN202011199528 A CN 202011199528A CN 112016639 B CN112016639 B CN 112016639B
Authority
CN
China
Prior art keywords
feature map
convolution
output
characteristic
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011199528.8A
Other languages
Chinese (zh)
Other versions
CN112016639A (en
Inventor
谢罗峰
朱杨洋
谢政峰
殷鸣
殷国富
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202011199528.8A priority Critical patent/CN112016639B/en
Publication of CN112016639A publication Critical patent/CN112016639A/en
Application granted granted Critical
Publication of CN112016639B publication Critical patent/CN112016639B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a flexible separable convolution framework, a feature extraction method and application thereof in VGG and ResNet, comprising a feature graph clustering division module, a first convolution operation module, a second convolution operation module, a feature graph fusion module and an attention mechanism SE module; the characteristic graph clustering and dividing module divides the characteristic graph into a characteristic main information characteristic graph and a characteristic supplementary information characteristic graph; the first convolution operation module performs ordinary convolution operation on the characteristic main information characteristic diagram; the second convolution operation module carries out grouping convolution operation on the characteristic supplementary information characteristic diagram; the feature map fusion module firstly splices the feature map after convolution, then adds the feature map with the original feature map and activates the feature map; the attention mechanism SE module multiplies the extracted channel weights with the feature map to generate an output feature map. The method comprises common convolution, grouping convolution, residual error branching and attention mechanism SE, reduces the calculation amount of operation and the parameter amount of the network while ensuring the operation accuracy, and can be used in the neural network convolution layer in a plug-and-play mode.

Description

Flexible separable convolution framework and feature extraction method and application thereof in VGG and ResNet
Technical Field
The invention relates to the technical field of image processing, in particular to a flexible separable convolution frame and a feature extraction method and application thereof in VGG and ResNet.
Background
In recent years, deep convolutional neural networks have shown excellent performance in different computer vision tasks such as image recognition, target detection and semantic segmentation. However, the traditional deep convolutional neural network needs a large number of parameters and floating point operands to achieve satisfactory accuracy, and the model reasoning time is long. In some real application scenarios, such as mobile or embedded devices, the practical deployment of the traditional deep convolutional neural network in small devices is difficult due to the limited memory and computing resources of the devices; meanwhile, the practical application requirements for low delay are difficult to meet. Although the hardware condition will be better and better in the future, the compression theory for researching the deep convolutional neural network model is crucial at present, which is also a hot problem in the research direction of the deep convolutional neural network.
Disclosure of Invention
The invention provides a flexible separable convolution framework and a feature extraction method and application thereof in VGG and ResNet, wherein the convolution framework comprises a common convolution module, a grouping convolution module, a residual error branch module and an attention mechanism SE module, the calculation amount of operation and the parameter amount of a network are reduced while the operation accuracy is ensured, and the convolution framework can be used in a common convolution layer of a neural network, such as the convolution layer of a VGG convolutional neural network and the residual error block of a ResNet network.
In order to achieve the purpose, the invention adopts the following technical scheme:
a flexible separable deep learning convolution framework comprises a feature map clustering division module, a first convolution operation module, a second convolution operation module, a feature map fusion module, an attention mechanism SE module, M input channels and N output channels;
the characteristic graph clustering and dividing module is used for dividing the M input characteristic graphs into a characteristic main information characteristic graph and a characteristic supplementary information characteristic graph;
the first convolution operation module is used for performing common convolution and BN operation on the characteristic main information characteristic diagram and outputting a main characteristic diagram;
the second convolution operation module is used for performing packet convolution and BN operation on the characteristic supplementary information feature graph and outputting a supplementary feature graph;
the feature map fusion module is used for splicing the output main feature map and the output supplementary feature map along the depth direction, adding the spliced output main feature map and the output supplementary feature map to the preprocessed input feature map, and then outputting the input feature map through the ReLU activation;
and the attention mechanism SE module is used for multiplying the extracted feature map channel weight output by the feature map fusion module and the feature map output by the feature map fusion module to output N output feature maps.
Further, the feature map clustering and dividing module is used for dividing the input feature map into a characterization main information feature map Mrep and a characterization supplementary information feature map Mred according to the hyper-parameter supplementary feature information occupation ratio alpha, wherein the supplementary feature information occupation ratio alpha is (0, 1).
Further, the characterization main information feature map Mrep = (1- α) M, and the characterization supplementary information feature map Mred = α M.
Further, the preprocessed input feature map refers to a feature map obtained by passing the feature number input by the input channel through a 1 × 1 convolution and a BN operation.
The invention also provides a feature map processing method of the flexible separable deep learning convolution framework, which comprises the following steps:
(1) acquiring M input feature maps, and dividing the input feature maps into a characteristic main information feature map and a characteristic supplementary information feature map according to a hyper-parameter supplementary information ratio alpha;
(2) outputting a main characteristic diagram after performing common convolution and BN operation on the characterization main information characteristic diagram, and outputting a supplementary characteristic diagram after performing grouping convolution and BN operation on the characterization supplementary information characteristic diagram;
(3) splicing the output main feature map and the output supplementary feature map along the depth direction, adding the spliced output main feature map and the output supplementary feature map to the input feature map subjected to 1 multiplied by 1 convolution and BN operation, and finally outputting the added output feature map through ReLU activation;
(4) and extracting the channel weight of the feature map of the activation output, and multiplying the extracted weight by the feature map of the activation output to generate N output feature maps.
The invention applies the flexible separable deep learning convolution framework to the VGG convolution neural network, and the flexible separable deep learning convolution framework is adopted by the convolution layer.
The invention applies the flexible separable deep learning convolution frame provided by the method to the residual block of the ResNet network, comprises two convolution frames, and sets a super-parameter channel scaling factor beta for the output channel of the first convolution frame and the input channel of the second convolution frame.
The invention has the following beneficial effects:
(1) after the input feature graph is divided into a feature graph representing main information and a feature graph representing supplementary information, the feature graph representing the main information is subjected to 'ordinary convolution + BN' to ensure the integrity of the information, and the feature graph representing the supplementary information is subjected to 'packet convolution + BN' with low calculation amount to supplement the overall features, so that the information loss is reduced. Different types of convolution operations are used in the same convolution layer, the probability that difference exists in the generated output characteristic diagram is naturally higher, and the network performance is improved. The information extraction modes of the grouped convolution and the common convolution have difference, the communication among the channels of the input feature map is reduced, and the filter depth is reduced, so the parameter quantity and the calculation cost for generating the output feature map are less. Packet convolution will have some loss of characteristics, but loss of appropriate characteristic information as side information is quite acceptable.
(2) The residual error shortcut branch is introduced, the input feature graph and the output feature graph are activated and output after being added, backward propagation of front-layer distinguishable information is facilitated, main information which can be distinguished by a front layer is directly transmitted into a rear layer, fusion of characteristics of a next input layer and characteristics of an output layer is achieved, multi-level semantic information is provided, the rear layer information is more distinguishable, difficulty of extracting characteristic diversity from the front layer feature graph through convolution operation by the rear layer is reduced, and neural network parameter learning can be easier.
(3) The convolution framework provided by the invention can realize plug and play, and is particularly applied to a VGG convolution neural network.
(4) In order to apply the convolution frame provided by the invention to a ResNet network, the original structure of a basic residual error module is kept, the integral frame of a shallow layer network is not changed, the plug and play of the module in the shallow layer residual error network is realized, only the number of middle channels of the residual error module is changed, namely, a channel scaling factor beta is set in an output channel of a 1 st convolution layer and an input channel of a 2 nd convolution layer, and thus the input and output channels of an improved residual error block are kept consistent with the original residual error block.
Drawings
FIG. 1 is a schematic diagram of the characteristic diagram clustering partitioning principle and the convolution operation principle of the present invention.
Fig. 2 is a schematic structural diagram of a convolution frame provided in the present invention.
Fig. 3 is a schematic structural diagram of the convolution framework applied to the ResNet network residual block according to the present invention.
FIG. 4 is a table of classification performance of FSConv _ VGG-16 with different hyperparameters on CIFAR-10.
FIG. 5 is a graph comparing FSConv _ VGG-16 with the state of the art method of compressing VGG-16.
FIG. 6 is a graph comparing FSConv _ VGG-16 on CIFAR-100 with the most advanced method of compressing VGG-16.
FIG. 7 is a chart of classification performance of FSBnegk _ ResNet-20 with different superparameters on CIFAR-10.
FIG. 8 is a chart exploring the possibility of FSBnegk _ ResNet-20 replacing baseline ResNet-56/110 on CIFAR-10.
FIG. 9 is a graph comparing FSBnegk _ ResNet-20 on CIFAR-10 with the most advanced method of compressing ResNet.
FIG. 10 is a graph comparing FSBnegk _ ResNet-20 on CIFAR-100 with the state-of-the-art method of compressing ResNet.
Detailed Description
Example 1
The embodiment provides a flexible separable deep learning convolution framework, and the module improves the network performance, reduces the calculated amount and reduces the network parameters on the premise of ensuring the accurate operation.
As shown in fig. 1, the flexible separable deep learning convolution framework provided by this embodiment includes a feature map cluster partitioning module, a first convolution operation module, a second convolution operation module, a feature map fusion module, an attention mechanism SE module, M input channels, and N output channels.
The feature graph clustering and dividing module divides the M input feature graphs into a main characterization information feature graph and a supplementary characterization information feature graph according to a super-parameter supplementary feature information occupation ratio alpha, wherein the supplementary feature information occupation ratio alpha is a defined super-parameter and belongs to (0,1), and in the embodiment, the M input feature graphs are divided into a main characterization information feature graph Mrep = (1-alpha) M and a supplementary characterization information feature graph Mred = alpha M according to the supplementary feature information occupation ratio alpha.
The first convolution operation module is used for performing common convolution and BN operation on the characteristic main information characteristic diagram and outputting a main characteristic diagram; this example adoptsk×kThe ordinary convolution carries out convolution operation on the characteristic main information characteristic diagram so as to ensure the integrity of the characteristic diagram information.
The second convolution operation module is used for performing packet convolution and BN operation on the characteristic supplementary information feature graph and outputting a supplementary feature graph; in the embodiment, the grouped convolution is adopted to carry out the grouped convolution operation on the characteristic graph representing the supplementary information, so as to supplement the overall characteristics and reduce the information loss.
The feature map fusion module is used for splicing the output main feature map and the output supplementary feature map along the depth direction, adding the spliced output main feature map and the output supplementary feature map to the preprocessed input feature map, and then outputting the result through ReLU activation; the preprocessed input feature map means that the shape C multiplied by H multiplied by W of the input feature map is adjusted by '1 multiplied by 1 convolution + BN', so that the input feature map is consistent with the spliced result shape of the output main feature map and the output supplementary feature map along the depth direction.
And the attention mechanism SE module is used for extracting channel weights from the feature map output by the feature map fusion module through the SE module, multiplying the extracted weights with the feature map output by the feature map fusion module and outputting N output feature maps.
The method for processing the feature map by adopting the flexible separable deep learning convolution framework comprises the following steps:
(1) acquiring M input feature maps, and dividing the input feature maps into a characterization main information feature map Mrep = (1-alpha) M and a characterization supplementary information feature map Mred = alpha M according to a hyper-parameter supplementary information ratio alpha;
(2) outputting a main characteristic diagram after performing common convolution and BN operation on the characterization main information characteristic diagram, and outputting a supplementary characteristic diagram after performing grouping convolution and BN operation on the characterization supplementary information characteristic diagram;
(3) splicing the output main feature map and the output supplementary feature map along the depth direction, adding the spliced output main feature map and the output supplementary feature map to the input feature map subjected to 1 multiplied by 1 convolution and BN operation, and finally outputting the added output feature map through ReLU activation;
(4) and extracting the channel weight of the feature map of the activation output, and multiplying the extracted weight by the feature map of the activation output to generate N output feature maps.
Example 2
As shown in fig. 2, this embodiment provides a VGG convolutional neural network, which includes a convolutional layer, a pooling layer and a fully-connected layer, and this embodiment only changes the structure of the convolutional layer, and changes the convolutional layer into the convolutional framework provided in embodiment 1, and this VGG convolutional neural network is referred to as FSConv _ VGG.
Example 3
As shown in fig. 3, this embodiment provides a ResNet-20 network, which includes a residual block, where the residual block includes a first convolution layer and a second convolution layer that are connected in sequence, where the first convolution layer and the second convolution layer adopt convolution frames provided in embodiment 1 and have the same structure, and in this embodiment, a super-parameter channel scaling factor β is introduced into an output channel of the first convolution layer and an input channel of the second convolution layer, and a value of the channel scaling factor β depends on an apparatus memory and a calculation resource; this ResNet-20 network is called FSBnegk _ ResNet-20.
The original convolutional layer of VGG-16 and the original residual module of ResNet-20 were replaced with example 2 and example 3, respectively, and the validity of the convolution framework provided by example 1 was verified on different public data sets (CIFAR-10 and CIFAR-100).
VGG-16 on CIFAR-10/100
The CIFAR-10/100 data set consists of fifty thousand trainable color images and ten thousand testable color images of size 32 × 32 pixels, containing 10 classes and 100 classes, respectively. VGG-16, which has 13 convolutional layers and 3 fully-connected layers, was originally designed for class 1000 ImageNet. For the CIFAR-10/100 dataset, variants thereof widely used in the literature were selected: VGG-15 with 2 fully connected layers, equipped with batch normalization after each layer. The proposed FSConv replaces all 3 x 3 convolutional layers, keeping all other configurations unchanged. All models were trained on the CIFAR-10/100 dataset for 200 generations. Optimization was performed using an SGD optimizer, momentum 0.9, weight decay 5e-4, batch size 128, initial learning rate 0.1, dividing by 10 every 50 generations. The image was data enhanced by random horizontal flipping and edge zero-filled 4-pixel random cropping to prevent model overfitting.
First analyze one hyper-parameter of FSConv: the supplemental information accounts for the effect of alpha on the model performance and is then compared to advanced methods.
Exploring the effect of hyper-parameters on model performance on CIFAR-10
FSConv contains only one hyper-parameter: the supplemental information accounts for a. And dividing the input feature map of the same layer into a feature map containing the supplementary information and a feature map containing the main information by setting the size of the hyperparameter alpha. The larger the total number of feature maps used for common convolution in the same-layer feature map is, the easier the model extracts information from each type of feature map, and the higher the model performance is, however, the higher the calculation cost is. FIG. 4 is the result of our FSConv _ VGG-16 exploring the effect of the hyperparameter α on model performance on CIFAR-10. The ratio (1- α) represents the ratio of the feature map Mrep for the normal convolution to the input feature map M, and therefore, as can be seen from the experimental results, the ratio (1- α) is gradually reduced, and the model calculation amount and the parameter amount are sharply reduced, at which time the model does not have any serious accuracy loss. The (1-alpha) can be properly adjusted according to specific hardware conditions, so that the model calculation quantity and the parameter quantity meet the use requirement.
Comparison with advanced models
The methods of comparison include different types of model compression methods: GhostNet, SPConv.
As shown in FIG. 5, on the CIFAR-10 dataset, the FSConv _ VGG-16 model of example 2 achieved comparable accuracy (93.8%) to the Baseline model VGG-16_ Baseline with a calculated amount compression of 32.58% and a parameter amount compression of 37.97% of the original. This shows that there is considerable redundancy in the VGG model, the class of the main information included in each layer of input feature map for generating the output feature map is limited, the extraction of the supplementary information does not need expensive ordinary convolution execution, and the cheap packet convolution can meet the requirement of feature extraction. Compared to advanced models, the FSConv model outperforms all competitors with superior performance but fewer floating point operations, close parameter quantities.
The advanced performance of the FSConv model to perform small-size image classification was verified on CIFAR-10, and for greater challenges, the fine-grained identification capability of the FSConv model was verified on a CIFAR-100 dataset (containing 20 large classes, each containing 5 small classes, for a total of 100 classes).
As shown in FIG. 6, on the CIFAR-100 dataset, the FSConv _ VGG-16 model exceeded baseline 1.79% on Top-5 and 0.97% on Top-1 with a calculated amount of compression of 31.27% and a parameter amount of compression of 37.09% of the original; the FSConv _ VGG-16 model exceeded the VGG-16_ Baseline Baseline by 1.77% at Top-5 and 0.82% at Top-1 with a calculated amount compression of 19.50% and a parameter amount compression of 25.24% of the original. At the same time, it can be seen that advanced models with larger computational load, similar or larger parameters than the FSConv _ VGG-16 model achieve lower accuracy than our model.
ResNet on CIFAR-10/100
ResNet is composed of three stages, each stage includes 16, 32, 64 filters, namely the characteristic diagram of ResNet is much less than that of VGG-16, 64-128-256-512. ResNet-20 contains only a 0.27M parameter amount, which is about 1.8% of the VGG-16 parameter amount. This means that the ResNet-20 network structure is more compact, the characteristic diagram containing the supplementary information in each layer of characteristic diagram is less, and most of the characteristic diagram is provided with the main body information. It would be more challenging for the FSBneck model to make the network more lightweight effectively. The FSBneck model replaces all residual modules of the ResNet-20 baseline network, keeping all other configurations unchanged. All models were trained on the CIFAR-10/100 dataset for 200 generations. Optimization was performed using an SGD optimizer, momentum 0.9, weight decay 5e-4, batch size 128, initial learning rate 0.1, dividing by 10 every 75 generations. The image was data enhanced by random horizontal flipping and edge zero-filled 4-pixel random cropping to prevent model overfitting.
First two hyper-parameters of FSBneck were analyzed: the effect of the supplemental information duty ratio alpha and the channel scaling factor beta on the model performance was then explored for the feasibility of the shallow network ResNet-20 instead of the deep network ResNet-56/110, and finally compared to the advanced approach.
Exploring the influence of hyperparameters on the performance of FSBnegk model on CIFAR-10
Resnet-20 is already much smaller than VGG-16, and the number of profiles containing homogeneous body information is much smaller. Most of main information categories even have only one feature map, in order to ensure that the main information is not lost, channel expansion is carried out on a first convolution output channel/a second convolution input channel of a residual error module, the number of feature maps containing the same type of main information in each convolution layer is increased, and at the moment, the network can be effectively compressed by adjusting the supplement information occupation ratio alpha of the convolution layers.
As shown in FIG. 7, by adjusting the two superparameters α and β simultaneously, the quantities of parameters or floating point calculations are made to approach each other at ResNet-20. As can be seen from the table, (1) when the parameter quantities are close to each other, the larger the calculated quantity is, the network performance is in an increasing trend; (2) when the calculated amount is close to each other, the larger the parameter amount is, the network performance still tends to rise. Therefore, in practical application, for a specific mobile device, a hyper-parameter combination mode suitable for the device can be flexibly selected according to the hardware performance of the mobile device.
Possibility of substituting ResNet-56/110 with FSBneck _ Resnet-20
As shown in FIG. 8, by observing the classification performance of the baseline ResNet-20/56/110 on CIFAR-10, it can be seen that the network performance tends to increase as the network deepens. However, it is also found from the graph that the excellent performance of the deep network is obtained at the cost of a large increase of the calculation amount and the parameter amount, and the relationship is not that
Figure 815086DEST_PATH_IMAGE001
The slope of (a) increases. I.e., at a greater cost to achieve a small amount of performance gain. Further, deep networks are more difficult to train. Aiming at the problems, the calculation amount and the parameter amount are properly increased in a manner of expanding the ResNet-20 channel of the shallow network, but the calculation cost is always less than that of the deep network, and the ResNet-20 performance reaches or even exceeds that of the ResNet-56/110 of the deep network.
Still in contrast to several representative advanced model compression methods on ResNet: GhostNet, SPConv.
And taking the parameter quantity of the advanced model as the memory capacity of the mobile equipment required to be deployed. In the experiment, under the principle that the parameter number is similar to that of the advanced model, two hyper-parameters alpha and beta of the FSBnegk _ ResNet-20 are adjusted, and models with different calculated quantities can be obtained. The table only shows the performance of part of the compression network, and a reader can adjust two hyper-parameters according to actual requirements, so that the floating point calculation amount and the parameter amount meet the requirements of mobile equipment needing to be deployed, and the excellent compression network is obtained.
As shown in FIG. 9, in CIFAR-10_ ResNet-20, the FSBnegk _ ResNet-20 model takes the best network performance with a parameter slightly higher than the advanced network while the computation is lower than them. In CIFAR-10_ ResNet-56, the FSBneck _ ResNet-20 model can maximally compress the calculated amount of the original ResNet-56 network to 17.85%, the parameter amount is compressed to 25.00%, the precision is only reduced by 0.84%, and the model performance is 1.10% better than that of the most advanced model under the condition that the calculated amount is obviously reduced than that of the most advanced model; the FSBnegk _ ResNet-20 model compresses the original ResNet-56 network computation to 33.18%, and the parameter to 37.34%, which can reach the accuracy comparable to the baseline. In CIFAR-10_ ResNet-110, the maximum compression of the calculation amount of the original ResNet-110 network by the FSBnegk _ ResNet-20 model is 14.58%, the compression of the parameter amount is 24.81%, the precision is reduced by 1.61%, and under the condition that the calculation amount is half of that of the most advanced model, the performance of the FSBnegk _ ResNet-20 model is 0.64% better than that of the most advanced model; the FSBnegk _ ResNet-20 model compresses the original ResNet-110 network computation to 33.48%, and the parameter to 37.42%, which can achieve the accuracy comparable to the baseline.
As shown in FIG. 10, in CIFAR-100_ ResNet-20, the FSBneck _ ResNet-20 model compressed the original ResNet-20 network computation to 35.11%, and the parameter compressed to 47.05% of the original, comparable accuracy to the baseline was achieved. In CIFAR-100_ ResNet-56, the FSBneck _ ResNet-20 model can maximally compress the calculated amount of an original ResNet-56 network to 17.85%, and compress the parameter amount to 25.51% of the original parameter amount, so that the Top-5 precision comparable to the baseline can be achieved; the FSBnegk _ ResNet-20 model compresses the original ResNet-56 network calculated amount to 33.18%, the parameter amount to 37.77% and the Top-5 precision of the FSBnegk _ ResNet-20 model exceeds the baseline model by 0.29%. In CIFAR-100_ ResNet-110, the FSBnegk _ ResNet-20 model can maximally compress the calculated amount of the original ResNet-110 network to 14.13%, the parameter amount is compressed to 26.94% of the original parameter amount, and the FSBnegk _ ResNet-20 model can still realize Top-5 precision exceeding that of the baseline model; the FSBnegk _ ResNet-20 model compresses the original ResNet-110 network calculated amount to 25.12%, the parameter amount to 37.28%, and the Top-5 precision of the FSBnegk _ ResNet-20 model can exceed the baseline model by 0.8%. At the same time, it can be seen that advanced models with larger computational load, similar or larger parameters than the FSBnegk _ ResNet-20 model achieve lower accuracy than the FSBnegk _ ResNet-20 model.
The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any modification and replacement based on the technical solution and inventive concept provided by the present invention should be covered within the scope of the present invention.

Claims (7)

1. A flexible separable deep learning convolution framework characterized by: the system comprises a feature map clustering and dividing module, a first convolution operation module, a second convolution operation module, a feature map fusion module, an attention mechanism SE module, M input channels and N output channels;
the characteristic graph clustering and dividing module is used for dividing the M input characteristic graphs into a characteristic main information characteristic graph and a characteristic supplementary information characteristic graph;
the first convolution operation module is used for performing common convolution and BN operation on the characteristic main information characteristic diagram and outputting a main characteristic diagram;
the second convolution operation module is used for performing packet convolution and BN operation on the characteristic supplementary information feature graph and outputting a supplementary feature graph;
the feature map fusion module is used for splicing the output main feature map and the output supplementary feature map along the depth direction, adding the spliced output main feature map and the output supplementary feature map to the preprocessed input feature map, and then outputting the input feature map through the ReLU activation;
and the attention mechanism SE module is used for multiplying the extracted feature map channel weight output by the feature map fusion module and the feature map output by the feature map fusion module to output N output feature maps.
2. The flexible separable deep learning convolution framework of claim 1, wherein: the characteristic graph clustering and dividing module is used for dividing the input characteristic graph into a representation main information characteristic graph Mrep and a representation supplementary information characteristic graph Mred according to a hyper-parameter supplementary characteristic information ratio alpha, and the supplementary characteristic information ratio alpha belongs to (0, 1).
3. The flexible separable deep learning convolution framework of claim 2 wherein: the characterization main information feature map Mrep ═ M (1- α), and the characterization supplemental information feature map Mred ═ α M.
4. The flexible separable deep learning convolution framework of claim 1, wherein: the preprocessed input feature map is the feature map obtained by subjecting the input feature map to 1 × 1 convolution and BN operation.
5. Use of a flexible separable deep learning convolution frame according to any one of claims 1 to 4 in a VGG convolutional neural network.
6. Use of a flexible separable deep learning convolution framework according to any one of claims 1 to 4 in a ResNet network residual block; the method is characterized in that: two of the flexible separable deep learning convolution frameworks are included, and a hyperparametric channel scaling factor β is set for the output channel of the first convolution framework and the input channel of the second convolution framework.
7. A feature extraction method of a flexible separable deep learning convolution framework is characterized by comprising the following steps:
(1) acquiring M input feature maps, and dividing the input feature maps into a characterization main information feature map Mrep (1-alpha) M and a characterization supplementary information feature map Mred (alpha M) according to a hyper-parameter supplementary information ratio alpha;
(2) outputting a main characteristic diagram after performing common convolution and BN operation on the characterization main information characteristic diagram, and outputting a supplementary characteristic diagram after performing grouping convolution and BN operation on the characterization supplementary information characteristic diagram;
(3) splicing the output main feature map and the output supplementary feature map along the depth direction, adding the spliced output main feature map and the output supplementary feature map to the input feature map subjected to 1 multiplied by 1 convolution and BN operation, and finally outputting the added output feature map through ReLU activation;
(4) and extracting the channel weight of the feature map of the activation output, and multiplying the extracted weight by the feature map of the activation output to generate N output feature maps.
CN202011199528.8A 2020-11-02 2020-11-02 Flexible separable convolution framework and feature extraction method and application thereof in VGG and ResNet Active CN112016639B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011199528.8A CN112016639B (en) 2020-11-02 2020-11-02 Flexible separable convolution framework and feature extraction method and application thereof in VGG and ResNet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011199528.8A CN112016639B (en) 2020-11-02 2020-11-02 Flexible separable convolution framework and feature extraction method and application thereof in VGG and ResNet

Publications (2)

Publication Number Publication Date
CN112016639A CN112016639A (en) 2020-12-01
CN112016639B true CN112016639B (en) 2021-01-26

Family

ID=73527739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011199528.8A Active CN112016639B (en) 2020-11-02 2020-11-02 Flexible separable convolution framework and feature extraction method and application thereof in VGG and ResNet

Country Status (1)

Country Link
CN (1) CN112016639B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949460B (en) * 2021-02-26 2024-02-13 陕西理工大学 Human behavior network model based on video and identification method
CN113850368A (en) * 2021-09-08 2021-12-28 深圳供电局有限公司 Lightweight convolutional neural network model suitable for edge-end equipment
CN117524252B (en) * 2023-11-13 2024-04-05 北方工业大学 Light-weight acoustic scene perception method based on drunken model

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108288075A (en) * 2018-02-02 2018-07-17 沈阳工业大学 A kind of lightweight small target detecting method improving SSD
CN108470208A (en) * 2018-02-01 2018-08-31 华南理工大学 It is a kind of based on be originally generated confrontation network model grouping convolution method
CN110197217A (en) * 2019-05-24 2019-09-03 中国矿业大学 It is a kind of to be interlocked the image classification method of fused packet convolutional network based on depth
CN110490866A (en) * 2019-08-22 2019-11-22 四川大学 Metal based on depth characteristic fusion increases material forming dimension real-time predicting method
CN110680278A (en) * 2019-09-10 2020-01-14 广州视源电子科技股份有限公司 Electrocardiosignal recognition device based on convolutional neural network
CN110728200A (en) * 2019-09-23 2020-01-24 武汉大学 Real-time pedestrian detection method and system based on deep learning
CN110766721A (en) * 2019-09-30 2020-02-07 南京航空航天大学 Carrier landing cooperative target detection method based on airborne vision
CN110782001A (en) * 2019-09-11 2020-02-11 东南大学 Improved method for using shared convolution kernel based on group convolution neural network
CN111209921A (en) * 2020-01-07 2020-05-29 南京邮电大学 License plate detection model based on improved YOLOv3 network and construction method
CN111461144A (en) * 2020-03-31 2020-07-28 中国科学院计算技术研究所 Method for accelerating convolutional neural network

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704866B (en) * 2017-06-15 2021-03-23 清华大学 Multitask scene semantic understanding model based on novel neural network and application thereof
CN110796027B (en) * 2019-10-10 2023-10-17 天津大学 Sound scene recognition method based on neural network model of tight convolution
CN111311538B (en) * 2019-12-28 2023-06-06 北京工业大学 Multi-scale lightweight road pavement detection method based on convolutional neural network
CN111523546B (en) * 2020-04-16 2023-06-16 湖南大学 Image semantic segmentation method, system and computer storage medium
CN111753736A (en) * 2020-06-24 2020-10-09 北京软通智慧城市科技有限公司 Human body posture recognition method, device, equipment and medium based on packet convolution

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108470208A (en) * 2018-02-01 2018-08-31 华南理工大学 It is a kind of based on be originally generated confrontation network model grouping convolution method
CN108288075A (en) * 2018-02-02 2018-07-17 沈阳工业大学 A kind of lightweight small target detecting method improving SSD
CN110197217A (en) * 2019-05-24 2019-09-03 中国矿业大学 It is a kind of to be interlocked the image classification method of fused packet convolutional network based on depth
CN110490866A (en) * 2019-08-22 2019-11-22 四川大学 Metal based on depth characteristic fusion increases material forming dimension real-time predicting method
CN110680278A (en) * 2019-09-10 2020-01-14 广州视源电子科技股份有限公司 Electrocardiosignal recognition device based on convolutional neural network
CN110782001A (en) * 2019-09-11 2020-02-11 东南大学 Improved method for using shared convolution kernel based on group convolution neural network
CN110728200A (en) * 2019-09-23 2020-01-24 武汉大学 Real-time pedestrian detection method and system based on deep learning
CN110766721A (en) * 2019-09-30 2020-02-07 南京航空航天大学 Carrier landing cooperative target detection method based on airborne vision
CN111209921A (en) * 2020-01-07 2020-05-29 南京邮电大学 License plate detection model based on improved YOLOv3 network and construction method
CN111461144A (en) * 2020-03-31 2020-07-28 中国科学院计算技术研究所 Method for accelerating convolutional neural network

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
Hang Zhang等.ResNeSt: Split-Attention Networks.《arXiv:2004.08955v1 [cs.CV]》.2020,第1-22页. *
Jie Hu等.Squeeze-and-Excitation Networks.《arXiv:1709.01507v4 [cs.CV]》.2019,第1-13页. *
Kai Han等.GhostNet: More Features from Cheap Operations.《arXiv:1911.11907v2 [cs.CV]》.2020,第1-10页. *
Qiulin Zhang等.Split to Be Slim: An Overlooked Redundancy in Vanilla Convolution.《arXiv:2006.12085v1 [cs.CV]》.2020,第1-7页. *
Ting Zhang等.Interleaved Group Convolutions.《Proceedings of the IEEE International Conference on Computer Vision (ICCV)》.2017,第4373-4382页. *
元润一.基于深度学习的检查与分割并行算法研究.《中国优秀硕士学位论文全文数据库·信息科技辑》.2019,(第12期),第I140-60页. *
周悦等.基于分组模块的卷积神经网络设计.《微电子学与计算机》.2019,第36卷(第2期),第68-72页. *
王瀚洋.基于全卷积网络的目标检测算法的研究与应用.《中国优秀硕士学位论文全文数据库·信息科技辑》.2019,(第9期),第I138-648页. *

Also Published As

Publication number Publication date
CN112016639A (en) 2020-12-01

Similar Documents

Publication Publication Date Title
CN112016639B (en) Flexible separable convolution framework and feature extraction method and application thereof in VGG and ResNet
CN110188795B (en) Image classification method, data processing method and device
CN110188239B (en) Double-current video classification method and device based on cross-mode attention mechanism
WO2023185243A1 (en) Expression recognition method based on attention-modulated contextual spatial information
CN111898709B (en) Image classification method and device
CN114067153B (en) Image classification method and system based on parallel double-attention light-weight residual error network
CN112236779A (en) Image processing method and image processing device based on convolutional neural network
CN112712528B (en) Intestinal tract focus segmentation method combining multi-scale U-shaped residual error encoder and integral reverse attention mechanism
CN114445430B (en) Real-time image semantic segmentation method and system for lightweight multi-scale feature fusion
Zhang et al. Lightweight and efficient asymmetric network design for real-time semantic segmentation
CN110852961A (en) Real-time video denoising method and system based on convolutional neural network
CN108664993B (en) Dense weight connection convolutional neural network image classification method
EP4239585A1 (en) Video loop recognition method and apparatus, computer device, and storage medium
CN112164077B (en) Cell instance segmentation method based on bottom-up path enhancement
CN113920581A (en) Method for recognizing motion in video by using space-time convolution attention network
CN115171052B (en) Crowded crowd attitude estimation method based on high-resolution context network
CN114821058A (en) Image semantic segmentation method and device, electronic equipment and storage medium
CN114996495A (en) Single-sample image segmentation method and device based on multiple prototypes and iterative enhancement
Hua et al. Dynamic scene deblurring with continuous cross-layer attention transmission
CN117541505A (en) Defogging method based on cross-layer attention feature interaction and multi-scale channel attention
CN110472732B (en) Image feature extraction system based on optimized feature extraction device
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping
CN112989955B (en) Human body action recognition method based on space-time double-flow heterogeneous grafting convolutional neural network
CN113627368B (en) Video behavior recognition method based on deep learning
CN113887419B (en) Human behavior recognition method and system based on extracted video space-time information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant