CN112560732B

CN112560732B - Feature extraction method of multi-scale feature extraction network

Info

Publication number: CN112560732B
Application number: CN202011530198.6A
Authority: CN
Inventors: 潘新建; 张崇富; 邓春健; 杨亮; 吴洁滢; 李奇; 李志莉; 徐世祥; 王婷瑶; 温贺平; 高庆国; 刘凯; 迟锋; 刘黎明
Original assignee: University of Electronic Science and Technology of China Zhongshan Institute
Current assignee: Zhongshan Lanqi Technology Co ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2023-07-04
Anticipated expiration: 2040-12-22
Also published as: CN112560732A

Abstract

The invention discloses a feature extraction method of a multi-scale feature extraction network, which comprises a dimension reduction convolution layer, a scale feature extraction layer, a merging layer and a feature fusion layer which are sequentially connected, wherein the scale feature extraction layer comprises a large target detection branch, an original feature detection branch and a small target detection branch, features with different scales can be extracted, and feature fusion is carried out through the feature fusion layer, so that the multi-scale feature extraction network has the multi-scale feature extraction capability and low calculation complexity, the feature extraction network can be immediately applied when multi-scale feature extraction is needed, the target detection precision is improved, and the feature extraction method of the multi-scale feature extraction network can carry out feature dimension reduction, target detection and core multi-scale feature extraction on an input feature image to be extracted, can rapidly obtain the scale feature extraction image, and has the advantages of high target detection precision and less calculation quantity.

Description

A feature extraction method for multi-scale feature extraction network

技术领域technical field

本发明涉及一种特征提取神经网络，特别是一种多尺度特征提取网络及该网络的特征提取方法。The invention relates to a feature extraction neural network, in particular to a multi-scale feature extraction network and a feature extraction method of the network.

背景技术Background technique

多尺度目标检测一直是计算机视觉领域研究的热点和难点，为获得多尺度目标检测的精度提升，FPN、PA-Net、NAS-FPN、BiFPN等网络结构不断被提出，但由于上述网络结构相对复杂，在提升目标检测精度的同时也携带过多的计算量，令推理时间出现延迟，使得多尺度目标检测在工业界的运用和推广变得困难。Multi-scale target detection has always been a hotspot and difficulty in the field of computer vision research. In order to improve the accuracy of multi-scale target detection, network structures such as FPN, PA-Net, NAS-FPN, and BiFPN have been continuously proposed. , while improving the accuracy of target detection, it also carries too much calculation, which delays the reasoning time, making it difficult to apply and promote multi-scale target detection in the industry.

发明内容Contents of the invention

为了克服现有技术的不足，本发明提供一种提升目标检测精度的多尺度特征提取网络的特征提取方法。In order to overcome the deficiencies of the prior art, the present invention provides a feature extraction method of a multi-scale feature extraction network that improves target detection accuracy.

本发明解决其技术问题所采用的技术方案是：The technical solution adopted by the present invention to solve its technical problems is:

一种多尺度特征提取网络的特征提取方法，多尺度特征提取网络，包括依次相连的降维卷积层、尺度特征提取层、合并层和特征融合层，所述尺度特征提取层包括大目标检测分支、原始特征检测分支和小目标检测分支。A feature extraction method of a multi-scale feature extraction network. The multi-scale feature extraction network includes a dimensionality reduction convolution layer, a scale feature extraction layer, a merge layer, and a feature fusion layer connected in sequence. The scale feature extraction layer includes a large target detection layer. branch, original feature detection branch and small object detection branch.

所述大目标检测分支包括依次相连的下采样特征图层、第一空洞卷积层和上采样恢复层；所述原始特征检测分支包括第二空洞卷积层；所述小目标检测分支包括依次相连的上采样特征图层、第三空洞卷积层和下采样恢复层。The large target detection branch includes a down-sampling feature layer, the first dilated convolution layer, and an up-sampling restoration layer connected in sequence; the original feature detection branch includes a second dilated convolution layer; the small target detection branch includes sequentially Connected upsampled feature layer, third dilated convolutional layer and downsampled recovery layer.

所述第一空洞卷积层包括三个卷积核均为3*3的空洞卷积，且该三个空洞卷积的空洞率分别为1、2和3；所述第二空洞卷积层和第三空洞卷积层的结构均与所述第一空洞卷积层相同。The first dilated convolutional layer includes three dilated convolutions whose convolution kernels are all 3*3, and the dilated rates of the three dilated convolutions are 1, 2 and 3 respectively; the second dilated convolutional layer The structures of the third dilated convolutional layer and the first dilated convolutional layer are the same.

所述降维卷积层以及所述特征融合层的卷积核均为1*1。The convolution kernels of the dimensionality reduction convolution layer and the feature fusion layer are both 1*1.

多尺度特征提取网络的特征提取方法包括以下步骤如下：The feature extraction method of the multi-scale feature extraction network includes the following steps as follows:

（1）、将待提取特征图输入至降维卷积层内进行1x1的卷积，将待提取特征图进行特征融合与降维，形成原始特征图；(1) Input the feature map to be extracted into the dimensionality reduction convolution layer for 1x1 convolution, and perform feature fusion and dimensionality reduction on the feature map to be extracted to form the original feature map;

（2）、下采样特征图层对原始特征图进行下采样，构成下采样特征图；上采样特征图层对原始特征图进行上采样，构成上采样特征图；(2) The downsampled feature layer downsamples the original feature map to form a downsampled feature map; the upsampled feature layer upsamples the original feature map to form an upsampled feature map;

（3）、第一空洞卷积层至第三空洞卷积层分别对下采样特征图、原始特征图和上采样特征图进行三个3*3的空洞卷积，生成下采样第一尺度图、下采样第二尺度图、下采样第三尺度图、原始第一尺度图、原始第二尺度图、原始第三尺度图、上采样第一尺度图、上采样第二尺度图和上采样第三尺度图；(3), the first hole convolution layer to the third hole convolution layer respectively perform three 3*3 hole convolutions on the downsampled feature map, original feature map and upsampled feature map to generate the first downsampled scale map , downsampled second scale map, downsampled third scale map, original first scale map, original second scale map, original third scale map, upsampled first scale map, upsampled second scale map, and upsampled third scale map Three-scale map;

（4）、上采样恢复层分别对下采样第一尺度图、下采样第二尺度图和下采样第三尺度图进行上采样，下采样恢复层分别对上采样第一尺度图、上采样第二尺度图和上采样第三尺度图进行下采样；(4) The upsampling restoration layer performs upsampling on the downsampled first scale map, the downsampled second scale map and the downsampled third scale map respectively, and the downsampling restoration layer respectively performs upsampling on the upsampled first scale map, upsampled second scale map The second-scale map and the up-sampled third-scale map are down-sampled;

（5）、合并层将下采样第一尺度图、下采样第二尺度图、下采样第三尺度图分别进行上采样得到的三个尺度图，原始第一尺度图、原始第二尺度图、原始第三尺度图，上采样第一尺度图、上采样第二尺度图和上采样第三尺度图分别进行下采样得到的三个尺度图合并后被特征融合层进行1*1的卷积融合，形成尺度特征提取图。(5) The merging layer performs upsampling of the downsampled first scale map, downsampled second scale map, and downsampled third scale map respectively to three scale maps obtained by upsampling, the original first scale map, the original second scale map, The original third scale image, the upsampled first scale image, the upsampled second scale image and the upsampled third scale image are respectively down-sampled to obtain three scale images that are merged and then 1*1 convolution fusion is performed by the feature fusion layer , forming a scale feature extraction map.

本发明的有益效果是：本发明能提取不同尺度的特征，再通过特征融合层进行特征融合，令多尺度特征提取网络具有多尺度特征提取能力的同时计算复杂度低，需要进行多尺度特征提取时能即时应用本特征提取网络，提升目标检测精度，而该多尺度特征提取网络的特征提取方法能对输入的待提取特征图进行特征降维、分目标检测以及核心的多尺度特征提取，能快速获取尺度特征提取图，且具有目标检测精度高、运算量少的优点。The beneficial effects of the present invention are: the present invention can extract features of different scales, and then perform feature fusion through the feature fusion layer, so that the multi-scale feature extraction network has multi-scale feature extraction capabilities and low computational complexity, requiring multi-scale feature extraction The feature extraction network can be applied immediately to improve the accuracy of target detection, and the feature extraction method of the multi-scale feature extraction network can perform feature dimensionality reduction, sub-target detection and core multi-scale feature extraction on the input feature map to be extracted. Quickly obtain the scale feature extraction map, and has the advantages of high target detection accuracy and less calculation.

附图说明Description of drawings

下面结合附图和实施例对本发明进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

图1是本发明的网络结构示意图；Fig. 1 is a schematic diagram of the network structure of the present invention;

图2是本发明的特征提取方法流程图。Fig. 2 is a flowchart of the feature extraction method of the present invention.

实施方式Implementation

参照图1，一种多尺度特征提取网络的特征提取方法，多尺度特征提取网络，包括依次相连的降维卷积层1、尺度特征提取层、合并层2和特征融合层3，所述尺度特征提取层包括大目标检测分支、原始特征检测分支和小目标检测分支，能提取不同尺度的特征，再通过特征融合层3进行特征融合，令多尺度特征提取网络具有多尺度特征提取能力的同时计算复杂度低，神经网络需要进行多尺度特征提取时能即时应用本特征提取网络，提升目标检测精度。Referring to Fig. 1 , a feature extraction method of a multi-scale feature extraction network, a multi-scale feature extraction network, includes successively connected dimensionality reduction convolution layers 1, scale feature extraction layers, merging layers 2 and feature fusion layers 3, the scale The feature extraction layer includes the large target detection branch, the original feature detection branch and the small target detection branch, which can extract features of different scales, and then perform feature fusion through the feature fusion layer 3, so that the multi-scale feature extraction network has the ability to extract multi-scale features at the same time The computational complexity is low, and the feature extraction network can be applied immediately when the neural network needs to perform multi-scale feature extraction to improve the accuracy of target detection.

所述大目标检测分支包括依次相连的下采样特征图层4、第一空洞卷积层5和上采样恢复层6；所述原始特征检测分支包括第二空洞卷积层7；所述小目标检测分支包括依次相连的上采样特征图层8、第三空洞卷积层9和下采样恢复层10，本实施例中，采用上采样的方式是UP-CONV，下采样的方式是MAXPOOL。The large target detection branch includes a downsampling feature layer 4, a first dilated convolution layer 5, and an upsampling restoration layer 6 connected in sequence; the original feature detection branch includes a second dilated convolutional layer 7; the small target The detection branch includes an upsampling feature layer 8, a third atrous convolution layer 9, and a downsampling restoration layer 10 connected in sequence. In this embodiment, the upsampling method is UP-CONV, and the downsampling method is MAXPOOL.

所述第一空洞卷积层5包括三个卷积核均为3*3的空洞卷积，且该三个空洞卷积的空洞率分别为1、2和3；所述第二空洞卷积层7和第三空洞卷积层9的结构均与所述第一空洞卷积层5相同，都是具有三个卷积核均为3*3的空洞卷积，空洞率分别为1、2和3，每个卷积核只作用该空洞卷积层通道数的1/3，令大目标检测分支、原始特征检测分支和小目标检测分支均具备多尺度特征提取能力。The first dilated convolution layer 5 includes three dilated convolutions whose convolution kernels are 3*3, and the dilated rates of the three dilated convolutions are 1, 2 and 3 respectively; the second dilated convolution The structures of layer 7 and the third dilated convolutional layer 9 are the same as the first dilated convolutional layer 5, and they all have dilated convolutions with three convolution kernels of 3*3, and the dilated ratios are 1 and 2 respectively. And 3, each convolution kernel only acts on 1/3 of the channel number of the dilated convolutional layer, so that the large object detection branch, the original feature detection branch and the small object detection branch all have multi-scale feature extraction capabilities.

所述降维卷积层1以及所述特征融合层3的卷积核均为1*1，令降维卷积层1和特征融合层3具备特征降维以及特征融合能力，且节约计算时间。The convolution kernels of the dimensionality reduction convolution layer 1 and the feature fusion layer 3 are both 1*1, so that the dimensionality reduction convolution layer 1 and the feature fusion layer 3 have feature dimensionality reduction and feature fusion capabilities, and save computing time .

参照图1和图2，多尺度特征提取网络的特征提取方法包括以下步骤如下：Referring to Figure 1 and Figure 2, the feature extraction method of the multi-scale feature extraction network includes the following steps as follows:

（1）、将待提取特征图输入至降维卷积层1内进行1x1的卷积，将待提取特征图进行特征融合与降维，形成原始特征图，是降维卷积层对输入的待提取特征图实现特征的融合及特征的降维，可以节约计算时间，原始特征图的特征深度降为待提取特征图的原始深度的1/3。(1) Input the feature map to be extracted into the dimensionality reduction convolution layer 1 for 1x1 convolution, and perform feature fusion and dimensionality reduction on the feature map to be extracted to form the original feature map, which is the input of the dimensionality reduction convolution layer The feature map to be extracted realizes feature fusion and feature dimensionality reduction, which can save computing time, and the feature depth of the original feature map is reduced to 1/3 of the original depth of the feature map to be extracted.

（2）、下采样特征图层4对原始特征图进行下采样，构成下采样特征图，下采样特征图的深度与原始特征图相同不变，图像宽高变为原来的1/2倍，下采样的目的是实现大目标的检测及减少运算量。(2) The downsampling feature layer 4 downsamples the original feature map to form a downsampling feature map. The depth of the downsampling feature map is the same as the original feature map, and the width and height of the image become 1/2 times of the original. The purpose of downsampling is to realize the detection of large objects and reduce the amount of computation.

上采样特征图层8对原始特征图进行上采样，构成上采样特征图；上采样特征图的同样与原始特征图相同不变，图像宽高变为原来的2倍，上采样的目的是实现小目标的检测。The upsampling feature layer 8 upsamples the original feature map to form an upsampling feature map; the upsampling feature map is also the same as the original feature map, and the image width and height become twice the original. The purpose of upsampling is to achieve Detection of small objects.

（3）、第一空洞卷积层5至第三空洞卷积层9分别对下采样特征图、原始特征图和上采样特征图进行三个3*3的空洞卷积，即下采样特征图、原始特征图和上采样特征图各自都进行三个3*3的空洞卷积，该三个3*3空洞卷积的空洞率分别为1(即标准3x3卷积)，2，3，每个卷积核只作用该层通道数的1/3，因此，就生成下采样第一尺度图11、下采样第二尺度图12、下采样第三尺度图13、原始第一尺度图14、原始第二尺度图15、原始第三尺度图16、上采样第一尺度图17、上采样第二尺度图18和上采样第三尺度图19，令下采样特征图、原始特征图和上采样特征图能接受不同感受野的卷积，从而实现多尺度特征的提取。(3), the first atrous convolution layer 5 to the third atrous convolution layer 9 respectively perform three 3*3 atrous convolutions on the downsampled feature map, original feature map and upsampled feature map, that is, the downsampled feature map , the original feature map and the upsampled feature map each perform three 3*3 hole convolutions, and the hole rates of the three 3*3 hole convolutions are 1 (ie standard 3x3 convolution), 2, 3, each A convolution kernel only acts on 1/3 of the number of channels in this layer. Therefore, the downsampled first scale map 11, the downsampled second scale map 12, the downsampled third scale map 13, the original first scale map 14, The original second scale map 15, the original third scale map 16, the upsampled first scale map 17, the upsampled second scale map 18 and the upsampled third scale map 19, so that the downsampled feature map, the original feature map and the upsampled feature map The feature map can accept the convolution of different receptive fields, so as to realize the extraction of multi-scale features.

（4）、上采样恢复层6分别对下采样第一尺度图11、下采样第二尺度图12和下采样第三尺度图13进行出上采样，下采样恢复层10分别对上采样第一尺度图17、上采样第二尺度图18和上采样第三尺度图19进行下采样，目的是令上采样恢复层6和下采样恢复层10将采样第一尺度图11、下采样第二尺度图12、下采样第三尺度图13、上采样第一尺度图17、上采样第二尺度图18和上采样第三尺度图19的宽高均与原始第一尺度图14、原始第二尺度图15和原始第三尺度图16保持一致，便于步骤五的合并及特征融合。(4) The upsampling restoration layer 6 respectively performs upsampling on the downsampled first scale map 11 , the downsampled second scale map 12 and the downsampled third scale map 13 , and the downsampling restoration layer 10 respectively performs upsampling on the upsampled first scale map The scale map 17, the upsampled second scale map 18 and the upsampled third scale map 19 are down-sampled, so that the upsampling restoration layer 6 and the downsampling restoration layer 10 will sample the first scale map 11 and the downsampling second scale map Figure 12. The width and height of the downsampled third-scale image 13, the up-sampled first-scale image 17, the up-sampled second-scale image 18, and the up-sampled third-scale image 19 are the same as those of the original first-scale image 14 and the original second-scale image. Figure 15 is consistent with the original third scale figure 16, which facilitates the merging and feature fusion of step five.

（5）、合并层2将下采样第一尺度图11、下采样第二尺度图12、下采样第三尺度图13分别进行上采样得到的三个尺度图，原始第一尺度图14、原始第二尺度图15、原始第三尺度图16，上采样第一尺度图17、上采样第二尺度图18和上采样第三尺度图19分别进行下采样得到的三个尺度图合并后被特征融合层3进行1*1的卷积融合，形成尺度特征提取图, 完成多尺度特征提取，并保持与待提取特征图相同的特征深度。(5) Merging layer 2 performs upsampling of the first down-sampled scale map 11, down-sampled second scale map 12, and down-sampled third scale map 13 respectively to three scale maps obtained by upsampling, the original first scale map 14, the original The second scale map 15, the original third scale map 16, the upsampled first scale map 17, the upsampled second scale map 18, and the upsampled third scale map 19 are respectively downsampled to obtain three scale maps that are combined and then characterized The fusion layer 3 performs 1*1 convolution fusion to form a scale feature extraction map, completes multi-scale feature extraction, and maintains the same feature depth as the feature map to be extracted.

以上的实施方式不能限定本发明创造的保护范围，专业技术领域的人员在不脱离本发明创造整体构思的情况下，所做的均等修饰与变化，均仍属于本发明创造涵盖的范围之内。The above embodiments cannot limit the scope of protection of the present invention, and equivalent modifications and changes made by those in the technical field without departing from the overall concept of the present invention still fall within the scope of the present invention.

Claims

1. The characteristic extraction method of the multi-scale characteristic extraction network is characterized in that the multi-scale characteristic extraction network comprises a dimension reduction convolution layer (1), a scale characteristic extraction layer, a merging layer (2) and a characteristic fusion layer (3) which are connected in sequence, wherein the scale characteristic extraction layer comprises a large target detection branch, an original characteristic detection branch and a small target detection branch;

the large target detection branch comprises a downsampling characteristic layer (4), a first cavity convolution layer (5) and an upsampling recovery layer (6) which are connected in sequence; the original feature detection branch comprises a second hole convolution layer (7); the small target detection branch comprises an up-sampling characteristic layer (8), a third cavity convolution layer (9) and a down-sampling recovery layer (10) which are connected in sequence; the first cavity convolution layer (5) comprises three cavity convolutions with convolution kernels of 3*3, and the cavity rates of the three cavity convolutions are 1, 2 and 3 respectively; the structures of the second hole convolution layer (7) and the third hole convolution layer (9) are the same as those of the first hole convolution layer (5);

the convolution kernels of the dimension reduction convolution layer (1) and the feature fusion layer (3) are 1*1;

the feature extraction method of the multi-scale feature extraction network comprises the following steps of:

firstly, inputting a feature image to be extracted into a dimension reduction convolution layer to carry out 1x1 convolution, and carrying out feature fusion and dimension reduction on the feature image to be extracted to form an original feature image;

secondly, the downsampling feature map layer downsamples the original feature map to form a downsampled feature map; the up-sampling feature map layer up-samples the original feature map to form an up-sampling feature map;

thirdly, the first hole convolution layer to the third hole convolution layer respectively conduct three 3*3 hole convolutions on the downsampled feature map, the original feature map and the upsampled feature map to generate a downsampled first scale map, a downsampled second scale map, a downsampled third scale map, an original first scale map, an original second scale map, an original third scale map, an upsampled first scale map, an upsampled second scale map and an upsampled third scale map;

fourthly, the up-sampling recovery layer up-samples the down-sampling first scale map, the down-sampling second scale map and the down-sampling third scale map respectively, and the down-sampling recovery layer down-samples the up-sampling first scale map, the up-sampling second scale map and the up-sampling third scale map respectively;

and fifthly, merging three scale maps obtained by respectively carrying out downsampling on the downsampled first scale map, the downsampled second scale map and the downsampled third scale map by a merging layer, and carrying out 1*1 convolution fusion on the three scale maps obtained by respectively carrying out downsampling on the upsampled first scale map, the upsampled second scale map and the upsampled third scale map by a feature fusion layer to form a scale feature extraction map.