CN113392960A - Target detection network and method based on mixed hole convolution pyramid - Google Patents

Target detection network and method based on mixed hole convolution pyramid Download PDF

Info

Publication number
CN113392960A
CN113392960A CN202110646653.7A CN202110646653A CN113392960A CN 113392960 A CN113392960 A CN 113392960A CN 202110646653 A CN202110646653 A CN 202110646653A CN 113392960 A CN113392960 A CN 113392960A
Authority
CN
China
Prior art keywords
feature
module
network
pyramid
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110646653.7A
Other languages
Chinese (zh)
Other versions
CN113392960B (en
Inventor
殷光强
殷康宁
候少麒
梁杰
丁一寅
曾宇昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110646653.7A priority Critical patent/CN113392960B/en
Publication of CN113392960A publication Critical patent/CN113392960A/en
Application granted granted Critical
Publication of CN113392960B publication Critical patent/CN113392960B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

本发明涉及数字图像处理技术领域,特别是涉及一种基于混合空洞卷积金字塔的目标检测网络及方法,所述目标检测网络包括主干网络、混合感受野模块、低层嵌入式特征金字塔模块和检测模块;所述主干网络使用分层级联的网络结构来提取目标图片特征;所述混合感受野模块将主干网络最顶端输出的最高层特征图进行特征增强;所述低层嵌入式特征金字塔模块在特征金字塔基础上,将高层特征向下融合,并通过低层嵌入的方式,生成最终的待检测特征图;所述检测模块将待检测特征图进行定位及分类,并输出结果。通过本目标检测网络及方法,能有效解决因尺度和遮挡引起的漏检、错检的问题。

Figure 202110646653

The present invention relates to the technical field of digital image processing, in particular to a target detection network and method based on a hybrid hole convolution pyramid. The target detection network includes a backbone network, a hybrid receptive field module, a low-level embedded feature pyramid module and a detection module ; Described backbone network uses hierarchical cascade network structure to extract target picture feature; Described mixed receptive field module carries out feature enhancement by the top-level feature map of backbone network topmost output; Described low-level embedded feature pyramid module is in feature On the basis of the pyramid, the high-level features are fused downward, and the final feature map to be detected is generated by means of low-level embedding; the detection module locates and classifies the feature map to be detected, and outputs the result. Through the target detection network and method, the problems of missed detection and false detection caused by scale and occlusion can be effectively solved.

Figure 202110646653

Description

Target detection network and method based on mixed hole convolution pyramid
Technical Field
The invention relates to the technical field of digital image processing, in particular to a target detection network and a target detection method based on a mixed cavity convolution pyramid.
Background
Object detection is one of the most widespread applications in real life, with the task of focusing on a specific object in a picture. The traditional target detection method can be divided into a single-stage target detection method and a two-stage target detection method, wherein the two-stage method has the core that a region proposing method is adopted, an input image is selectively searched, a region proposing frame is generated, then, a convolutional neural network is used for extracting characteristics of each region proposing frame, and then, a classifier is used for classifying. The single-stage method is to directly output the target detection result through a convolutional neural network.
Through a series of varieties, the common point of the two methods gradually evolves to the method that a large number of Anchor frames are required to be generated in advance in the detection process, and the algorithms are collectively called Anchor-based target detection algorithms. The anchor box is a group of rectangular boxes obtained by utilizing a clustering algorithm on a training set before training, and represents the length and width sizes of the main distribution of the targets in the data set. During reasoning, n candidate rectangular frames are extracted from the anchor frames on the feature diagram, and then the rectangular frames are further classified and regressed. Compared with a two-stage algorithm, the processing of the candidate frame still passes through two steps of foreground coarse classification and multi-class fine classification.
The single-stage target detection algorithm lacks fine processing of a two-stage algorithm, and is not good in performance when the problems of multi-scale and shielding of targets and the like are faced. In addition, although the problem of candidate frame calculation amount explosion caused by selective search is relieved to a certain extent by the Anchor-based algorithm, the generation of a large number of Anchor frames with different sizes in each grid still causes calculation redundancy, and most importantly, the generation of the Anchor frames depends on a large number of super-parameter settings, and the positioning accuracy and the classification effect of the target are seriously influenced by manual parameter adjustment.
In the prior art, a patent with publication number CN110222712A discloses "a multi-item target detection algorithm based on deep learning", the proposed target detection algorithm obtains an augmented RoI set through a multi-scale sliding window and selective search, and takes over the generation of an intensive RoI set through an exhaustive mode with the multi-scale sliding window, which is large in calculation amount and low in efficiency.
Patent publication No. CN112115883A discloses a "method and apparatus for suppressing non-maximum value based on Anchor-free target detection algorithm", which proposes that a centret network model is used to perform target detection by predicting the upper left corner point, the lower right corner point and the center point of an object, and a non-maximum value suppression method is used to avoid the situation that there are multiple detection boxes in the same target object, but more complicated post-processing is required to group the pairs of corner points belonging to the same target, which is inefficient.
The patent with publication number CN112101153A discloses a "remote sensing target detection method based on a receptive field module and a multi-feature pyramid", which performs feature extraction on a visible light remote sensing image through a VGG network to obtain feature maps with different sizes, then performs cascade fusion on the feature maps and obtains an optimized feature map through a step length convolution feature pyramid, and then performs multi-scale output detection through receptive field information mining. The method utilizes the feature maps with different sizes, but the feature map fusion mode is redundant, and the performance of the backbone network is poor, so that the final detection result is influenced.
Disclosure of Invention
In order to solve the technical problems, the invention provides a target detection network and a target detection method based on a mixed cavity convolution pyramid, which can effectively solve the problems of missed detection and false detection caused by scale and shielding.
The invention is realized by adopting the following technical scheme:
a target detection network based on a hybrid void convolution pyramid is characterized in that: the system comprises a backbone network, a mixed reception field module, a low-level embedded characteristic pyramid module and a detection module; the backbone network extracts target picture features by using a layered cascade network structure; the mixed receptive field module is used for carrying out feature enhancement on the highest layer feature map output from the topmost end of the backbone network; the low-layer embedded feature pyramid module is used for fusing high-layer features downwards on the basis of a feature pyramid and generating a final feature graph to be detected in a low-layer embedding mode; the detection module is used for positioning and classifying the characteristic diagram to be detected and outputting a result.
The low-layer embedded characteristic pyramid module is used for generating a final characteristic diagram to be detected, and specifically comprises the following steps:
a. the low-layer embedded characteristic pyramid module fuses the current-layer characteristic graph with the high-layer characteristic graph subjected to channel compression and upsampling to form a composite characteristic graph, and embedding high-layer semantic information is completed;
b. fusing the composite feature map and the downsampled low-level feature map to form a mixed feature map, and completing embedding of low-level detail information;
c. and (4) generating a final characteristic diagram to be detected after each mixed characteristic diagram passes through the composite convolution layer.
The fusion mode in the step a and the step b is element-by-element channel-by-channel addition.
And the composite convolution layer in the step c is formed by connecting a 3 x 3 convolution layer, a BN layer and a LeakyReLU activation layer.
The mixed receptive field module comprises four parallel branches, including a 1 × 1 convolutional layer branch and three 3 × 3 convolutional layer branches with void rates of 1, 2 and 4 respectively; and after splicing the feature maps obtained by the cavity convolution layers with different cavity rates in parallel by the mixed receptive field module, performing feature information fusion by adopting the convolution layers of 1 multiplied by 1, and reducing the channel dimension to a specified number.
The backbone network is a single-stage detection network based on a Res2Net50 network, an echo-free mechanism of an FCOS is introduced in the prediction of a target, pixel-by-pixel prediction is carried out, and a Centeress branch network is added in a Loss function part.
The feature map output by the backbone network comprises C3, C4 and C5, and the feature map size is 100 × 100, 50 × 50 and 25 × 25.
A target detection method based on a mixed cavity convolution pyramid is characterized in that: the method comprises the following steps:
building a backbone network based on an Achor-free mechanism, obtaining feature maps C3, C4 and C5 through the backbone network, and outputting a highest-level feature map C5 output by the backbone network to a low-level embedded feature pyramid module after feature enhancement is carried out on the highest-level feature map C5 output by the backbone network through a mixed receptive field module;
II, forming composite characteristics by the aid of the low-layer embedded characteristic pyramid module and characteristic graphs C4 and C3 output by the main network through up-sampling and down-sampling operations, generating a characteristic graph to be detected after the composite characteristics pass through a composite convolution layer, and conveying the characteristic graph to be detected to a detection module for target positioning and classification tasks;
training the network, testing the model of each round, storing the best training model weight, testing the real-time performance of the mixed receptive field module and the low-level embedded characteristic pyramid module by using a corresponding test set, and training to obtain a network model;
and iv, detecting the target by using the trained network model, and outputting a detection result.
In the process of training the network in step iii, the loss function is as follows:
Figure BDA0003110069770000031
wherein p isx,yRepresenting the class prediction probability, tx,yExpressing regression prediction coordinates, and N expressing the number of positive samples; k is an indication function, if the current prediction is determined to be a positive sample, the current prediction is 1, and if not, the current prediction is 0;
Lclsthe specific expression form is a Focal local Loss function:
Figure BDA0003110069770000032
wherein y is a sample label, y' predicts the probability that a sample is a positive case, and gamma is a focusing parameter;
Lregis GIoUThe Loss function of Loss, the concrete calculation process is:
Figure BDA0003110069770000041
Figure BDA0003110069770000042
Lreg=1-GIoU
where A and B represent the prediction and real boxes, IoU is their intersection-to-parallel ratio, and L is calculated by first calculating their minimum convex set C, i.e., the minimum bounding box bounding A, B, and then combining C with the minimum convex set to calculate GIoUreg
Compared with the prior art, the invention has the beneficial effects that:
1. the invention improves the structure of the characteristic pyramid, provides a low-layer embedded characteristic pyramid module, can effectively solve the problem that target detection is insufficient in processing multi-scale change, fuses shallow-layer characteristic information and high-layer characteristic information, adds normalization processing and activation functions to fused output, and optimizes model training.
The invention designs a mixed reception field module, and increases the reception field to acquire more global feature detail information by utilizing multi-size cavity convolution and combining with the multi-scale output characteristic of the feature pyramid under the condition of controlling the parameter quantity of the model so as to solve the problem of shielding of a target.
The method introduces an Anchor-free mechanism, combines a low-layer embedded characteristic pyramid module and a mixed receptive field module, can reduce invalid calculation caused by redundant candidate frames, can improve positioning accuracy, and effectively solves the problems of missing detection and the like.
2. According to the invention, the target detection network can solve the multi-scale and shielding problems of a target detection scene, can be used in a plug-and-play manner, introduces an Anchor-free algorithm, combines a low-layer embedded characteristic pyramid module and a mixed receptive field module, can reduce invalid calculation caused by redundant candidate frames, can improve positioning accuracy, and solves the problems of large model parameter quantity, large redundant calculation, low applicability, low efficiency and easy omission in the face of actual conditions in the existing target detection task.
3. The backbone network adopts an echo-free mechanism introduced into the FCOS to predict pixel points by pixel points, target detection is carried out without depending on a predefined anchor frame or a predefined proposed area, invalid calculation caused by redundant candidate frames is reduced, positioning accuracy is improved, problems of missed detection and the like are effectively solved, a central mechanism is utilized to quickly filter negative samples, low-quality prediction frames at positions far away from the target center are restrained, the weight of prediction frames close to the target center is increased, and detection performance is improved. Introducing the Res2Net50 network replaces the single 3 x 3 convolutional layer used in the ResNet50 with a hierarchical cascaded feature set in a given redundancy block that is more optimized in terms of network width, depth and resolution.
4. The hybrid receptive field module of the invention is different from other networks which carry out feature processing after multi-layer (C3, C4 and C5) feature fusion, but before feature fusion, the hybrid receptive field module is embedded between C5 and a feature pyramid P5 of a backbone network, so that the characterization capability of the C5 feature is improved, and the final detection and prediction are carried out only by the hybrid receptive field module and a low-layer embedded feature pyramid module. The use of the convolution layers with different void ratios improves the adaptability of the model to targets with different scales, after the spliced feature maps, the 1 x 1 convolution layers are adopted for feature information fusion, the channel dimensionality is reduced to a specified number, and the flexibility of the mixed receptive field module is improved.
5. Compared with the characteristic pyramid, the characteristics output by the low-level embedded characteristic pyramid module in the invention not only contain rich semantic information, but also contain specific detail information, thereby realizing double promotion of multi-scale target detection effect and positioning precision.
Drawings
The invention will be described in further detail with reference to the following description taken in conjunction with the accompanying drawings and detailed description, in which:
FIG. 1 is a schematic diagram of the overall structure of a target detection network according to the present invention;
FIG. 2 is a schematic flow chart of a target detection method according to the present invention;
FIG. 3 is a schematic diagram of a hybrid receptor field module according to the present invention;
FIG. 4 is a schematic diagram of a low-level embedded feature pyramid module according to the present invention;
FIG. 5 is a schematic view of the composite convolution layer of the present invention.
Detailed Description
Example 1
As a basic implementation mode of the invention, the invention comprises a target detection network based on a mixed cavity convolution pyramid, which comprises a backbone network, a mixed reception field module, a low-level embedded characteristic pyramid module and a detection module. The backbone network extracts target picture features by using a layered cascade network structure; and the mixed receptive field module is used for carrying out characteristic enhancement on the highest-layer characteristic diagram output from the topmost end of the backbone network. And the low-layer embedded characteristic pyramid module is used for fusing high-layer characteristics downwards on the basis of the characteristic pyramid and generating a final characteristic diagram to be detected in a low-layer embedding mode. The detection module is used for positioning and classifying the characteristic diagram to be detected and outputting a result.
The backbone network can be a single-stage detection network based on a Res2Net50 network, the feature extraction capability is stronger without increasing the calculation load, an echo-free mechanism of an FCOS is introduced in the aspect of target prediction to predict pixel points, a centerless branch network is added in a Loss function part, a low-quality detection frame is restrained, and the detection performance is improved.
A target detection method based on a mixed hole convolution pyramid comprises the following steps:
building a backbone network based on an Achor-free mechanism, obtaining feature maps C3, C4 and C5 through the backbone network, and outputting a highest-level feature map C5 output by the backbone network to a low-level embedded feature pyramid module after feature enhancement is carried out on the highest-level feature map C5 output by the backbone network through a mixed receptive field module;
II, forming composite characteristics by the aid of the low-layer embedded characteristic pyramid module and characteristic graphs C4 and C3 output by the main network through up-sampling and down-sampling operations, generating a characteristic graph to be detected after the composite characteristics pass through a composite convolution layer, and conveying the characteristic graph to be detected to a detection module for target positioning and classification tasks;
training the network, testing the model of each round, storing the best training model weight, testing the real-time performance of the mixed receptive field module and the low-level embedded characteristic pyramid module by using a corresponding test set, and training to obtain a network model;
and iv, detecting the target by using the trained network model, and outputting a detection result.
Example 2
As a best implementation mode of the invention, the invention comprises a target detection network based on a hybrid void convolution pyramid, and with reference to the attached drawing 1 of the specification, the target detection network comprises a backbone network, a hybrid reception field module, a low-level embedded feature pyramid module and a detection module.
The backbone network adopts a single-stage detection network structure, introduces an echo-free mechanism of FCOS (fiber channel operating system), performs pixel-by-pixel prediction, does not depend on a predefined anchor frame or a proposed area to perform target detection, reduces invalid calculation caused by redundant candidate frames, improves positioning accuracy, effectively solves the problems of missed detection and the like, utilizes a center mechanism, quickly filters negative samples, inhibits low-quality prediction frames at positions far away from a target center, increases the weight of the prediction frames close to the target center, and improves detection performance. The expression of Centeress is shown in formula (1) < CHEM >*、r*、t*、b*The distances from the pixel points to the left, right, upper and lower parts of the prediction frame are represented, and the values are between 0 and 1, so that the closer the Centeress value to the target real center is, the larger the Centeress value is, and the farther the Centeress value is from the target real center is, the smaller the Centeress value is.
Figure BDA0003110069770000061
The backbone network introduces a Res2Net50 network, using a hierarchical cascaded set of features in a given redundancy block instead of the single 3 x 3 convolutional layer used in the ResNet50, which is more optimized in terms of network width, depth and resolution. When passing through C3, C4 and C5, the sizes of the characteristic maps are 100 × 100, 50 × 50 and 25 × 25.
The mixed receptive field module is used for splicing the feature maps which are obtained by the cavity convolution layers with different cavity rates in parallel, so that the obtaining capability of the network on the global features is improved, and the grid effect caused by single cavity convolution is compensated. The hybrid receptive field module of the present application uses all the hole convolution layers to effectively solve the target occlusion problem.
Referring to the description and the accompanying drawing 3, in order to fully exert the performance of the hybrid receptor field module, the hybrid receptor field module of the present invention is different from other networks in that feature fusion is performed after multi-layer (C3, C4, C5) feature fusion is performed, but before feature fusion, the hybrid receptor field module is embedded between C5 and a feature pyramid P5 of a backbone network, so as to improve the characterization capability of C5 features, and final detection prediction is performed after the hybrid receptor field module passes through the low-layer embedded feature pyramid module. The mixed receptive field module of the invention is composed of four parallel branches, a convolution layer branch of 1 × 1, and three convolution layer branches of 3 × 3 with void rates of 1, 2 and 4 respectively. 3 x 3 cavity convolution with a cavity rate of 4 can acquire more global context feature details, enhance reasoning capability and solve the problem of target occlusion; and the convolution layers with different void ratios are used, so that the adaptability of the model to targets with different scales is improved.
The high-level features output by the C5 have rich semantic information, and are different from the combination of the conventionally adopted cascade features, and the parallel feature combination adopted by the invention can train the network parameters more suitable for the current data set. After the parallel branch 1 passes through the 1 × 1 convolutional layer, the detailed information of the image can be kept as much as possible under the condition of not changing the size of the feature diagram, and meanwhile, the number of channels of the feature diagram can be controlled, so that the subsequent calculation amount is reduced; the convolution kernel of 3 multiplied by 3 has smaller parameters, so that the characteristic information can be processed, and the calculation of the network is further reduced; the hole convolution can obtain more global feature detail information, the reasoning capability is enhanced, the shielding target is well identified, and the adaptability of the model to the multi-scale target is improved while the grid effect is eliminated due to the arrangement of different hole rates. The parallel branch 2 is a convolution of 3 x 3 with a void rate equal to 1 and is suitable for detecting small and medium-sized targets, the parallel branch 3 is a convolution of 3 x 3 with a void rate equal to 2 and is suitable for detecting medium-sized targets, and the parallel branch 4 is a convolution of 3 x 3 with a void rate equal to 4 and is suitable for detecting medium and large-sized targets.
After the spliced feature map, feature information fusion is carried out by adopting a convolution layer of 1 multiplied by 1, the channel dimensionality is reduced to a specified number, and the flexibility of the mixed receptive field module is improved.
The feature pyramid enables the feature map of each layer to have strong semantic information by fusing the features of the high layer downwards, and can perform prediction respectively. Compared with a characteristic pyramid, the characteristics output by the low-layer embedded characteristic pyramid module of the application not only contain rich semantic information, but also contain specific detail information, and double promotion of multi-scale target detection effect and positioning accuracy is achieved.
Referring to the specification and the attached figure 4, C5' is a feature diagram after passing through a low-layer embedded feature pyramid module, and referring to the specification and the attached figure 5, a composite convolutional layer (formed by cascading a 3 × 3 convolutional layer, a BN layer and a LeakyReLU activation layer) aims at processing fused features, optimizing model training and improving the nonlinear expression capability of the features.
The low-level embedded characteristic pyramid module firstly fuses a current-level characteristic graph and a high-level characteristic graph subjected to channel compression and upsampling in a mode of adding element by element and channel by channel to form a composite characteristic graph and complete the embedding of high-level semantic information; secondly, fusing the composite feature map and the downsampled low-level feature map to form a mixed feature map, and completing embedding of low-level detail information; and finally, after each mixed feature map is subjected to the designed composite convolution layer, generating a final feature map to be detected and entering the next module.
A target detection method based on a mixed cavity convolution pyramid refers to the attached figure 1 of the specification, and comprises the following steps:
building a backbone network based on an Achor-free mechanism, obtaining feature maps C3, C4 and C5 through the backbone network, and outputting a highest-level feature map C5 output by the backbone network to a low-level embedded feature pyramid module after feature enhancement is carried out on the highest-level feature map C5 output by the backbone network through a mixed receptive field module;
II, forming composite characteristics by the aid of the low-layer embedded characteristic pyramid module and characteristic graphs C4 and C3 output by the main network through up-sampling and down-sampling operations, generating a characteristic graph to be detected after the composite characteristics pass through a composite convolution layer, and conveying the characteristic graph to be detected to a detection module for target positioning and classification tasks;
training the network, testing the model of each round, storing the best training model weight, testing the real-time performance of the mixed receptive field module and the low-level embedded characteristic pyramid module by using a corresponding test set, and training to obtain a network model;
and iv, detecting the target by using the trained network model, and outputting a detection result.
Wherein, in the process of training the network, the loss function is as follows:
Figure BDA0003110069770000081
wherein p isx,yRepresenting the class prediction probability, tx,yExpressing regression prediction coordinates, and N expressing the number of positive samples; k is an indication function, if the current prediction is determined to be a positive sample, the current prediction is 1, and if not, the current prediction is 0;
Lclsthe specific expression form is a Focal local Loss function:
Figure BDA0003110069770000091
wherein y is a sample label, y' predicts the probability that a sample is a positive case, and gamma is a focusing parameter; compared with the common cross entropy Loss function, the Focal local increases a gamma factor, and the influence of simple samples is reduced by controlling the value of gamma to focus more on difficult samples.
LregFor the GIoU Loss function, the specific calculation process is as follows:
Figure BDA0003110069770000092
Figure BDA0003110069770000093
Lreg=1-GIoU
where A and B represent the prediction and real boxes, IoU is their intersection-to-parallel ratio, and L is calculated by first calculating their minimum convex set C, i.e., the minimum bounding box bounding A, B, and then combining C with the minimum convex set to calculate GIoUreg
In summary, after reading the present disclosure, those skilled in the art should make various other modifications without creative efforts according to the technical solutions and concepts of the present disclosure, which are within the protection scope of the present disclosure.

Claims (9)

1.一种基于混合空洞卷积金字塔的目标检测网络,其特征在于:包括主干网络、混合感受野模块、低层嵌入式特征金字塔模块和检测模块;所述主干网络使用分层级联的网络结构来提取目标图片特征;所述混合感受野模块用于将主干网络最顶端输出的最高层特征图进行特征增强;所述低层嵌入式特征金字塔模块在特征金字塔基础上,用于将高层特征向下融合,并通过低层嵌入的方式,生成最终的待检测特征图;所述检测模块用于将待检测特征图进行定位及分类,并输出结果。1. a target detection network based on mixed hole convolution pyramid, is characterized in that: comprise backbone network, mixed receptive field module, low-level embedded feature pyramid module and detection module; Described backbone network uses the network structure of hierarchical cascading to extract the target picture features; the mixed receptive field module is used for feature enhancement of the highest-level feature map output from the top of the backbone network; the low-level embedded feature pyramid module is based on the feature pyramid, and is used for high-level features downward. fusion, and generate the final feature map to be detected by means of low-level embedding; the detection module is used to locate and classify the feature map to be detected, and output the result. 2.根据权利要求1所述的一种基于混合空洞卷积金字塔的目标检测网络,其特征在于:所述低层嵌入式特征金字塔模块用于生成最终的待检测特征图,具体包括以下步骤:2. a kind of target detection network based on mixed hole convolution pyramid according to claim 1, is characterized in that: described low-level embedded feature pyramid module is used for generating final feature map to be detected, specifically comprises the following steps: a.低层嵌入式特征金字塔模块将当前层特征图与经过通道压缩和上采样后的高层特征图相融合,形成复合特征图,完成高层语义信息的嵌入;a. The low-level embedded feature pyramid module fuses the current-level feature map with the high-level feature map after channel compression and upsampling to form a composite feature map to complete the embedding of high-level semantic information; b.将复合特征图和经过下采样的低层特征图相融合,形成混合特征图,完成低层细节信息的嵌入;b. Integrate the composite feature map with the downsampled low-level feature map to form a hybrid feature map to complete the embedding of low-level detail information; c.各混合特征图经过复合卷积层后,生成最终的待检测特征图。c. After each mixed feature map passes through the composite convolution layer, the final feature map to be detected is generated. 3.根据权利要求2所述的一种基于混合空洞卷积金字塔的目标检测网络,其特征在于:所述步骤a和步骤b中的融合方式为逐元素逐通道相加。3 . A target detection network based on a mixed hole convolution pyramid according to claim 2 , wherein the fusion method in the step a and the step b is element-by-channel addition. 4 . 4.根据权利要求2所述的一种基于混合空洞卷积金字塔的目标检测网络,其特征在于:所述步骤c中的复合卷积层由3×3卷积层、BN层和LeakyReLU激活层级联而成。4. A target detection network based on a hybrid hole convolution pyramid according to claim 2, wherein the composite convolution layer in the step c is composed of a 3×3 convolution layer, a BN layer and a LeakyReLU activation level linked together. 5.根据权利要求1所述的一种基于混合空洞卷积金字塔的目标检测网络,其特征在于:所述混合感受野模块包括四个并行分支,包括一个1×1的卷积层分支和三个空洞率分别为1、2、4的3×3卷积层分支;所述混合感受野模块将不同空洞率的空洞卷积层并行获取的特征图拼接在一起后,采用1×1的卷积层进行特征信息融合,并将通道维度降低至指定数量。5. A target detection network based on a mixed hole convolution pyramid according to claim 1, wherein the mixed receptive field module comprises four parallel branches, including a 1×1 convolutional layer branch and three 3×3 convolutional layer branches with dilation rates of 1, 2, and 4 respectively; the hybrid receptive field module stitches together the feature maps obtained in parallel from dilated convolutional layers with different dilation rates, and uses a 1×1 volume The product layer performs feature information fusion and reduces the channel dimension to a specified number. 6.根据权利要求1所述的一种基于混合空洞卷积金字塔的目标检测网络,其特征在于:所述主干网络为基于Res2Net50网络的单阶段检测网络,在对目标的预测上,引入FCOS的Achor-free机制,进行逐像素点的预测,同时在Loss函数部分加入Centerness分支网络。6. a kind of target detection network based on mixed hole convolution pyramid according to claim 1, it is characterized in that: described backbone network is a single-stage detection network based on Res2Net50 network, in the prediction of target, the introduction of FCOS The Achor-free mechanism performs pixel-by-pixel prediction, and adds the Centerness branch network to the Loss function part. 7.根据权利要求6所述的一种基于混合空洞卷积金字塔的目标检测网络,其特征在于:所述主干网络输出的特征图包括C3、C4和C5,所述特征图大小为100×100、50×50、25×25。7. A target detection network based on a hybrid hole convolution pyramid according to claim 6, wherein the feature map output by the backbone network includes C3, C4 and C5, and the feature map size is 100×100 , 50×50, 25×25. 8.一种基于混合空洞卷积金字塔的目标检测方法,其特征在于:包括以下步骤:8. A target detection method based on a mixed hole convolution pyramid, characterized in that: comprising the following steps: ⅰ.基于Achor-free机制搭建主干网络,经过主干网络得到特征图C3、C4和C5,主干网络输出的最高层特征图C5经过混合感受野模块进行特征增强后输出至低层嵌入式特征金字塔模块;1. Build a backbone network based on the Achor-free mechanism, and obtain feature maps C3, C4 and C5 through the backbone network. The highest-level feature map C5 output by the backbone network is enhanced by the hybrid receptive field module and then output to the low-level embedded feature pyramid module; ⅱ.低层嵌入式特征金字塔模块与主干网络输出的特征图C4和C3通过上采样、下采样操作,形成复合特征,复合特征经过复合卷积层后生成待检测特征图,并将该待检测特征图输送至检测模块进行目标定位和分类任务;ii. The feature maps C4 and C3 output by the low-level embedded feature pyramid module and the backbone network are subjected to up-sampling and down-sampling operations to form composite features. The image is sent to the detection module for target positioning and classification tasks; ⅲ.训练上述网络,对每一轮的模型进行测试,保存最好的训练模型权重,并用相应的测试集测试混合感受野模块和低层嵌入式特征金字塔模块的实时性能,训练得到网络模型;iii. Train the above network, test the model in each round, save the best training model weight, and use the corresponding test set to test the real-time performance of the mixed receptive field module and the low-level embedded feature pyramid module, and train to obtain the network model; iv.利用训练好的网络模型对目标进行检测,输出检测结果。iv. Use the trained network model to detect the target and output the detection result. 9.根据权利要求8所述的一种基于混合空洞卷积金字塔的目标检测方法,其特征在于:所述步骤ⅲ中训练网络的过程中,损失函数如下:9. a kind of target detection method based on mixed hole convolution pyramid according to claim 8, is characterized in that: in the process of training network in described step iii, loss function is as follows:
Figure FDA0003110069760000021
Figure FDA0003110069760000021
其中,px,y表示分类预测概率,tx,y表示回归预测坐标,N表示正样本数;k是指示函数,若确定当前预测为正样本则为1,否则为0;Among them, p x, y represents the classification prediction probability, t x, y represents the regression prediction coordinates, N represents the number of positive samples; k is the indicator function, if the current prediction is determined to be a positive sample, it is 1, otherwise it is 0; Lcls为Focal Loss损失函数,具体表达形式为:L cls is the Focal Loss loss function, and its specific expression is:
Figure FDA0003110069760000022
Figure FDA0003110069760000022
其中,y为样本标签,y’模型预测样本是正例的概率,γ为聚焦参数;Among them, y is the sample label, the y' model predicts the probability that the sample is a positive example, and γ is the focusing parameter; Lreg为GIoU Loss损失函数,具体计算过程为:L reg is the GIoU Loss loss function, and the specific calculation process is:
Figure FDA0003110069760000023
Figure FDA0003110069760000023
其中,A和B代表预测框和真实框,IoU是它们的交并比,先计算出它们的最小凸集C,即包围A、B的最小包围框,其次结合C这个最小凸集,计算出GIoU,从而计算出LregAmong them, A and B represent the predicted frame and the real frame, IoU is their intersection ratio, first calculate their minimum convex set C, that is, the minimum bounding box surrounding A and B, and then combine C, the minimum convex set, to calculate GIoU, thereby calculating L reg .
CN202110646653.7A 2021-06-10 2021-06-10 Target detection network and method based on mixed hole convolution pyramid Active CN113392960B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110646653.7A CN113392960B (en) 2021-06-10 2021-06-10 Target detection network and method based on mixed hole convolution pyramid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110646653.7A CN113392960B (en) 2021-06-10 2021-06-10 Target detection network and method based on mixed hole convolution pyramid

Publications (2)

Publication Number Publication Date
CN113392960A true CN113392960A (en) 2021-09-14
CN113392960B CN113392960B (en) 2022-08-30

Family

ID=77620186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110646653.7A Active CN113392960B (en) 2021-06-10 2021-06-10 Target detection network and method based on mixed hole convolution pyramid

Country Status (1)

Country Link
CN (1) CN113392960B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887455A (en) * 2021-10-11 2022-01-04 东北大学 Face mask detection system and method based on improved FCOS
CN113902896A (en) * 2021-09-24 2022-01-07 西安电子科技大学 Infrared target detection method based on enlarged receptive field
CN113947774A (en) * 2021-10-08 2022-01-18 东北大学 Lightweight vehicle target detection system
CN113963177A (en) * 2021-11-11 2022-01-21 电子科技大学 A CNN-based method for building mask contour vectorization
CN113989498A (en) * 2021-12-27 2022-01-28 北京文安智能技术股份有限公司 Training method of target detection model for multi-class garbage scene recognition
CN114170587A (en) * 2021-12-13 2022-03-11 微民保险代理有限公司 Vehicle indicator lamp identification method and device, computer equipment and storage medium
CN114283488A (en) * 2022-03-08 2022-04-05 北京万里红科技有限公司 Method for generating detection model and method for detecting eye state by using detection model
CN114339049A (en) * 2021-12-31 2022-04-12 深圳市商汤科技有限公司 A video processing method, apparatus, computer equipment and storage medium
CN114494108A (en) * 2021-11-15 2022-05-13 北京知见生命科技有限公司 A method and system for quality control of pathological slices based on target detection
CN114693939A (en) * 2022-03-16 2022-07-01 中南大学 A deep feature extraction method for transparent object detection in complex environment
CN115100516A (en) * 2022-06-07 2022-09-23 北京科技大学 A Relation Learning-Based Object Detection Method for Remote Sensing Images
CN115861855A (en) * 2022-12-15 2023-03-28 福建亿山能源管理有限公司 Operation and maintenance monitoring method and system for photovoltaic power station
CN115984105A (en) * 2022-12-07 2023-04-18 深圳大学 Method and device for optimizing hole convolution, computer equipment and storage medium
CN117132761A (en) * 2023-08-25 2023-11-28 京东方科技集团股份有限公司 Target detection method and device, storage medium and electronic equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985269A (en) * 2018-08-16 2018-12-11 东南大学 Converged network driving environment sensor model based on convolution sum cavity convolutional coding structure
CN109543672A (en) * 2018-10-15 2019-03-29 天津大学 Object detecting method based on dense characteristic pyramid network
CN111260630A (en) * 2020-01-16 2020-06-09 高新兴科技集团股份有限公司 Improved lightweight small target detection method
CN112070729A (en) * 2020-08-26 2020-12-11 西安交通大学 Anchor-free remote sensing image target detection method and system based on scene enhancement
CN112183649A (en) * 2020-09-30 2021-01-05 佛山市南海区广工大数控装备协同创新研究院 An Algorithm for Predicting Pyramid Feature Maps
CN112365501A (en) * 2021-01-13 2021-02-12 南京理工大学 Weldment contour detection algorithm based on convolutional neural network
CN112419237A (en) * 2020-11-03 2021-02-26 中国计量大学 Automobile clutch master cylinder groove surface defect detection method based on deep learning
CN112446327A (en) * 2020-11-27 2021-03-05 中国地质大学(武汉) Remote sensing image target detection method based on non-anchor frame
CN112651351A (en) * 2020-12-29 2021-04-13 珠海大横琴科技发展有限公司 Data processing method and device
CN112801117A (en) * 2021-02-03 2021-05-14 四川中烟工业有限责任公司 Multi-channel receptive field guided characteristic pyramid small target detection network and detection method
CN112819748A (en) * 2020-12-16 2021-05-18 机科发展科技股份有限公司 Training method and device for strip steel surface defect recognition model

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985269A (en) * 2018-08-16 2018-12-11 东南大学 Converged network driving environment sensor model based on convolution sum cavity convolutional coding structure
CN109543672A (en) * 2018-10-15 2019-03-29 天津大学 Object detecting method based on dense characteristic pyramid network
CN111260630A (en) * 2020-01-16 2020-06-09 高新兴科技集团股份有限公司 Improved lightweight small target detection method
CN112070729A (en) * 2020-08-26 2020-12-11 西安交通大学 Anchor-free remote sensing image target detection method and system based on scene enhancement
CN112183649A (en) * 2020-09-30 2021-01-05 佛山市南海区广工大数控装备协同创新研究院 An Algorithm for Predicting Pyramid Feature Maps
CN112419237A (en) * 2020-11-03 2021-02-26 中国计量大学 Automobile clutch master cylinder groove surface defect detection method based on deep learning
CN112446327A (en) * 2020-11-27 2021-03-05 中国地质大学(武汉) Remote sensing image target detection method based on non-anchor frame
CN112819748A (en) * 2020-12-16 2021-05-18 机科发展科技股份有限公司 Training method and device for strip steel surface defect recognition model
CN112651351A (en) * 2020-12-29 2021-04-13 珠海大横琴科技发展有限公司 Data processing method and device
CN112365501A (en) * 2021-01-13 2021-02-12 南京理工大学 Weldment contour detection algorithm based on convolutional neural network
CN112801117A (en) * 2021-02-03 2021-05-14 四川中烟工业有限责任公司 Multi-channel receptive field guided characteristic pyramid small target detection network and detection method

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
GAO S等: "Res2net: A new multi-scale backbone architecture", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 》 *
GUO C等: "Augfpn: Improving multi-scale feature learning for object detection", 《PROCEEDINGS OF THE IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
MA J等: "Dual refinement feature pyramid networks for object detection", 《ARXIV:2012.01733》 *
TIAN ZHI等: "Fcos: Fully convolutional one-stage object detection", 《PROCEEDINGS OF THE IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 *
候少麒等: "基于空洞卷积金字塔的目标检测算法", 《电子科技大学学报》 *
姜世浩等: "基于Mask R-CNN和多特征融合的实例分割", 《计算机技术与发展》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113902896A (en) * 2021-09-24 2022-01-07 西安电子科技大学 Infrared target detection method based on enlarged receptive field
CN113947774A (en) * 2021-10-08 2022-01-18 东北大学 Lightweight vehicle target detection system
CN113947774B (en) * 2021-10-08 2024-05-14 东北大学 A lightweight vehicle target detection system
CN113887455A (en) * 2021-10-11 2022-01-04 东北大学 Face mask detection system and method based on improved FCOS
CN113887455B (en) * 2021-10-11 2024-05-28 东北大学 A face mask detection system and method based on improved FCOS
CN113963177A (en) * 2021-11-11 2022-01-21 电子科技大学 A CNN-based method for building mask contour vectorization
CN114494108A (en) * 2021-11-15 2022-05-13 北京知见生命科技有限公司 A method and system for quality control of pathological slices based on target detection
CN114170587A (en) * 2021-12-13 2022-03-11 微民保险代理有限公司 Vehicle indicator lamp identification method and device, computer equipment and storage medium
CN113989498A (en) * 2021-12-27 2022-01-28 北京文安智能技术股份有限公司 Training method of target detection model for multi-class garbage scene recognition
CN113989498B (en) * 2021-12-27 2022-07-12 北京文安智能技术股份有限公司 Training method of target detection model for multi-class garbage scene recognition
CN114339049A (en) * 2021-12-31 2022-04-12 深圳市商汤科技有限公司 A video processing method, apparatus, computer equipment and storage medium
CN114283488A (en) * 2022-03-08 2022-04-05 北京万里红科技有限公司 Method for generating detection model and method for detecting eye state by using detection model
CN114693939A (en) * 2022-03-16 2022-07-01 中南大学 A deep feature extraction method for transparent object detection in complex environment
CN114693939B (en) * 2022-03-16 2024-04-30 中南大学 Method for extracting depth features of transparent object detection under complex environment
CN115100516A (en) * 2022-06-07 2022-09-23 北京科技大学 A Relation Learning-Based Object Detection Method for Remote Sensing Images
CN115984105A (en) * 2022-12-07 2023-04-18 深圳大学 Method and device for optimizing hole convolution, computer equipment and storage medium
CN115861855A (en) * 2022-12-15 2023-03-28 福建亿山能源管理有限公司 Operation and maintenance monitoring method and system for photovoltaic power station
CN115861855B (en) * 2022-12-15 2023-10-24 福建亿山能源管理有限公司 Operation and maintenance monitoring method and system for photovoltaic power station
CN117132761A (en) * 2023-08-25 2023-11-28 京东方科技集团股份有限公司 Target detection method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN113392960B (en) 2022-08-30

Similar Documents

Publication Publication Date Title
CN113392960A (en) Target detection network and method based on mixed hole convolution pyramid
CN110263705B (en) Two phases of high-resolution remote sensing image change detection system for the field of remote sensing technology
CN110335270B (en) Power transmission line defect detection method based on hierarchical regional feature fusion learning
CN112906718B (en) A multi-target detection method based on convolutional neural network
CN117557922B (en) Improved YOLOv8 drone aerial target detection method
CN113052834A (en) Pipeline defect detection method based on convolution neural network multi-scale features
CN112528913A (en) Grit particulate matter particle size detection analytic system based on image
CN117079163A (en) Aerial image small target detection method based on improved YOLOX-S
CN112183649A (en) An Algorithm for Predicting Pyramid Feature Maps
CN117635628B (en) A land-sea segmentation method based on contextual attention and boundary perception guidance
CN117173120A (en) Chip weld void defect detection method and system
CN117095155A (en) Multi-scale nixie tube detection method based on improved YOLO self-adaptive attention-feature enhancement network
CN118469946A (en) Insulator defect detection method for multiple defect categories based on multi-angle feature enhancement
CN112700450A (en) Image segmentation method and system based on ensemble learning
CN116524319A (en) Night vehicle detection method and system based on improved YOLOv5 convolutional neural network
CN118429355A (en) Lightweight power distribution cabinet shell defect detection method based on feature enhancement
CN117853803A (en) Small sample motor car anomaly detection method and system based on feature enhancement and communication network
CN117670791A (en) Road disease detection method and device based on multi-scale fusion strategy and improved YOLOv5
CN117058386A (en) Asphalt road crack detection method based on improved deep Labv3+ network
CN116664535A (en) Transmission tower ground wire image detection method based on directional representation
CN117853397A (en) Image tampering detection and positioning method and system based on multi-level feature learning
CN116228626A (en) Surface defect detection method of magnetic inductance element based on improved YOLOv5
CN115761223A (en) An Instance Segmentation Method of Remote Sensing Image Using Data Synthesis
CN119152453B (en) An infrared highway foreign object detection method based on Mamba architecture
CN119152315B (en) ElectroTrackNet electric selection track recognition method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant