CN113392960A - Target detection network and method based on mixed hole convolution pyramid - Google Patents
Target detection network and method based on mixed hole convolution pyramid Download PDFInfo
- Publication number
- CN113392960A CN113392960A CN202110646653.7A CN202110646653A CN113392960A CN 113392960 A CN113392960 A CN 113392960A CN 202110646653 A CN202110646653 A CN 202110646653A CN 113392960 A CN113392960 A CN 113392960A
- Authority
- CN
- China
- Prior art keywords
- network
- layer
- pyramid
- module
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of digital image processing, in particular to a target detection network and a method based on a mixed cavity convolution pyramid, wherein the target detection network comprises a backbone network, a mixed reception field module, a low-level embedded characteristic pyramid module and a detection module; the backbone network extracts target picture features by using a layered cascade network structure; the mixed receptive field module performs characteristic enhancement on a highest-layer characteristic diagram output from the topmost end of the backbone network; the low-layer embedded feature pyramid module fuses high-layer features downwards on the basis of the feature pyramid, and generates a final feature graph to be detected in a low-layer embedded mode; the detection module is used for positioning and classifying the characteristic diagram to be detected and outputting a result. By the target detection network and the target detection method, the problems of missed detection and wrong detection caused by scale and shielding can be effectively solved.
Description
Technical Field
The invention relates to the technical field of digital image processing, in particular to a target detection network and a target detection method based on a mixed cavity convolution pyramid.
Background
Object detection is one of the most widespread applications in real life, with the task of focusing on a specific object in a picture. The traditional target detection method can be divided into a single-stage target detection method and a two-stage target detection method, wherein the two-stage method has the core that a region proposing method is adopted, an input image is selectively searched, a region proposing frame is generated, then, a convolutional neural network is used for extracting characteristics of each region proposing frame, and then, a classifier is used for classifying. The single-stage method is to directly output the target detection result through a convolutional neural network.
Through a series of varieties, the common point of the two methods gradually evolves to the method that a large number of Anchor frames are required to be generated in advance in the detection process, and the algorithms are collectively called Anchor-based target detection algorithms. The anchor box is a group of rectangular boxes obtained by utilizing a clustering algorithm on a training set before training, and represents the length and width sizes of the main distribution of the targets in the data set. During reasoning, n candidate rectangular frames are extracted from the anchor frames on the feature diagram, and then the rectangular frames are further classified and regressed. Compared with a two-stage algorithm, the processing of the candidate frame still passes through two steps of foreground coarse classification and multi-class fine classification.
The single-stage target detection algorithm lacks fine processing of a two-stage algorithm, and is not good in performance when the problems of multi-scale and shielding of targets and the like are faced. In addition, although the problem of candidate frame calculation amount explosion caused by selective search is relieved to a certain extent by the Anchor-based algorithm, the generation of a large number of Anchor frames with different sizes in each grid still causes calculation redundancy, and most importantly, the generation of the Anchor frames depends on a large number of super-parameter settings, and the positioning accuracy and the classification effect of the target are seriously influenced by manual parameter adjustment.
In the prior art, a patent with publication number CN110222712A discloses "a multi-item target detection algorithm based on deep learning", the proposed target detection algorithm obtains an augmented RoI set through a multi-scale sliding window and selective search, and takes over the generation of an intensive RoI set through an exhaustive mode with the multi-scale sliding window, which is large in calculation amount and low in efficiency.
Patent publication No. CN112115883A discloses a "method and apparatus for suppressing non-maximum value based on Anchor-free target detection algorithm", which proposes that a centret network model is used to perform target detection by predicting the upper left corner point, the lower right corner point and the center point of an object, and a non-maximum value suppression method is used to avoid the situation that there are multiple detection boxes in the same target object, but more complicated post-processing is required to group the pairs of corner points belonging to the same target, which is inefficient.
The patent with publication number CN112101153A discloses a "remote sensing target detection method based on a receptive field module and a multi-feature pyramid", which performs feature extraction on a visible light remote sensing image through a VGG network to obtain feature maps with different sizes, then performs cascade fusion on the feature maps and obtains an optimized feature map through a step length convolution feature pyramid, and then performs multi-scale output detection through receptive field information mining. The method utilizes the feature maps with different sizes, but the feature map fusion mode is redundant, and the performance of the backbone network is poor, so that the final detection result is influenced.
Disclosure of Invention
In order to solve the technical problems, the invention provides a target detection network and a target detection method based on a mixed cavity convolution pyramid, which can effectively solve the problems of missed detection and false detection caused by scale and shielding.
The invention is realized by adopting the following technical scheme:
a target detection network based on a hybrid void convolution pyramid is characterized in that: the system comprises a backbone network, a mixed reception field module, a low-level embedded characteristic pyramid module and a detection module; the backbone network extracts target picture features by using a layered cascade network structure; the mixed receptive field module is used for carrying out feature enhancement on the highest layer feature map output from the topmost end of the backbone network; the low-layer embedded feature pyramid module is used for fusing high-layer features downwards on the basis of a feature pyramid and generating a final feature graph to be detected in a low-layer embedding mode; the detection module is used for positioning and classifying the characteristic diagram to be detected and outputting a result.
The low-layer embedded characteristic pyramid module is used for generating a final characteristic diagram to be detected, and specifically comprises the following steps:
a. the low-layer embedded characteristic pyramid module fuses the current-layer characteristic graph with the high-layer characteristic graph subjected to channel compression and upsampling to form a composite characteristic graph, and embedding high-layer semantic information is completed;
b. fusing the composite feature map and the downsampled low-level feature map to form a mixed feature map, and completing embedding of low-level detail information;
c. and (4) generating a final characteristic diagram to be detected after each mixed characteristic diagram passes through the composite convolution layer.
The fusion mode in the step a and the step b is element-by-element channel-by-channel addition.
And the composite convolution layer in the step c is formed by connecting a 3 x 3 convolution layer, a BN layer and a LeakyReLU activation layer.
The mixed receptive field module comprises four parallel branches, including a 1 × 1 convolutional layer branch and three 3 × 3 convolutional layer branches with void rates of 1, 2 and 4 respectively; and after splicing the feature maps obtained by the cavity convolution layers with different cavity rates in parallel by the mixed receptive field module, performing feature information fusion by adopting the convolution layers of 1 multiplied by 1, and reducing the channel dimension to a specified number.
The backbone network is a single-stage detection network based on a Res2Net50 network, an echo-free mechanism of an FCOS is introduced in the prediction of a target, pixel-by-pixel prediction is carried out, and a Centeress branch network is added in a Loss function part.
The feature map output by the backbone network comprises C3, C4 and C5, and the feature map size is 100 × 100, 50 × 50 and 25 × 25.
A target detection method based on a mixed cavity convolution pyramid is characterized in that: the method comprises the following steps:
building a backbone network based on an Achor-free mechanism, obtaining feature maps C3, C4 and C5 through the backbone network, and outputting a highest-level feature map C5 output by the backbone network to a low-level embedded feature pyramid module after feature enhancement is carried out on the highest-level feature map C5 output by the backbone network through a mixed receptive field module;
II, forming composite characteristics by the aid of the low-layer embedded characteristic pyramid module and characteristic graphs C4 and C3 output by the main network through up-sampling and down-sampling operations, generating a characteristic graph to be detected after the composite characteristics pass through a composite convolution layer, and conveying the characteristic graph to be detected to a detection module for target positioning and classification tasks;
training the network, testing the model of each round, storing the best training model weight, testing the real-time performance of the mixed receptive field module and the low-level embedded characteristic pyramid module by using a corresponding test set, and training to obtain a network model;
and iv, detecting the target by using the trained network model, and outputting a detection result.
In the process of training the network in step iii, the loss function is as follows:
wherein p isx,yRepresenting the class prediction probability, tx,yExpressing regression prediction coordinates, and N expressing the number of positive samples; k is an indication function, if the current prediction is determined to be a positive sample, the current prediction is 1, and if not, the current prediction is 0;
Lclsthe specific expression form is a Focal local Loss function:
wherein y is a sample label, y' predicts the probability that a sample is a positive case, and gamma is a focusing parameter;
Lregis GIoUThe Loss function of Loss, the concrete calculation process is:
Lreg=1-GIoU
where A and B represent the prediction and real boxes, IoU is their intersection-to-parallel ratio, and L is calculated by first calculating their minimum convex set C, i.e., the minimum bounding box bounding A, B, and then combining C with the minimum convex set to calculate GIoUreg。
Compared with the prior art, the invention has the beneficial effects that:
1. the invention improves the structure of the characteristic pyramid, provides a low-layer embedded characteristic pyramid module, can effectively solve the problem that target detection is insufficient in processing multi-scale change, fuses shallow-layer characteristic information and high-layer characteristic information, adds normalization processing and activation functions to fused output, and optimizes model training.
The invention designs a mixed reception field module, and increases the reception field to acquire more global feature detail information by utilizing multi-size cavity convolution and combining with the multi-scale output characteristic of the feature pyramid under the condition of controlling the parameter quantity of the model so as to solve the problem of shielding of a target.
The method introduces an Anchor-free mechanism, combines a low-layer embedded characteristic pyramid module and a mixed receptive field module, can reduce invalid calculation caused by redundant candidate frames, can improve positioning accuracy, and effectively solves the problems of missing detection and the like.
2. According to the invention, the target detection network can solve the multi-scale and shielding problems of a target detection scene, can be used in a plug-and-play manner, introduces an Anchor-free algorithm, combines a low-layer embedded characteristic pyramid module and a mixed receptive field module, can reduce invalid calculation caused by redundant candidate frames, can improve positioning accuracy, and solves the problems of large model parameter quantity, large redundant calculation, low applicability, low efficiency and easy omission in the face of actual conditions in the existing target detection task.
3. The backbone network adopts an echo-free mechanism introduced into the FCOS to predict pixel points by pixel points, target detection is carried out without depending on a predefined anchor frame or a predefined proposed area, invalid calculation caused by redundant candidate frames is reduced, positioning accuracy is improved, problems of missed detection and the like are effectively solved, a central mechanism is utilized to quickly filter negative samples, low-quality prediction frames at positions far away from the target center are restrained, the weight of prediction frames close to the target center is increased, and detection performance is improved. Introducing the Res2Net50 network replaces the single 3 x 3 convolutional layer used in the ResNet50 with a hierarchical cascaded feature set in a given redundancy block that is more optimized in terms of network width, depth and resolution.
4. The hybrid receptive field module of the invention is different from other networks which carry out feature processing after multi-layer (C3, C4 and C5) feature fusion, but before feature fusion, the hybrid receptive field module is embedded between C5 and a feature pyramid P5 of a backbone network, so that the characterization capability of the C5 feature is improved, and the final detection and prediction are carried out only by the hybrid receptive field module and a low-layer embedded feature pyramid module. The use of the convolution layers with different void ratios improves the adaptability of the model to targets with different scales, after the spliced feature maps, the 1 x 1 convolution layers are adopted for feature information fusion, the channel dimensionality is reduced to a specified number, and the flexibility of the mixed receptive field module is improved.
5. Compared with the characteristic pyramid, the characteristics output by the low-level embedded characteristic pyramid module in the invention not only contain rich semantic information, but also contain specific detail information, thereby realizing double promotion of multi-scale target detection effect and positioning precision.
Drawings
The invention will be described in further detail with reference to the following description taken in conjunction with the accompanying drawings and detailed description, in which:
FIG. 1 is a schematic diagram of the overall structure of a target detection network according to the present invention;
FIG. 2 is a schematic flow chart of a target detection method according to the present invention;
FIG. 3 is a schematic diagram of a hybrid receptor field module according to the present invention;
FIG. 4 is a schematic diagram of a low-level embedded feature pyramid module according to the present invention;
FIG. 5 is a schematic view of the composite convolution layer of the present invention.
Detailed Description
Example 1
As a basic implementation mode of the invention, the invention comprises a target detection network based on a mixed cavity convolution pyramid, which comprises a backbone network, a mixed reception field module, a low-level embedded characteristic pyramid module and a detection module. The backbone network extracts target picture features by using a layered cascade network structure; and the mixed receptive field module is used for carrying out characteristic enhancement on the highest-layer characteristic diagram output from the topmost end of the backbone network. And the low-layer embedded characteristic pyramid module is used for fusing high-layer characteristics downwards on the basis of the characteristic pyramid and generating a final characteristic diagram to be detected in a low-layer embedding mode. The detection module is used for positioning and classifying the characteristic diagram to be detected and outputting a result.
The backbone network can be a single-stage detection network based on a Res2Net50 network, the feature extraction capability is stronger without increasing the calculation load, an echo-free mechanism of an FCOS is introduced in the aspect of target prediction to predict pixel points, a centerless branch network is added in a Loss function part, a low-quality detection frame is restrained, and the detection performance is improved.
A target detection method based on a mixed hole convolution pyramid comprises the following steps:
building a backbone network based on an Achor-free mechanism, obtaining feature maps C3, C4 and C5 through the backbone network, and outputting a highest-level feature map C5 output by the backbone network to a low-level embedded feature pyramid module after feature enhancement is carried out on the highest-level feature map C5 output by the backbone network through a mixed receptive field module;
II, forming composite characteristics by the aid of the low-layer embedded characteristic pyramid module and characteristic graphs C4 and C3 output by the main network through up-sampling and down-sampling operations, generating a characteristic graph to be detected after the composite characteristics pass through a composite convolution layer, and conveying the characteristic graph to be detected to a detection module for target positioning and classification tasks;
training the network, testing the model of each round, storing the best training model weight, testing the real-time performance of the mixed receptive field module and the low-level embedded characteristic pyramid module by using a corresponding test set, and training to obtain a network model;
and iv, detecting the target by using the trained network model, and outputting a detection result.
Example 2
As a best implementation mode of the invention, the invention comprises a target detection network based on a hybrid void convolution pyramid, and with reference to the attached drawing 1 of the specification, the target detection network comprises a backbone network, a hybrid reception field module, a low-level embedded feature pyramid module and a detection module.
The backbone network adopts a single-stage detection network structure, introduces an echo-free mechanism of FCOS (fiber channel operating system), performs pixel-by-pixel prediction, does not depend on a predefined anchor frame or a proposed area to perform target detection, reduces invalid calculation caused by redundant candidate frames, improves positioning accuracy, effectively solves the problems of missed detection and the like, utilizes a center mechanism, quickly filters negative samples, inhibits low-quality prediction frames at positions far away from a target center, increases the weight of the prediction frames close to the target center, and improves detection performance. The expression of Centeress is shown in formula (1) < CHEM >*、r*、t*、b*The distances from the pixel points to the left, right, upper and lower parts of the prediction frame are represented, and the values are between 0 and 1, so that the closer the Centeress value to the target real center is, the larger the Centeress value is, and the farther the Centeress value is from the target real center is, the smaller the Centeress value is.
The backbone network introduces a Res2Net50 network, using a hierarchical cascaded set of features in a given redundancy block instead of the single 3 x 3 convolutional layer used in the ResNet50, which is more optimized in terms of network width, depth and resolution. When passing through C3, C4 and C5, the sizes of the characteristic maps are 100 × 100, 50 × 50 and 25 × 25.
The mixed receptive field module is used for splicing the feature maps which are obtained by the cavity convolution layers with different cavity rates in parallel, so that the obtaining capability of the network on the global features is improved, and the grid effect caused by single cavity convolution is compensated. The hybrid receptive field module of the present application uses all the hole convolution layers to effectively solve the target occlusion problem.
Referring to the description and the accompanying drawing 3, in order to fully exert the performance of the hybrid receptor field module, the hybrid receptor field module of the present invention is different from other networks in that feature fusion is performed after multi-layer (C3, C4, C5) feature fusion is performed, but before feature fusion, the hybrid receptor field module is embedded between C5 and a feature pyramid P5 of a backbone network, so as to improve the characterization capability of C5 features, and final detection prediction is performed after the hybrid receptor field module passes through the low-layer embedded feature pyramid module. The mixed receptive field module of the invention is composed of four parallel branches, a convolution layer branch of 1 × 1, and three convolution layer branches of 3 × 3 with void rates of 1, 2 and 4 respectively. 3 x 3 cavity convolution with a cavity rate of 4 can acquire more global context feature details, enhance reasoning capability and solve the problem of target occlusion; and the convolution layers with different void ratios are used, so that the adaptability of the model to targets with different scales is improved.
The high-level features output by the C5 have rich semantic information, and are different from the combination of the conventionally adopted cascade features, and the parallel feature combination adopted by the invention can train the network parameters more suitable for the current data set. After the parallel branch 1 passes through the 1 × 1 convolutional layer, the detailed information of the image can be kept as much as possible under the condition of not changing the size of the feature diagram, and meanwhile, the number of channels of the feature diagram can be controlled, so that the subsequent calculation amount is reduced; the convolution kernel of 3 multiplied by 3 has smaller parameters, so that the characteristic information can be processed, and the calculation of the network is further reduced; the hole convolution can obtain more global feature detail information, the reasoning capability is enhanced, the shielding target is well identified, and the adaptability of the model to the multi-scale target is improved while the grid effect is eliminated due to the arrangement of different hole rates. The parallel branch 2 is a convolution of 3 x 3 with a void rate equal to 1 and is suitable for detecting small and medium-sized targets, the parallel branch 3 is a convolution of 3 x 3 with a void rate equal to 2 and is suitable for detecting medium-sized targets, and the parallel branch 4 is a convolution of 3 x 3 with a void rate equal to 4 and is suitable for detecting medium and large-sized targets.
After the spliced feature map, feature information fusion is carried out by adopting a convolution layer of 1 multiplied by 1, the channel dimensionality is reduced to a specified number, and the flexibility of the mixed receptive field module is improved.
The feature pyramid enables the feature map of each layer to have strong semantic information by fusing the features of the high layer downwards, and can perform prediction respectively. Compared with a characteristic pyramid, the characteristics output by the low-layer embedded characteristic pyramid module of the application not only contain rich semantic information, but also contain specific detail information, and double promotion of multi-scale target detection effect and positioning accuracy is achieved.
Referring to the specification and the attached figure 4, C5' is a feature diagram after passing through a low-layer embedded feature pyramid module, and referring to the specification and the attached figure 5, a composite convolutional layer (formed by cascading a 3 × 3 convolutional layer, a BN layer and a LeakyReLU activation layer) aims at processing fused features, optimizing model training and improving the nonlinear expression capability of the features.
The low-level embedded characteristic pyramid module firstly fuses a current-level characteristic graph and a high-level characteristic graph subjected to channel compression and upsampling in a mode of adding element by element and channel by channel to form a composite characteristic graph and complete the embedding of high-level semantic information; secondly, fusing the composite feature map and the downsampled low-level feature map to form a mixed feature map, and completing embedding of low-level detail information; and finally, after each mixed feature map is subjected to the designed composite convolution layer, generating a final feature map to be detected and entering the next module.
A target detection method based on a mixed cavity convolution pyramid refers to the attached figure 1 of the specification, and comprises the following steps:
building a backbone network based on an Achor-free mechanism, obtaining feature maps C3, C4 and C5 through the backbone network, and outputting a highest-level feature map C5 output by the backbone network to a low-level embedded feature pyramid module after feature enhancement is carried out on the highest-level feature map C5 output by the backbone network through a mixed receptive field module;
II, forming composite characteristics by the aid of the low-layer embedded characteristic pyramid module and characteristic graphs C4 and C3 output by the main network through up-sampling and down-sampling operations, generating a characteristic graph to be detected after the composite characteristics pass through a composite convolution layer, and conveying the characteristic graph to be detected to a detection module for target positioning and classification tasks;
training the network, testing the model of each round, storing the best training model weight, testing the real-time performance of the mixed receptive field module and the low-level embedded characteristic pyramid module by using a corresponding test set, and training to obtain a network model;
and iv, detecting the target by using the trained network model, and outputting a detection result.
Wherein, in the process of training the network, the loss function is as follows:
wherein p isx,yRepresenting the class prediction probability, tx,yExpressing regression prediction coordinates, and N expressing the number of positive samples; k is an indication function, if the current prediction is determined to be a positive sample, the current prediction is 1, and if not, the current prediction is 0;
Lclsthe specific expression form is a Focal local Loss function:
wherein y is a sample label, y' predicts the probability that a sample is a positive case, and gamma is a focusing parameter; compared with the common cross entropy Loss function, the Focal local increases a gamma factor, and the influence of simple samples is reduced by controlling the value of gamma to focus more on difficult samples.
LregFor the GIoU Loss function, the specific calculation process is as follows:
Lreg=1-GIoU
where A and B represent the prediction and real boxes, IoU is their intersection-to-parallel ratio, and L is calculated by first calculating their minimum convex set C, i.e., the minimum bounding box bounding A, B, and then combining C with the minimum convex set to calculate GIoUreg。
In summary, after reading the present disclosure, those skilled in the art should make various other modifications without creative efforts according to the technical solutions and concepts of the present disclosure, which are within the protection scope of the present disclosure.
Claims (9)
1. A target detection network based on a hybrid void convolution pyramid is characterized in that: the system comprises a backbone network, a mixed reception field module, a low-level embedded characteristic pyramid module and a detection module; the backbone network extracts target picture features by using a layered cascade network structure; the mixed receptive field module is used for carrying out feature enhancement on the highest layer feature map output from the topmost end of the backbone network; the low-layer embedded feature pyramid module is used for fusing high-layer features downwards on the basis of a feature pyramid and generating a final feature graph to be detected in a low-layer embedding mode; the detection module is used for positioning and classifying the characteristic diagram to be detected and outputting a result.
2. The hybrid hole convolution pyramid-based target detection network of claim 1, wherein: the low-layer embedded characteristic pyramid module is used for generating a final characteristic diagram to be detected, and specifically comprises the following steps:
a. the low-layer embedded characteristic pyramid module fuses the current-layer characteristic graph with the high-layer characteristic graph subjected to channel compression and upsampling to form a composite characteristic graph, and embedding high-layer semantic information is completed;
b. fusing the composite feature map and the downsampled low-level feature map to form a mixed feature map, and completing embedding of low-level detail information;
c. and (4) generating a final characteristic diagram to be detected after each mixed characteristic diagram passes through the composite convolution layer.
3. The hybrid hole convolution pyramid-based target detection network of claim 2, wherein: the fusion mode in the step a and the step b is element-by-element channel-by-channel addition.
4. The hybrid hole convolution pyramid-based target detection network of claim 2, wherein: and the composite convolution layer in the step c is formed by connecting a 3 x 3 convolution layer, a BN layer and a LeakyReLU activation layer.
5. The hybrid hole convolution pyramid-based target detection network of claim 1, wherein: the mixed receptive field module comprises four parallel branches, including a 1 × 1 convolutional layer branch and three 3 × 3 convolutional layer branches with void rates of 1, 2 and 4 respectively; and after splicing the feature maps obtained by the cavity convolution layers with different cavity rates in parallel by the mixed receptive field module, performing feature information fusion by adopting the convolution layers of 1 multiplied by 1, and reducing the channel dimension to a specified number.
6. The hybrid hole convolution pyramid-based target detection network of claim 1, wherein: the backbone network is a single-stage detection network based on a Res2Net50 network, an echo-free mechanism of an FCOS is introduced in the prediction of a target, pixel-by-pixel prediction is carried out, and a Centeress branch network is added in a Loss function part.
7. The hybrid-hole convolutional-pyramid-based target detection network of claim 6, wherein: the feature map output by the backbone network comprises C3, C4 and C5, and the feature map size is 100 × 100, 50 × 50 and 25 × 25.
8. A target detection method based on a mixed cavity convolution pyramid is characterized in that: the method comprises the following steps:
building a backbone network based on an Achor-free mechanism, obtaining feature maps C3, C4 and C5 through the backbone network, and outputting a highest-level feature map C5 output by the backbone network to a low-level embedded feature pyramid module after feature enhancement is carried out on the highest-level feature map C5 output by the backbone network through a mixed receptive field module;
II, forming composite characteristics by the aid of the low-layer embedded characteristic pyramid module and characteristic graphs C4 and C3 output by the main network through up-sampling and down-sampling operations, generating a characteristic graph to be detected after the composite characteristics pass through a composite convolution layer, and conveying the characteristic graph to be detected to a detection module for target positioning and classification tasks;
training the network, testing the model of each round, storing the best training model weight, testing the real-time performance of the mixed receptive field module and the low-level embedded characteristic pyramid module by using a corresponding test set, and training to obtain a network model;
and iv, detecting the target by using the trained network model, and outputting a detection result.
9. The method of claim 8, wherein the method comprises: in the process of training the network in step iii, the loss function is as follows:
wherein p isx,yRepresenting the class prediction probability, tx,yExpressing regression prediction coordinates, and N expressing the number of positive samples; k is an indication function, if the current prediction is determined to be a positive sample, the current prediction is 1, and if not, the current prediction is 0;
Lclsthe specific expression form is a Focal local Loss function:
wherein y is a sample label, y' predicts the probability that a sample is a positive case, and gamma is a focusing parameter;
Lregfor the GIoU Loss function, the specific calculation process is as follows:
where A and B represent the prediction and real boxes, IoU is their intersection-to-parallel ratio, and L is calculated by first calculating their minimum convex set C, i.e., the minimum bounding box bounding A, B, and then combining C with the minimum convex set to calculate GIoUreg。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110646653.7A CN113392960B (en) | 2021-06-10 | 2021-06-10 | Target detection network and method based on mixed hole convolution pyramid |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110646653.7A CN113392960B (en) | 2021-06-10 | 2021-06-10 | Target detection network and method based on mixed hole convolution pyramid |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113392960A true CN113392960A (en) | 2021-09-14 |
CN113392960B CN113392960B (en) | 2022-08-30 |
Family
ID=77620186
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110646653.7A Active CN113392960B (en) | 2021-06-10 | 2021-06-10 | Target detection network and method based on mixed hole convolution pyramid |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113392960B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113887455A (en) * | 2021-10-11 | 2022-01-04 | 东北大学 | Face mask detection system and method based on improved FCOS |
CN113947774A (en) * | 2021-10-08 | 2022-01-18 | 东北大学 | Lightweight vehicle target detection system |
CN113963177A (en) * | 2021-11-11 | 2022-01-21 | 电子科技大学 | CNN-based building mask contour vectorization method |
CN113989498A (en) * | 2021-12-27 | 2022-01-28 | 北京文安智能技术股份有限公司 | Training method of target detection model for multi-class garbage scene recognition |
CN114170587A (en) * | 2021-12-13 | 2022-03-11 | 微民保险代理有限公司 | Vehicle indicator lamp identification method and device, computer equipment and storage medium |
CN114283488A (en) * | 2022-03-08 | 2022-04-05 | 北京万里红科技有限公司 | Method for generating detection model and method for detecting eye state by using detection model |
CN114339049A (en) * | 2021-12-31 | 2022-04-12 | 深圳市商汤科技有限公司 | Video processing method and device, computer equipment and storage medium |
CN114494108A (en) * | 2021-11-15 | 2022-05-13 | 北京知见生命科技有限公司 | Pathological section quality control method and system based on target detection |
CN114693939A (en) * | 2022-03-16 | 2022-07-01 | 中南大学 | Transparency detection depth feature extraction method under complex environment |
CN115861855A (en) * | 2022-12-15 | 2023-03-28 | 福建亿山能源管理有限公司 | Operation and maintenance monitoring method and system for photovoltaic power station |
CN115984105A (en) * | 2022-12-07 | 2023-04-18 | 深圳大学 | Method and device for optimizing hole convolution, computer equipment and storage medium |
CN117132761A (en) * | 2023-08-25 | 2023-11-28 | 京东方科技集团股份有限公司 | Target detection method and device, storage medium and electronic equipment |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108985269A (en) * | 2018-08-16 | 2018-12-11 | 东南大学 | Converged network driving environment sensor model based on convolution sum cavity convolutional coding structure |
CN109543672A (en) * | 2018-10-15 | 2019-03-29 | 天津大学 | Object detecting method based on dense characteristic pyramid network |
CN111260630A (en) * | 2020-01-16 | 2020-06-09 | 高新兴科技集团股份有限公司 | Improved lightweight small target detection method |
CN112070729A (en) * | 2020-08-26 | 2020-12-11 | 西安交通大学 | Anchor-free remote sensing image target detection method and system based on scene enhancement |
CN112183649A (en) * | 2020-09-30 | 2021-01-05 | 佛山市南海区广工大数控装备协同创新研究院 | Algorithm for predicting pyramid feature map |
CN112365501A (en) * | 2021-01-13 | 2021-02-12 | 南京理工大学 | Weldment contour detection algorithm based on convolutional neural network |
CN112419237A (en) * | 2020-11-03 | 2021-02-26 | 中国计量大学 | Automobile clutch master cylinder groove surface defect detection method based on deep learning |
CN112446327A (en) * | 2020-11-27 | 2021-03-05 | 中国地质大学(武汉) | Remote sensing image target detection method based on non-anchor frame |
CN112651351A (en) * | 2020-12-29 | 2021-04-13 | 珠海大横琴科技发展有限公司 | Data processing method and device |
CN112801117A (en) * | 2021-02-03 | 2021-05-14 | 四川中烟工业有限责任公司 | Multi-channel receptive field guided characteristic pyramid small target detection network and detection method |
CN112819748A (en) * | 2020-12-16 | 2021-05-18 | 机科发展科技股份有限公司 | Training method and device for strip steel surface defect recognition model |
-
2021
- 2021-06-10 CN CN202110646653.7A patent/CN113392960B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108985269A (en) * | 2018-08-16 | 2018-12-11 | 东南大学 | Converged network driving environment sensor model based on convolution sum cavity convolutional coding structure |
CN109543672A (en) * | 2018-10-15 | 2019-03-29 | 天津大学 | Object detecting method based on dense characteristic pyramid network |
CN111260630A (en) * | 2020-01-16 | 2020-06-09 | 高新兴科技集团股份有限公司 | Improved lightweight small target detection method |
CN112070729A (en) * | 2020-08-26 | 2020-12-11 | 西安交通大学 | Anchor-free remote sensing image target detection method and system based on scene enhancement |
CN112183649A (en) * | 2020-09-30 | 2021-01-05 | 佛山市南海区广工大数控装备协同创新研究院 | Algorithm for predicting pyramid feature map |
CN112419237A (en) * | 2020-11-03 | 2021-02-26 | 中国计量大学 | Automobile clutch master cylinder groove surface defect detection method based on deep learning |
CN112446327A (en) * | 2020-11-27 | 2021-03-05 | 中国地质大学(武汉) | Remote sensing image target detection method based on non-anchor frame |
CN112819748A (en) * | 2020-12-16 | 2021-05-18 | 机科发展科技股份有限公司 | Training method and device for strip steel surface defect recognition model |
CN112651351A (en) * | 2020-12-29 | 2021-04-13 | 珠海大横琴科技发展有限公司 | Data processing method and device |
CN112365501A (en) * | 2021-01-13 | 2021-02-12 | 南京理工大学 | Weldment contour detection algorithm based on convolutional neural network |
CN112801117A (en) * | 2021-02-03 | 2021-05-14 | 四川中烟工业有限责任公司 | Multi-channel receptive field guided characteristic pyramid small target detection network and detection method |
Non-Patent Citations (6)
Title |
---|
GAO S等: "Res2net: A new multi-scale backbone architecture", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 》 * |
GUO C等: "Augfpn: Improving multi-scale feature learning for object detection", 《PROCEEDINGS OF THE IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
MA J等: "Dual refinement feature pyramid networks for object detection", 《ARXIV:2012.01733》 * |
TIAN ZHI等: "Fcos: Fully convolutional one-stage object detection", 《PROCEEDINGS OF THE IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 * |
候少麒等: "基于空洞卷积金字塔的目标检测算法", 《电子科技大学学报》 * |
姜世浩等: "基于Mask R-CNN和多特征融合的实例分割", 《计算机技术与发展》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113947774B (en) * | 2021-10-08 | 2024-05-14 | 东北大学 | Lightweight vehicle target detection system |
CN113947774A (en) * | 2021-10-08 | 2022-01-18 | 东北大学 | Lightweight vehicle target detection system |
CN113887455A (en) * | 2021-10-11 | 2022-01-04 | 东北大学 | Face mask detection system and method based on improved FCOS |
CN113887455B (en) * | 2021-10-11 | 2024-05-28 | 东北大学 | Face mask detection system and method based on improved FCOS |
CN113963177A (en) * | 2021-11-11 | 2022-01-21 | 电子科技大学 | CNN-based building mask contour vectorization method |
CN114494108A (en) * | 2021-11-15 | 2022-05-13 | 北京知见生命科技有限公司 | Pathological section quality control method and system based on target detection |
CN114170587A (en) * | 2021-12-13 | 2022-03-11 | 微民保险代理有限公司 | Vehicle indicator lamp identification method and device, computer equipment and storage medium |
CN113989498A (en) * | 2021-12-27 | 2022-01-28 | 北京文安智能技术股份有限公司 | Training method of target detection model for multi-class garbage scene recognition |
CN113989498B (en) * | 2021-12-27 | 2022-07-12 | 北京文安智能技术股份有限公司 | Training method of target detection model for multi-class garbage scene recognition |
CN114339049A (en) * | 2021-12-31 | 2022-04-12 | 深圳市商汤科技有限公司 | Video processing method and device, computer equipment and storage medium |
CN114283488A (en) * | 2022-03-08 | 2022-04-05 | 北京万里红科技有限公司 | Method for generating detection model and method for detecting eye state by using detection model |
CN114693939A (en) * | 2022-03-16 | 2022-07-01 | 中南大学 | Transparency detection depth feature extraction method under complex environment |
CN114693939B (en) * | 2022-03-16 | 2024-04-30 | 中南大学 | Method for extracting depth features of transparent object detection under complex environment |
CN115984105A (en) * | 2022-12-07 | 2023-04-18 | 深圳大学 | Method and device for optimizing hole convolution, computer equipment and storage medium |
CN115861855B (en) * | 2022-12-15 | 2023-10-24 | 福建亿山能源管理有限公司 | Operation and maintenance monitoring method and system for photovoltaic power station |
CN115861855A (en) * | 2022-12-15 | 2023-03-28 | 福建亿山能源管理有限公司 | Operation and maintenance monitoring method and system for photovoltaic power station |
CN117132761A (en) * | 2023-08-25 | 2023-11-28 | 京东方科技集团股份有限公司 | Target detection method and device, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN113392960B (en) | 2022-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113392960B (en) | Target detection network and method based on mixed hole convolution pyramid | |
CN111967305B (en) | Real-time multi-scale target detection method based on lightweight convolutional neural network | |
CN110084124B (en) | Feature enhancement target detection method based on feature pyramid network | |
CN111461083A (en) | Rapid vehicle detection method based on deep learning | |
CN113052834B (en) | Pipeline defect detection method based on convolution neural network multi-scale features | |
CN114049584A (en) | Model training and scene recognition method, device, equipment and medium | |
CN114627360A (en) | Substation equipment defect identification method based on cascade detection model | |
CN112906718A (en) | Multi-target detection method based on convolutional neural network | |
CN111429466A (en) | Space-based crowd counting and density estimation method based on multi-scale information fusion network | |
CN113034444A (en) | Pavement crack detection method based on MobileNet-PSPNet neural network model | |
CN113850324B (en) | Multispectral target detection method based on Yolov4 | |
CN114170526B (en) | Remote sensing image multi-scale target detection and identification method based on lightweight network | |
CN112183649A (en) | Algorithm for predicting pyramid feature map | |
CN116503318A (en) | Aerial insulator multi-defect detection method, system and equipment integrating CAT-BiFPN and attention mechanism | |
CN117079163A (en) | Aerial image small target detection method based on improved YOLOX-S | |
CN117765378B (en) | Method and device for detecting forbidden articles in complex environment with multi-scale feature fusion | |
CN112528904A (en) | Image segmentation method for sand particle size detection system | |
CN112700450A (en) | Image segmentation method and system based on ensemble learning | |
CN113901928A (en) | Target detection method based on dynamic super-resolution, and power transmission line component detection method and system | |
CN111027542A (en) | Target detection method improved based on fast RCNN algorithm | |
CN113870162A (en) | Low-light image enhancement method integrating illumination and reflection | |
CN117649526A (en) | High-precision semantic segmentation method for automatic driving road scene | |
CN112132207A (en) | Target detection neural network construction method based on multi-branch feature mapping | |
CN112418229A (en) | Unmanned ship marine scene image real-time segmentation method based on deep learning | |
CN117197530A (en) | Insulator defect identification method based on improved YOLOv8 model and cosine annealing learning rate decay method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |