CN116994034A

CN116994034A - Small target detection algorithm based on feature pyramid

Info

Publication number: CN116994034A
Application number: CN202310803480.4A
Authority: CN
Inventors: 张丽娟; 王敏慧; 姜雨彤; 周悦
Original assignee: Changchun University of Technology
Current assignee: Changchun University of Technology
Priority date: 2023-07-03
Filing date: 2023-07-03
Publication date: 2023-11-03

Abstract

The invention provides a small target detection algorithm based on a feature pyramid, which comprises the following steps: firstly, adding a bottom-up path after the FPN structure, and improving the performance through path enhancement and aggregation; secondly, attention is added into the fusion module, screening of important information in the original feature map is achieved by using an attention weight map, and interference of redundant information on small target prediction is reduced; third, to address the problem that IoU is very sensitive to positional deviations of small targets, the use of anchor-based detectors greatly reduces detection performance, an NWD index was introduced. The result shows that the model mAP value is improved after the model is improved under the same training parameters, and the mAP_s of a small target is obviously improved.

Description

Small target detection algorithm based on feature pyramid

Technical Field

The invention belongs to the field of target detection, and designs a novel small target detection algorithm based on a characteristic pyramid aiming at a small target object. The invention can effectively improve the detection accuracy of small targets, accurately give out target classification and prepare the basis for subsequent image work.

Background

Object detection is an important task in classifying and locating objects of interest in images or videos. And the target detection is the basis for solving complex visual tasks such as target segmentation, scene understanding, target tracking, image description and the like. Small targets are ubiquitous in real world applications, including smart medicine, defect detection driving assistance, large scale surveillance, and rescue at sea, among others. Small target detection has now evolved into a popular sub-field of the target detection field and has evolved into an important basis for verifying the reliability of target detection algorithms.

While with the rapid development of deep learning, target detection has made great progress in performance and speed, most algorithms are directed to the detection of normal-sized objects. While small target objects often exhibit very limited visual characteristic information, this increases the difficulty of detecting small targets, resulting in very slow development of small target detection. The feature pyramid (Feature Pyramid Network, FPN) is a very representative network structure in the current small target detection algorithm, and features are enriched in a hierarchical detection mode so as to improve detection performance. Although FPN-based methods have achieved many satisfactory results, there are still problems such as inconsistent computation of different layer gradients, insufficient exploitation of shallow features, etc., which all reduce the effectiveness of the FPN structure. Therefore, in order to solve the problem caused by the FPN structure, the invention makes a brand new improvement on the FPN structure, and can better improve the detection performance of small targets in different environments.

Most existing small target detection methods can be roughly divided into four categories, namely data enhancement, multi-scale learning, custom training strategies for small targets and feature enhancement strategies. One simple and efficient way in data enhancement is to collect more small target data. Another approach is to use simple data enhancement, including rotation, image flipping, and upsampling. Multi-resolution image pyramid is a basic approach to multi-scale learning. In order to reduce the computational cost, some studies have proposed constructing FPN. After this, many approaches have attempted to further improve FPNs, such as PANet, biFPN, recursive-FPN. Multiscale learning strategies typically improve TOD performance through additional computations. The object detector is generally unable to obtain satisfactory detection performance for both small and large objects. Inspired by this fact, SNIP and SNIPER are designed to train subjects selectively over a range of scales. Functional enhancement strategies. Some studies have proposed using GAN to enhance the feature representation of small objects. Wherein PGAN first attempts to apply GAN for small target detection.

Most methods dedicated to small target detection incur additional annotation or computational costs. In contrast, the method provided by the invention does not increase extra cost in the reasoning stage, and can better improve the detection efficiency and detection precision of the small target.

Disclosure of Invention

The invention aims to improve the FPN structure of the existing algorithm to improve the detection precision and detection efficiency of small targets and simultaneously reduce the occupation of computing resources. The small target detection algorithm based on the feature pyramid is provided, and higher small target detection accuracy can be achieved.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

the small target detection algorithm based on the feature pyramid is realized by the following steps:

in order to alleviate the problem that shallow features are not fully utilized in the FPN of the detection RS, a bottom-up feature extraction structure is added after the FPN structure to enhance the accurate position information of the shallow layers. The bottom-up path is increased, so that the position information of the shallow layer features can be more easily propagated, and the detection performance of the small target is improved;

and secondly, adding an attention module into a fusion module of the detection RS in order to better extract small target features. The module comprises two parts of channel attention and space attention. The channel attention module adopts a local cross-channel interaction strategy without dimension reduction, and the strategy is realized through one-dimensional convolution. The spatial attention module adopts a Contextual Transformer (CoT) module, and can fully utilize the context information among input key values to guide the learning of a dynamic attention matrix, so that the visual expressive capacity is enhanced;

step three, since the IoU (Intersection over Union) -based metric is very sensitive to positional deviations of small targets, and the use in anchor-based detectors can significantly degrade detection performance. To solve this problem, a new metric method, normalized wasperstein distance (Normalized Wasserstein Distance, NWD), was introduced. NWD metrics can be easily embedded into any of the allocation, non-maximum suppression (NMS) and loss functions of anchor-based detectors to replace the usual IoU metrics, effectively improving the detection performance of small targets.

Step one, adding a bottom-up path after the FPN structure, and enhancing performance through path enhancement and aggregation. The enhancement path starts with the shallowest layer P2 of the FPN, and P2 is mapped directly to the shallowest layer feature N2 of the enhancement path. Then, each layer adopts transverse connection in upward propagation, and a higher resolution characteristic diagram Ni and a coarser characteristic diagram Pi+1 are taken to generate a new characteristic diagram Ni+1. Each feature map Ni first passes through a 3 x 3 convolution layer, with a step of 2, to reduce the space size. Each element of the feature map pi+1 is then added to the downsampled map by a lateral connection. The fused feature map is processed by another 3×3 convolution layer to generate ni+1 of the subsequent subnetwork. This is an iterative process that terminates near P5.

The attention module in the second step is divided into two parts: channel attention and spatial attention. The convolution kernel size k of the channel attention is determined in an adaptive way, the size of k being proportional to the channel dimension. The convolution features are first aggregated using Global Average Pooling (GAP) and then adaptively determining the convolution kernel size k to perform for the convolution process. And finally, obtaining the channel attention through a sigmoid function. Spatial attention employs a CoT module that integrates contextual information and self-attention into a unified hierarchy. The CoT module firstly adopts 3X 3 convolution to obtain a space Key Value as static context information K1, the Query Value is directly equal to the input characteristic x, and the Value is a convolution result of 1X 1. Then K1 and Query are connected, and the attention matrix A is obtained through two continuous 1X 1 convolutions. And finally multiplying A by Value to obtain a feature map K2, and fusing K1 and K2 to obtain a final feature attention matrix.

Step three NWD index uses the wasperstein distance in the most common transmission theory to calculate the distribution distance. And then carrying out index normalization on the distribution distance to convert the distribution distance into a similarity measurement index NWD. The calculation formula is as follows:

(1)

(2)

the modification of NWD index mainly includes three parts: positive and negative label assignment, NMS and regression loss function. Label distribution is to generate anchor points with different scales and proportions, and then perform binary marking, training classification and regression head on the anchor points. Specifically, the forward label is assigned to two anchor points, namely (1) the anchor point with the highest NWD value and a group-trunk box, wherein the NWD value is larger than Ɵ n, and (2) the anchor point with the NWD value larger than a positive threshold Ɵ p and any group. Thus, if the NWD value of the anchor is below the negative threshold Ɵ n for all gt boxes, a negative label is assigned to the anchor. Furthermore, anchors that are not assigned either positive or negative labels do not participate in the training process. IoU in the NMS is replaced with our NWD index and all prediction boxes are classified according to the score. The highest scoring prediction box M is selected and all other prediction boxes that have significant overlap with M (using a predefined threshold Nt) are suppressed. This process is recursively applied to the remaining blocks. The formula for the regression loss function is defined as:

(3)

wherein the method comprises the steps ofFor the Gaussian distribution model of the prediction box P, +.>Is a gaussian distribution model of the prediction box G.

The method provided by the invention has the main advantages that: (1) The addition of bottom-up paths allows the positional information of shallow features to be more easily propagated, thereby improving small target detection accuracy. (2) The attention module reduces interference of redundant information on small target predictions and occupies less computing resources. (3) The introduction of NWD substitution IoU as a better measure of similarity between two bounding boxes significantly improves the small target detection performance of the detector.

Drawings

FIG. 1 is a flow chart of a feature pyramid based small target detection algorithm of the present invention;

FIG. 2, FIG. 3 is a block diagram of a channel attention module and a spatial attention module of the feature pyramid based small object detection algorithm of the present invention;

FIG. 4 is a Fusion module architecture diagram of a feature pyramid-based small object detection algorithm of the present invention;

FIG. 5 is a network structure diagram of a feature pyramid-based small target detection algorithm of the present invention;

FIGS. 6 and 7 are examples of image detection results obtained in the AI-TOD dataset by the feature pyramid-based small target detection algorithm of the present invention versus the original model;

fig. 8 and 9 are examples of comparison of the AP indices detected by the image obtained in the AI-TOD dataset by the small object detection algorithm based on the feature pyramid with the original model of the present invention.

Detailed Description

The present invention is described in detail below with reference to the drawings so that those skilled in the art can better understand the present invention. It should be noted that modifications can be made to the present invention by those skilled in the art without departing from the core concept of the present invention, which falls within the scope of the present invention.

As shown in fig. 1, the general flow of the small target detection algorithm based on the feature pyramid of the present invention includes the following steps:

in the first step, a bottom-up path is added after the FPN structure, and the construction process is as follows: starting from the shallowest layer P2 of the FPN, P2 is mapped directly to the shallowest layer feature N2 of the enhancement path. Then, each layer adopts transverse connection in upward propagation, and a higher resolution characteristic diagram Ni and a coarser characteristic diagram Pi+1 are taken to generate a new characteristic diagram Ni+1. Each feature map Ni first passes through a 3 x 3 convolution layer, with a step of 2, to reduce the space size. Each element of the feature map pi+1 is then added to the downsampled map by a lateral connection. The fused feature map is processed by another 3×3 convolution layer to generate ni+1 of the subsequent subnetwork. This is an iterative process that terminates near P5;

step two, adding an attention module into the fusion module, wherein the attention module is divided into two parts: channel attention and spatial attention. The convolution kernel size k of the channel attention is determined in an adaptive way, the size of k being proportional to the channel dimension. The convolution features are first aggregated using Global Average Pooling (GAP) and then adaptively determining the convolution kernel size k to perform for the convolution process. And finally, obtaining the channel attention through a sigmoid function. Spatial attention employs a CoT module that integrates contextual information and self-attention into a unified hierarchy. The CoT module firstly adopts 3X 3 convolution to obtain a space Key Value as static context information K1, the Query Value is directly equal to the input characteristic x, and the Value is a convolution result of 1X 1. Then K1 and Query are connected, and the attention matrix A is obtained through two continuous 1X 1 convolutions. Finally multiplying A with Value to obtain a feature map K2, and fusing K1 and K2 to obtain a final feature attention matrix;

the IoU metric is modified in step three to be an NWD metric. The NWD index uses the wasperstein distance in the most common transmission theory to calculate the distribution distance. And then carrying out index normalization on the distribution distance to convert the distribution distance into a similarity measurement index NWD. The calculation formula is as follows:

(1)

(2)

(3)

Fig. 2, 3 and 4 are block diagrams of attention module according to the present invention. The spatial attention module and the channel attention module form a complete attention module which is added into the fusion module.

FIG. 5 is a network structure diagram of a feature pyramid based small target detection algorithm of the present invention. The bottom-up path structure is introduced after the FPN structure as shown on the basis of the detectrs and the attention module is added to the original fusion module.

Fig. 6, fig. 7, and fig. 8 and fig. 9 show comparative examples of the original model and the model of the present invention. From fig. 6 and fig. 7, it is obvious that the accuracy of detecting small targets by the model of the invention is obviously improved, and more small targets are detected and the accuracy of detection is improved compared with the original model. Fig. 8 and 9 show that the mAP value and the map_s value of the small target are significantly improved according to the AP index.

Claims

1. The small target detection algorithm based on the feature pyramid is provided, and the method is realized through the following steps:

2. The feature pyramid-based small object detection algorithm according to claim 1, wherein in the first step, a bottom-up path is added after the FPN structure, and the construction process is as follows:

starting from the shallowest layer P2 of the FPN, P2 is mapped directly to the shallowest layer feature N2 of the enhancement path. Then, each layer adopts transverse connection in upward propagation, and a higher resolution characteristic diagram Ni and a coarser characteristic diagram Pi+1 are taken to generate a new characteristic diagram Ni+1. Each feature map Ni first passes through a 3 x 3 convolution layer, with a step of 2, to reduce the space size. Each element of the feature map pi+1 is then added to the downsampled map by a lateral connection. The fused feature map is processed by another 3×3 convolution layer to generate ni+1 of the subsequent subnetwork. This is an iterative process that terminates near P5.

3. The feature pyramid-based small target detection algorithm according to claim 2, wherein the second step adds an attention module to the fusion module, and the attention module is divided into two parts: channel attention and spatial attention. The specific operation is described as follows:

the convolution kernel size k of the channel attention is determined in an adaptive way, the size of k being proportional to the channel dimension. The convolution features are first aggregated using Global Average Pooling (GAP) and then adaptively determining the convolution kernel size k to perform for the convolution process. And finally, obtaining the channel attention through a sigmoid function. Spatial attention employs a CoT module that integrates contextual information and self-attention into a unified hierarchy. The CoT module firstly adopts 3X 3 convolution to obtain a space Key Value as static context information K1, the Query Value is directly equal to the input characteristic x, and the Value is a convolution result of 1X 1. Then K1 and Query are connected, and the attention matrix A is obtained through two continuous 1X 1 convolutions. And finally multiplying A by Value to obtain a feature map K2, and fusing K1 and K2 to obtain a final feature attention matrix.

4. A feature pyramid based small object detection algorithm according to claim 3, wherein IoU metric is modified in step three to NWD metric. The NWD index uses the wasperstein distance in the most common transmission theory to calculate the distribution distance. And then carrying out index normalization on the distribution distance to convert the distribution distance into a similarity measurement index NWD. The calculation formula is as follows:

(1)

(2)

(3)