CN115797684A

CN115797684A - Infrared small target detection method and system based on context information

Info

Publication number: CN115797684A
Application number: CN202211461433.8A
Authority: CN
Inventors: 付莹; 李峻宇; 郑德智; 宋韬; 林德福
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2022-11-21
Filing date: 2022-11-21
Publication date: 2023-03-14

Abstract

The invention relates to an infrared small target detection method and system based on context information, belonging to the technical field of computer vision. For the infrared small target dataset, the classification loss function, confidence loss function and position loss function are used to train the small target detection network. Then use the trained small target detection network to extract features from the infrared image and get the feature results. Finally, the extracted features are further fused, and infrared small target detection is performed on the fused features to obtain the final target detection result. At the same time, the invention proposes an infrared small target detection system based on context information. The invention does not rely on additional infrared image denoising, enhancement and other processing modules, and the training process is carried out end-to-end, with simple implementation, high performance and strong robustness. The extra calculation cost of the present invention is extremely low, which is beneficial to realize low-delay and high-speed infrared small target detection, effectively improves the detection rate, and reduces the missed detection rate.

Description

A method and system for infrared small target detection based on context information

技术领域technical field

本发明涉及一种在红外图像中检测小目标的方法及装置，具体涉及一种基于上下文信息的红外小目标检测方法及装置，属于计算机视觉处理技术领域。The invention relates to a method and device for detecting small targets in infrared images, in particular to a method and device for detecting small infrared targets based on context information, and belongs to the technical field of computer vision processing.

背景技术Background technique

与可见光图像相比，红外图像不受极端气候、环境影响，无需借助外部光照也可成像，探测能力强，作用距离远，从红外监控系统到红外制导系统，红外图像在民用领域和军用领域均有着重要的研究和应用意义。然而，与可见光图像相比，红外图像存在分辨率差、成像模糊，信噪比低等缺点，其中的小物体容易被噪声淹没。因此，有效检测红外图像中的小目标是一项具有挑战的任务，受到信号处理和计算机视觉界的广泛关注。Compared with visible light images, infrared images are not affected by extreme climates and environments, and can be imaged without external light. They have strong detection capabilities and a long range. From infrared monitoring systems to infrared guidance systems, infrared images are used in both civilian and military fields. It has important research and application significance. However, compared with visible light images, infrared images have disadvantages such as poor resolution, blurred imaging, and low signal-to-noise ratio, and small objects in them are easily overwhelmed by noise. Therefore, efficiently detecting small objects in infrared images is a challenging task that has received extensive attention from the signal processing and computer vision communities.

小目标检测，是一种能够从图像中检测小目标的技术。该技术能够从自然光照的条件下检测出小目标的类别、存在的位置。目前，主要的小目标检测方法均基于深度学习和深度卷积神经网络，该技术广泛用于监控安防、自动驾驶、遥感卫星等领域。按照COCO数据集的定义，原则上小于32×32的目标被称为小目标。小目标占比像素低，检测性能与大目标相比相差极大。如果检测场景比较复杂，例如目标之间遮挡，目标被背景遮挡，或者密集的情况下，小目标受到的影响会比大目标更剧烈，小目标检测的难度被进一步加大。Small object detection is a technique capable of detecting small objects from images. This technology can detect the category and location of small objects under natural lighting conditions. At present, the main small target detection methods are based on deep learning and deep convolutional neural network. This technology is widely used in monitoring security, automatic driving, remote sensing satellite and other fields. According to the definition of the COCO dataset, in principle, objects smaller than 32×32 are called small objects. Small targets account for a low proportion of pixels, and the detection performance is significantly different from that of large targets. If the detection scene is more complex, such as occlusion between targets, the target is occluded by the background, or in dense situations, the impact of small targets will be more severe than that of large targets, and the difficulty of small target detection will be further increased.

上下文信息，物体通常伴随着相应的环境出现，除了物体本身具有的特征外，物体与周围环境之间也存在着紧密的联系，这些信息即所谓的特征上下文信息。由于红外图像受噪声和杂波影响严重，其中的小目标极易受到干扰。因此，借助图像中其他和目标相关的信息并结合小目标的特征来检测物体可以有效提高检测的结果，降低漏检小目标的概率。Contextual information. Objects usually appear with the corresponding environment. In addition to the characteristics of the object itself, there is also a close connection between the object and the surrounding environment. This information is the so-called feature context information. Because infrared images are seriously affected by noise and clutter, small targets are easily disturbed. Therefore, using other information related to the target in the image and combining the characteristics of small targets to detect objects can effectively improve the detection results and reduce the probability of missing small targets.

发明内容Contents of the invention

本发明的目的是针对现有红外小目标检测技术存在的缺陷和不足，为了解决现有方法未充分利用目标周围的局部上下文信息和整体图像中的全局上下文信息，且适应不同类别小目标特征变化能力不足，以及浅层特征与深层特征融合不当等技术问题，创造性地提出一种基于上下文信息的红外小目标检测方法及系统。本方发明有效提高了针对红外小目标检测性能，具有良好的实际应用效果。The purpose of the present invention is to address the defects and deficiencies in the existing infrared small target detection technology, in order to solve the problem that the existing methods do not make full use of the local context information around the target and the global context information in the overall image, and adapt to the feature changes of different types of small targets Insufficient ability, and technical problems such as improper fusion of shallow features and deep features, creatively propose a method and system for infrared small target detection based on context information. The invention of the present invention effectively improves the detection performance for small infrared targets, and has good practical application effect.

为达上述目的，本发明采用以下技术方案实现。In order to achieve the above object, the present invention adopts the following technical solutions to achieve.

一种基于上下文信息的红外小目标检测方法，包括以下步骤：A method for detecting infrared small targets based on context information, comprising the following steps:

步骤1：获取红外小目标数据集并处理。Step 1: Obtain and process the infrared small target dataset.

步骤2：使用分类损失函数、置信度损失函数以及位置损失函数，对小目标检测网络进行训练。Step 2: Use the classification loss function, confidence loss function and position loss function to train the small object detection network.

步骤3：用训练好的小目标检测网络，对红外图像进行特征提取，得到特征结果。本方法中，无需对红外图像进行预处理。Step 3: Use the trained small target detection network to extract features from the infrared image and obtain feature results. In this method, there is no need to preprocess the infrared image.

步骤4：将提取到的特征进行进一步融合，在融合后的特征上进行红外小目标检测，得到最终目标检测结果。Step 4: The extracted features are further fused, and infrared small target detection is performed on the fused features to obtain the final target detection result.

为实现本发明所述目的，本发明进一步提出了一种基于上下文信息的红外小目标检测系统，包括图像处理模块、目标信息学习模块、特征提取模块、特征融合和目标检测模块。In order to achieve the purpose of the present invention, the present invention further proposes a small infrared target detection system based on context information, including an image processing module, a target information learning module, a feature extraction module, a feature fusion and a target detection module.

有益效果Beneficial effect

本发明方法及系统，与现有技术相比，具有以下优点：Compared with the prior art, the method and system of the present invention have the following advantages:

1.本发明不依赖额外红外图像去噪、增强以及其他处理模块，训练过程端到端进行，实现简单、性能高、鲁棒性强。1. The present invention does not rely on additional infrared image denoising, enhancement and other processing modules, and the training process is carried out end-to-end, with simple implementation, high performance and strong robustness.

2.本发明的额外计算量开销极低，有利于实现低延迟，高速度的红外小目标检测，有效的提高了小目标的检测率，降低了漏检率，其他尺度目标的精度也有所提升。2. The extra calculation overhead of the present invention is extremely low, which is beneficial to realize low-latency, high-speed infrared small target detection, effectively improves the detection rate of small targets, reduces the missed detection rate, and improves the accuracy of targets of other scales .

附图说明Description of drawings

图1是本发明方法的流程图。Figure 1 is a flow chart of the method of the present invention.

图2是本发明方法所述特征提取方法示意图。Fig. 2 is a schematic diagram of the feature extraction method described in the method of the present invention.

图3是本发明方法所述框架图及特征融合内部细节示意图。Fig. 3 is a schematic diagram of the frame diagram and internal details of feature fusion described in the method of the present invention.

图4是本发明系统的流程图。Figure 4 is a flow chart of the system of the present invention.

具体实施方式Detailed ways

为了更好的阐述本发明的目的和优点，下面结合附图对发明内容进行详细的说明。In order to better illustrate the purpose and advantages of the present invention, the content of the invention will be described in detail below in conjunction with the accompanying drawings.

如图1所示，一种基于上下文信息的红外小目标检测方法，包括以下步骤：As shown in Figure 1, a small infrared target detection method based on context information includes the following steps:

步骤1：获取红外小目标数据集，并进行数据增强处理。Step 1: Obtain the infrared small target data set and perform data enhancement processing.

高质量的数据集是基于深度学习的红外小目标检测方法实现良好性能所必不可少的选择。但是，现有的红外小目标数据集数量少、规模小，小目标占比不佳。因此，本发明首先通过数据增强来扩充红外小目标数据集，从而增强整个检测方法的鲁棒性。A high-quality dataset is an essential choice for deep learning-based infrared small object detection methods to achieve good performance. However, the existing infrared small target data sets are small in number and small in scale, and the proportion of small targets is not good. Therefore, the present invention first expands the infrared small target data set through data enhancement, thereby enhancing the robustness of the entire detection method.

具体地，在含有红外小目标的图像中，找出不与其他目标重叠的小目标，并随机复制粘贴到图像的其他位置。其中，复制的小目标不遮挡其他目标，与其他目标保持距离。Specifically, in an image containing small infrared targets, find out the small targets that do not overlap with other targets, and randomly copy and paste them to other positions in the image. Among them, the copied small target does not block other targets, and keeps a distance from other targets.

进一步地，在复制粘贴小目标的基础上，可以叠加其他的数据增强操作(比如：旋转平移、缩放剪裁、马赛克增强等)，其中，优选做马赛克增强。Furthermore, on the basis of copying and pasting small objects, other data enhancement operations (such as: rotation and translation, scaling and cropping, mosaic enhancement, etc.) can be superimposed, among which mosaic enhancement is preferred.

具体地，令总损失函数L(x,x′)表示为：Specifically, let the total loss function L(x,x′) be expressed as:

其中，x、x′分别表示预测值和真实值，α_box、α_obj、α_cls分别表示三个损失函数的权重，L_CIoU、L_obj、L_cls分别表示目标检测任务位置损失函数、置信度损失函数及分类损失函数；k、s²、B分别表示输出特征图、网格和每个网格上anchor(即位置)的数量，I_kij表示第k个输出特征图、第i个网格、第j个anchor box是否为正样本，若是正样本，则为1，若是负样本则为0；α_k用来平衡不同尺度的输出特征权重。Among them, x and x′ represent the predicted value and the real value respectively, α _box , α _obj , and α _cls represent the weights of the three loss functions respectively, L _CIoU , L _obj , and L _cls represent the target detection task position loss function, confidence Loss function and classification loss function; k, s ² , and B represent the output feature map, grid, and the number of anchors (ie positions) on each grid, respectively, and I _kij represents the kth output feature map and the i-th grid , Whether the jth anchor box is a positive sample, if it is a positive sample, it is 1, if it is a negative sample, it is 0; α _k is used to balance the output feature weights of different scales.

步骤3：用训练好的小目标检测网络对红外图像进行特征提取，得到特征结果。并且，无需对红外图像进行预处理。Step 3: Use the trained small target detection network to perform feature extraction on the infrared image, and obtain the feature result. Also, no preprocessing of infrared images is required.

普通的红外小目标检测方法主要提取物体本身蕴含的特征，欠缺对上下文信息的获取分析能力，而小目标纹理信息不足，在没有上下文信息的补充增强时，低对比度的红外图像复杂的背景很容易将其淹没，小目标的不同形态也会给检测造成一定的难度。The ordinary infrared small target detection method mainly extracts the features contained in the object itself, lacks the ability to acquire and analyze context information, and the texture information of small targets is insufficient. When there is no supplementary enhancement of context information, low-contrast infrared images with complex backgrounds are easy to detect. If it is submerged, the different shapes of small targets will also cause certain difficulties in detection.

为此，本方法提出，首先对图像进行特征提取。在提取过程中，充分考虑到不同形状的特征对上下文信息的需求不同，通过动态上下文信息提取，如图2所示，对特征建立各个信息之间的远距离依赖。其中，输入部分加入的位置编码弥补了特征的位置信息，改善了红外目标在远处特征信息不足的问题。对输入的深层特征分块展平为序列并引入位置信息后，送入多头注意力机制进行加权求和。然后，采用残差连接来优化结果、加快收敛，通过两层全连接层并再次残差连接。后续为一层可变形卷积，在卷积的同时加入偏置项，确保相同位置存在形态各异、大小差别明显的物体时，也能很好的表达物体的特征。To this end, this method proposes to first perform feature extraction on the image. During the extraction process, fully considering that features of different shapes have different requirements for context information, through dynamic context information extraction, as shown in Figure 2, a long-distance dependence between each information is established for features. Among them, the position code added in the input part makes up the position information of the feature, and improves the problem of insufficient feature information of the infrared target in the distance. After flattening the input deep feature block into a sequence and introducing position information, it is sent to the multi-head attention mechanism for weighted summation. Then, the residual connection is used to optimize the results and speed up the convergence, through two fully connected layers and residual connection again. The follow-up is a layer of deformable convolution, and a bias term is added at the same time of convolution to ensure that when there are objects of different shapes and sizes at the same position, the characteristics of the objects can also be well expressed.

具体地，动态上下文信息提取的过程如下：Specifically, the process of dynamic context information extraction is as follows:

对于输入特征F，其特征大小为C×H×W，其中C表示通道数，H表示高度，W表示宽度；给定区块的大小尺寸P，将C×H×W划分为N个P×P×C块，P表示块。For the input feature F, its feature size is C×H×W, where C represents the number of channels, H represents the height, and W represents the width; given the size P of the block, divide C×H×W into N P× P×C blocks, P means block.

得到N个块之后，将其线性变换为N个长度的特征向量，并在向量起始位置添加一个标志位向量x_p；After obtaining N blocks, linearly transform them into feature vectors of N lengths, and add a flag vector x _p at the starting position of the vector;

F₁＝E+F₀ F ₁ =E+F ₀

其中，F₀表示输出的向量结果，

表示第N个区块，W_N为权重参数，Concat[]为拼接操作。Among them, F ₀ represents the output vector result,

Indicates the Nth block, W _N is the weight parameter, and Concat[] is the splicing operation.

最终得到的F₀为分块嵌入的输出结果。得到F₀分块后，嵌入得到的特征尚缺少区块之间的相对位置信息，因此，添加位置编码信息E与F₀相加，得到F₁，F₁表示添加位置信息后的结果。The final F ₀ obtained is the output result of block embedding. After obtaining the F ₀ blocks, the embedded features still lack the relative position information between the blocks. Therefore, add the position coding information E and F ₀ to get F ₁ , and F ₁ represents the result after adding the position information.

嵌入位置信息后的F₁分别乘以三个不同的参数矩阵，映射为查询矩阵、被查询的键值矩阵和值矩阵。经过注意力机制处理后，得到多个注意力结果，用来表示图像中不同的上下文信息。将这些注意力结果拼接起来并标准化，得到最终的上下文信息汇总结果:The _F1 after embedding position information is multiplied by three different parameter matrices respectively, and mapped into query matrix, queried key-value matrix and value matrix. After being processed by the attention mechanism, multiple attention results are obtained to represent different contextual information in the image. These attention results are concatenated and normalized to obtain the final summary result of contextual information:

head_i＝Attention(F₁W_q；F₁W_k；F₁W_v)head _i = Attention(F ₁ W _q ; F ₁ W _k ; F ₁ W _v )

F_M＝Concat[head_i；head_i；head_i；...；head_i]W_M F _M =Concat[head _i ; head _i ; head _i ; . . . ; head _i ] W _M

其中，Attention()表示注意力机制操作，Q、K、V分别表示查询矩阵、被查询的键值矩阵和值矩阵，T表示转置运算，

表示缩放因子；F₁表示添加位置信息后的结果，W_q、W_k、W_v、W_M是可学习参数矩阵，Softmax表示进行Softmax操作，head_i表示多个注意力结果的输出，F_M表示多头注意力输出特征。Concat表示相加操作。Among them, Attention() represents the operation of the attention mechanism, Q, K, and V represent the query matrix, the queried key-value matrix and the value matrix, respectively, T represents the transpose operation,

Represents the scaling factor; F ₁ represents the result after adding position information, W _q , W _k , W _v , W _M are learnable parameter matrices, Softmax represents the Softmax operation, head _i represents the output of multiple attention results, F _M Denotes multi-head attention output features. Concat represents an addition operation.

前馈神经网络包括两层全连接层，残差归一化后的多头注意力输出特征F_M被第一个全连接层映射到高维空间，低维空间则被第二个全连接层映射，进一步保留有用的信息，其过程为：The feed-forward neural network includes two layers of fully connected layers. The multi-head attention output feature F _M after residual normalization is mapped to the high-dimensional space by the first fully connected layer, and the low-dimensional space is mapped by the second fully connected layer. , to further retain useful information, the process is:

F₂＝F_M[0]+F₁ F ₂ =F _M [0]+F ₁

X＝F₂W_fc1W_fc2+F₁ X＝F ₂ W _fc1 W _fc2 +F ₁

其中，F₂表示残差后的结果，F_M[0]为标志位向量，X为输出结果，W_fc1、W_fc2为两个全连接层的权重。Among them, F ₂ represents the result after the residual, F _M [0] is the flag bit vector, X is the output result, W _fc1 and W _fc2 are the weights of the two fully connected layers.

处理完上下文信息后，输出结果X通过可变形卷积来动态调整有效信息，联系不同小目标与上下文信息之间的关系：After processing the context information, the output result X dynamically adjusts the effective information through deformable convolution, and links the relationship between different small targets and context information:

其中，Y(p₀)表示可变形卷积输出结果，X、Y分别为输入特征图和输出特征图，p₀表示输出特征中的位置，p_n表示相邻位置，R表示实数范围。函数W()表示p_n处的权重。p_n是偏移值，通过从输入特征进行并行卷积来学习。Among them, Y(p ₀ ) represents the output result of deformable convolution, X and Y are the input feature map and output feature map respectively, p ₀ represents the position in the output feature, p _n represents the adjacent position, and R represents the range of real numbers. The function W() represents the weight at p _n . p _n is the offset value, learned by parallel convolution from the input features.

受到噪声影响，红外图像中不同小目标的特征表现存在很大差异，十分考验模型的特征融合能力。而普通目标检测模型单纯的上采样和卷积、连接特征等操作，未从空间维度分析物体的位置特征、通道维度分析物体的语义特征，或者仅能对图像中明显的特征进行融合，却忽略掉其中的小目标信息，进而导致小目标的最终检测精确度不高。Affected by noise, the feature performance of different small targets in the infrared image is very different, which is a test of the feature fusion ability of the model. However, the ordinary object detection model simply performs operations such as upsampling, convolution, and connection features, and does not analyze the positional features of objects from the spatial dimension and the semantic features of objects from the channel dimension, or can only fuse obvious features in the image, but ignores them. The small target information is lost, which leads to low final detection accuracy of small targets.

因此，本方法中，对图像进行特征提取后，将提取到的特征进行特征融合，并在融合后的特征上进行目标检测。在特征融合过程中，利用多信息融合层聚合多重特征中的通道、空间信息。Therefore, in this method, after feature extraction is performed on the image, the extracted features are subjected to feature fusion, and target detection is performed on the fused features. In the process of feature fusion, the multi-information fusion layer is used to aggregate channel and spatial information in multiple features.

聚合后的特征大大提高了物体的位置信息和语义信息的表达。在特征融合时增加了新的特征尺度并进行融合，来补充深层的小目标特征，有利于丰富小目标的细节特征。The aggregated features greatly improve the expression of the object's location information and semantic information. In the feature fusion, a new feature scale is added and fused to complement the deep small target features, which is conducive to enriching the detailed features of small targets.

为了在特征融合时尽可能的增加物体的时空信息，保留更多的目标特征。如图3所示，多信息融合层在每个特征尺度中进行信息融合操作。In order to increase the spatio-temporal information of the object as much as possible during feature fusion, more target features are retained. As shown in Figure 3, the multi-information fusion layer performs information fusion operations in each feature scale.

多信息融合模块(MFM)通过多个残差结构融合不同层的信息，其结构如图3(c)所示，包含三个部分，第一个是IC层，如图3(b)表示，负责细化特征的信息，然后从通道层面分别进行全局池化和最大池化，并经过共享权重的全连接层整理信息，相乘加后再通过softmax函数归一化，得到提取的通道信息，与输入信息相乘，达到对通道信息增强的效果。The multi-information fusion module (MFM) fuses the information of different layers through multiple residual structures. Its structure is shown in Figure 3(c), which includes three parts. The first one is the IC layer, as shown in Figure 3(b). Responsible for refining the feature information, and then perform global pooling and maximum pooling from the channel level, and organize the information through the fully connected layer with shared weights, multiply and add, and then normalize through the softmax function to obtain the extracted channel information. Multiplied with the input information to achieve the effect of enhancing the channel information.

通道信息提取增强后，继续对图像的每个位置分别进行全局池化和最大池化，相加后可以采取7×7卷积叠加特征并通过softmax函数归一化，达到对位置信息增强的效果。最后，可以经过1×1卷积，进一步整合通道和空间信息。After the channel information extraction is enhanced, continue to perform global pooling and maximum pooling on each position of the image respectively. After adding, 7×7 convolution superimposition features can be taken and normalized by the softmax function to achieve the effect of enhancing position information. . Finally, 1×1 convolution can be used to further integrate channel and spatial information.

深层特征所含语义信息丰富，但多为目标的语义信息，小目标的相关特征经过多次下采样操作后，容易被噪声遮盖，难以定位，而浅层特征具有丰富的小目标纹理信息和位置信息。同时，为了有效利用浅层特征来增强小目标的细节信息、补充小目标的位置信息，额外增加了一个特征尺度来专门关注小物体，增加一个检测头来输出检测结果。相关结构命名如图3(a)所示，动态上下文信息提取模块及后续三个多信息融合模块(MFM)的输出为T5、T4、T3、T2，这些输出的大小分别是原图的1/32、1/16、1/8、1/4。与T5、T4、T3连接的相同大小的特征记为R4、R3、R2。The deep features contain rich semantic information, but most of them are the semantic information of the target. After multiple downsampling operations, the relevant features of the small target are easily covered by noise and difficult to locate, while the shallow features have rich texture information and position of the small target. information. At the same time, in order to effectively use shallow features to enhance the details of small objects and supplement the location information of small objects, an additional feature scale is added to focus on small objects, and a detection head is added to output detection results. The names of related structures are shown in Figure 3(a). The outputs of the dynamic context information extraction module and the subsequent three multi-information fusion modules (MFM) are T5, T4, T3, and T2, and the sizes of these outputs are 1/1 of the original image respectively. 32, 1/16, 1/8, 1/4. Features of the same size connected with T5, T4, T3 are denoted as R4, R3, R2.

本方法在特征图处理到T3层时，继续将特征上采样，并于上采样后加入T2层，同时将T2层与骨干网络第二层相同大小的特征连接。提高小目标细节的表征能力，传递浅层细节信息，在T2层后接小目标检测头，来减小小目标和其他目标在同一层的特征耦合，降低小目标的漏检率，提升检测到小目标的几率，缓解尺度过大带来的精度不佳。为了与后面网络的通道对应，在T2层后加入R2层，与维度相同的T3层特征连接。In this method, when the feature map is processed to the T3 layer, the feature is continuously up-sampled, and the T2 layer is added after the up-sampling, and the T2 layer is connected to the features of the same size as the second layer of the backbone network. Improve the characterization ability of small target details, transfer shallow detail information, and connect small target detection heads after the T2 layer to reduce the feature coupling between small targets and other targets on the same layer, reduce the missed detection rate of small targets, and improve the detection accuracy. The probability of small targets alleviates the poor accuracy caused by excessive scale. In order to correspond to the channel of the subsequent network, the R2 layer is added after the T2 layer, and is connected with the T3 layer features of the same dimension.

为实现本发明所述目的，本发明进一步提出了一种基于上下文信息的端到端红外小目标检测系统，如图4所示，包括红外图像处理模块10、目标信息学习模块20、特征提取模块30、特征融合和目标检测模块40。In order to achieve the stated purpose of the present invention, the present invention further proposes an end-to-end infrared small target detection system based on context information, as shown in Figure 4, including an infrared image processing module 10, a target information learning module 20, and a feature extraction module 30. Feature fusion and object detection module 40.

其中，红外图像处理模块10用于处理用于训练小目标检测模型的红外图像数据集。该模块能够增加小目标的数量，丰富数据集的变化场景，增强模型的鲁棒性。Wherein, the infrared image processing module 10 is used for processing the infrared image data set used for training the small target detection model. This module can increase the number of small targets, enrich the changing scenarios of the dataset, and enhance the robustness of the model.

小目标信息学习模块20，用于引导小目标检测模型学习鲁棒的图像特征。该模块利用红外小目标数据集使用信息学习训练模型，输出得到训练好的小目标检测模型。The small target information learning module 20 is used to guide the small target detection model to learn robust image features. This module uses the infrared small target data set to use the information to learn and train the model, and outputs the trained small target detection model.

图像特征提取模块30，利用动态上下文信息提取模块提取图像特征中的目标周围信息和全局相关信息，并适配不同小目标的轮廓变化。在红外图像上提取出稳定干净的小目标特征，以实现精确的红外小目标检测。The image feature extraction module 30 uses the dynamic context information extraction module to extract the target surrounding information and global related information in the image features, and adapt to the contour changes of different small targets. Extract stable and clean small target features from infrared images to achieve precise infrared small target detection.

特征融合和目标检测模块40，能够将提取到的特征进行融合。从融合后的图像特征中识别提取出感兴趣目标的类别位置大小形状，得到最终的红外小目标检测结果。The feature fusion and target detection module 40 can fuse the extracted features. The category, position, size, and shape of the target of interest are identified and extracted from the fused image features, and the final infrared small target detection result is obtained.

上述模块之间的连接关系如下：The connections between the above modules are as follows:

红外图像处理模块10的输出端与小目标信息学习模块20的输入端相连。The output end of the infrared image processing module 10 is connected with the input end of the small target information learning module 20 .

小目标信息学习模块20的输出端与图像特征提取模块30的输入端相连。The output end of the small target information learning module 20 is connected to the input end of the image feature extraction module 30 .

图像特征提取模块30的输出端与特征融合和目标检测模块40的输入端相连。The output terminal of the image feature extraction module 30 is connected to the input terminal of the feature fusion and object detection module 40 .

Claims

1. a kind of infrared small target detection method based on context information, it is characterized in that, comprising the following steps:

Step 1: Obtain the infrared small target data set and perform data enhancement processing;

Step 2: Use the classification loss function, confidence loss function and position loss function to train the small target detection network;

Step 3: Use the trained small target detection network to extract features from the infrared image and obtain feature results;

First, feature extraction is performed on the image; in the process of extraction, the long-distance dependence between each information is established for the feature through the extraction of dynamic context information; the input deep feature block is flattened into a sequence and the location information is introduced, and then sent to the multi-head The attention mechanism performs weighted summation;

Then, use the residual connection to optimize the result and speed up the convergence, pass through two fully connected layers and connect the residual again; the follow-up is a layer of deformable convolution, adding a bias term while convolution;

Step 4: The extracted features are further fused, and infrared small target detection is performed on the fused features to obtain the final target detection result.

2. a kind of infrared small target detection method based on contextual information as claimed in claim 1, is characterized in that, when step 1 carries out data enhancement processing, in the image that contains infrared small target, find out and do not overlap with other targets small target, and randomly copy and paste it to other positions in the image; among them, the copied small target does not block other targets, and keeps a distance from other targets.

3. A method for detecting infrared small targets based on context information as claimed in claim 2, characterized in that, on the basis of copying and pasting small targets, other data enhancement operations are further superimposed, including rotation and translation, zooming and cutting, mosaic enhanced.

4. A kind of infrared small target detection method based on context information as claimed in claim 1, is characterized in that, in step 2, let total loss function L (x, x ') be expressed as:

Among them, x and x′ represent the predicted value and the real value respectively, α _box , α _obj , and α _cls represent the weights of the three loss functions respectively, L _CIoU , L _obj , and L _cls represent the target detection task position loss function, confidence Loss function and classification loss function; k, s ² , and B represent the number of output feature maps, grids, and positions on each grid, respectively, and I _kij represents the kth output feature map, the i-th grid, and the j-th Whether an anchor box is a positive sample, if it is a positive sample, it is 1, if it is a negative sample, it is 0; α _k is used to balance the output feature weights of different scales.

5. a kind of infrared small target detection method based on context information as claimed in claim 1, is characterized in that, in step 3, the process of dynamic context information extraction is as follows:

For the input feature F, its feature size is C×H×W, where C represents the number of channels, H represents the height, and W represents the width; given the size P of the block, divide C×H×W into N P× P×C block, P means block;

After obtaining N blocks, linearly transform them into feature vectors of N lengths, and add a flag vector x _p at the starting position of the vector;

F ₁ =E+F ₀

Among them, F ₀ represents the output vector result,

Indicates the Nth block, W _N is the weight parameter, Concat[] is the splicing operation; the final F ₀ is the output result of block embedding;

Add the location coding information E and add it to F ₀ to get F ₁ , and F ₁ represents the result after adding the location information;

F ₁ after embedding position information is multiplied by three different parameter matrices, and mapped to query matrix, queried key-value matrix and value matrix; after being processed by the attention mechanism, multiple attention results are obtained, which are used to represent the image Different context information in different context information; these attention results are stitched together and standardized to get the final summary result of context information:

head _i = Attention(F ₁ W _q ; F ₁ W _k ; F ₁ W _v )

F _M =Concat[head _i ; head _i ; head _i ; . . . ; head _i ] W _M

Among them, Attention() represents the operation of the attention mechanism, Q, K, and V represent the query matrix, the queried key-value matrix and the value matrix, respectively, T represents the transpose operation,

Represents the scaling factor; F ₁ represents the result after adding position information, W _q , W _k , W _v , W _M are learnable parameter matrices, Softmax represents the Softmax operation, head _i represents the output of multiple attention results, F _M Represents multi-head attention output features; Concat represents addition operation;

The feed-forward neural network includes two layers of fully connected layers. The multi-head attention output feature F _M after residual normalization is mapped to the high-dimensional space by the first fully connected layer, and the low-dimensional space is mapped by the second fully connected layer. , to further retain useful information, the process is:

F ₂ =F _M [0]+F ₁

X＝F ₂ W _fc1 W _fc2 +F ₁

Among them, F ₂ represents the result after the residual, F _M [0] is the flag bit vector, X is the output result, W _fc1 and W _fc2 are the weights of the two fully connected layers;

After processing the context information, the output result X dynamically adjusts the effective information through deformable convolution, and links the relationship between different small targets and context information:

Among them, Y(p ₀ ) represents the output result of deformable convolution, X and Y are the input feature map and output feature map respectively, p ₀ represents the position in the output feature, p _n represents the adjacent position, and R represents the range of real numbers; the function W() denotes the weight at p _n ; p _n is the offset value, learned by parallel convolution from the input features.

6. A kind of infrared small target detection method based on contextual information as claimed in claim 1, it is characterized in that, step 4 is in feature fusion process, utilizes channel, spatial information in multi-information fusion layer aggregation multiple features, multi-information The fusion layer performs information fusion operation in each feature scale;

Among them, the multi-information fusion module fuses the information of different layers through multiple residual structures, which consists of three parts. The first is the IC layer, which is responsible for refining the feature information, and then performs global pooling and maximum pooling from the channel level. , and organize the information through the fully connected layer of shared weights, multiply and add, and then normalize through the softmax function to obtain the extracted channel information and multiply it with the input information;

After the channel information extraction is enhanced, continue to perform global pooling and maximum pooling on each position of the image respectively, and after the addition, take the convolution superposition feature and normalize it through the softmax function to achieve the effect of enhancing the position information; finally, after Convolution, integrating channel and spatial information.

7. A kind of infrared small target detection method based on contextual information as claimed in claim 6, is characterized in that, increases a feature scale, is used for paying special attention to small object; Increases a detection head to output detection result;

Among them, the output of the dynamic context information extraction module and the subsequent three multi-information fusion modules are T5, T4, T3, and T2, and the sizes of these outputs are 1/32, 1/16, 1/8, and 1/4 of the original image, respectively. , the features of the same size connected with T5, T4, T3 are denoted as R4, R3, R2;

When the feature map is processed to the T3 layer, continue to upsample the feature, and add the T2 layer after the upsampling, and connect the T2 layer with the features of the same size as the second layer of the backbone network; connect the small target detection head after the T2 layer, Add the R2 layer after the T2 layer, and connect with the T3 layer features of the same dimension.

8. a kind of infrared small target detection system based on the context information of method described in claim 1 is characterized in that, comprises infrared image processing module (10), target information learning module (20), feature extraction module (30), Feature fusion and object detection module (40);

Wherein, the infrared image processing module (10) is used for processing the infrared image dataset used for training the small target detection model;

The small target information learning module (20) is used to guide the small target detection model to learn robust image features; this module utilizes the infrared small target data set to use the information learning training model, and outputs the trained small target detection model;

The image feature extraction module (30) uses the dynamic context information extraction module to extract target surrounding information and global related information in the image feature, and adapt to the contour changes of different small targets; extract stable and clean small target features on the infrared image;

The feature fusion and target detection module (40) fuses the extracted features, identifies and extracts the category, position, size and shape of the target of interest from the fused image features, and obtains the final infrared small target detection result;

The connections between the above modules are as follows:

The output end of the infrared image processing module (10) is connected with the input end of the small target information learning module (20);

The output end of the small target information learning module (20) is connected with the input end of the image feature extraction module (30);

The output end of the image feature extraction module (30) is connected with the input end of the feature fusion and target detection module (40).