CN114842235A

CN114842235A - Infrared dim and small target identification method based on shape prior segmentation and multi-scale feature aggregation

Info

Publication number: CN114842235A
Application number: CN202210284099.7A
Authority: CN
Inventors: 秦翰林; 欧洪璇; 延翔; 罗国慧; 张昱赓; 孙鹏; 陈嘉欣; 冯冬竹
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-03-22
Filing date: 2022-03-22
Publication date: 2022-08-02
Anticipated expiration: 2042-03-22
Also published as: CN114842235B

Abstract

The invention discloses an infrared dim target identification method based on shape prior segmentation and multi-scale feature aggregation, which comprises the following steps: performing Gaussian filtering operation on the input infrared original image to enhance dim weak and small targets; carrying out shape prior-based segmentation on the infrared image subjected to Gaussian filtering to obtain a target candidate region; cutting the target candidate area and inputting the target candidate area into a multi-scale feature extraction module to obtain feature representation of the small target; inputting the feature representation of the small target into a feature aggregation network to obtain a tensor spliced image; and carrying out batch normalization processing and nonlinear transformation on the image after tensor splicing, and outputting a target classification result through Softmax. The shape prior-based segmentation module fully utilizes prior information of the weak and small targets to obtain suspicious target areas, reduces the overall parameters to improve the algorithm efficiency, and the multi-scale feature extraction and aggregation module realizes a sufficient number of feature channels for the weak and small targets so as to ensure the detectability of the weak and small targets.

Description

Infrared dim and small target identification method based on shape prior segmentation and multi-scale feature aggregation

Technical Field

The invention belongs to the technical field of infrared target detection, and particularly relates to an infrared small and weak target identification method based on shape prior segmentation and multi-scale feature aggregation.

Background

The infrared imaging detection technology can be used for detecting and tracking the unmanned aerial vehicle, and is a technical means for effectively monitoring the unmanned aerial vehicle. However, in an actual scene, due to influences caused by remote imaging and atmospheric radiation interference, the target signal-to-noise ratio is low, the number of pixel points is small, no shape texture and structural information exist, and interference of complex background noise and random noise is easily caused, so that the detection accuracy and the detection efficiency cannot be balanced by a conventional target detection and identification algorithm.

In order to solve the problem of infrared weak and small target detection, two types of methods are mainly used at present: single frame based and multi-frame based. Because the multi-frame detection algorithm generally performs segmentation processing based on prior information such as the shape of a small target, the continuity of a gray level motion track and the like, more time is consumed than that consumed by a single-frame detection algorithm, so that the multi-frame detection algorithm is not suitable for real-time application. This report mainly studies single frame detection algorithms.

Existing single frame detection methods are broadly classified into traditional and neural network-based detection algorithms. Most of traditional detection methods are based on target priori knowledge, the target contrast is improved by inhibiting the background and enhancing the target, the target is extracted by self-adaptive threshold segmentation, and the false alarm rate of the traditional algorithm detection effect is high due to the existence of noise and the lack of robustness characteristics. Most of deep learning detection methods obtain a candidate region through an Anchor mechanism, then share parameters, and unify classification and regression together to obtain a detection result, but the method is mainly directed to general target detection and still has an unsatisfactory effect on detection of weak and small targets with low signal-to-noise ratio and extremely small pixels.

Fan et al in the literature "Zunlin, Fan, Duyan, et al. dim in-less image enhancement on a proportional neural network [ J ]. neuro-output, 2018, 272(Jan.10): 396-404." for the target blur and background complexity problem caused by the long shooting distance that is common in the current infrared imaging system, a convolutional neural network enhancement method for suppressing the background clutter while enhancing the small target is proposed, the handwritten characters in MNIST data set are used to simulate the difference between the foreground and the background of the infrared image, the small target and the background are predicted, and the contrast of the weak infrared image in which the background clutter is embedded into the small target is improved. However, the above method does not consider the influence of noise existing in the imaging system itself on the detection, so the document "Deng Q, Lu H, Tao H, et al, Multi-scale connected neural networks for space-included point objects discrimination [ J ]. IEEE Access,2019: 1-1" by Deng et al at the national defense science and technology university proposes a multi-scale convolutional neural network considering that the existence of noise in the infrared detection system and the lack of robustness characteristics make the detection of space objects difficult to process in a long observation distance. The network structure provided by the method consists of 3 parts of transformation, partial convolution and full convolution, small target training data is generated by an infrared radiation model, and the inherent properties of the small and medium targets in different scenes are considered. The method can improve the performance of the system and has stronger robustness on the noise of the detection system. However, most of the current detection methods for deep learning regard the problem of infrared weak and small target detection as a problem of two-classification or significance detection, and the detection accuracy for the weak and small targets is still not ideal.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides an infrared dim target identification method based on shape prior segmentation and multi-scale feature aggregation. The technical problem to be solved by the invention is realized by the following technical scheme:

an infrared dim target identification method based on shape prior segmentation and multi-scale feature aggregation is characterized by comprising the following steps:

performing Gaussian filtering operation on the input infrared original image to enhance dim weak and small targets;

carrying out shape prior-based segmentation on the infrared image subjected to Gaussian filtering to obtain a target candidate region;

cutting the target candidate region and inputting the target candidate region into a multi-scale feature extraction module to obtain feature representation of a small target;

inputting the feature representation of the small target into a feature aggregation network to obtain a tensor spliced image;

and carrying out batch normalization processing and nonlinear transformation on the image after tensor splicing, and outputting a target classification result through Softmax.

In an embodiment of the present invention, the S1 includes:

s11: preprocessing the infrared original image by adopting a 3 x 3 Gaussian kernel template;

s12: gaussian filtering operation is performed using the imfilter () function.

In one embodiment of the present invention, the enhanced image after performing the gaussian filtering operation is represented as:

wherein, (x, y) is the coordinate of a certain pixel in the infrared original image, I represents the infrared original image, and G represents a gaussian kernel.

In an embodiment of the present invention, the S2 includes:

s21: processing the infrared image after Gaussian filtering by using a segmentation algorithm based on shape prior to eliminate a large-size continuous background area and high-energy noise with a preset pixel number;

s22: selecting a point region with the aspect ratio not more than 2 for the image excluding the large-size continuous background region and the high-energy noise with the preset number of pixels to fit the suspicious target, and segmenting the region around the fitted boundary to exclude the strip-shaped edge region to obtain the target candidate region.

In an embodiment of the present invention, the S21 includes:

s211: and the prior shape information of the weak and small target is fused into an energy function so as to eliminate large-size continuous background areas and high-energy noise with preset pixel quantity.

In an embodiment of the present invention, the S211 includes:

s2111: combining the energies of the shape priors into an energy function;

s2112: and eliminating high-energy noise of a large-size continuous background area and a preset pixel number through the energy function, the energy of the area item and the energy of the boundary item.

In one embodiment of the present invention, the expression of the energy function is:

E(L)＝R(L)+B(L)+E _shape ；

where R (L) is the region term, B (L) is the boundary term, and Escape is the shape prior term;

the energy expression of the region term is as follows:

E(L)＝αR(L)+B(L)；

the energy expression of the boundary term is as follows:

wherein, alpha is a relative importance factor between the region item and the boundary item; r _p (l _p ) Is to label l _p The weight assigned to pixel p.

In an embodiment of the present invention, the S3 includes:

s31: the gaussian filtered image is processed using five-size convolution kernels to obtain a feature representation of the small target.

In an embodiment of the present invention, the S31 includes:

s311: convolution kernels of five sizes, 3 × 3, 5 × 5, 7 × 7, 9 × 9 and 11 × 11, are used, and the number of the convolution kernels is 1, 2, 3, 4 and 5 respectively;

s312: the five groups of convolution kernels correspond to targets with the sizes of 1 × 1, 3 × 3, 5 × 5, 7 × 7 and 9 × 9 respectively, the numbers of the five groups of convolution kernels are 5, 4, 3, 2 and 1 respectively, and the numbers of corresponding feature mappings are 5, 4, 3, 2 and 1, so that the 15 feature mappings and the infrared original image form feature mappings of 16 channels;

s313: and serially connecting the image formed by the five groups of convolution kernels and the feature map, and inputting the image and the feature map into an intermediate maximum pooling layer, wherein the kernel size and the step size of the maximum pooling layer are set to be 2 so as to obtain the mapping features of 32 channels, and the mapping features of the 32 channels are the feature representation of the small target.

In an embodiment of the present invention, the S4 includes:

s41: performing convolution and pooling on the feature representation of the small target to obtain a feature map of the middle layer;

s42: downsampling the feature representation of the small target, and decomposing pixels at the same position in downsampling into 4 sub-images;

s43: and carrying out tensor splicing on the feature map of the middle layer and the 4 sub-maps.

The invention has the beneficial effects that:

the segmentation module based on the shape prior fully utilizes the prior information of the weak and small targets to obtain suspicious target areas, reduces the global parameter number to improve the algorithm efficiency, and the multi-scale feature extraction and aggregation module realizes a sufficient number of feature channels for the weak and small targets, thereby ensuring the detectability of the weak and small targets and obviously reducing the false alarm rate under higher recall rate.

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Drawings

Fig. 1 is a schematic flow chart of an infrared small and weak target identification method based on shape prior segmentation and multi-scale feature aggregation according to an embodiment of the present invention;

fig. 2 is a structural diagram of an infrared small and weak target identification method based on shape prior segmentation and multi-scale feature aggregation according to an embodiment of the present invention.

Detailed Description

In order to further explain the technical means and effects of the present invention adopted to achieve the predetermined invention purpose, the following detailed description is made with reference to the accompanying drawings and the detailed description, for an infrared small and weak target identification method based on shape prior segmentation and multi-scale feature aggregation according to the present invention.

The foregoing and other technical matters, features and effects of the present invention will be apparent from the following detailed description of the embodiments, which is to be read in connection with the accompanying drawings. The technical means and effects of the present invention adopted to achieve the predetermined purpose can be more deeply and specifically understood through the description of the specific embodiments, however, the attached drawings are provided for reference and description only and are not used for limiting the technical scheme of the present invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or device that comprises a list of elements does not include only those elements but may include other elements not expressly listed. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of additional like elements in the article or device comprising the element.

The present invention will be described in further detail with reference to specific examples, but the embodiments of the present invention are not limited thereto.

Example one

Referring to fig. 1, fig. 1 is a schematic flow chart of an infrared dim target identification method based on shape prior segmentation and multi-scale feature aggregation according to an embodiment of the present invention, and fig. 2 is a structural diagram of an infrared dim target identification method based on shape prior segmentation and multi-scale feature aggregation according to an embodiment of the present invention, which is combined with fig. 1 and fig. 2, and includes:

s1: and performing a Gaussian filtering operation on the input infrared original image to enhance dim weak and small targets.

Specifically, step S1 includes:

s12: gaussian filtering operation is performed using the imfilter () function.

Firstly, preprocessing an input infrared original image by adopting a 3 x 3 Gaussian kernel template, and executing Gaussian filtering operation by using an imfilter () function, namely, performing a weighting and averaging process on the values of the pixel points of the whole image and the pixel points in the neighborhood of the pixel points, effectively inhibiting a background area, enhancing a weak target to be detected, and facilitating subsequent detection and identification. The gaussian filtered enhanced image is represented as:

wherein, (x, y) is the coordinate of a certain pixel in the infrared original image, I represents the infrared original image, and G represents the gaussian kernel.

S2: carrying out shape prior-based segmentation on the infrared image subjected to Gaussian filtering to obtain a target candidate region;

specifically, step S2 includes:

specifically, step S21 includes:

Specifically, step S211 includes:

s2111: combining the energies of the shape priors into an energy function;

s2112: and eliminating high-energy noise of large-size continuous background areas and preset pixel quantity through the energy function, the energy of the area item and the energy of the boundary item.

S22: selecting a point region with the aspect ratio not more than 2 for the image excluding the large-size continuous background region and the high-energy noise with the preset number of pixels to fit the suspicious target, and segmenting the region around the fitted boundary to exclude the strip-shaped edge region to obtain a target candidate region.

Firstly, processing the infrared image after Gaussian filtering by using a segmentation algorithm based on shape prior to eliminate a large-size continuous background area and high-energy noise with a preset number of pixels.

Specifically, there are three types of regions in the infrared image, including a large continuous background region (such as a cloud layer), a band-shaped edge region (long strip), and a small dot region (dot region), in which a weak target exists, according to the characteristic analysis of the target. And (3) integrating the prior shape information of the weak and small target into an energy function to generate a candidate region with higher recall rate, namely, excluding a large-size continuous background region and high-energy noise with preset pixel quantity.

The method includes the steps of fusing prior shape information of a weak and small target into an energy function to eliminate large-size continuous background areas and high-energy noise with preset pixel quantity, and specifically includes the following steps: based on the gray scale information of the infrared image, the edge is the euclidean distance of the gray scale values of the two pixels. Thus, a distance function φ is introduced to express the shape prior, φ (x, y) represents the minimum Euclidean distance between a point (x, y) and an edge. Since the drone target may be approximately represented by a rectangle with a threshold ratio of no greater than 2, a template is selected that constructs a rectangular shape. Expressing an energy term of the shape template by Escape, and combining the energy of the shape prior into an energy function;

E(L)＝R(L)+B(L)+E _shape ；

where R (L) is the region term, B (L) is the boundary term, and Escape is the shape prior term. The energy formula for the region and boundary terms is expressed as follows:

E(L)＝αR(L)+B(L)；

where α is the relative importance factor between the region term and the boundary term. Rp (lp) is the weight that assigns the label lp to the pixel p. By comparing the intensity of the pixel p with the intensity models (given histograms) of the target and background, the weight of rp (lp) can be obtained, i.e. when a pixel is more likely to be a target, the weight assigning the pixel to the target should be smaller, which can reduce the energy in the equation, so when all pixels are correctly assigned as target and background, the area term will be minimized. The energy representation method of shape prior is to select the approximate center pixel of the target as a reference point, set the pixel values around the center point to 255 and the other values to 0 according to the distance between the pixel and the center pixel. The new image is then considered as a shape template for the object and incorporates the energy function. In this way large size continuous background areas (clouds and buildings) and high energy noise of about a few pixels (e.g. hot pixels) can be excluded, since the value of the central part of the object is relatively larger than elsewhere in the template.

Then, selecting a point region with an aspect ratio not more than 2 for the image excluding the large-size continuous background region and the high-energy noise with the preset number of pixels to fit the suspicious target, and segmenting the region around the fitted boundary to exclude the strip-shaped edge region to obtain a target candidate region.

S3: cutting the target candidate area and inputting the target candidate area into a multi-scale feature extraction module to obtain feature representation of the small target;

specifically, step S3 includes:

Specifically, step S31 includes:

s312: the five groups of convolution kernels correspond to targets with the sizes of 1 × 1, 3 × 3, 5 × 5, 7 × 7 and 9 × 9 respectively, the number of the five groups of convolution kernels is 5, 4, 3, 2 and 1 respectively, and the number of corresponding feature mappings is 5, 4, 3, 2 and 1, so that 15 feature mappings and the infrared original image form 16-channel feature mappings;

s313: and connecting the images formed by the five groups of convolution kernels and the feature maps in series and inputting the images and the feature maps into an intermediate maximum pooling layer, wherein the kernel size and the step size of the maximum pooling layer are set to be 2 so as to obtain the mapping features of 32 channels, and the mapping features of the 32 channels are the feature representation of the small target.

The gaussian filtered image is processed using five-size convolution kernels to obtain a feature representation of the small target, i.e., multi-scale information of the small target is extracted using a multi-scale convolution kernel.

The method comprises the following steps of extracting multi-scale information of a small target by utilizing a multi-scale convolution kernel, wherein the method specifically comprises the following steps:

convolution kernels of five sizes, 3 × 3, 5 × 5, 7 × 7, 9 × 9 and 11 × 11, are used, and the number of the convolution kernels is 1, 2, 3, 4 and 5 respectively;

the 15 convolution kernels mentioned above correspond to objects with sizes of 1 × 1, 3 × 3, 5 × 5, 7 × 7 and 9 × 9 respectively, the numbers of the convolution kernels are 5, 4, 3, 2 and 1 respectively, and the numbers of the corresponding feature maps are 5, 4, 3, 2 and 1, namely, the 15 feature maps and the original image form a 16-channel feature map, and a sufficient number of feature channels are obtained for weak and small objects to ensure detectability;

connecting images formed by five groups of convolution kernels and feature maps in series and inputting the images and the feature maps into a middle maximum pooling layer, setting the kernel size and the step length of the maximum pooling layer to be 2, doubling the number of feature mappings to obtain the mapping features of 32 channels, and reducing consumed memory resources when extracting features as much as possible; the mapping features of the 32 channels are the feature representation of the small target.

S4: inputting the feature representation of the small target into a feature aggregation network to obtain a tensor spliced image;

specifically, step S4 includes:

s41: performing convolution and pooling treatment on the feature representation of the small target to obtain a feature map of the middle layer;

s42: carrying out down-sampling on the feature representation of the small target, and decomposing pixels at the same position into 4 sub-images during the down-sampling;

The number of the converted channels is changed into 4 times, and the down-sampling is 2 times, so that the feature information under higher resolution is effectively fused.

S5: and carrying out batch normalization processing and nonlinear transformation on the image after tensor splicing, and outputting a target classification result through Softmax.

The tensor-stitched image obtained in S4 is subjected to batch normalization before being subjected to nonlinear transformation. And finally, outputting the unmanned aerial vehicle target classification result by Softmax.

In the embodiment, the shape prior-based segmentation module fully utilizes prior information of the small and weak targets to obtain suspicious target regions, reduces the global parameter to improve the algorithm efficiency, and the multi-scale feature extraction and aggregation module realizes a sufficient number of feature channels for the small and weak targets, so that the detectability of the small and weak targets is ensured, and the false alarm rate can be obviously reduced under a higher recall rate.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. An infrared dim target identification method based on shape prior segmentation and multi-scale feature aggregation is characterized by comprising the following steps:

s1: performing Gaussian filtering operation on the input infrared original image to enhance dim weak and small targets;

s3: cutting the target candidate region and inputting the target candidate region into a multi-scale feature extraction module to obtain feature representation of a small target;

2. The infrared dim target recognition method based on shape prior segmentation and multi-scale feature aggregation according to claim 1, wherein the S1 includes:

s12: gaussian filtering operation is performed using the imfilter () function.

3. The method for evaluating anti-interference performance of radar according to claim 2, wherein the enhanced image after performing the gaussian filtering operation is represented as:

4. The infrared dim target recognition method based on shape prior segmentation and multi-scale feature aggregation according to claim 1, wherein the S2 includes:

5. The method for infrared weak and small target recognition based on shape prior segmentation and multi-scale feature aggregation according to claim 4, wherein the S21 includes:

6. The method for identifying infrared weak and small targets based on shape prior segmentation and multi-scale feature aggregation according to claim 5, wherein the step S211 comprises:

s2111: combining the energies of the shape priors into an energy function;

7. The method for identifying the infrared dim target based on the shape prior segmentation and the multi-scale feature aggregation according to claim 6, wherein the expression of the energy function is as follows:

E(L)＝R(L)+B(L)+E _shape ；

wherein R (L) is a region term, B (L) is a boundary term, E _shape Is a shape prior term;

the energy expression of the region term is as follows:

E(L)＝αR(L)+B(L)；

the energy expression of the boundary term is as follows:

8. The infrared dim target recognition method based on shape prior segmentation and multi-scale feature aggregation according to claim 1, wherein the S3 includes:

9. The method for infrared weak and small target recognition based on shape prior segmentation and multi-scale feature aggregation according to claim 8, wherein the S31 includes:

10. The infrared dim target recognition method based on shape prior segmentation and multi-scale feature aggregation according to claim 1, wherein the S4 includes: