CN116310709A

CN116310709A - Lightweight infrared target detection method based on improved PF-YOLO

Info

Publication number: CN116310709A
Application number: CN202310053253.4A
Authority: CN
Inventors: 王�琦; 李文博; 高尚; 于化龙; 崔弘杨; 陈建军
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2023-02-03
Filing date: 2023-02-03
Publication date: 2023-06-23

Abstract

The invention discloses a lightweight infrared target detection method based on improved PF-YOLO, which comprises the following steps: step 1: regenerating an anchor frame for the image, and preprocessing an infrared target in the image; step 2: extracting the characteristics of the images processed in the step 1 to obtain four types of characteristic images with different sizes; step 3: performing enhancement receptive field and enhancement target feature processing on the feature map with the minimum size in the four types of sizes; step 4: the three types of feature graphs which are not processed in the step 3 and semantic information and position information contained in the feature graphs processed in the step 3 are transmitted and fused in a two-way mode through a feature pyramid method, and two types of feature graphs with different sizes are obtained; step 5: and (3) sending the two types of feature graphs obtained in the step (4) into a YOLO Head, and performing post-processing on a result output by the YOLO Head to obtain a final infrared target detection result. The invention effectively solves the problems of lower accuracy, more parameters and poorer real-time performance in the prior art.

Description

Lightweight infrared target detection method based on improved PF-YOLO

Technical Field

The invention relates to the technical field of infrared target detection, in particular to a lightweight infrared target detection method based on improved PF-YOLO.

Background

The infrared imaging technology has the characteristics of strong penetrating power, long working distance, small weather influence, strong anti-interference capability, high measurement precision, capability of working day and night and the like, so that the image obtained based on the infrared imaging technology is subject to extensive attention in the field of scientific research, and the demand of the market for the image is increased. However, most of the current methods have higher accuracy, but are all proposed based on a general platform, and have low requirements on real-time detection. In other application fields, the method can only be deployed on an embedded platform in many cases, and has high requirements on real-time performance. Therefore, research into a light-weight infrared target detection method has become a further research hotspot in the recent academia and industry. In combination with the above, YOLOv4-Tiny is a relatively preferred choice in view of the real-time nature, accuracy, etc. of the method.

However, the inventor of the application finds that when the infrared target detection based on YOLOv4-Tiny is realized, the accuracy of directly detecting the infrared image by using the method is not ideal because the infrared image has the problems of blurring edges, serious shielding, difficult recognition of small targets and the like. Meanwhile, although YOLOv4-Tiny has more desirable performances such as parameter number, detection speed, etc. than the small models of YOLOv3, YOLOv5s, etc., it is still required to further perform a light-weight operation.

In summary, the existing method has the problems of low accuracy, more parameters and poor real-time performance.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a lightweight infrared target detection method based on improved PF-YOLO, which aims to solve the problems of lower accuracy, more parameters and poor real-time performance in the prior art.

The invention provides a lightweight infrared target detection method based on improved PF-YOLO, which comprises the following steps:

step 1: regenerating an anchor frame for the image, and preprocessing an infrared target in the image;

step 2: extracting the characteristics of the images processed in the step 1 to obtain four types of characteristic images with different sizes;

step 3: performing enhancement receptive field and enhancement target feature processing on the feature map with the minimum size in the four types of sizes;

step 4: the three types of feature graphs which are not processed in the step 3 and semantic information and position information contained in the feature graphs processed in the step 3 are transmitted and fused in a two-way mode through a feature pyramid method, and two types of feature graphs with different sizes are obtained;

step 5: and (3) sending the two types of feature graphs obtained in the step (4) into a YOLO Head, and performing post-processing on a result output by the YOLO Head to obtain a final infrared target detection result.

Further, in the step 1, an anchor frame is regenerated for the image through a K-means clustering method.

Further, in the step 1, the infrared target pretreatment process includes a mosaics enhancement process.

Further, in the step 2, the feature extraction is performed on the image processed in the step 1 by using the feature extraction network cspdarknet53_tiny as a backbone network.

Further, in the step 3, the enhancement target feature is first processed on the feature map, and then enhancement receptive field is performed.

Further, in the step 3, the target feature is enhanced through a channel attention mechanism, and then the network receptive field is enhanced through a spatial pyramid pooling method.

Further, the convolution in the spatial pyramid pooling method is a depth separable convolution.

Further, in the step 4, the feature pyramid method is an improved feature pyramid method, specifically: adding a bottom-up pyramid structure, namely adding a fusion path, in the feature pyramid method; the convolution in the feature pyramid approach is a depth separable convolution.

Further, in the step 5, the result output by the YOLO Head is processed by a soft-nms algorithm.

The invention has the beneficial effects that:

in the single-stage method with relatively good instantaneity, the method selects the YOLOv4-Tiny as the basic method so as to ensure the instantaneity of the basic method.

According to the method, the Anchor frame Anchor is regenerated by using a K-means clustering method, so that the Anchor frame is more consistent with the actual object in size.

Meanwhile, the invention adopts a comparatively different order of enhancement modules on the basis of the basic method. The output of the backbone network is sent to the visual attention mechanism SE to improve the target positioning capability, and then the output of the visual attention mechanism SE is sent to the spatial pyramid pooling module SPP to strengthen the model receptive field. And in order to meet the requirement of a lightweight method, the convolution in the SPP module is replaced by the depth separable convolution with different expansion rates.

Secondly, the P-FPN method, namely the improved feature pyramid method, is put forward by inspiring the two-way information transmission thought of the path aggregation network PANet, semantic information and position information are transmitted while depth separable convolution is used, and detection precision is improved.

When the feature map is post-processed, a soft-NMS algorithm is used for reducing the problems of missed detection and false detection caused by the occlusion between targets.

The invention effectively solves the problems of lower accuracy, more parameters and poorer real-time performance in the prior art.

Drawings

The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and should not be construed as limiting the invention in any way, in which:

FIG. 1 is a schematic diagram of a PF-YOLO network architecture in accordance with an embodiment of the present invention;

FIG. 2 is a schematic diagram of a network architecture of Yolov 4-Tiny;

FIG. 3 is a schematic diagram of a method for preprocessing images according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of the structure of a Res_Block module, which is a component of YOLOv4-Tiny

FIG. 5 is a schematic diagram of the structure of a channel attention SE Block and spatial pyramid pooling method SPP in an embodiment of the present invention;

FIG. 6 is a schematic diagram of a structure in which the conventional convolution in the spatial pyramid pooling method SPP is replaced with a depth separable convolution in accordance with an embodiment of the present invention;

FIG. 7 is a schematic diagram of a feature delivery module P-FPN according to an embodiment of the invention;

FIG. 8 is a schematic diagram illustrating an explanation of the post-processing algorithm soft-nms in an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

The invention will be further elucidated with reference to specific examples. It will be appreciated by those skilled in the art that these examples are intended to illustrate the invention and not to limit the scope of the invention, and that various equivalent modifications to the invention fall within the scope of the invention as defined in the claims appended hereto.

The invention provides a lightweight infrared target detection method based on improved PF-YOLO, and the network architecture of the invention is shown in figure 1, and comprises the following steps:

step 1: in order to enable the Anchor frame Anchor to be more consistent with the infrared target size in the data set, the invention regenerates the Anchor frame Anchor for the data set by using a K-means method. In the basic method YOLOv4-Tiny shown in fig. 2, the size of the Anchor frame Anchor is generic in view of a number of factors, and thus there is a probability that it is not the most suitable size for a particular data set. The K-means clustering algorithm divides the marked target size into a specified number of categories according to the marked target size, so that the size of the Anchor frame Anchor is more consistent with the size of an actual target, and the detection effect of the method is improved. The invention divides the target size into 6 types, and finally obtains the sizes of 6 groups of Anchor frames Anchor. In addition, when the infrared target is preprocessed, the method is added with a Mosaic image enhancement method besides preprocessing the images in the data set by using a traditional image processing mode including turning, zooming and warping. The method is a high-order version of the multi-sample data enhancement method, as shown in fig. 3, four random pictures in the data set are spliced, and then the spliced images are transmitted into a network for learning, which is equivalent to learning four pictures at a time, so that the learning efficiency is greatly improved.

Step 2: after the processing of step 1, the image is again transferred into the main dry network CSPDarkNet53_Tiny for feature extraction. CSPDarkNet53_Tiny is composed of CBL and Res_Block modules. The CBL module is a stack of three basic operations, namely Conv2D convolutional layer, BN normalization layer, leakyReLU activation layer. The Res_Block module consists of a CBL module, a residual network and MaxPool2d maximum pooling. As shown in fig. 4, in the res_block module, the feature extraction network is first divided into channels, the feature layer channel output after 3×3 convolution is divided into two parts, the first part is reserved, the second part is taken to enter the residual network, and finally the result of the second part and the first part are re-processed

One fusion. And finally, four feature graphs with different sizes are output after the physical sign extraction of the backbone network.

Step 3: and 2, taking the feature map with the smallest size output in the step 2, firstly using a channel attention mechanism SE Block for the feature map, and obtaining the enhanced feature map. As shown in the left half of fig. 5, SE Block consists of a global average pooling layer, a fully connected layer using a ReLU activation function, a fully connected layer using a Sigmoid activation function. It can model the interdependence relation between the characteristic diagram channels to enhance the characteristic expression capability. Meanwhile, the infrared targets in the data set used by the method have the problems of fuzzy contours, low contrast and the like, and then the SPP space pyramid pooling method is introduced to process the feature images output by the SE Block so as to fuse the features with different scales, enhance the network receptive field and improve the model positioning capability. As shown in the right half of fig. 5, the frame is approximately: and (3) inputting an image, extracting features by a convolution layer, and extracting fixed-size features by a space pyramid pooling layer. According to the invention, the characteristic map is pooled by selecting windows with the sizes of 1,5,9 and 13 respectively. In order to improve the real-time performance of the method and reduce the weight of the method, the convolution in the SPP module is replaced by a depth separable convolution, and the expansion rate of the main convolution is set to 0,1,3,5, as shown in fig. 6.

Step 4: according to the idea of feature information transmission of the bidirectional aggregation network PANet, the invention improves a feature pyramid FPN module in the YOLOv4-Tiny, adds a pyramid structure from bottom to top before the original structure, and replaces standard convolution in the structure by using depth separable convolution, as shown in figure 7. In the invention, the three types of feature images which are not processed in the step 2 and the semantic information and the position information contained in the feature images processed in the step 3 are transmitted and fused in a two-way manner, so that two types of feature images with different sizes are obtained.

Step 5: and (3) sending the output obtained in the step (4) into a YOLO Head, outputting tensor with ideal size, and generating all detection frames on the whole feature map according to the Anchor frame Anchor size set in the step (1), so that post-processing operation is facilitated. Meanwhile, the original nms non-maximum suppression algorithm is changed into a soft-nms algorithm, so that the post-processing of YOLO Head output is realized, and a final infrared target detection result is obtained. When the nms non-maximum value suppression algorithm performs post-processing, the detection frames are ordered according to scores, the frame with the highest score is reserved, and other frames with the overlapping area larger than a certain proportion with the frame are deleted. Such an algorithm may erroneously delete a neighboring object detection frame, and is not suitable for object detection in a relatively dense object arrangement portion. In the dataset, the infrared targets in partial images are mutually shielded and overlapped, and in order to alleviate the influence caused by the problems, the nms algorithm is changed into a soft-nms algorithm. The method is characterized in that firstly, the frames with highest scores are reserved according to the score ranking of the detection frames, and the scores of the frames with the overlapping areas larger than a certain proportion are reduced. I.e. setting an attenuation function for adjacent detection frames based on the size of the overlap instead of setting its fraction to zero completely. Therefore, soft-nms can alleviate the problem of false erasure caused by shielding to a certain extent. Taking the case shown in fig. 8 as an example, the overlapping area of the detection frame 1 and the detection frame 2 is large, if the conventional nms algorithm is used, the possibility of deleting the detection frame 2 by mistake occurs when detecting the target in the detection frame 1, and soft-nms reduces the score of the detection frame 2 instead of deleting it directly, so that the detection frame still exists in the candidate frame when detecting the target in the detection frame 2.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations are within the scope of the invention as defined by the appended claims.

Claims

1. The lightweight infrared target detection method based on the improved PF-YOLO is characterized by comprising the following steps of:

2. The method for detecting a lightweight infrared target based on improved PF-YOLO as claimed in claim 1, wherein in the step 1, an anchor frame is regenerated for the image by a K-means clustering method.

3. The method for lightweight infrared target detection based on modified PF-YOLO as claimed in claim 1 or 2, wherein the infrared target pretreatment process in step 1 includes a Mosaic enhancement process.

4. The method for detecting a lightweight infrared target based on improved PF-YOLO as claimed in claim 1, wherein in the step 2, feature extraction is performed on the image processed in the step 1 by using a feature extraction network cspdarknet53_tiny as a backbone network.

5. The method for detecting a lightweight infrared target based on improved PF-YOLO as claimed in claim 1, wherein in the step 3, the feature map is first subjected to enhanced target feature processing, and then enhanced receptive field is performed.

6. The method for lightweight infrared target detection based on improved PF-YOLO of claim 1 or 5, wherein in step 3, the target feature is enhanced by channel attention mechanism, and then the network receptive field is enhanced by spatial pyramid pooling method.

7. The improved PF-YOLO based lightweight infrared target detection method of claim 6 wherein the convolutions in the spatial pyramid pooling method are depth separable convolutions.

8. The lightweight infrared target detection method based on improved PF-YOLO of claim 1, wherein in step 4, the feature pyramid method is an improved feature pyramid method, specifically: adding a bottom-up pyramid structure, namely adding a fusion path, in the feature pyramid method; the convolution in the feature pyramid approach is a depth separable convolution.

9. The method for lightweight infrared target detection based on improved PF-YOLO of claim 1, wherein in step 5, the result of YOLO Head output is processed by soft-nms algorithm.