CN116310709A - Lightweight infrared target detection method based on improved PF-YOLO - Google Patents

Lightweight infrared target detection method based on improved PF-YOLO Download PDF

Info

Publication number
CN116310709A
CN116310709A CN202310053253.4A CN202310053253A CN116310709A CN 116310709 A CN116310709 A CN 116310709A CN 202310053253 A CN202310053253 A CN 202310053253A CN 116310709 A CN116310709 A CN 116310709A
Authority
CN
China
Prior art keywords
feature
yolo
infrared target
improved
types
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310053253.4A
Other languages
Chinese (zh)
Inventor
王�琦
李文博
高尚
于化龙
崔弘杨
陈建军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Science and Technology
Original Assignee
Jiangsu University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Science and Technology filed Critical Jiangsu University of Science and Technology
Priority to CN202310053253.4A priority Critical patent/CN116310709A/en
Publication of CN116310709A publication Critical patent/CN116310709A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/16Image acquisition using multiple overlapping images; Image stitching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Aiming, Guidance, Guns With A Light Source, Armor, Camouflage, And Targets (AREA)

Abstract

The invention discloses a lightweight infrared target detection method based on improved PF-YOLO, which comprises the following steps: step 1: regenerating an anchor frame for the image, and preprocessing an infrared target in the image; step 2: extracting the characteristics of the images processed in the step 1 to obtain four types of characteristic images with different sizes; step 3: performing enhancement receptive field and enhancement target feature processing on the feature map with the minimum size in the four types of sizes; step 4: the three types of feature graphs which are not processed in the step 3 and semantic information and position information contained in the feature graphs processed in the step 3 are transmitted and fused in a two-way mode through a feature pyramid method, and two types of feature graphs with different sizes are obtained; step 5: and (3) sending the two types of feature graphs obtained in the step (4) into a YOLO Head, and performing post-processing on a result output by the YOLO Head to obtain a final infrared target detection result. The invention effectively solves the problems of lower accuracy, more parameters and poorer real-time performance in the prior art.

Description

Lightweight infrared target detection method based on improved PF-YOLO
Technical Field
The invention relates to the technical field of infrared target detection, in particular to a lightweight infrared target detection method based on improved PF-YOLO.
Background
The infrared imaging technology has the characteristics of strong penetrating power, long working distance, small weather influence, strong anti-interference capability, high measurement precision, capability of working day and night and the like, so that the image obtained based on the infrared imaging technology is subject to extensive attention in the field of scientific research, and the demand of the market for the image is increased. However, most of the current methods have higher accuracy, but are all proposed based on a general platform, and have low requirements on real-time detection. In other application fields, the method can only be deployed on an embedded platform in many cases, and has high requirements on real-time performance. Therefore, research into a light-weight infrared target detection method has become a further research hotspot in the recent academia and industry. In combination with the above, YOLOv4-Tiny is a relatively preferred choice in view of the real-time nature, accuracy, etc. of the method.
However, the inventor of the application finds that when the infrared target detection based on YOLOv4-Tiny is realized, the accuracy of directly detecting the infrared image by using the method is not ideal because the infrared image has the problems of blurring edges, serious shielding, difficult recognition of small targets and the like. Meanwhile, although YOLOv4-Tiny has more desirable performances such as parameter number, detection speed, etc. than the small models of YOLOv3, YOLOv5s, etc., it is still required to further perform a light-weight operation.
In summary, the existing method has the problems of low accuracy, more parameters and poor real-time performance.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a lightweight infrared target detection method based on improved PF-YOLO, which aims to solve the problems of lower accuracy, more parameters and poor real-time performance in the prior art.
The invention provides a lightweight infrared target detection method based on improved PF-YOLO, which comprises the following steps:
step 1: regenerating an anchor frame for the image, and preprocessing an infrared target in the image;
step 2: extracting the characteristics of the images processed in the step 1 to obtain four types of characteristic images with different sizes;
step 3: performing enhancement receptive field and enhancement target feature processing on the feature map with the minimum size in the four types of sizes;
step 4: the three types of feature graphs which are not processed in the step 3 and semantic information and position information contained in the feature graphs processed in the step 3 are transmitted and fused in a two-way mode through a feature pyramid method, and two types of feature graphs with different sizes are obtained;
step 5: and (3) sending the two types of feature graphs obtained in the step (4) into a YOLO Head, and performing post-processing on a result output by the YOLO Head to obtain a final infrared target detection result.
Further, in the step 1, an anchor frame is regenerated for the image through a K-means clustering method.
Further, in the step 1, the infrared target pretreatment process includes a mosaics enhancement process.
Further, in the step 2, the feature extraction is performed on the image processed in the step 1 by using the feature extraction network cspdarknet53_tiny as a backbone network.
Further, in the step 3, the enhancement target feature is first processed on the feature map, and then enhancement receptive field is performed.
Further, in the step 3, the target feature is enhanced through a channel attention mechanism, and then the network receptive field is enhanced through a spatial pyramid pooling method.
Further, the convolution in the spatial pyramid pooling method is a depth separable convolution.
Further, in the step 4, the feature pyramid method is an improved feature pyramid method, specifically: adding a bottom-up pyramid structure, namely adding a fusion path, in the feature pyramid method; the convolution in the feature pyramid approach is a depth separable convolution.
Further, in the step 5, the result output by the YOLO Head is processed by a soft-nms algorithm.
The invention has the beneficial effects that:
in the single-stage method with relatively good instantaneity, the method selects the YOLOv4-Tiny as the basic method so as to ensure the instantaneity of the basic method.
According to the method, the Anchor frame Anchor is regenerated by using a K-means clustering method, so that the Anchor frame is more consistent with the actual object in size.
Meanwhile, the invention adopts a comparatively different order of enhancement modules on the basis of the basic method. The output of the backbone network is sent to the visual attention mechanism SE to improve the target positioning capability, and then the output of the visual attention mechanism SE is sent to the spatial pyramid pooling module SPP to strengthen the model receptive field. And in order to meet the requirement of a lightweight method, the convolution in the SPP module is replaced by the depth separable convolution with different expansion rates.
Secondly, the P-FPN method, namely the improved feature pyramid method, is put forward by inspiring the two-way information transmission thought of the path aggregation network PANet, semantic information and position information are transmitted while depth separable convolution is used, and detection precision is improved.
When the feature map is post-processed, a soft-NMS algorithm is used for reducing the problems of missed detection and false detection caused by the occlusion between targets.
The invention effectively solves the problems of lower accuracy, more parameters and poorer real-time performance in the prior art.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and should not be construed as limiting the invention in any way, in which:
FIG. 1 is a schematic diagram of a PF-YOLO network architecture in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram of a network architecture of Yolov 4-Tiny;
FIG. 3 is a schematic diagram of a method for preprocessing images according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the structure of a Res_Block module, which is a component of YOLOv4-Tiny
FIG. 5 is a schematic diagram of the structure of a channel attention SE Block and spatial pyramid pooling method SPP in an embodiment of the present invention;
FIG. 6 is a schematic diagram of a structure in which the conventional convolution in the spatial pyramid pooling method SPP is replaced with a depth separable convolution in accordance with an embodiment of the present invention;
FIG. 7 is a schematic diagram of a feature delivery module P-FPN according to an embodiment of the invention;
FIG. 8 is a schematic diagram illustrating an explanation of the post-processing algorithm soft-nms in an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The invention will be further elucidated with reference to specific examples. It will be appreciated by those skilled in the art that these examples are intended to illustrate the invention and not to limit the scope of the invention, and that various equivalent modifications to the invention fall within the scope of the invention as defined in the claims appended hereto.
The invention provides a lightweight infrared target detection method based on improved PF-YOLO, and the network architecture of the invention is shown in figure 1, and comprises the following steps:
step 1: in order to enable the Anchor frame Anchor to be more consistent with the infrared target size in the data set, the invention regenerates the Anchor frame Anchor for the data set by using a K-means method. In the basic method YOLOv4-Tiny shown in fig. 2, the size of the Anchor frame Anchor is generic in view of a number of factors, and thus there is a probability that it is not the most suitable size for a particular data set. The K-means clustering algorithm divides the marked target size into a specified number of categories according to the marked target size, so that the size of the Anchor frame Anchor is more consistent with the size of an actual target, and the detection effect of the method is improved. The invention divides the target size into 6 types, and finally obtains the sizes of 6 groups of Anchor frames Anchor. In addition, when the infrared target is preprocessed, the method is added with a Mosaic image enhancement method besides preprocessing the images in the data set by using a traditional image processing mode including turning, zooming and warping. The method is a high-order version of the multi-sample data enhancement method, as shown in fig. 3, four random pictures in the data set are spliced, and then the spliced images are transmitted into a network for learning, which is equivalent to learning four pictures at a time, so that the learning efficiency is greatly improved.
Step 2: after the processing of step 1, the image is again transferred into the main dry network CSPDarkNet53_Tiny for feature extraction. CSPDarkNet53_Tiny is composed of CBL and Res_Block modules. The CBL module is a stack of three basic operations, namely Conv2D convolutional layer, BN normalization layer, leakyReLU activation layer. The Res_Block module consists of a CBL module, a residual network and MaxPool2d maximum pooling. As shown in fig. 4, in the res_block module, the feature extraction network is first divided into channels, the feature layer channel output after 3×3 convolution is divided into two parts, the first part is reserved, the second part is taken to enter the residual network, and finally the result of the second part and the first part are re-processed
One fusion. And finally, four feature graphs with different sizes are output after the physical sign extraction of the backbone network.
Step 3: and 2, taking the feature map with the smallest size output in the step 2, firstly using a channel attention mechanism SE Block for the feature map, and obtaining the enhanced feature map. As shown in the left half of fig. 5, SE Block consists of a global average pooling layer, a fully connected layer using a ReLU activation function, a fully connected layer using a Sigmoid activation function. It can model the interdependence relation between the characteristic diagram channels to enhance the characteristic expression capability. Meanwhile, the infrared targets in the data set used by the method have the problems of fuzzy contours, low contrast and the like, and then the SPP space pyramid pooling method is introduced to process the feature images output by the SE Block so as to fuse the features with different scales, enhance the network receptive field and improve the model positioning capability. As shown in the right half of fig. 5, the frame is approximately: and (3) inputting an image, extracting features by a convolution layer, and extracting fixed-size features by a space pyramid pooling layer. According to the invention, the characteristic map is pooled by selecting windows with the sizes of 1,5,9 and 13 respectively. In order to improve the real-time performance of the method and reduce the weight of the method, the convolution in the SPP module is replaced by a depth separable convolution, and the expansion rate of the main convolution is set to 0,1,3,5, as shown in fig. 6.
Step 4: according to the idea of feature information transmission of the bidirectional aggregation network PANet, the invention improves a feature pyramid FPN module in the YOLOv4-Tiny, adds a pyramid structure from bottom to top before the original structure, and replaces standard convolution in the structure by using depth separable convolution, as shown in figure 7. In the invention, the three types of feature images which are not processed in the step 2 and the semantic information and the position information contained in the feature images processed in the step 3 are transmitted and fused in a two-way manner, so that two types of feature images with different sizes are obtained.
Step 5: and (3) sending the output obtained in the step (4) into a YOLO Head, outputting tensor with ideal size, and generating all detection frames on the whole feature map according to the Anchor frame Anchor size set in the step (1), so that post-processing operation is facilitated. Meanwhile, the original nms non-maximum suppression algorithm is changed into a soft-nms algorithm, so that the post-processing of YOLO Head output is realized, and a final infrared target detection result is obtained. When the nms non-maximum value suppression algorithm performs post-processing, the detection frames are ordered according to scores, the frame with the highest score is reserved, and other frames with the overlapping area larger than a certain proportion with the frame are deleted. Such an algorithm may erroneously delete a neighboring object detection frame, and is not suitable for object detection in a relatively dense object arrangement portion. In the dataset, the infrared targets in partial images are mutually shielded and overlapped, and in order to alleviate the influence caused by the problems, the nms algorithm is changed into a soft-nms algorithm. The method is characterized in that firstly, the frames with highest scores are reserved according to the score ranking of the detection frames, and the scores of the frames with the overlapping areas larger than a certain proportion are reduced. I.e. setting an attenuation function for adjacent detection frames based on the size of the overlap instead of setting its fraction to zero completely. Therefore, soft-nms can alleviate the problem of false erasure caused by shielding to a certain extent. Taking the case shown in fig. 8 as an example, the overlapping area of the detection frame 1 and the detection frame 2 is large, if the conventional nms algorithm is used, the possibility of deleting the detection frame 2 by mistake occurs when detecting the target in the detection frame 1, and soft-nms reduces the score of the detection frame 2 instead of deleting it directly, so that the detection frame still exists in the candidate frame when detecting the target in the detection frame 2.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations are within the scope of the invention as defined by the appended claims.

Claims (9)

1. The lightweight infrared target detection method based on the improved PF-YOLO is characterized by comprising the following steps of:
step 1: regenerating an anchor frame for the image, and preprocessing an infrared target in the image;
step 2: extracting the characteristics of the images processed in the step 1 to obtain four types of characteristic images with different sizes;
step 3: performing enhancement receptive field and enhancement target feature processing on the feature map with the minimum size in the four types of sizes;
step 4: the three types of feature graphs which are not processed in the step 3 and semantic information and position information contained in the feature graphs processed in the step 3 are transmitted and fused in a two-way mode through a feature pyramid method, and two types of feature graphs with different sizes are obtained;
step 5: and (3) sending the two types of feature graphs obtained in the step (4) into a YOLO Head, and performing post-processing on a result output by the YOLO Head to obtain a final infrared target detection result.
2. The method for detecting a lightweight infrared target based on improved PF-YOLO as claimed in claim 1, wherein in the step 1, an anchor frame is regenerated for the image by a K-means clustering method.
3. The method for lightweight infrared target detection based on modified PF-YOLO as claimed in claim 1 or 2, wherein the infrared target pretreatment process in step 1 includes a Mosaic enhancement process.
4. The method for detecting a lightweight infrared target based on improved PF-YOLO as claimed in claim 1, wherein in the step 2, feature extraction is performed on the image processed in the step 1 by using a feature extraction network cspdarknet53_tiny as a backbone network.
5. The method for detecting a lightweight infrared target based on improved PF-YOLO as claimed in claim 1, wherein in the step 3, the feature map is first subjected to enhanced target feature processing, and then enhanced receptive field is performed.
6. The method for lightweight infrared target detection based on improved PF-YOLO of claim 1 or 5, wherein in step 3, the target feature is enhanced by channel attention mechanism, and then the network receptive field is enhanced by spatial pyramid pooling method.
7. The improved PF-YOLO based lightweight infrared target detection method of claim 6 wherein the convolutions in the spatial pyramid pooling method are depth separable convolutions.
8. The lightweight infrared target detection method based on improved PF-YOLO of claim 1, wherein in step 4, the feature pyramid method is an improved feature pyramid method, specifically: adding a bottom-up pyramid structure, namely adding a fusion path, in the feature pyramid method; the convolution in the feature pyramid approach is a depth separable convolution.
9. The method for lightweight infrared target detection based on improved PF-YOLO of claim 1, wherein in step 5, the result of YOLO Head output is processed by soft-nms algorithm.
CN202310053253.4A 2023-02-03 2023-02-03 Lightweight infrared target detection method based on improved PF-YOLO Pending CN116310709A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310053253.4A CN116310709A (en) 2023-02-03 2023-02-03 Lightweight infrared target detection method based on improved PF-YOLO

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310053253.4A CN116310709A (en) 2023-02-03 2023-02-03 Lightweight infrared target detection method based on improved PF-YOLO

Publications (1)

Publication Number Publication Date
CN116310709A true CN116310709A (en) 2023-06-23

Family

ID=86798610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310053253.4A Pending CN116310709A (en) 2023-02-03 2023-02-03 Lightweight infrared target detection method based on improved PF-YOLO

Country Status (1)

Country Link
CN (1) CN116310709A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114120019A (en) * 2021-11-08 2022-03-01 贵州大学 Lightweight target detection method
CN115424104A (en) * 2022-08-19 2022-12-02 西安电子科技大学 Target detection method based on feature fusion and attention mechanism
CN115588112A (en) * 2022-09-06 2023-01-10 江苏科技大学 Target detection method based on RFEF-YOLO

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114120019A (en) * 2021-11-08 2022-03-01 贵州大学 Lightweight target detection method
CN115424104A (en) * 2022-08-19 2022-12-02 西安电子科技大学 Target detection method based on feature fusion and attention mechanism
CN115588112A (en) * 2022-09-06 2023-01-10 江苏科技大学 Target detection method based on RFEF-YOLO

Similar Documents

Publication Publication Date Title
CN110533084B (en) Multi-scale target detection method based on self-attention mechanism
CN113052210B (en) Rapid low-light target detection method based on convolutional neural network
CN111368754B (en) Airport runway foreign matter detection method based on global context information
CN112906718A (en) Multi-target detection method based on convolutional neural network
CN116485709A (en) Bridge concrete crack detection method based on YOLOv5 improved algorithm
Mao et al. Automatic image detection of multi-type surface defects on wind turbine blades based on cascade deep learning network
CN112149526B (en) Lane line detection method and system based on long-distance information fusion
Han et al. A method based on multi-convolution layers joint and generative adversarial networks for vehicle detection
CN115965827A (en) Lightweight small target detection method and device integrating multi-scale features
CN112861840A (en) Complex scene character recognition method and system based on multi-feature fusion convolutional network
CN111915558A (en) Pin state detection method for high-voltage transmission line
CN112785610B (en) Lane line semantic segmentation method integrating low-level features
CN111027542A (en) Target detection method improved based on fast RCNN algorithm
CN114359167A (en) Insulator defect detection method based on lightweight YOLOv4 in complex scene
CN111881803B (en) Face recognition method based on improved YOLOv3
Sun et al. UAV image detection algorithm based on improved YOLOv5
CN113361475A (en) Multi-spectral pedestrian detection method based on multi-stage feature fusion information multiplexing
CN111950476A (en) Deep learning-based automatic river channel ship identification method in complex environment
CN117173547A (en) Underwater target detection method based on improved YOLOv6 algorithm
CN117218545A (en) LBP feature and improved Yolov 5-based radar image detection method
CN117197438A (en) Target detection method based on visual saliency
CN111612803A (en) Vehicle image semantic segmentation method based on image definition
CN117132531A (en) Lightweight-based YOLOv5 insulator defect detection method
CN116310709A (en) Lightweight infrared target detection method based on improved PF-YOLO
CN114495160A (en) Pedestrian detection method and system based on improved RFBNet algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination