CN116229217A - Infrared target detection method applied to complex environment - Google Patents

Infrared target detection method applied to complex environment Download PDF

Info

Publication number
CN116229217A
CN116229217A CN202310369678.6A CN202310369678A CN116229217A CN 116229217 A CN116229217 A CN 116229217A CN 202310369678 A CN202310369678 A CN 202310369678A CN 116229217 A CN116229217 A CN 116229217A
Authority
CN
China
Prior art keywords
feature
module
infrared
feature map
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310369678.6A
Other languages
Chinese (zh)
Inventor
王慧
虞继敏
周尚波
李舜
吴涛
张鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202310369678.6A priority Critical patent/CN116229217A/en
Publication of CN116229217A publication Critical patent/CN116229217A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Radiation Pyrometers (AREA)
  • Aiming, Guidance, Guns With A Light Source, Armor, Camouflage, And Targets (AREA)

Abstract

The invention belongs to the field of infrared target detection, and particularly relates to an infrared target detection method applied to a complex environment, which comprises the following steps: acquiring an infrared image to be detected, and preprocessing the infrared image; extracting different scale features of the infrared image by adopting a trunk feature extraction network; adopting a neck reinforcing feature extraction network to carry out reinforcing fusion treatment on the features with different scales to obtain a fusion feature map; inputting the fused effective feature map into a prediction output network to obtain a target detection result; the method effectively improves the detection precision of the infrared targets, has better detection effect on the infrared targets which are easy to be blocked in complex scenes, simultaneously remarkably reduces the quantity of parameters and meets the real-time detection requirement.

Description

Infrared target detection method applied to complex environment
Technical Field
The invention belongs to the field of infrared target detection, and particularly relates to an infrared target detection method applied to a complex environment.
Background
The infrared image is obtained by heat radiation, and has the outstanding characteristics of long target detection distance, strong concealment, availability in daytime and at night, and the like. Along with the expansion of the distance imaging range, the demand for an intelligent target detection method in an infrared image is also increasing. Conventional infrared image target detection methods include thresholding-based methods, edge detection-based methods, and the like, but such methods are only suitable for detection in a single scene. Due to the complexity of the real environment and the weak characteristics of the infrared target, accurate detection of the target becomes difficult, so that important characteristics of the detection model, especially for some targets shielded by obstacles, are difficult to extract, and the practicability is poor. The detection method based on the convolutional neural network can automatically learn features from input data, has robustness to changes of complex environments, and is high in adaptability.
At present, the existing infrared target detection method comprises a method for detecting an infrared target in a complex scene, which comprises the steps of performing Mosaic data enhancement on an input infrared image, wherein the patent application number of the method is CN202210207336. X; optimizing and improving the structure of a feature extraction network CSPDarknet53, and adding an attention mechanism ECA module into the feature extraction network; performing slicing operation on an input image by using a Focus structure, performing convolution processing for a plurality of times, extracting feature information by using an optimized CSPDarknet53 feature extraction network to obtain feature images with different scales, adding an SPP module after the feature extraction network, and solving the problem of accuracy reduction caused by target scale change; the minimum feature map is integrated with the high-level strong semantic feature information and the low-level strong positioning features through a feature pyramid network structure and a path aggregation network structure, and the two network structures are combined to finally obtain detection layers with different scales and simultaneously having the strong semantic features and the strong positioning features; and using the varical Loss as a Loss function of confidence and class probability of the detected object to realize multi-scale detection and obtain different prediction frames. According to the method, the input infrared image is subjected to feature extraction through an improved trunk feature extraction network, the fusion of feature information of different scales is realized by combining a feature pyramid network structure and a path aggregation network structure, meanwhile, the loss function of the network is optimized, finally, the feature images of different scales are predicted, and the detection of dense shielding objects is improved by using non-maximum value inhibition based on Distance-IoU (DIoU), so that the method can be widely applied to the fields of automatic driving, night security and the like.
However, the above method has the following problems: 1. the feature extraction module CSPDarknet53 has excessive parameters and redundant feature graphs in the feature extraction process. 2. The multi-scale feature fusion is to be enhanced, and the background interference resistance is weak.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides an infrared target detection method applied to a complex environment, which comprises the following steps: acquiring an infrared image to be detected, and preprocessing the infrared image; inputting the preprocessed infrared image into a trained infrared target detection model to obtain a detection result; the infrared target detection model comprises a trunk feature extraction network, a neck reinforcing feature extraction network and a prediction output feature layer;
the process of training the infrared target detection model comprises the following steps:
s1: acquiring a training data set, wherein the training data set comprises an infrared image and a category label corresponding to the infrared image;
s2: preprocessing an infrared image in the training data set, and inputting the preprocessed infrared image into an infrared target detection model for training;
s3: extracting different scale features of the infrared image by adopting a trunk feature extraction network;
s4: adopting a neck reinforcing feature extraction network to carry out reinforcing fusion treatment on the features with different scales to obtain a fusion feature map;
s5: inputting the fusion feature map into a prediction output feature layer to obtain a target detection result;
s6: and calculating a loss function of the model according to the target detection result, continuously adjusting model parameters, determining the accuracy of the target detection result by adopting a performance evaluation index, and completing training of the model when the accuracy of the target detection result meets the requirement.
Preferably, the trunk feature extraction network comprises a GSeConv module, a C3Ghost module and an SPPF module, wherein the GSeConv module is used for extracting shallow features of the infrared image, the C3Ghost module is used for reducing redundant information of the shallow features, and the SPPF module is used for increasing the receptive field of the network and obtaining context information after removing the redundant information.
Further, the GSeConv module extracting shallow features of the infrared image includes: compressing channels of the infrared image by adopting convolution with the size of 1 multiplied by 1, wherein the number of convolution kernels of the convolution layer is one half of the number of channels of the input image; performing feature reconstruction on a feature map output by convolution with the size of 1 multiplied by 1 by adopting layer-by-layer convolution with the size of 3 multiplied by 3 to obtain a mixed feature map; carrying out channel splitting on the mixed feature images to obtain two groups of feature images; performing feature superposition on the first group of feature graphs and concentrated features generated by point-by-point convolution according to the channel direction; and splicing the superimposed feature images with the second group of feature images to obtain an output result.
Preferably, the reinforcement fusion processing of the features with different scales by using the neck reinforcement feature extraction network EPANet comprises: the 32 times of downsampling characteristic diagram extracted by the backbone network is passed through an SPPF module to obtain an output characteristic diagram with the size of 20 multiplied by 20; the size of the 32 times downsampled feature map is transformed through a 1X 1 convolution module and an upsampling module, and the transformed feature map is spliced with the feature map which is downsampled by 16 times by the backbone network to obtain a feature map A; inputting the feature map A into a C3GS module and an up-sampling module to further extract features and superimpose the feature map obtained by 8 times down-sampling of a main network to obtain a feature map B, and realizing a bottom-up fusion process; and inputting the feature map B into a C3GS module and a downsampling module for processing to obtain a 40×40 fusion feature map and an 80×80 fusion feature map.
Preferably, the feature extraction module C3GS of the neck network can reduce the number of parameters without losing accuracy. The module structure is characterized in that the module structure is divided into two branches after the channels are divided, one branch does not do any operation, the other branch firstly extracts the characteristics through the GSeConv module, and then the corresponding characteristic information is weighted by utilizing the SimAM attention mechanism, so that effective characteristic details in the image are highlighted, the superimposed characteristic images are shuffled, the information interaction between the channels is promoted, and the learning capacity of a network is enhanced. Neck multiscale feature fusion networks bottom-up and top-down EPANet enhanced information fusion networks are implemented by upsampling and downsampling features. In order to obtain a better fusion effect, nodes with smaller contribution to the network are removed, the depth of the network is reduced, and the neck is lighter; and then, a fusion edge is added between the original shallow network and the neck bottom layer output node so as to fuse higher-level characteristics and shorten the transmission path of the context information, thereby extracting the characteristics of richer semantic information.
Preferably, the model loss function is expressed as:
Figure BDA0004168232430000041
Figure BDA0004168232430000042
wherein IOU represents the intersection ratio of the predicted frame and the real target frame, ρ 2 (b,b gt ) Representing the distance between the predicted frame and the center point of the real frame, ρ represents the Euclidean distance between the center points of the two frames, b represents the coordinate of the predicted center point, b gt Representing the coordinates of the center point of a real target frame, c representing the diagonal distance from the smallest rectangular area containing two bounding boxes, α being a parameter for the balance ratio, v for the uniformity of the Heng Lianggao width ratio, w gt And h gt Representing the width and height of the real target frame, w and h representing the width and height of the predicted frame.
Preferably, the calculation formula of the performance evaluation index is:
Figure BDA0004168232430000043
Figure BDA0004168232430000044
where precision represents the precision, tp represents the number of detection frames with the intersection ratio of the predicted value and the true value being greater than 0.5, fp represents the number of detection frames with the intersection ratio of the predicted value and the true value being less than or equal to 0.5, recovery represents the recall, and fn represents the number in which no true value is detected.
The beneficial effects are that:
the method effectively improves the detection precision of the infrared targets, has better detection effect on the infrared targets which are easy to be blocked in complex scenes, simultaneously remarkably reduces the quantity of parameters, meets the real-time detection requirement, and is friendly to the deployment of edge equipment.
Drawings
FIG. 1 is a flowchart of an infrared target detection algorithm in an embodiment of the invention;
FIG. 2 is a block diagram of a GSeConv module in an embodiment of the invention;
FIG. 3 is a block diagram of a C3Ghost module in an embodiment of the invention;
FIG. 4 is a block diagram of a C3GS module according to an embodiment of the invention;
fig. 5 is a block diagram of a SimAM module according to an embodiment of the present invention;
FIG. 6 is a diagram of an EPANet network in an embodiment of the present invention;
FIG. 7 is a diagram of an infrared detection model overall framework in an embodiment of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
An embodiment of an infrared target detection method applied to a complex environment, the method comprises the following steps: acquiring an infrared image to be detected, and preprocessing the infrared image; inputting the preprocessed infrared image into a trained infrared target detection model to obtain a detection result; the infrared target detection model comprises a trunk feature extraction network, a neck reinforcing feature extraction network and a prediction output feature layer. Specifically, as shown in fig. 1, inputting a given infrared image, firstly extracting fine-grained feature information such as basic outline, texture and the like from a shallow network of a trunk; the main network is mainly characterized by extracting by a GSeConv module and a C3Ghost module; carrying out multi-scale information fusion on different scale features extracted from the backbone network through a neck EPAnet network, and further enhancing the extraction of target features; the main feature extraction module of the neck network is a C3GS module; the prediction output feature layer utilizes the three effective feature layers after the fusion network strengthening treatment to identify and predict the target category and detect the regression of the frame position, thereby obtaining an output result.
In this embodiment, a method for training an infrared target detection model is provided, the method including:
s1: acquiring a training data set, wherein the training data set comprises an infrared image and a category label corresponding to the infrared image;
s2: preprocessing an infrared image in the training data set, and inputting the preprocessed infrared image into an infrared target detection model for training;
s3: extracting different scale features of the infrared image by adopting a trunk feature extraction network;
s4: adopting a neck reinforcing feature extraction network to carry out reinforcing fusion treatment on the features with different scales to obtain a fusion feature map;
s5: inputting the fusion feature map into a prediction output feature layer to obtain a target detection result;
s6: and calculating a loss function of the model according to the target detection result, continuously adjusting model parameters, determining the accuracy of the target detection result by adopting a performance evaluation index, and completing training of the model when the accuracy of the target detection result meets the requirement.
As shown in fig. 7, the trunk feature extraction network includes a GSeConv module, a C3Ghost module, and an SPPF module, where the GSeConv module is used to extract shallow features of an infrared image, the C3Ghost module is used to reduce redundant information of the shallow features, and the SPPF module is used to increase a receptive field of the network, and obtain context information after removing the redundant information.
Specifically, the shallow layer contains fine-grained feature information such as basic contours and textures, and insufficient extraction of the feature information in the shallow layer network can cause partial information loss of a target to be detected and blur the expression of global features, which directly affects the feature extraction quality of the deep layer network, so that the detection precision of a model is reduced. As shown in fig. 2, the GSeConv module is a core feature extraction module of a shallow layer part of the trunk, which can effectively inhibit interference of irrelevant information, reduce loss of shallow layer features, and enhance expression of target feature information. The module specifically operates: the channels of the input feature map are first compressed using a 1 x 1 convolution, where the number of convolution kernels is set to half the number of output channels, and then features are reconstructed using fewer parameter quantities by a layer-by-layer convolution of 3 x 3 size. Based on the method, the obtained mixed feature images are split into two groups of feature images through channels, one group of the feature images is overlapped with concentrated features generated by point-by-point convolution according to the channel direction, and the other group of the feature images is directly mapped to the next layer of feature images which are overlapped and then subjected to convolution operation, so that an output result is obtained. As shown in fig. 3, the C3Ghost uses a CSP architecture, which is used to solve the problem of gradient information repetition of an injection network, and flows gradients into different paths, so that the propagation of the gradient information has correlation differences, and the purpose of the CSP architecture is to improve the accuracy while reducing the reasoning cost.
The neck reinforcing feature extraction network is used for carrying out reinforcing fusion on the features with different scales. And the effective feature layer obtained by the backbone network is fused with the features of the neck network after up-sampling and down-sampling, so that the bottom-up and top-down fusion process is realized. In order to obtain a better fusion effect, nodes with smaller contribution to the network are removed, the depth of the network is reduced, and the neck is lighter; and then, a fusion edge is added between the original shallow network and the neck bottom layer output node so as to fuse higher-level characteristics and shorten the transmission path of the context information, thereby extracting the characteristics of richer semantic information. The feature extraction module C3GS module of the neck network can reduce the parameter number without losing the precision. The module structure is characterized in that the feature images are subjected to channel segmentation, and a group of segmented feature images are input into a GSeConv module to extract deep features; weighting deep characteristic information by adopting a SimAM attention mechanism; and shuffling the overlapped and output characteristic graphs according to channels, so as to promote information interaction and enhance the learning ability of the network.
The GSeConv module extracting shallow layer characteristics of the infrared image comprises the following steps: compressing channels of the infrared image by adopting convolution with the size of 1 multiplied by 1, wherein the number of convolution kernels of the convolution layer is one half of the number of channels of the input image; performing feature reconstruction on a feature map output by convolution with the size of 1 multiplied by 1 by adopting layer-by-layer convolution with the size of 3 multiplied by 3 to obtain a mixed feature map; carrying out channel splitting on the mixed feature images to obtain two groups of feature images; performing feature superposition on the first group of feature graphs and concentrated features generated by point-by-point convolution according to the channel direction; and splicing the superimposed feature images with the second group of feature images to obtain an output result.
The C3Ghost module comprises a residual network branch and a convolution branch, wherein the residual network branch is composed of a plurality of residual structures; inputting input information into a residual network branch and a convolution branch respectively, wherein the residual network branch is used for extracting deep features of the input information, increasing gradient values of back propagation between layers, and the convolution branch is used for directly extracting features of the input information; and superposing the feature images extracted by the two branches according to the channel direction to obtain an output feature image.
In this embodiment, the feature extraction module C3GS module of the neck network can reduce the number of parameters without losing accuracy. As shown in fig. 4, the module structure is characterized in that after the channels are segmented, one branch does not perform any operation, the other branch firstly extracts features through the GSeConv module, and then weights corresponding feature information by using the SimAM attention mechanism, so that effective feature details in the image are highlighted, and the superimposed feature images are shuffled, so that information interaction between the channels is promoted, and learning capability of a network is enhanced, wherein the structure of the SimAM attention mechanism is shown in fig. 5. As shown in fig. 6, the neck multi-scale feature fusion network implements bottom-up and top-down EPANet enhanced information fusion networks by upsampling and downsampling features. In order to obtain a better fusion effect, nodes with smaller contribution to the network are removed, the depth of the network is reduced, and the neck is lighter; and then, a fusion edge is added between the original shallow network and the neck bottom layer output node so as to fuse higher-level characteristics and shorten the transmission path of the context information, thereby extracting the characteristics of richer semantic information.
The method for carrying out reinforced fusion treatment on the features with different scales by adopting the neck reinforced feature extraction network EPANet comprises the following steps: the 32 times of downsampling characteristic diagram extracted by the backbone network is passed through an SPPF module to obtain an output characteristic diagram with the size of 20 multiplied by 20; the size of the 32 times downsampled feature map is transformed through a 1X 1 convolution module and an upsampling module, and the transformed feature map is spliced with the feature map which is downsampled by 16 times by the backbone network to obtain a feature map A; inputting the feature map A into a C3GS module and an up-sampling module to further extract features and superimpose the feature map obtained by 8 times down-sampling of a main network to obtain a feature map B, and realizing a bottom-up fusion process; and inputting the feature map B into a C3GS module and a downsampling module for processing to obtain a 40×40 fusion feature map and an 80×80 fusion feature map.
The C3GS module comprises a convolution branch and a special processing branch, the convolution branch directly extracts input channel information, the special processing branch carries out channel segmentation processing on the feature images, and a group of segmented feature images are input into the GSeConv module to extract deep features; weighting deep characteristic information by adopting a SimAM attention mechanism; and shuffling the superimposed and output characteristic images according to channels to promote information interaction. And superposing and outputting the feature graphs obtained by the two branches.
The post-processing stage of the prediction output of the model includes two parts, namely, regression for confirming and locating the target class. Wherein the regression positioning loss function of the detection frame position is CIOU, which takes into account not only the overlapping area and center point distance between the frame prediction frame and the real frame, but also introduces an aspect ratio influencing factor between the two target bounding frames.
Specifically, the process of processing the fusion feature map by adopting the prediction output feature layer comprises the following steps: respectively inputting the three fusion feature graphs output by the neck reinforcement feature extraction network EPAnet into three corresponding prediction output feature layers; distributing preset priori frames of three scales in each prediction feature layer; taking a detection head for predicting a small target as an example, dividing an 80×80 fusion feature map into 80×80 grids, wherein each grid corresponds to three preset priori frames; sliding the divided fusion feature map by adopting a predictor, and predicting confidence coefficient parameters and regression parameters of each priori frame when the predictor slides to a specific grid; regression fine adjustment is carried out on the predicted frame by adopting a CIOU loss function, and all kinds of information and coordinate information of the position of the predicted frame are decoded, so that the positioning of the target frame is realized; and removing redundant prediction frames through non-maximum suppression, and reserving an optimal prediction frame to obtain a final detection result.
The model loss function is expressed as:
Figure BDA0004168232430000091
Figure BDA0004168232430000092
wherein IOU represents the intersection ratio of the predicted frame and the real target frame, ρ 2 (b,b gt ) Representing the distance between the predicted frame and the center point of the real frame, ρ represents the Euclidean distance between the center points of the two frames, b represents the coordinate of the predicted center point, b gt Representing the center point of a real target frameCoordinates, c, represents the diagonal distance from the smallest rectangular region containing two bounding boxes, α is a parameter for balancing the scale, v is the uniformity of the Heng Lianggao width ratio, w gt And h gt Representing the width and height of the real target frame, w and h representing the width and height of the predicted frame.
The commonly used evaluation performance indexes of the target detection task are as follows:
accuracy (P): positive and negative sample samples that are correctly identified are the ratio of positive samples.
Figure BDA0004168232430000093
Recall (R): of all positive samples in the test set, the ratio of positive samples is correctly identified.
Figure BDA0004168232430000094
Average accuracy mean (mAP): average of average accuracy of each category.
Wherein tp is the number of detection frames with the intersection ratio of the predicted value and the true value being more than 0.5; fp is the number of detection frames with the intersection ratio of the predicted value and the true value being less than or equal to 0.5; fn is the number of true values that are not detected.
Simulation implementation conditions: the environment configuration is based on Pytorch 1.12.0,CUDA 11.7,python3.7. The model was trained on NVIDIA GeForce RTX 3060 GPU. During the training phase, the optimizer weight decay is set to 0.005, the SGD momentum is 0.9, and the model is trained with 100 iterations.
Compared with the most advanced infrared target detection model, the detection precision is improved by 2 points, and the detection effect on the small infrared target shielded in the complex scene is better.
While the foregoing is directed to embodiments, aspects and advantages of the present invention, other and further details of the invention may be had by the foregoing description, it will be understood that the foregoing embodiments are merely exemplary of the invention, and that any changes, substitutions, alterations, etc. which may be made herein without departing from the spirit and principles of the invention.

Claims (9)

1. The infrared target detection method applied to the complex environment is characterized by comprising the following steps of: acquiring an infrared image to be detected, and preprocessing the infrared image; inputting the preprocessed infrared image into a trained infrared target detection model to obtain a detection result; the infrared target detection model comprises a trunk feature extraction network, a neck reinforcing feature extraction network and a prediction output feature layer;
the process of training the infrared target detection model comprises the following steps:
s1: acquiring a training data set, wherein the training data set comprises an infrared image and a category label corresponding to the infrared image;
s2: preprocessing an infrared image in the training data set, and inputting the preprocessed infrared image into an infrared target detection model for training;
s3: extracting different scale features of the infrared image by adopting a trunk feature extraction network;
s4: adopting a neck reinforcing feature extraction network to carry out reinforcing treatment on the features with different scales to obtain a fusion feature map;
s5: inputting the fusion feature map into a prediction output feature layer to obtain a target detection result;
s6: and calculating a loss function of the model according to the target detection result, continuously adjusting model parameters, determining the accuracy of the target detection result by adopting a performance evaluation index, and completing training of the model when the accuracy of the target detection result meets the requirement.
2. The method for detecting the infrared target in the complex environment according to claim 1, wherein the trunk feature extraction network comprises a GSeConv module, a C3Ghost module and an SPPF module, wherein the GSeConv module is used for extracting shallow features of an infrared image, the C3Ghost module is used for reducing redundant information of the shallow features, and the SPPF module is used for increasing a receptive field of the network to obtain the context information after the redundant information is removed.
3. The method for detecting an infrared target applied to a complex environment according to claim 2, wherein the GSeConv module extracting shallow features of an infrared image comprises: compressing channels of the infrared image by adopting convolution with the size of 1 multiplied by 1, wherein the number of convolution kernels of the convolution layer is one half of the number of channels of the input image; performing feature reconstruction on a feature map output by convolution with the size of 1 multiplied by 1 by adopting layer-by-layer convolution with the size of 3 multiplied by 3 to obtain a mixed feature map; carrying out channel splitting on the mixed feature images to obtain two groups of feature images; performing feature superposition on the first group of feature graphs and concentrated features generated by point-by-point convolution according to the channel direction; and splicing the superimposed feature images with the second group of feature images to obtain an output result.
4. The method for detecting an infrared target in a complex environment according to claim 2, wherein the C3Ghost module comprises a residual network branch and a convolution branch, wherein the residual network branch is composed of a plurality of residual structures; inputting input information into a residual network branch and a convolution branch respectively, wherein the residual network branch is used for extracting deep features of the input information, increasing gradient values of back propagation between layers, and the convolution branch is used for directly extracting features of the input information; and superposing the feature images extracted by the two branches according to the channel direction to obtain an output feature image.
5. The method for detecting an infrared target in a complex environment according to claim 1, wherein the step of performing enhanced fusion processing on features of different scales by using a neck enhanced feature extraction network EPANet comprises: the 32 times of downsampling characteristic diagram extracted by the backbone network is passed through an SPPF module to obtain an output characteristic diagram with the size of 20 multiplied by 20; the size of the 32 times downsampled feature map is transformed through a 1X 1 convolution module and an upsampling module, and the transformed feature map is spliced with the feature map which is downsampled by 16 times by the backbone network to obtain a feature map A; inputting the feature map A into a C3GS module and an up-sampling module to further extract features and superimpose the feature map obtained by 8 times down-sampling of a backbone network to obtain a feature map B; and inputting the feature map B into a C3GS module and a downsampling module for processing to obtain a 40×40 fusion feature map and an 80×80 fusion feature map.
6. The method for detecting the infrared target in the complex environment according to claim 5, wherein the C3GS module comprises a convolution branch and a special processing branch, the convolution branch directly extracts input channel information, the special processing branch performs channel segmentation processing on the feature map, and a segmented set of feature maps are input into the GSeConv module to extract deep features; weighting deep characteristic information by adopting a SimAM attention mechanism; and shuffling the superimposed and output characteristic images according to channels to promote information interaction. And superposing and outputting the feature graphs obtained by the two branches.
7. The method for detecting an infrared target in a complex environment according to claim 1, wherein the process of processing the fusion feature map by using the predicted output feature layer comprises: respectively inputting the three fusion feature graphs output by the neck reinforcement feature extraction network EPAnet into three corresponding prediction output feature layers; distributing preset priori frames of three scales in each prediction feature layer; in the process of small target detection, dividing an 80×80 fusion feature map into 80×80 grids, wherein each grid corresponds to three preset priori frames; sliding the divided fusion feature map by adopting a predictor, and predicting confidence coefficient parameters and regression parameters of the fusion feature map through a priori frame when the predictor slides to a corresponding grid; regression fine tuning is carried out on the prediction frame by adopting a CIOU loss function, and all kinds of information and coordinate information of the position of the prediction frame are decoded; removing redundant prediction frames through non-maximum suppression, and reserving optimal prediction frames to obtain a final detection result; the other object detection process is the same as the small object detection process.
8. The method for detecting an infrared target in a complex environment according to claim 1, wherein the expression of the model loss function is:
Figure FDA0004168232420000031
Figure FDA0004168232420000032
wherein IOU represents the intersection ratio of the predicted frame and the real target frame, ρ 2 (b,b gt ) Representing the distance between the predicted frame and the center point of the real frame, ρ represents the Euclidean distance between the center points of the two frames, b represents the coordinate of the predicted center point, b gt Representing the coordinates of the center point of a real target frame, c representing the diagonal distance from the smallest rectangular area containing two bounding boxes, α being a parameter for the balance ratio, v for the uniformity of the Heng Lianggao width ratio, w gt And h gt Representing the width and height of the real target frame, w and h representing the width and height of the predicted frame.
9. The method for detecting an infrared target applied to a complex environment according to claim 1, wherein the calculation formula of the performance evaluation index is:
Figure FDA0004168232420000033
Figure FDA0004168232420000034
where precision represents the precision, tp represents the number of detection frames with the intersection ratio of the predicted value and the true value being greater than 0.5, fp represents the number of detection frames with the intersection ratio of the predicted value and the true value being less than or equal to 0.5, recovery represents the recall, and fn represents the number in which no true value is detected.
CN202310369678.6A 2023-04-07 2023-04-07 Infrared target detection method applied to complex environment Pending CN116229217A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310369678.6A CN116229217A (en) 2023-04-07 2023-04-07 Infrared target detection method applied to complex environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310369678.6A CN116229217A (en) 2023-04-07 2023-04-07 Infrared target detection method applied to complex environment

Publications (1)

Publication Number Publication Date
CN116229217A true CN116229217A (en) 2023-06-06

Family

ID=86577056

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310369678.6A Pending CN116229217A (en) 2023-04-07 2023-04-07 Infrared target detection method applied to complex environment

Country Status (1)

Country Link
CN (1) CN116229217A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665016A (en) * 2023-06-26 2023-08-29 中国科学院长春光学精密机械与物理研究所 Single-frame infrared dim target detection method based on improved YOLOv5

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665016A (en) * 2023-06-26 2023-08-29 中国科学院长春光学精密机械与物理研究所 Single-frame infrared dim target detection method based on improved YOLOv5
CN116665016B (en) * 2023-06-26 2024-02-23 中国科学院长春光学精密机械与物理研究所 Single-frame infrared dim target detection method based on improved YOLOv5

Similar Documents

Publication Publication Date Title
CN111738110A (en) Remote sensing image vehicle target detection method based on multi-scale attention mechanism
CN112861729B (en) Real-time depth completion method based on pseudo-depth map guidance
CN113076871B (en) Fish shoal automatic detection method based on target shielding compensation
CN110119728A (en) Remote sensing images cloud detection method of optic based on Multiscale Fusion semantic segmentation network
CN109784283A (en) Based on the Remote Sensing Target extracting method under scene Recognition task
CN112528896A (en) SAR image-oriented automatic airplane target detection method and system
CN109919059B (en) Salient object detection method based on deep network layering and multi-task training
CN111126278A (en) Target detection model optimization and acceleration method for few-category scene
CN113610905B (en) Deep learning remote sensing image registration method based on sub-image matching and application
CN112529065A (en) Target detection method based on feature alignment and key point auxiliary excitation
CN116229217A (en) Infrared target detection method applied to complex environment
CN112668672A (en) TensorRT-based target detection model acceleration method and device
CN117274774A (en) Yolov 7-based X-ray security inspection image dangerous goods detection algorithm
CN115661767A (en) Image front vehicle target identification method based on convolutional neural network
CN115223009A (en) Small target detection method and device based on improved YOLOv5
CN116758263A (en) Remote sensing image target detection method based on multi-level feature fusion and joint positioning
CN114494893B (en) Remote sensing image feature extraction method based on semantic reuse context feature pyramid
CN116385876A (en) Optical remote sensing image ground object detection method based on YOLOX
CN114782983A (en) Road scene pedestrian detection method based on improved feature pyramid and boundary loss
CN117746066B (en) Diffusion model guided high-speed vehicle detection integrated learning method and device
CN116469014B (en) Small sample satellite radar image sailboard identification and segmentation method based on optimized Mask R-CNN
CN117553807B (en) Automatic driving navigation method and system based on laser radar
CN117557774A (en) Unmanned aerial vehicle image small target detection method based on improved YOLOv8
CN117994493A (en) Aerial photography rotation small target detection method based on feature enhancement
CN115471762A (en) Light fast RCNN remote sensing image airplane detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination