CN115223026A

CN115223026A - Real-time detection method for lightweight infrared dim targets

Info

Publication number: CN115223026A
Application number: CN202210896509.3A
Authority: CN
Inventors: 刘晓涛; 魏子翔; 刘静
Original assignee: Guangzhou Institute of Technology of Xidian University
Current assignee: Guangzhou Institute of Technology of Xidian University
Priority date: 2022-07-28
Filing date: 2022-07-28
Publication date: 2022-10-21

Abstract

The invention relates to the technical field of image recognition, in particular to a light infrared small target real-time detection method, which comprises the following steps: step one, constructing an infrared dim target data set: acquiring infrared weak small targets, selecting and labeling the infrared weak small targets, constructing an infrared weak small target detection network based on the Mobilenetv3-Unet, training a Mobilenetv3-Unet model by using infrared weak small target data set in the step one to obtain a trained Mobilenetv3-Unet model, and performing infrared weak small target detection on an input picture by using the trained Mobilenetv3-Unet model in the step two, wherein the Mobilenetv3-Unet model in the step two comprises a feature extraction backbone network and a feature aggregation network, and the feature extraction backbone network is an optimized Mobilenetv3 model.

Description

Real-time detection method for lightweight infrared dim targets

Technical Field

The invention relates to a real-time target detection method, in particular to a light infrared small and weak target real-time detection method, and belongs to the technical field of image recognition.

Background

The infrared weak and small target detection is a research difficulty and a key point in the field of target detection, and plays an important role in national defense and military command. The traditional infrared weak and small target detection method mainly comprises the following steps: aiming at a mode of searching a region of interest by target characteristics, a threshold segmentation method and a background prediction modeling method which utilize infrared imaging physical characteristics and the like, an infrared dim target detection algorithm based on a deep learning method mainly comprises the following steps: the yolo series based on the target detection method and the yolo series based on the semantic segmentation method. However, these algorithms have low detection efficiency and are difficult to cope with complicated and variable scenes, or the network model is huge and is difficult to deploy on the edge devices with limited resources.

In recent years, with the development of deep learning, target detection algorithms based on deep learning have been developed greatly. The deep learning is realized through the nonlinear change of a network layer, and the abstract features in the image are extracted through a back propagation algorithm, so that the target is accurately identified. However, due to the problems of low imaging resolution, low contrast, lack of corresponding texture of the weak and small targets, and the like of the infrared image, the target detection algorithm designed based on the common data set is not suitable for detecting the infrared weak and small targets. In the last few years, in order to improve the detection performance of the network, the depth and the width of the network are continuously increased, so that the computation amount of the network is larger and larger, and the training of one network even uses hundreds of servers for more than several days. The training and deployment of networks become more difficult, and therefore it is important to design a dedicated network for a specific application scenario.

With the rapid development of the deep learning method, the target detection field has achieved breakthrough development, and a new method and thought are brought to the infrared weak and small target detection research. The patent application with the application publication number of CN114549959A and the name of 'infrared small and weak target real-time detection method and system based on a target detection model' discloses an infrared small target detection method realized based on yolov4-tiny, which solves the problems that the existing infrared small and weak target detection method has low detection accuracy and is difficult to detect in real time, but the method still has certain difficulties for being deployed on edge equipment with limited resources.

At present, most networks only pay attention to the performance of the network, the complexity of the network is continuously improved to improve the detection effect of the infrared small and weak target, however, the infrared small and weak target occupies few pixels, lacks texture characteristics and the like, and the characteristics of the infrared small and weak target can be lost by an excessively deep and complicated network, so that the special lightweight network is designed aiming at the characteristics of the infrared small and weak target, and the special lightweight network has important significance in real-time operation on equipment with limited resources.

The present invention has been made in view of the above circumstances to help solve the above problems.

Disclosure of Invention

The invention aims to solve the problems, and provides a light infrared small target real-time detection method, which improves the detection capability of infrared small targets in complex scenes and can use less computing resources.

The invention realizes the aim through the following technical scheme, and provides a light infrared dim target real-time detection method, which comprises the following steps:

step one, constructing an infrared dim target data set: selecting and marking the infrared dim targets after the infrared dim targets are obtained;

step two, constructing an infrared small and weak target detection network based on Mobilenetv 3-Unet;

thirdly, training a Mobilenetv3-Unet model by utilizing the infrared weak and small target data set in the first step to obtain the trained Mobilenetv3-Unet model;

and step four, performing infrared small and weak target detection on the input picture by using the trained Mobilenetv3-Unet model.

Further, the Mobilenetv3-Unet model in the second step includes a feature extraction backbone network and a feature aggregation network, wherein the feature extraction backbone network is an optimized Mobilenetv3 model, the feature extraction backbone network extracts features of an input image, the features are output to the feature aggregation network for feature enhancement, and the feature aggregation network outputs a detection result.

Furthermore, the optimized Mobilenetv3 model is a feature extraction part of Mobilenetv3-small, the down-sampling times are optimized, and the size of a convolution kernel is adjusted to deal with the detection of infrared weak and small targets.

Further, the feature extraction backbone network model specifically includes:

the method comprises the steps that a first extraction feature is obtained after the input image passes through a convolutional layer, a second extraction feature is obtained after the first extraction feature passes through a block1 module, a third extraction feature is obtained after the second extraction feature passes through a block2 module, a fourth extraction feature is obtained after the third extraction feature passes through a block3 module, and a fifth extraction feature is obtained after the fourth extraction feature passes through a block4 module.

Furthermore, the block module is formed by stacking a series of bneck modules, and the bneck modules mainly realize channel separable convolution, an SE channel attention mechanism and residual error connection.

Further, the adjusting the size of the convolution kernel specifically includes:

convHead passes through a convolution layer with convolution kernel size of 3x3 and step length of 1, a bn layer and an hswih activation function, wherein block1 consists of 1 bneck module, the convolution kernel size is 3x3 and the step length is 2; block2 is composed of 2 bneck modules, the sizes of convolution kernels are respectively 3x3 and 3x3, and the step lengths are respectively 2 and 1; block3 is composed of 5 bneck modules, the sizes of convolution kernels are 5x5, 5x5 and 5x5 respectively, and the step lengths are 2, 1 and 1 respectively; block4 is composed of 3 bneck modules, the convolution kernel sizes are 7x7, 7x7 and 7x7 respectively, and the step sizes are 2, 1 and 1.

Further, the bneck module is a controllable parameter module, the controllable parameters include the size of a convolution kernel, the type of an activation function and whether to use an attention mechanism or not, global average pooling is performed on feature maps to obtain a 1x1xC result, each channel in the feature maps is equivalent to describing a part of features, the operation is equivalent to global, then in order to obtain the importance degree score of each feature map, two full connections are needed, and finally the whole result is also 1x1xC.

Further, the feature aggregation network is composed of four feature aggregation modules, and specifically includes:

and taking the fourth extraction feature and the fifth extraction feature of the feature extraction backbone network model as input of FA4, taking the third extraction feature and FA4 output as input of FA3, taking the second extraction feature and FA3 output as input of FA2, and taking the first extraction feature and FA2 output as output of FA 1.

Further, the input of the feature aggregation module is composed of shallow features and deep features. The shallow features refer to the output of the backbone at each stage, while the deep features refer to the output of the last feature aggregation module. We propose a simple but efficient aggregation module that autonomously adjusts the weights of two inputs using the attention mechanism as shown in fig. 2.

Further, in the third step, a BCE loss function and a DICE loss function are adopted for training, where the BCE loss function is:

wherein y is a basic true value and y is a predicted value.

The calculation formula of DICE is:

where X represents the true outcome and Y represents the predicted outcome.

Detecting the infrared dim and small target by using a trained Mobilenetv3-Unet model to obtain a detection result:

number of positive samples N _TP Number of erroneous positive samples N _FP And the number of false negative examples N _FN ；

1) Accuracy (Accuracy): indicating the proportion of positive and negative samples correctly classified

2) Precision (Precision): represents the proportion of all samples originally classified as positive samples

3) Recall (Recall): representing the proportion of originally positive samples to originally positive samples

4) Comprehensive evaluation index (F-Measure): is a harmonic mean of accuracy and recall

The invention has the technical effects and advantages that:

the method detects the infrared small and weak target through the Mobilenetv3-Unet model, utilizes the Mobilenetv3 as a skeleton for characteristic extraction, improves the characteristic extractor aiming at the characteristics of the infrared small and weak target, improves the detection performance of the infrared small and weak target under the condition of greatly lightening network calculation parameters and calculation amount, enables the network to be better suitable for the detection of the small and weak target, improves the detection precision, and simultaneously reduces the complexity of the network. Compared with the prior art, the method and the device have the advantages that the detection capability of the infrared dim target in the complex scene is improved, fewer computing resources can be used, and the dim target detection algorithm can be easily deployed on the edge equipment with limited resources.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a basic network module designed by the present invention;

FIG. 3 is a diagram of a network architecture according to the present invention;

fig. 4 is a diagram illustrating the effect of the present invention on detecting weak and small targets in different scenes.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

Referring to fig. 1-4, a method for detecting a lightweight infrared small and weak target in real time includes the following steps:

the method comprises the following steps: constructing an infrared small target data set;

step two: constructing an infrared small and weak target detection network based on Mobilenetv 3-Unet;

step three: training a Mobilenetv3-Unet model by using the infrared small and weak target data set in the step one to obtain the trained Mobilenetv3-Unet model;

step four: and carrying out infrared small and weak target detection on the input picture by using the trained Mobilenetv3-Unet model.

The Mobilenetv3-Unet model in the second step comprises a feature extraction backbone network and a feature aggregation network, wherein the feature extraction backbone network is an optimized Mobilenetv3 model, the feature extraction backbone network extracts features of an input image, the features are output to the feature aggregation network for feature enhancement, and the feature aggregation network outputs a detection result.

The optimized Mobilenetv3 model is a feature extraction part of the Mobilenetv3-small, and the down-sampling times are optimized, and the size of a convolution kernel is adjusted so as to be used for detecting infrared weak and small targets.

The feature extraction backbone network model specifically comprises:

The block module is formed by stacking a series of bneck modules, and the bneck modules mainly realize channel separable convolution, SE channel attention mechanism and residual error connection.

The feature aggregation network is composed of four feature aggregation modules, and specifically comprises the following steps:

According to the technical idea, the technical scheme adopted for realizing the purpose of the invention comprises the following steps:

step three: training a Mobilenetv3-Unet model by using the infrared weak and small target data set in the step one to obtain the trained Mobilenetv3-Unet model;

step four: and carrying out infrared small target detection on the input picture by using the trained Mobilenetv3-Unet model.

The Mobilenetv3-Unet model in the step two comprises a feature extraction backbone network and a feature aggregation network, wherein the feature extraction backbone network is an optimized Mobilenetv3 model, the feature extraction backbone network extracts features of an input image and outputs the features to the feature aggregation network for feature enhancement, and the feature aggregation network outputs a detection result.

Preferably, the optimized Mobilenetv3 model is a feature extraction part of the Mobilenetv3-small, and the down-sampling times are optimized, and the size of a convolution kernel is adjusted to cope with the detection of infrared weak and small targets.

Preferably, the feature extraction backbone network model specifically includes:

Preferably, the block module is formed by stacking a series of bneck modules, and the bneck modules mainly realize channel separable convolution, an SE channel attention mechanism and residual error connection.

As shown in fig. 3, the convHead passes through a convolutional layer with a convolutional kernel size of 3x3 and a step size of 1, a bn layer and an hswih activation function, block1 is composed of 1 bneck module, the convolutional kernel size is 3x3, and the step size is 2; block2 is composed of 2 bneck modules, the sizes of convolution kernels are 3x3 and 3x3 respectively, and the step lengths are 2 and 1 respectively; block3 is composed of 5 bneck modules, the sizes of convolution kernels are 5x5, 5x5 and 5x5 respectively, and the step lengths are 2, 1 and 1 respectively; block4 is composed of 3 bneck modules, the sizes of convolution kernels are 7x7, 7x7 and 7x7 respectively, and the step lengths are 2, 1 and 1;

specifically, the bneck module is a controllable parameter module, and the controllable parameters include the size of the convolution kernel, the type of the activation function, and whether to use an attention mechanism.

The schematic diagram of the bneck module is shown in fig. 2, first, global average pooling is performed on the feature map to obtain a result of 1x1xC, each channel in the feature map is equivalent to describing a part of features, and is equivalent to being global after operation, next, in order to obtain a score of importance of each feature map, two full connections are required, and finally, the whole result is also 1x1xC:

Specifically, the input of the feature aggregation module consists of shallow features and deep features. The shallow features refer to the output of the backbone at each stage, while the deep features refer to the output of the last feature aggregation module. We propose a simple but efficient aggregation module that autonomously adjusts the weights of two inputs using the attention mechanism as shown in FIG. 2

Preferably, the BCE loss function and the DICE loss function are adopted in the third step for training, and the BCE loss function is:

wherein y is the basic true value, and ^ y is the predicted value.

The calculation formula of DICE is:

where X is the true result and Y is the predicted result.

Detecting the infrared weak and small target by using the trained Mobilenetv3-Unet model to obtain a detection result:

number of positive samples N _TP Number of false positive samples N _FP And the number of false negative samples N _FN ；

The technical effects of the invention are further explained by combining simulation experiments as follows:

1. simulation conditions and contents:

the simulation experiment of the invention is realized based on a Pythrch frame under the hardware environment of GPU GeForce GTX 2080Ti and RAM 32G and the software environment of Ubuntu 18.04.

Simulation experiment: after the infrared small dim target data set is constructed according to the invention, the training set is subjected to 50 times of iterative training by using the optimized network. And inputting the test set to trained infrared weak and small target detection data for detection, as shown in the figure.

2. And (3) simulation result analysis:

compared with other infrared dim target detection algorithms, the infrared dim target detection result obtained by the method has obvious advantages, the method adopts the F1 parameter as a main evaluation index for infrared dim target detection to be 0.6706, and the prior art is only 0.6502. The detection result shows that the method can achieve good detection effect on infrared weak and small targets in various complex scenes.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. The method for detecting the light infrared dim targets in real time is characterized by comprising the following steps: the detection method comprises the following steps:

2. The method for detecting the light-weight infrared dim target in real time according to claim 1, characterized in that: the Mobilenetv3-Unet model in the second step comprises a feature extraction backbone network and a feature aggregation network, wherein the feature extraction backbone network is an optimized Mobilenetv3 model, the feature extraction backbone network extracts features of an input image and outputs the features to the feature aggregation network for feature enhancement, and the feature aggregation network outputs a detection result.

3. The method for detecting the light-weight infrared dim target in real time according to claim 2, characterized in that: the optimized Mobilenetv3 model is a feature extraction part of the Mobilenetv3-small, and the down-sampling times are optimized, and the size of a convolution kernel is adjusted so as to be used for detecting infrared weak and small targets.

4. The method for detecting the light-weight infrared dim target in real time according to claim 2, characterized in that: the feature extraction backbone network model specifically comprises:

5. The method for detecting the light-weight infrared dim target in real time according to claim 4, characterized in that: the block module is formed by stacking a series of bneck modules, and the bneck modules mainly realize channel separable convolution, SE channel attention mechanism and residual error connection.

6. The method for detecting the light-weight infrared dim target in real time according to claim 3, characterized in that: the size of the convolution kernel is specifically adjusted as follows:

convHead passes through a convolution layer with convolution kernel size of 3x3 and step length of 1, a bn layer and an hswih activation function, wherein block1 is composed of 1 bneck module, the convolution kernel size is 3x3 and the step length is 2; block2 is composed of 2 bneck modules, the sizes of convolution kernels are 3x3 and 3x3 respectively, and the step lengths are 2 and 1 respectively; the block3 is composed of 5 bneck modules, the sizes of convolution kernels are 5x5, 5x5 and 5x5 respectively, and the step lengths are 2, 1 and 1 respectively; block4 is composed of 3 bneck modules, the convolution kernel sizes are 7x7, 7x7 and 7x7 respectively, and the step sizes are 2, 1 and 1.

7. The method for detecting the small and weak infrared target in real time in a lightweight manner according to claim 5, wherein the method comprises the following steps: the bneck module is a controllable parameter module, the controllable parameters comprise the size of a convolution kernel, the type of an activation function and whether an attention mechanism is used or not, global average pooling is performed on the feature maps to obtain a 1x1xC result, each channel in the feature maps is equivalent to describing a part of features, the operation is equivalent to global, then two full connections are required to obtain the importance degree score of each feature map, and finally the whole result is also 1x1xC.

8. The method for detecting the light-weight infrared dim target in real time according to claim 2, characterized in that: the feature aggregation network is composed of four feature aggregation modules, and specifically comprises the following steps:

9. The method for detecting the light-weight infrared dim target in real time according to claim 8, characterized in that: the input of the feature aggregation module consists of shallow features and deep features. The shallow features refer to the output of the backbone at each stage, while the deep features refer to the output of the last feature aggregation module. We propose a simple but efficient aggregation module that autonomously adjusts the weights of two inputs using the attention mechanism as shown in fig. 2.

10. The method for detecting the light-weight infrared dim target in real time according to claim 1, characterized in that: in the third step, a BCE loss function and a DICE loss function are adopted for training, wherein the BCE loss function is as follows:

wherein y is the basic true value, and y is the predicted value.

The calculation formula of DICE is:

where X represents the true outcome and Y represents the predicted outcome.

number of positive samples N _TP Number of erroneous positive samples N _FP And the number of false negative samples N _FN ；