CN113989624A

CN113989624A - Infrared low-slow small target detection method, device, computing device and storage medium

Info

Publication number: CN113989624A
Application number: CN202111490363.4A
Authority: CN
Inventors: 余秋冬; 李海涛; 杨明; 王永艳; 段宇辉; 杨晨; 杨金宝; 梁坤; 戴明伦
Original assignee: Tianjin University of Technology and Education China Vocational Training Instructor Training Center; Beijing Institute of Environmental Features
Current assignee: Tianjin University of Technology and Education China Vocational Training Instructor Training Center; Beijing Institute of Environmental Features
Priority date: 2021-12-08
Filing date: 2021-12-08
Publication date: 2022-01-28
Anticipated expiration: 2041-12-08
Also published as: CN113989624B

Abstract

The invention provides an infrared low-slow small target detection method, an infrared low-slow small target detection device, a computing device and a storage medium, wherein the method comprises the following steps: acquiring an infrared image to be detected; inputting the infrared image into a detection model generated by pre-training; the detection model is obtained based on YOLOv4-tiny network training, and the YOLOv4-tiny network sequentially utilizes a backbone network, a neck network, a coordinate attention processing module and a head network to perform feature recognition on the infrared image; and obtaining the detection result of the low-slow small target in the infrared image according to the result output by the head network of the detection model. According to the scheme, the low-slow small target can be accurately detected according to the infrared image.

Description

Infrared low-slow small target detection method and device, computing equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of image processing, in particular to an infrared low-slow small target detection method, an infrared low-slow small target detection device, a computing device and a storage medium.

Background

The low-slow small target refers to an aircraft and an airborne object which have all or part of characteristics of flying speed less than 200km/h, flying height less than 1km, small radar reflection area and the like. With the development of aviation industry, especially the development of general aviation industry, especially the national requirement of the air force to open low-altitude aviation field, the problem of 'low-speed and small-size' flight becomes more and more prominent. Especially, the wide application of civil unmanned aerial vehicles brings great threat to air defense safety, and the monitoring and early warning can be correctly implemented only by effectively detecting the civil unmanned aerial vehicles.

The low-slow small target detection needs to face a complex scene, and the low-slow small target detection is low in flying height, small in size and strong in maneuverability, so that the existing model cannot directly detect the low-slow small target. Therefore, it is desirable to provide a method capable of accurately detecting low-slow small targets.

Disclosure of Invention

Based on the problem that the existing model cannot directly detect the low-slow small target, the embodiment of the invention provides an infrared low-slow small target detection method, an infrared low-slow small target detection device, a computing device and a storage medium, and the low-slow small target can be accurately detected.

In a first aspect, an embodiment of the present invention provides a method for detecting a small infrared low-slow target, including:

acquiring an infrared image to be detected;

inputting the infrared image into a detection model generated by pre-training; the detection model is obtained based on YOLOv4-tiny network training, and the YOLOv4-tiny network sequentially utilizes a backbone network, a neck network, a coordinate attention processing module and a head network to perform feature recognition on the infrared image;

and obtaining the detection result of the low-slow small target in the infrared image according to the result output by the head network of the detection model.

Preferably, the training mode of the detection model includes:

acquiring a plurality of data samples marked with labels, wherein the data samples are images comprising low and slow small targets; the number of data samples satisfies: the image background covers one or more of the specified background environment, the specified type of the target type, and the target size is not smaller than the set size;

for each data sample, performing: inputting the data sample into the backbone network, inputting the data sample output by the backbone network into a neck network, performing coordinate attention processing on the data sample output by the neck network by using the coordinate attention processing module, and inputting the data sample after coordinate attention processing into the head network;

and obtaining the trained detection model.

Preferably, the data samples output by the backbone network include: the first feature map and the second feature map of different prediction scales;

the coordinate attention processing module respectively carries out coordinate attention processing on the following data samples output by the neck network: the first feature map is processed and output after convolution processing is carried out on the first feature map, and the spliced feature map is obtained by tensor splicing the processed first feature map and the second feature map after upsampling is carried out on the processed first feature map.

Preferably, the data samples output by the backbone network further include: the third feature map is different from the first feature map and the second feature map in prediction scale;

the coordinate attention processing module further performs coordinate attention processing on the following data samples output by the neck network: and carrying out tensor splicing on the spliced characteristic diagram and the third characteristic diagram after carrying out upsampling on the spliced characteristic diagram, and carrying out characteristic extraction processing to obtain a characteristic extraction diagram.

Preferably, the performing, by the coordinate attention processing module, coordinate attention processing on the data sample output by the neck network includes:

coding each channel of the data samples along the horizontal coordinate direction and the vertical coordinate direction respectively to obtain a horizontal tensor matrix and a vertical tensor matrix;

performing feature mapping on the horizontal tensor matrix and the vertical tensor matrix to obtain an feature mapping matrix, and decomposing the feature mapping matrix along the horizontal coordinate direction and the vertical coordinate direction respectively to obtain a horizontal attention weight matrix and a vertical attention weight matrix which have the same channel number as the data sample;

and multiplying each channel of the data sample by the corresponding horizontal attention weight matrix and the vertical attention weight matrix to obtain the data sample after coordinate attention processing.

Preferably, the encoding each channel by the data samples along the horizontal coordinate direction and the vertical coordinate direction respectively to obtain a horizontal tensor matrix and a vertical tensor matrix, includes:

the data samples are encoded in the horizontal coordinate direction according to the following formula:

the data samples are encoded along the vertical coordinate direction according to the following formula:

wherein W and H are the width and height of the image, respectively, c is the number of channels of the data sample image, X is the input data sample,

the output of the c-th channel with height h,

is the output of the c-th channel with width w.

Preferably, the performing the feature mapping on the horizontal tensor matrix and the vertical tensor matrix to obtain the feature mapping matrix includes:

f＝δ(F₁([B^h,B^w]))

wherein δ is a nonlinear activation function, F₁Is a 1 × 1 convolution operation, [,.]For concatenate operations along the spatial dimension, B^hIs a matrix of horizontal tensors, B^wIs the vertical tensor matrix and f is the resulting eigenmap matrix.

In a second aspect, an embodiment of the present invention further provides an infrared low-slow small target detection apparatus, including:

the image acquisition unit is used for acquiring an infrared image to be detected;

the detection model identification unit is used for inputting the infrared image into a detection model generated by pre-training; the detection model is obtained based on YOLOv4-tiny network training, and the YOLOv4-tiny network sequentially utilizes a backbone network, a neck network, a coordinate attention processing module and a head network to perform feature recognition on the infrared image;

and the detection result generation unit is used for obtaining the detection result of the low-slow small target in the infrared image according to the result output by the head network of the detection model.

In a third aspect, an embodiment of the present invention further provides a computing device, including a memory and a processor, where the memory stores a computer program, and the processor, when executing the computer program, implements the method described in any embodiment of this specification.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed in a computer, the computer program causes the computer to execute the method described in any embodiment of the present specification.

The embodiment of the invention provides an infrared low-slow small target detection method, an infrared low-slow small target detection device, a computing device and a storage medium, wherein a coordinate attention processing module is added to a YOLOv4-tiny network to embed position information into channel attention, so that when an infrared image is subjected to feature recognition, the position of a low-slow small target in the infrared image can be more accurately positioned, and the purpose of accurately detecting the low-slow small target according to the infrared image is achieved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of a method for detecting a small infrared low-slow target according to an embodiment of the present invention;

FIG. 2 is a flow chart of another method for detecting small infrared low-slow targets according to an embodiment of the present invention;

FIG. 3 is a diagram of a network architecture for detecting small infrared low-slow objects according to an embodiment of the present invention;

FIG. 4 is a diagram of a hardware architecture of a computing device according to an embodiment of the present invention;

fig. 5 is a structural diagram of an infrared low-slow small target detection device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.

As described above, because the low-slow small target detection needs to face a complex scene, and the low-slow small target detection has a low flying height, a small volume and strong maneuverability, the existing model cannot directly detect the low-slow small target, and therefore, it can be considered that the infrared low-slow small target detection model is applied to a device with strong maneuverability, and the situation that hardware resources are insufficient commonly exists in an airplane or other devices with strong maneuverability, so that the detection model needs to be lighter. For the problems of complex detection background and small size of the low-slow small target, a coordinate attention processing module is added to the YOLOv4-tiny network, so that the position of the low-slow small target in the infrared image can be more accurately positioned. Therefore, the scheme can accurately detect the low-slow small target according to the infrared image.

Specific implementations of the above concepts are described below.

Referring to fig. 1, an embodiment of the present invention provides a method for detecting a small infrared low-slow target, including:

and step 100, acquiring an infrared image to be detected.

Step 102, inputting the infrared image into a detection model generated by pre-training; the detection model is obtained based on YOLOv4-tiny network training, and the YOLOv4-tiny network carries out feature recognition on the infrared image by sequentially utilizing a main network, a neck network, a coordinate attention processing module and a head network.

And 104, obtaining a detection result of the low-slow small target in the infrared image according to a result output by the head network of the detection model.

In the embodiment of the invention, the coordinate attention processing module is added to the YOLOv4-tiny network, and the position information is embedded into the channel attention, so that the position of a low-slow small target in an infrared image can be more accurately positioned when the infrared image is subjected to feature recognition, and the purpose of accurately detecting the low-slow small target according to the infrared image is achieved.

The manner in which the various steps shown in fig. 1 are performed is described below.

First, in step 100, an infrared image to be detected is acquired.

In the embodiment of the invention, each frame of infrared image of the infrared video collected in real time or the collected infrared video can be used as the infrared image to be detected, and the infrared low-slow small target detection model is used for detecting and identifying the low-slow small target of each frame of infrared image, so that the aim of detecting the low-slow small target in real time is fulfilled.

Then, aiming at step 102, inputting the infrared image into a detection model generated by pre-training; the detection model is obtained based on YOLOv4-tiny network training, and the YOLOv4-tiny network carries out feature recognition on the infrared image by sequentially utilizing a main network, a neck network, a coordinate attention processing module and a head network.

When each frame of infrared image in the infrared video is required to be detected to be a low-slow small target through the infrared low-slow small target detection model, the infrared low-slow small target detection network needs to be trained in advance to obtain the detection model.

Then, the following describes a training method of the detection model.

In the embodiment of the present invention, referring to fig. 2, the detection model can be trained at least by using the following method as shown in steps 200-204:

step 200, acquiring a plurality of data samples marked with labels, wherein the data samples are images comprising low and slow small targets; the number of data samples satisfies: the image background covers one or more of the specified background environment, the specified type of the target type, and the target size is not smaller than the set size.

In practical application, the data sample can be selected according to practical requirements, for example, a background image without low-slow small targets can be obtained as a negative sample. The background environment, the type and the size of the low-slow small target in the positive sample can be selected according to the requirement and the actual scene.

In the embodiment of the invention, the infrared image formed by self-shooting and low-slow small target infrared video on the network can be intercepted as the data sample. The image background needs to cover a specified background environment, such as one or more background environments of a building, a cloud layer, a headroom, a mountain and a tree, and the target type covers a specified type, for example, the target type may be an unmanned aerial vehicle, a balloon and a bird, and in addition, the target size in the data sample may be not smaller than a set size, for example, the set size is greater than 10 × 10 pixels; therefore, the diversity of data samples can be ensured, the characteristic learning capacity of the model on low and slow small targets is improved, and the model can accurately identify the targets from images in different background environments and different sizes.

In order to enable the detection model obtained after training to detect the low-slow small targets from the infrared image without the label, the low-slow small targets in the data sample need to be labeled in advance. In the embodiment of the invention, each low-slow small target in the data sample can be labeled by using a labelImg labeling tool, and the labeled label can include the category of the low-slow small target, the abscissa of the central point, the ordinate of the central point, the width of the target and the length of the target. When the infrared low-slow small target detection network is trained, in order to input the data sample and the corresponding label into the infrared low-slow small target detection network at the same time, the label file output by the labelImg labeling tool needs to be converted from an xml format to a txt format.

In addition, in order to train the infrared low-slow small target detection network and obtain an optimal detection model, a plurality of data samples with labels can be divided into a training set, a verification set and a test set according to the ratio of 8:1: 1.

Step 202, for each data sample, executing: inputting the data sample into the backbone network, inputting the data sample output by the backbone network into a neck network, performing coordinate attention processing on the data sample output by the neck network by using the coordinate attention processing module, and inputting the data sample after coordinate attention processing into the head network;

in the embodiment of the invention, each data sample in the training set is respectively input into an infrared low-slow small target detection network, and the infrared low-slow small target detection network is based on a YOLOv4-tiny network and sequentially utilizes a backbone network, a neck network, a coordinate attention processing module and a head network to perform feature recognition on each data sample.

In the embodiment of the invention, the CSPdakrnet 53-Tiny network is adopted as the backbone network, when the residual block is processed, the original stack of the residual block is split into two parts, the backbone part continues to stack the residual block, the convolution result and the other part introduce a residual edge, the learning capacity of the backbone network is enhanced, the light weight can be ensured, the accuracy is ensured, and the memory cost is reduced. In addition, to increase the speed, the activation function is modified to LeakyReLU.

Fig. 3 shows a structure diagram of a modified YOLOv4-tiny network, in which boxes 1, 2, 3 and 4 are a backbone network, a neck network, a coordinate attention processing module and a head network, respectively. For each data sample, firstly, the data sample is subjected to feature extraction by using the backbone network to obtain feature maps with different scales. And then inputting the feature maps with different scales output by the backbone network into the neck network for feature fusion. In the embodiment of the invention, for the detection of the small target, the extraction of the characteristic information of the small target needs to be strengthened. In order to obtain richer small target position information, a scale is added in a data sample output by a backbone network, and then information of different scales is obtained by fusing a characteristic pyramid structure (FPN), so that the defect that the original Yolov4-Tiny network only comprises two prediction scales is overcome, and the accuracy of extracting the small target position information is improved.

Therefore, the data sample output by the backbone network comprises a third feature map, which is different from the first feature map and the second feature map in prediction scale, in addition to the first feature map and the second feature map with different prediction scale. For example, the original feature map output sizes of 13 × 13 and 26 × 26 feature maps are output from the main network, and in the embodiment of the present invention, the feature map with the newly added output size of 52 × 52 is used as the third feature map.

In the embodiment of the invention, the neck network improves a feature pyramid structure (FPN), and performs feature fusion and feature processing on a feature map with a newly added scale. The improved characteristic pyramid structure (FPN) network can fuse the characteristics between different output layers in a main network, so that the rich semantic information of a deep network can be ensured, and the geometric detail information of a low-level network can be obtained, thereby enhancing the extraction capability of the characteristics.

After the feature map output by the backbone network is subjected to feature fusion processing by the neck network, coordinate attention processing needs to be performed on the data sample output by the neck network. Due to the addition of a scale, the data sample output by the neck network not only comprises the processed first characteristic diagram output after convolution processing of the first characteristic diagram, the spliced characteristic diagram obtained by tensor splicing the processed first characteristic diagram and the second characteristic diagram after upsampling of the processed first characteristic diagram, but also comprises the characteristic extraction diagram obtained by tensor splicing of the spliced characteristic diagram and the third characteristic diagram after upsampling of the spliced characteristic diagram.

For example, referring to fig. 3, the backbone network outputs a first feature map with a size of 13 × 13, a second feature map with a size of 26 × 26, and a third feature map with a newly added output 52 × 52. In the neck network, 13 × 13 feature maps are convolved and output, and simultaneously, upsampling is performed, tensor splicing is performed on the upsampled feature maps and 26 × 26 feature maps, and then convolution processing is performed and output. For the newly added scale feature map, tensor splicing is carried out on the feature map with 26 × 26, then the spliced feature map output by convolution processing is subjected to upsampling, tensor splicing is carried out on the spliced feature map output by convolution processing and the third feature map with 52 × 52 newly added and output by the main network, feature extraction processing of convolution, regularization and excitation is carried out, and then convolution processing is carried out, and output is obtained. The method aims to perform secondary feature extraction on the newly added 52-by-52 scale feature map and improve the detection accuracy of low and slow small targets. Therefore, after the neck network performs feature fusion processing on the feature maps with the sizes of 13 × 13, 26 × 26 and 52 × 52 output by the main network, the feature maps with the sizes of 13 × 13, 26 × 26 and 52 × 52 are output.

In the embodiment of the invention, a coordinate attention processing module is additionally arranged and is used for carrying out coordinate attention processing on the data sample output by the neck network. The coordinate attention processing is characterized in that position information is embedded in the channel attention, so that the subsequent network can pay attention in a larger area, a large amount of calculation is not introduced, the method is suitable for a light-weight low-slow small target detection network, the detection and identification of low-slow small targets can be improved, too many calculation resources are not increased, and the detection speed is influenced.

Specifically, the coordinate attention processing may be performed on the data samples output by the neck network through the following steps S1-S3:

and S1, encoding each channel of the data samples along the horizontal coordinate direction and the vertical coordinate direction respectively to obtain a horizontal tensor matrix and a vertical tensor matrix.

For example, each channel is encoded along the horizontal coordinate direction and the vertical coordinate direction using pooling kernels of sizes (H,1) and (1, W) for the input data sample image X, respectively.

wherein W, H are the width and height of the image, respectively, and c isThe number of channels of the data sample image, X is the input data sample image,

the output of the c-th channel with height h,

is the output of the c-th channel with width w.

Aggregating characteristics of the data sample image X along the horizontal direction and the vertical direction respectively through the two coding formulas to obtain a horizontal tensor matrix B^hAnd the vertical tensor matrix B^w。

And S2, performing feature mapping on the horizontal tensor matrix and the vertical tensor matrix to obtain an feature mapping matrix, and decomposing the feature mapping matrix along the horizontal coordinate direction and the vertical coordinate direction respectively to obtain a horizontal attention weight matrix and a vertical attention weight matrix which have the same channel number as the data sample.

In the embodiment of the present invention, the manner of performing eigen mapping on the horizontal tensor matrix and the vertical tensor matrix to obtain an eigen mapping matrix is as follows:

f＝δ(F₁([B^h,B^w]))

Then, the eigen mapping matrix is decomposed into 2 separate tensor matrices along the horizontal coordinate direction and the vertical coordinate direction, and then the two tensor matrices become the horizontal attention weight matrix and the vertical attention weight matrix which are the same as the number of X channels of the data sample image through convolution operation of 1 × 1, respectively.

Specifically, the following manner may be used:

g^h＝σ(F_h(f^h))

g^w＝σ(F_w(f^w))

wherein, g^hAnd g^wRespectively a horizontal attention weight matrix and a vertical attention weight matrix of the data sample image X, sigma is a sigmoid activation function, F_hAnd F_wFor 1 × 1 convolution operation, f^hAnd f^wAnd respectively obtaining a horizontal feature mapping tensor matrix and a vertical feature mapping tensor matrix by decomposing the feature mapping matrix f.

And S3, multiplying each channel of the data sample by the corresponding horizontal attention weight matrix and the vertical attention weight matrix to obtain the data sample after coordinate attention processing.

In the embodiment of the invention, the coordinate attention weight processing is carried out on each data sample, so that the coordinate attention processing module can capture the long-range dependence along one spatial direction and store the accurate position information along the other spatial direction, and the network can be helped to more accurately position the interested target.

The coordinate attention processed data sample Y can be obtained as follows:

where c is the number of channels in the data sample image, y_c(i, j) is the c channel image, x, of the coordinate attention processed data sample Y_c(i, j) is a c-th channel image of the input data sample image X,

is the horizontal attention weight matrix for the c-th channel,

is the vertical attention weight matrix for the c-th channel.

Inputting each data sample after coordinate attention processing into a head network, wherein the head network adopts a YOLOv3 detection head to detect and identify low and slow small targets, and in order to enable the head network to identify the low and slow small targets in the feature map more quickly, a K-means clustering algorithm is used for pre-training to generate a pre-selected anchor frame based on a self-made data set on the basis of the YOLOv3 detection head.

In the embodiment of the invention, in order to enable the detection model to be more fit with the form of low and medium-low small targets in the infrared image, the network convergence is accelerated, and the training precision and speed are improved. Training the K-means clustering algorithm by using a training set in advance to obtain 9 preselected anchor frames with different sizes: (10,10),(14,13),(16,16),(22,18),(24,9),(26,21),(31,13),(40,33),(47,17).

Then, when the head network performs feature recognition on each data sample after coordinate attention processing, a plurality of preselected anchor frames are firstly generated, then the category and the offset are predicted for each preselected anchor frame, then the position of the preselected anchor frame is adjusted according to the predicted offset so as to obtain a predicted boundary frame, and finally the predicted boundary frame containing low and slow small targets is screened. Then, the head network identifies whether the frame contains the low-slow small target and the category, the central point abscissa, the central point ordinate, the target width and the target length of the low-slow small target according to each prediction boundary frame.

And step 204, obtaining the trained detection model.

In this step, using transfer learning, each infrared image of the training set in step 200 is used to train the infrared low-slow small target detection network of step 202. And (4) verifying each model trained to a set number of times by using a verification set, so as to select the training models when the loss function is converged and preselect a set number of training models. And testing the preselected training model by using the test set, and selecting the optimal detection model according to the evaluation index.

For example, the training times can be preset for 300 generations according to experience, and when the infrared low-slow small target detection network is trained for 300 generations by using the training set, a part of data samples in the verification set are used for verifying the training model trained for 300 generations, and it is found that the loss function is fast to converge. Training is continued for the training model of 300 generations and verification is carried out, and when the loss function is found to be converged when the training is carried out for 301 generations, the training models of 301 generations, 302 generations and 303 generations are preselected.

And testing the training models of 301 generation, 302 generation and 303 generation by using a test set, and evaluating the training results respectively, wherein the evaluation indexes are average precision, accuracy and recall rate.

The P-R curve is plotted according to accuracy and recall using the following formula:

wherein, P is the precision rate, R is the recall rate, C is the number of categories, AP is the area between a P-R curve drawn according to the precision rate and the recall rate and a coordinate axis, and mAP is the average AP value of each category.

And determining the training model with the best evaluation index as the optimal detection model according to the higher average precision and the larger AP and mAP values in the evaluation index and the better detection performance and effect of the model.

Finally, in step 104, according to the result output by the head network of the detection model, the detection result of the low and slow small targets in the infrared image is obtained.

And if the result output by the head network contains the low-slow small target, the detection result of the infrared image is the low-slow small target information containing the low-slow small target and the training label, otherwise, the detection structure of the infrared image is not containing the low-slow small target.

As shown in fig. 4 and 5, an embodiment of the present invention provides an infrared low-slow small target detection apparatus. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. In terms of hardware, as shown in fig. 4, a hardware architecture diagram of a computing device in which an infrared low-speed small target detection apparatus provided in the embodiment of the present invention is located is shown, where in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 4, the computing device in which the apparatus is located in the embodiment may generally include other hardware, such as a forwarding chip responsible for processing a packet, and the like. Taking a software implementation as an example, as shown in fig. 5, as a logical means, the device is formed by reading a corresponding computer program in a non-volatile memory into a memory by a CPU of a computing device where the device is located and running the computer program. The small target detection device that this embodiment provided hangs down slowly includes:

an image obtaining unit 501, configured to obtain an infrared image to be detected;

a detection model recognition unit 502, configured to input the infrared image into a detection model generated by pre-training; the detection model is obtained based on YOLOv4-tiny network training, and the YOLOv4-tiny network sequentially utilizes a backbone network, a neck network, a coordinate attention processing module and a head network to perform feature recognition on the infrared image;

a detection result generating unit 503, configured to obtain a detection result of a low-slow small target in the infrared image according to a result output by the head network of the detection model.

In an embodiment of the present invention, the detection model identification unit 502 is specifically configured to, when executing generation of a detection model, obtain a plurality of data samples labeled with labels, where the data samples are images including low and slow small targets; the number of data samples satisfies: the image background covers one or more of the specified background environment, the specified type of the target type, and the target size is not smaller than the set size; for each data sample, performing: inputting the data sample into the backbone network, inputting the data sample output by the backbone network into a neck network, performing coordinate attention processing on the data sample output by the neck network by using the coordinate attention processing module, and inputting the data sample after coordinate attention processing into the head network; and obtaining the trained detection model.

In an embodiment of the present invention, in the detection model identification unit 502, the data samples output by the backbone network include: the first feature map and the second feature map of different prediction scales; the coordinate attention processing module respectively carries out coordinate attention processing on the following data samples output by the neck network: the first feature map is processed and output after convolution processing is carried out on the first feature map, and the spliced feature map is obtained by tensor splicing the processed first feature map and the second feature map after upsampling is carried out on the processed first feature map.

In an embodiment of the present invention, in the detection model identifying unit 502, the data samples output by the backbone network further include: the third feature map is different from the first feature map and the second feature map in prediction scale; the coordinate attention processing module further performs coordinate attention processing on the following data samples output by the neck network: and carrying out tensor splicing on the spliced characteristic diagram and the third characteristic diagram after carrying out upsampling on the spliced characteristic diagram, and carrying out characteristic extraction processing to obtain a characteristic extraction diagram.

In an embodiment of the present invention, when the coordinate attention processing module performs coordinate attention processing on the data sample output by the neck network, the identifying unit 502 is specifically configured to encode each channel along a horizontal coordinate direction and a vertical coordinate direction for the data sample, so as to obtain a horizontal tensor matrix and a vertical tensor matrix; performing feature mapping on the horizontal tensor matrix and the vertical tensor matrix to obtain an feature mapping matrix, and decomposing the feature mapping matrix along the horizontal coordinate direction and the vertical coordinate direction respectively to obtain a horizontal attention weight matrix and a vertical attention weight matrix which have the same channel number as the data sample; and multiplying each channel of the data sample by the corresponding horizontal attention weight matrix and the vertical attention weight matrix to obtain the data sample after coordinate attention processing.

In an embodiment of the present invention, when the identifying unit 502 performs the encoding of the data samples along the horizontal coordinate direction and the vertical coordinate direction respectively to obtain the horizontal tensor matrix and the vertical tensor matrix, the identifying unit is specifically configured to encode the data samples along the horizontal coordinate direction according to the following formula:

the output of the c-th channel with height h,

is the output of the c-th channel with width w.

In an embodiment of the present invention, when performing the feature mapping on the horizontal tensor matrix and the vertical tensor matrix to obtain the feature mapping matrix, the identifying unit 502 is further specifically configured to obtain the feature mapping matrix according to the following formula:

f＝δ(F₁([B^h,B^w]))

It is understood that the illustrated structure of the embodiment of the present invention does not constitute a specific limitation to an infrared low-speed small target detection apparatus. In other embodiments of the present invention, an infrared low-slow small target detection apparatus may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Because the content of information interaction, execution process, and the like among the modules in the device is based on the same concept as the method embodiment of the present invention, specific content can be referred to the description in the method embodiment of the present invention, and is not described herein again.

The embodiment of the invention also provides computing equipment which comprises a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, the infrared low-slow small target detection method in any embodiment of the invention is realized.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the processor is enabled to execute an infrared low-slow small target detection method in any embodiment of the present invention.

Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.

In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.

Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.

Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.

Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion module connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion module to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an …" does not exclude the presence of other similar elements in a process, method, article, or apparatus that comprises the element.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An infrared low-slow small target detection method is characterized by comprising the following steps:

acquiring an infrared image to be detected;

2. The method of claim 1, wherein the training of the test model comprises:

and obtaining the trained detection model.

3. The method of claim 2,

the data samples output by the backbone network include: the first feature map and the second feature map of different prediction scales;

4. The method of claim 3,

the data samples output by the backbone network further comprise: the third feature map is different from the first feature map and the second feature map in prediction scale;

5. The method of any of claims 2-4, wherein the coordinate attention processing, with the coordinate attention processing module, of the data samples output by the neck network comprises:

6. The method of claim 5, wherein encoding each channel in a horizontal coordinate direction and a vertical coordinate direction for the pair of data samples, respectively, resulting in a horizontal tensor matrix and a vertical tensor matrix, comprises:

the output of the c-th channel with height h,

is the output of the c-th channel with width w.

7. The method of claim 5, wherein the characterizing the horizontal tensor matrix and the vertical tensor matrix to obtain an eigen mapping matrix comprises:

f＝δ(F₁([B^h,B^w]))

8. An infrared low-slow small target detection device is characterized by comprising:

9. A computing device comprising a memory having stored therein a computer program and a processor that, when executing the computer program, implements the method of any of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-7.