CN113989624A - Infrared low-slow small target detection method and device, computing equipment and storage medium - Google Patents
Infrared low-slow small target detection method and device, computing equipment and storage medium Download PDFInfo
- Publication number
- CN113989624A CN113989624A CN202111490363.4A CN202111490363A CN113989624A CN 113989624 A CN113989624 A CN 113989624A CN 202111490363 A CN202111490363 A CN 202111490363A CN 113989624 A CN113989624 A CN 113989624A
- Authority
- CN
- China
- Prior art keywords
- network
- coordinate
- matrix
- low
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 103
- 238000012545 processing Methods 0.000 claims abstract description 71
- 238000012549 training Methods 0.000 claims abstract description 41
- 238000000034 method Methods 0.000 claims abstract description 31
- 239000011159 matrix material Substances 0.000 claims description 83
- 238000010586 diagram Methods 0.000 claims description 27
- 238000013507 mapping Methods 0.000 claims description 24
- 238000004590 computer program Methods 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000007499 fusion processing Methods 0.000 description 2
- 238000003064 k means clustering Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an infrared low-slow small target detection method, an infrared low-slow small target detection device, a computing device and a storage medium, wherein the method comprises the following steps: acquiring an infrared image to be detected; inputting the infrared image into a detection model generated by pre-training; the detection model is obtained based on YOLOv4-tiny network training, and the YOLOv4-tiny network sequentially utilizes a backbone network, a neck network, a coordinate attention processing module and a head network to perform feature recognition on the infrared image; and obtaining the detection result of the low-slow small target in the infrared image according to the result output by the head network of the detection model. According to the scheme, the low-slow small target can be accurately detected according to the infrared image.
Description
Technical Field
The embodiment of the invention relates to the technical field of image processing, in particular to an infrared low-slow small target detection method, an infrared low-slow small target detection device, a computing device and a storage medium.
Background
The low-slow small target refers to an aircraft and an airborne object which have all or part of characteristics of flying speed less than 200km/h, flying height less than 1km, small radar reflection area and the like. With the development of aviation industry, especially the development of general aviation industry, especially the national requirement of the air force to open low-altitude aviation field, the problem of 'low-speed and small-size' flight becomes more and more prominent. Especially, the wide application of civil unmanned aerial vehicles brings great threat to air defense safety, and the monitoring and early warning can be correctly implemented only by effectively detecting the civil unmanned aerial vehicles.
The low-slow small target detection needs to face a complex scene, and the low-slow small target detection is low in flying height, small in size and strong in maneuverability, so that the existing model cannot directly detect the low-slow small target. Therefore, it is desirable to provide a method capable of accurately detecting low-slow small targets.
Disclosure of Invention
Based on the problem that the existing model cannot directly detect the low-slow small target, the embodiment of the invention provides an infrared low-slow small target detection method, an infrared low-slow small target detection device, a computing device and a storage medium, and the low-slow small target can be accurately detected.
In a first aspect, an embodiment of the present invention provides a method for detecting a small infrared low-slow target, including:
acquiring an infrared image to be detected;
inputting the infrared image into a detection model generated by pre-training; the detection model is obtained based on YOLOv4-tiny network training, and the YOLOv4-tiny network sequentially utilizes a backbone network, a neck network, a coordinate attention processing module and a head network to perform feature recognition on the infrared image;
and obtaining the detection result of the low-slow small target in the infrared image according to the result output by the head network of the detection model.
Preferably, the training mode of the detection model includes:
acquiring a plurality of data samples marked with labels, wherein the data samples are images comprising low and slow small targets; the number of data samples satisfies: the image background covers one or more of the specified background environment, the specified type of the target type, and the target size is not smaller than the set size;
for each data sample, performing: inputting the data sample into the backbone network, inputting the data sample output by the backbone network into a neck network, performing coordinate attention processing on the data sample output by the neck network by using the coordinate attention processing module, and inputting the data sample after coordinate attention processing into the head network;
and obtaining the trained detection model.
Preferably, the data samples output by the backbone network include: the first feature map and the second feature map of different prediction scales;
the coordinate attention processing module respectively carries out coordinate attention processing on the following data samples output by the neck network: the first feature map is processed and output after convolution processing is carried out on the first feature map, and the spliced feature map is obtained by tensor splicing the processed first feature map and the second feature map after upsampling is carried out on the processed first feature map.
Preferably, the data samples output by the backbone network further include: the third feature map is different from the first feature map and the second feature map in prediction scale;
the coordinate attention processing module further performs coordinate attention processing on the following data samples output by the neck network: and carrying out tensor splicing on the spliced characteristic diagram and the third characteristic diagram after carrying out upsampling on the spliced characteristic diagram, and carrying out characteristic extraction processing to obtain a characteristic extraction diagram.
Preferably, the performing, by the coordinate attention processing module, coordinate attention processing on the data sample output by the neck network includes:
coding each channel of the data samples along the horizontal coordinate direction and the vertical coordinate direction respectively to obtain a horizontal tensor matrix and a vertical tensor matrix;
performing feature mapping on the horizontal tensor matrix and the vertical tensor matrix to obtain an feature mapping matrix, and decomposing the feature mapping matrix along the horizontal coordinate direction and the vertical coordinate direction respectively to obtain a horizontal attention weight matrix and a vertical attention weight matrix which have the same channel number as the data sample;
and multiplying each channel of the data sample by the corresponding horizontal attention weight matrix and the vertical attention weight matrix to obtain the data sample after coordinate attention processing.
Preferably, the encoding each channel by the data samples along the horizontal coordinate direction and the vertical coordinate direction respectively to obtain a horizontal tensor matrix and a vertical tensor matrix, includes:
the data samples are encoded in the horizontal coordinate direction according to the following formula:
the data samples are encoded along the vertical coordinate direction according to the following formula:
wherein W and H are the width and height of the image, respectively, c is the number of channels of the data sample image, X is the input data sample,the output of the c-th channel with height h,is the output of the c-th channel with width w.
Preferably, the performing the feature mapping on the horizontal tensor matrix and the vertical tensor matrix to obtain the feature mapping matrix includes:
f=δ(F1([Bh,Bw]))
wherein δ is a nonlinear activation function, F1Is a 1 × 1 convolution operation, [,.]For concatenate operations along the spatial dimension, BhIs a matrix of horizontal tensors, BwIs the vertical tensor matrix and f is the resulting eigenmap matrix.
In a second aspect, an embodiment of the present invention further provides an infrared low-slow small target detection apparatus, including:
the image acquisition unit is used for acquiring an infrared image to be detected;
the detection model identification unit is used for inputting the infrared image into a detection model generated by pre-training; the detection model is obtained based on YOLOv4-tiny network training, and the YOLOv4-tiny network sequentially utilizes a backbone network, a neck network, a coordinate attention processing module and a head network to perform feature recognition on the infrared image;
and the detection result generation unit is used for obtaining the detection result of the low-slow small target in the infrared image according to the result output by the head network of the detection model.
In a third aspect, an embodiment of the present invention further provides a computing device, including a memory and a processor, where the memory stores a computer program, and the processor, when executing the computer program, implements the method described in any embodiment of this specification.
In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed in a computer, the computer program causes the computer to execute the method described in any embodiment of the present specification.
The embodiment of the invention provides an infrared low-slow small target detection method, an infrared low-slow small target detection device, a computing device and a storage medium, wherein a coordinate attention processing module is added to a YOLOv4-tiny network to embed position information into channel attention, so that when an infrared image is subjected to feature recognition, the position of a low-slow small target in the infrared image can be more accurately positioned, and the purpose of accurately detecting the low-slow small target according to the infrared image is achieved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a method for detecting a small infrared low-slow target according to an embodiment of the present invention;
FIG. 2 is a flow chart of another method for detecting small infrared low-slow targets according to an embodiment of the present invention;
FIG. 3 is a diagram of a network architecture for detecting small infrared low-slow objects according to an embodiment of the present invention;
FIG. 4 is a diagram of a hardware architecture of a computing device according to an embodiment of the present invention;
fig. 5 is a structural diagram of an infrared low-slow small target detection device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
As described above, because the low-slow small target detection needs to face a complex scene, and the low-slow small target detection has a low flying height, a small volume and strong maneuverability, the existing model cannot directly detect the low-slow small target, and therefore, it can be considered that the infrared low-slow small target detection model is applied to a device with strong maneuverability, and the situation that hardware resources are insufficient commonly exists in an airplane or other devices with strong maneuverability, so that the detection model needs to be lighter. For the problems of complex detection background and small size of the low-slow small target, a coordinate attention processing module is added to the YOLOv4-tiny network, so that the position of the low-slow small target in the infrared image can be more accurately positioned. Therefore, the scheme can accurately detect the low-slow small target according to the infrared image.
Specific implementations of the above concepts are described below.
Referring to fig. 1, an embodiment of the present invention provides a method for detecting a small infrared low-slow target, including:
and step 100, acquiring an infrared image to be detected.
And 104, obtaining a detection result of the low-slow small target in the infrared image according to a result output by the head network of the detection model.
In the embodiment of the invention, the coordinate attention processing module is added to the YOLOv4-tiny network, and the position information is embedded into the channel attention, so that the position of a low-slow small target in an infrared image can be more accurately positioned when the infrared image is subjected to feature recognition, and the purpose of accurately detecting the low-slow small target according to the infrared image is achieved.
The manner in which the various steps shown in fig. 1 are performed is described below.
First, in step 100, an infrared image to be detected is acquired.
In the embodiment of the invention, each frame of infrared image of the infrared video collected in real time or the collected infrared video can be used as the infrared image to be detected, and the infrared low-slow small target detection model is used for detecting and identifying the low-slow small target of each frame of infrared image, so that the aim of detecting the low-slow small target in real time is fulfilled.
Then, aiming at step 102, inputting the infrared image into a detection model generated by pre-training; the detection model is obtained based on YOLOv4-tiny network training, and the YOLOv4-tiny network carries out feature recognition on the infrared image by sequentially utilizing a main network, a neck network, a coordinate attention processing module and a head network.
When each frame of infrared image in the infrared video is required to be detected to be a low-slow small target through the infrared low-slow small target detection model, the infrared low-slow small target detection network needs to be trained in advance to obtain the detection model.
Then, the following describes a training method of the detection model.
In the embodiment of the present invention, referring to fig. 2, the detection model can be trained at least by using the following method as shown in steps 200-204:
In practical application, the data sample can be selected according to practical requirements, for example, a background image without low-slow small targets can be obtained as a negative sample. The background environment, the type and the size of the low-slow small target in the positive sample can be selected according to the requirement and the actual scene.
In the embodiment of the invention, the infrared image formed by self-shooting and low-slow small target infrared video on the network can be intercepted as the data sample. The image background needs to cover a specified background environment, such as one or more background environments of a building, a cloud layer, a headroom, a mountain and a tree, and the target type covers a specified type, for example, the target type may be an unmanned aerial vehicle, a balloon and a bird, and in addition, the target size in the data sample may be not smaller than a set size, for example, the set size is greater than 10 × 10 pixels; therefore, the diversity of data samples can be ensured, the characteristic learning capacity of the model on low and slow small targets is improved, and the model can accurately identify the targets from images in different background environments and different sizes.
In order to enable the detection model obtained after training to detect the low-slow small targets from the infrared image without the label, the low-slow small targets in the data sample need to be labeled in advance. In the embodiment of the invention, each low-slow small target in the data sample can be labeled by using a labelImg labeling tool, and the labeled label can include the category of the low-slow small target, the abscissa of the central point, the ordinate of the central point, the width of the target and the length of the target. When the infrared low-slow small target detection network is trained, in order to input the data sample and the corresponding label into the infrared low-slow small target detection network at the same time, the label file output by the labelImg labeling tool needs to be converted from an xml format to a txt format.
In addition, in order to train the infrared low-slow small target detection network and obtain an optimal detection model, a plurality of data samples with labels can be divided into a training set, a verification set and a test set according to the ratio of 8:1: 1.
in the embodiment of the invention, each data sample in the training set is respectively input into an infrared low-slow small target detection network, and the infrared low-slow small target detection network is based on a YOLOv4-tiny network and sequentially utilizes a backbone network, a neck network, a coordinate attention processing module and a head network to perform feature recognition on each data sample.
In the embodiment of the invention, the CSPdakrnet 53-Tiny network is adopted as the backbone network, when the residual block is processed, the original stack of the residual block is split into two parts, the backbone part continues to stack the residual block, the convolution result and the other part introduce a residual edge, the learning capacity of the backbone network is enhanced, the light weight can be ensured, the accuracy is ensured, and the memory cost is reduced. In addition, to increase the speed, the activation function is modified to LeakyReLU.
Fig. 3 shows a structure diagram of a modified YOLOv4-tiny network, in which boxes 1, 2, 3 and 4 are a backbone network, a neck network, a coordinate attention processing module and a head network, respectively. For each data sample, firstly, the data sample is subjected to feature extraction by using the backbone network to obtain feature maps with different scales. And then inputting the feature maps with different scales output by the backbone network into the neck network for feature fusion. In the embodiment of the invention, for the detection of the small target, the extraction of the characteristic information of the small target needs to be strengthened. In order to obtain richer small target position information, a scale is added in a data sample output by a backbone network, and then information of different scales is obtained by fusing a characteristic pyramid structure (FPN), so that the defect that the original Yolov4-Tiny network only comprises two prediction scales is overcome, and the accuracy of extracting the small target position information is improved.
Therefore, the data sample output by the backbone network comprises a third feature map, which is different from the first feature map and the second feature map in prediction scale, in addition to the first feature map and the second feature map with different prediction scale. For example, the original feature map output sizes of 13 × 13 and 26 × 26 feature maps are output from the main network, and in the embodiment of the present invention, the feature map with the newly added output size of 52 × 52 is used as the third feature map.
In the embodiment of the invention, the neck network improves a feature pyramid structure (FPN), and performs feature fusion and feature processing on a feature map with a newly added scale. The improved characteristic pyramid structure (FPN) network can fuse the characteristics between different output layers in a main network, so that the rich semantic information of a deep network can be ensured, and the geometric detail information of a low-level network can be obtained, thereby enhancing the extraction capability of the characteristics.
After the feature map output by the backbone network is subjected to feature fusion processing by the neck network, coordinate attention processing needs to be performed on the data sample output by the neck network. Due to the addition of a scale, the data sample output by the neck network not only comprises the processed first characteristic diagram output after convolution processing of the first characteristic diagram, the spliced characteristic diagram obtained by tensor splicing the processed first characteristic diagram and the second characteristic diagram after upsampling of the processed first characteristic diagram, but also comprises the characteristic extraction diagram obtained by tensor splicing of the spliced characteristic diagram and the third characteristic diagram after upsampling of the spliced characteristic diagram.
For example, referring to fig. 3, the backbone network outputs a first feature map with a size of 13 × 13, a second feature map with a size of 26 × 26, and a third feature map with a newly added output 52 × 52. In the neck network, 13 × 13 feature maps are convolved and output, and simultaneously, upsampling is performed, tensor splicing is performed on the upsampled feature maps and 26 × 26 feature maps, and then convolution processing is performed and output. For the newly added scale feature map, tensor splicing is carried out on the feature map with 26 × 26, then the spliced feature map output by convolution processing is subjected to upsampling, tensor splicing is carried out on the spliced feature map output by convolution processing and the third feature map with 52 × 52 newly added and output by the main network, feature extraction processing of convolution, regularization and excitation is carried out, and then convolution processing is carried out, and output is obtained. The method aims to perform secondary feature extraction on the newly added 52-by-52 scale feature map and improve the detection accuracy of low and slow small targets. Therefore, after the neck network performs feature fusion processing on the feature maps with the sizes of 13 × 13, 26 × 26 and 52 × 52 output by the main network, the feature maps with the sizes of 13 × 13, 26 × 26 and 52 × 52 are output.
In the embodiment of the invention, a coordinate attention processing module is additionally arranged and is used for carrying out coordinate attention processing on the data sample output by the neck network. The coordinate attention processing is characterized in that position information is embedded in the channel attention, so that the subsequent network can pay attention in a larger area, a large amount of calculation is not introduced, the method is suitable for a light-weight low-slow small target detection network, the detection and identification of low-slow small targets can be improved, too many calculation resources are not increased, and the detection speed is influenced.
Specifically, the coordinate attention processing may be performed on the data samples output by the neck network through the following steps S1-S3:
and S1, encoding each channel of the data samples along the horizontal coordinate direction and the vertical coordinate direction respectively to obtain a horizontal tensor matrix and a vertical tensor matrix.
For example, each channel is encoded along the horizontal coordinate direction and the vertical coordinate direction using pooling kernels of sizes (H,1) and (1, W) for the input data sample image X, respectively.
The data samples are encoded in the horizontal coordinate direction according to the following formula:
the data samples are encoded along the vertical coordinate direction according to the following formula:
wherein W, H are the width and height of the image, respectively, and c isThe number of channels of the data sample image, X is the input data sample image,the output of the c-th channel with height h,is the output of the c-th channel with width w.
Aggregating characteristics of the data sample image X along the horizontal direction and the vertical direction respectively through the two coding formulas to obtain a horizontal tensor matrix BhAnd the vertical tensor matrix Bw。
And S2, performing feature mapping on the horizontal tensor matrix and the vertical tensor matrix to obtain an feature mapping matrix, and decomposing the feature mapping matrix along the horizontal coordinate direction and the vertical coordinate direction respectively to obtain a horizontal attention weight matrix and a vertical attention weight matrix which have the same channel number as the data sample.
In the embodiment of the present invention, the manner of performing eigen mapping on the horizontal tensor matrix and the vertical tensor matrix to obtain an eigen mapping matrix is as follows:
f=δ(F1([Bh,Bw]))
wherein δ is a nonlinear activation function, F1Is a 1 × 1 convolution operation, [,.]For concatenate operations along the spatial dimension, BhIs a matrix of horizontal tensors, BwIs the vertical tensor matrix and f is the resulting eigenmap matrix.
Then, the eigen mapping matrix is decomposed into 2 separate tensor matrices along the horizontal coordinate direction and the vertical coordinate direction, and then the two tensor matrices become the horizontal attention weight matrix and the vertical attention weight matrix which are the same as the number of X channels of the data sample image through convolution operation of 1 × 1, respectively.
Specifically, the following manner may be used:
gh=σ(Fh(fh))
gw=σ(Fw(fw))
wherein, ghAnd gwRespectively a horizontal attention weight matrix and a vertical attention weight matrix of the data sample image X, sigma is a sigmoid activation function, FhAnd FwFor 1 × 1 convolution operation, fhAnd fwAnd respectively obtaining a horizontal feature mapping tensor matrix and a vertical feature mapping tensor matrix by decomposing the feature mapping matrix f.
And S3, multiplying each channel of the data sample by the corresponding horizontal attention weight matrix and the vertical attention weight matrix to obtain the data sample after coordinate attention processing.
In the embodiment of the invention, the coordinate attention weight processing is carried out on each data sample, so that the coordinate attention processing module can capture the long-range dependence along one spatial direction and store the accurate position information along the other spatial direction, and the network can be helped to more accurately position the interested target.
The coordinate attention processed data sample Y can be obtained as follows:
where c is the number of channels in the data sample image, yc(i, j) is the c channel image, x, of the coordinate attention processed data sample Yc(i, j) is a c-th channel image of the input data sample image X,is the horizontal attention weight matrix for the c-th channel,is the vertical attention weight matrix for the c-th channel.
Inputting each data sample after coordinate attention processing into a head network, wherein the head network adopts a YOLOv3 detection head to detect and identify low and slow small targets, and in order to enable the head network to identify the low and slow small targets in the feature map more quickly, a K-means clustering algorithm is used for pre-training to generate a pre-selected anchor frame based on a self-made data set on the basis of the YOLOv3 detection head.
In the embodiment of the invention, in order to enable the detection model to be more fit with the form of low and medium-low small targets in the infrared image, the network convergence is accelerated, and the training precision and speed are improved. Training the K-means clustering algorithm by using a training set in advance to obtain 9 preselected anchor frames with different sizes: (10,10),(14,13),(16,16),(22,18),(24,9),(26,21),(31,13),(40,33),(47,17).
Then, when the head network performs feature recognition on each data sample after coordinate attention processing, a plurality of preselected anchor frames are firstly generated, then the category and the offset are predicted for each preselected anchor frame, then the position of the preselected anchor frame is adjusted according to the predicted offset so as to obtain a predicted boundary frame, and finally the predicted boundary frame containing low and slow small targets is screened. Then, the head network identifies whether the frame contains the low-slow small target and the category, the central point abscissa, the central point ordinate, the target width and the target length of the low-slow small target according to each prediction boundary frame.
And step 204, obtaining the trained detection model.
In this step, using transfer learning, each infrared image of the training set in step 200 is used to train the infrared low-slow small target detection network of step 202. And (4) verifying each model trained to a set number of times by using a verification set, so as to select the training models when the loss function is converged and preselect a set number of training models. And testing the preselected training model by using the test set, and selecting the optimal detection model according to the evaluation index.
For example, the training times can be preset for 300 generations according to experience, and when the infrared low-slow small target detection network is trained for 300 generations by using the training set, a part of data samples in the verification set are used for verifying the training model trained for 300 generations, and it is found that the loss function is fast to converge. Training is continued for the training model of 300 generations and verification is carried out, and when the loss function is found to be converged when the training is carried out for 301 generations, the training models of 301 generations, 302 generations and 303 generations are preselected.
And testing the training models of 301 generation, 302 generation and 303 generation by using a test set, and evaluating the training results respectively, wherein the evaluation indexes are average precision, accuracy and recall rate.
The P-R curve is plotted according to accuracy and recall using the following formula:
wherein, P is the precision rate, R is the recall rate, C is the number of categories, AP is the area between a P-R curve drawn according to the precision rate and the recall rate and a coordinate axis, and mAP is the average AP value of each category.
And determining the training model with the best evaluation index as the optimal detection model according to the higher average precision and the larger AP and mAP values in the evaluation index and the better detection performance and effect of the model.
Finally, in step 104, according to the result output by the head network of the detection model, the detection result of the low and slow small targets in the infrared image is obtained.
And if the result output by the head network contains the low-slow small target, the detection result of the infrared image is the low-slow small target information containing the low-slow small target and the training label, otherwise, the detection structure of the infrared image is not containing the low-slow small target.
As shown in fig. 4 and 5, an embodiment of the present invention provides an infrared low-slow small target detection apparatus. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. In terms of hardware, as shown in fig. 4, a hardware architecture diagram of a computing device in which an infrared low-speed small target detection apparatus provided in the embodiment of the present invention is located is shown, where in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 4, the computing device in which the apparatus is located in the embodiment may generally include other hardware, such as a forwarding chip responsible for processing a packet, and the like. Taking a software implementation as an example, as shown in fig. 5, as a logical means, the device is formed by reading a corresponding computer program in a non-volatile memory into a memory by a CPU of a computing device where the device is located and running the computer program. The small target detection device that this embodiment provided hangs down slowly includes:
an image obtaining unit 501, configured to obtain an infrared image to be detected;
a detection model recognition unit 502, configured to input the infrared image into a detection model generated by pre-training; the detection model is obtained based on YOLOv4-tiny network training, and the YOLOv4-tiny network sequentially utilizes a backbone network, a neck network, a coordinate attention processing module and a head network to perform feature recognition on the infrared image;
a detection result generating unit 503, configured to obtain a detection result of a low-slow small target in the infrared image according to a result output by the head network of the detection model.
In an embodiment of the present invention, the detection model identification unit 502 is specifically configured to, when executing generation of a detection model, obtain a plurality of data samples labeled with labels, where the data samples are images including low and slow small targets; the number of data samples satisfies: the image background covers one or more of the specified background environment, the specified type of the target type, and the target size is not smaller than the set size; for each data sample, performing: inputting the data sample into the backbone network, inputting the data sample output by the backbone network into a neck network, performing coordinate attention processing on the data sample output by the neck network by using the coordinate attention processing module, and inputting the data sample after coordinate attention processing into the head network; and obtaining the trained detection model.
In an embodiment of the present invention, in the detection model identification unit 502, the data samples output by the backbone network include: the first feature map and the second feature map of different prediction scales; the coordinate attention processing module respectively carries out coordinate attention processing on the following data samples output by the neck network: the first feature map is processed and output after convolution processing is carried out on the first feature map, and the spliced feature map is obtained by tensor splicing the processed first feature map and the second feature map after upsampling is carried out on the processed first feature map.
In an embodiment of the present invention, in the detection model identifying unit 502, the data samples output by the backbone network further include: the third feature map is different from the first feature map and the second feature map in prediction scale; the coordinate attention processing module further performs coordinate attention processing on the following data samples output by the neck network: and carrying out tensor splicing on the spliced characteristic diagram and the third characteristic diagram after carrying out upsampling on the spliced characteristic diagram, and carrying out characteristic extraction processing to obtain a characteristic extraction diagram.
In an embodiment of the present invention, when the coordinate attention processing module performs coordinate attention processing on the data sample output by the neck network, the identifying unit 502 is specifically configured to encode each channel along a horizontal coordinate direction and a vertical coordinate direction for the data sample, so as to obtain a horizontal tensor matrix and a vertical tensor matrix; performing feature mapping on the horizontal tensor matrix and the vertical tensor matrix to obtain an feature mapping matrix, and decomposing the feature mapping matrix along the horizontal coordinate direction and the vertical coordinate direction respectively to obtain a horizontal attention weight matrix and a vertical attention weight matrix which have the same channel number as the data sample; and multiplying each channel of the data sample by the corresponding horizontal attention weight matrix and the vertical attention weight matrix to obtain the data sample after coordinate attention processing.
In an embodiment of the present invention, when the identifying unit 502 performs the encoding of the data samples along the horizontal coordinate direction and the vertical coordinate direction respectively to obtain the horizontal tensor matrix and the vertical tensor matrix, the identifying unit is specifically configured to encode the data samples along the horizontal coordinate direction according to the following formula:
the data samples are encoded along the vertical coordinate direction according to the following formula:
wherein W and H are the width and height of the image, respectively, c is the number of channels of the data sample image, X is the input data sample,the output of the c-th channel with height h,is the output of the c-th channel with width w.
In an embodiment of the present invention, when performing the feature mapping on the horizontal tensor matrix and the vertical tensor matrix to obtain the feature mapping matrix, the identifying unit 502 is further specifically configured to obtain the feature mapping matrix according to the following formula:
f=δ(F1([Bh,Bw]))
wherein δ is a nonlinear activation function, F1Is a 1 × 1 convolution operation, [,.]For concatenate operations along the spatial dimension, BhIs a matrix of horizontal tensors, BwIs the vertical tensor matrix and f is the resulting eigenmap matrix.
It is understood that the illustrated structure of the embodiment of the present invention does not constitute a specific limitation to an infrared low-speed small target detection apparatus. In other embodiments of the present invention, an infrared low-slow small target detection apparatus may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Because the content of information interaction, execution process, and the like among the modules in the device is based on the same concept as the method embodiment of the present invention, specific content can be referred to the description in the method embodiment of the present invention, and is not described herein again.
The embodiment of the invention also provides computing equipment which comprises a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, the infrared low-slow small target detection method in any embodiment of the invention is realized.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the processor is enabled to execute an infrared low-slow small target detection method in any embodiment of the present invention.
Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion module connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion module to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an …" does not exclude the presence of other similar elements in a process, method, article, or apparatus that comprises the element.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. An infrared low-slow small target detection method is characterized by comprising the following steps:
acquiring an infrared image to be detected;
inputting the infrared image into a detection model generated by pre-training; the detection model is obtained based on YOLOv4-tiny network training, and the YOLOv4-tiny network sequentially utilizes a backbone network, a neck network, a coordinate attention processing module and a head network to perform feature recognition on the infrared image;
and obtaining the detection result of the low-slow small target in the infrared image according to the result output by the head network of the detection model.
2. The method of claim 1, wherein the training of the test model comprises:
acquiring a plurality of data samples marked with labels, wherein the data samples are images comprising low and slow small targets; the number of data samples satisfies: the image background covers one or more of the specified background environment, the specified type of the target type, and the target size is not smaller than the set size;
for each data sample, performing: inputting the data sample into the backbone network, inputting the data sample output by the backbone network into a neck network, performing coordinate attention processing on the data sample output by the neck network by using the coordinate attention processing module, and inputting the data sample after coordinate attention processing into the head network;
and obtaining the trained detection model.
3. The method of claim 2,
the data samples output by the backbone network include: the first feature map and the second feature map of different prediction scales;
the coordinate attention processing module respectively carries out coordinate attention processing on the following data samples output by the neck network: the first feature map is processed and output after convolution processing is carried out on the first feature map, and the spliced feature map is obtained by tensor splicing the processed first feature map and the second feature map after upsampling is carried out on the processed first feature map.
4. The method of claim 3,
the data samples output by the backbone network further comprise: the third feature map is different from the first feature map and the second feature map in prediction scale;
the coordinate attention processing module further performs coordinate attention processing on the following data samples output by the neck network: and carrying out tensor splicing on the spliced characteristic diagram and the third characteristic diagram after carrying out upsampling on the spliced characteristic diagram, and carrying out characteristic extraction processing to obtain a characteristic extraction diagram.
5. The method of any of claims 2-4, wherein the coordinate attention processing, with the coordinate attention processing module, of the data samples output by the neck network comprises:
coding each channel of the data samples along the horizontal coordinate direction and the vertical coordinate direction respectively to obtain a horizontal tensor matrix and a vertical tensor matrix;
performing feature mapping on the horizontal tensor matrix and the vertical tensor matrix to obtain an feature mapping matrix, and decomposing the feature mapping matrix along the horizontal coordinate direction and the vertical coordinate direction respectively to obtain a horizontal attention weight matrix and a vertical attention weight matrix which have the same channel number as the data sample;
and multiplying each channel of the data sample by the corresponding horizontal attention weight matrix and the vertical attention weight matrix to obtain the data sample after coordinate attention processing.
6. The method of claim 5, wherein encoding each channel in a horizontal coordinate direction and a vertical coordinate direction for the pair of data samples, respectively, resulting in a horizontal tensor matrix and a vertical tensor matrix, comprises:
the data samples are encoded in the horizontal coordinate direction according to the following formula:
the data samples are encoded along the vertical coordinate direction according to the following formula:
7. The method of claim 5, wherein the characterizing the horizontal tensor matrix and the vertical tensor matrix to obtain an eigen mapping matrix comprises:
f=δ(F1([Bh,Bw]))
wherein δ is a nonlinear activation function, F1Is a 1 × 1 convolution operation, [,.]For concatenate operations along the spatial dimension, BhIs a matrix of horizontal tensors, BwIs the vertical tensor matrix and f is the resulting eigenmap matrix.
8. An infrared low-slow small target detection device is characterized by comprising:
the image acquisition unit is used for acquiring an infrared image to be detected;
the detection model identification unit is used for inputting the infrared image into a detection model generated by pre-training; the detection model is obtained based on YOLOv4-tiny network training, and the YOLOv4-tiny network sequentially utilizes a backbone network, a neck network, a coordinate attention processing module and a head network to perform feature recognition on the infrared image;
and the detection result generation unit is used for obtaining the detection result of the low-slow small target in the infrared image according to the result output by the head network of the detection model.
9. A computing device comprising a memory having stored therein a computer program and a processor that, when executing the computer program, implements the method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111490363.4A CN113989624B (en) | 2021-12-08 | 2021-12-08 | Infrared low-speed small target detection method, device, computing equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111490363.4A CN113989624B (en) | 2021-12-08 | 2021-12-08 | Infrared low-speed small target detection method, device, computing equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113989624A true CN113989624A (en) | 2022-01-28 |
CN113989624B CN113989624B (en) | 2024-10-18 |
Family
ID=79733495
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111490363.4A Active CN113989624B (en) | 2021-12-08 | 2021-12-08 | Infrared low-speed small target detection method, device, computing equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113989624B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114821519A (en) * | 2022-03-21 | 2022-07-29 | 上海应用技术大学 | Traffic sign identification method and system based on coordinate attention |
CN118097475A (en) * | 2024-04-28 | 2024-05-28 | 北京鲲鹏凌昊智能技术有限公司 | Low-speed small target detection method, electronic equipment and computer program product |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112464910A (en) * | 2020-12-18 | 2021-03-09 | 杭州电子科技大学 | Traffic sign identification method based on YOLO v4-tiny |
CN112733749A (en) * | 2021-01-14 | 2021-04-30 | 青岛科技大学 | Real-time pedestrian detection method integrating attention mechanism |
CN113344847A (en) * | 2021-04-21 | 2021-09-03 | 安徽工业大学 | Long tail clamp defect detection method and system based on deep learning |
-
2021
- 2021-12-08 CN CN202111490363.4A patent/CN113989624B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112464910A (en) * | 2020-12-18 | 2021-03-09 | 杭州电子科技大学 | Traffic sign identification method based on YOLO v4-tiny |
CN112733749A (en) * | 2021-01-14 | 2021-04-30 | 青岛科技大学 | Real-time pedestrian detection method integrating attention mechanism |
CN113344847A (en) * | 2021-04-21 | 2021-09-03 | 安徽工业大学 | Long tail clamp defect detection method and system based on deep learning |
Non-Patent Citations (2)
Title |
---|
HUIXUAN FU等: "Improved YOLOv4 Marine Target Detection Combined with CBAM", SYMMETRY, 8 April 2021 (2021-04-08) * |
MINGFENG ZHA等: "A Lightweight YOLOv4-Based Forestry Pest Detection Method Using Coordinate Attention and Feature Fusion", ENTROPY, 27 November 2021 (2021-11-27) * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114821519A (en) * | 2022-03-21 | 2022-07-29 | 上海应用技术大学 | Traffic sign identification method and system based on coordinate attention |
CN114821519B (en) * | 2022-03-21 | 2024-05-21 | 上海应用技术大学 | Traffic sign recognition method and system based on coordinate attention |
CN118097475A (en) * | 2024-04-28 | 2024-05-28 | 北京鲲鹏凌昊智能技术有限公司 | Low-speed small target detection method, electronic equipment and computer program product |
Also Published As
Publication number | Publication date |
---|---|
CN113989624B (en) | 2024-10-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111507370A (en) | Method and device for obtaining sample image of inspection label in automatic labeling image | |
CN111507371B (en) | Method and device for automatically evaluating reliability of label on training image | |
CN113989624B (en) | Infrared low-speed small target detection method, device, computing equipment and storage medium | |
CN112070135A (en) | Power equipment image detection method and device, power equipment and storage medium | |
CN112837315A (en) | Transmission line insulator defect detection method based on deep learning | |
CN111178451A (en) | License plate detection method based on YOLOv3 network | |
CN112528934A (en) | Improved YOLOv3 traffic sign detection method based on multi-scale feature layer | |
CN113139594B (en) | Self-adaptive detection method for airborne image unmanned aerial vehicle target | |
CN111611933B (en) | Information extraction method and system for document image | |
CN114612835A (en) | Unmanned aerial vehicle target detection model based on YOLOv5 network | |
CN115830399B (en) | Classification model training method, device, equipment, storage medium and program product | |
CN114170531B (en) | Infrared image target detection method and device based on difficult sample transfer learning | |
CN112766409A (en) | Feature fusion method for remote sensing image target detection | |
CN117011616B (en) | Image content auditing method and device, storage medium and electronic equipment | |
CN117576073A (en) | Road defect detection method, device and medium based on improved YOLOv8 model | |
CN115909280A (en) | Traffic sign recognition algorithm based on multi-head attention mechanism | |
CN116823793A (en) | Device defect detection method, device, electronic device and readable storage medium | |
CN116964588A (en) | Target detection method, target detection model training method and device | |
CN116597326A (en) | Unmanned aerial vehicle aerial photography small target detection method based on improved YOLOv7 algorithm | |
CN111241905A (en) | Power transmission line nest detection method based on improved SSD algorithm | |
CN113392837A (en) | License plate recognition method and device based on deep learning | |
CN113033518B (en) | Image detection method, image detection device, electronic equipment and storage medium | |
Bellam et al. | A Practical Approach of Recognizing and Detecting Traffic Signs using Deep Neural Network Model | |
CN117876942B (en) | Unmanned aerial vehicle and bird monitoring method based on convolutional neural network | |
CN116630876A (en) | Airport scene monitoring image target detection method based on YOLO frame |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |