CN111860637B - Single-shot multi-frame infrared target detection method - Google Patents

Single-shot multi-frame infrared target detection method Download PDF

Info

Publication number
CN111860637B
CN111860637B CN202010689129.3A CN202010689129A CN111860637B CN 111860637 B CN111860637 B CN 111860637B CN 202010689129 A CN202010689129 A CN 202010689129A CN 111860637 B CN111860637 B CN 111860637B
Authority
CN
China
Prior art keywords
feature
network
scale
fusion
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010689129.3A
Other languages
Chinese (zh)
Other versions
CN111860637A (en
Inventor
刘刚
刘森
刘中华
肖春宝
曹紫绚
张文波
张培根
许来祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Science and Technology
Original Assignee
Henan University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Science and Technology filed Critical Henan University of Science and Technology
Priority to CN202010689129.3A priority Critical patent/CN111860637B/en
Publication of CN111860637A publication Critical patent/CN111860637A/en
Application granted granted Critical
Publication of CN111860637B publication Critical patent/CN111860637B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The single-shot multi-frame infrared target detection method is characterized by using a characteristic pyramid network, and based on the inequality of the contribution of each characteristic layer to fusion output, the bidirectional multi-scale characteristic weighted fusion between the characteristic layer with low resolution and strong semantics and the characteristic layer with high resolution and weak semantics is realized, so that an auxiliary network is constructed; and from the cross-over ratio, simultaneously considering the influence of the overlapping area and the non-overlapping area on the objective function, constructing the detector positioning loss which keeps invariance to the target scale change, constructing the target detector, and improving the sensitivity of the detection model to small target positioning errors. And taking the VGG16 convolutional neural network as a characteristic extraction network and integrating the VGG16 convolutional neural network with an auxiliary network and a target detector to form a single-shot multi-frame infrared target detection model integrating multi-scale characteristic weighted fusion and scale invariance positioning loss. The invention has the advantages of autonomous learning capability and high detection rate, and is an effective way for solving the problem of detection of the infrared imaging guidance target in the complex environment.

Description

Single-shot multi-frame infrared target detection method
Technical Field
The invention belongs to the technical field of infrared target detection, and particularly relates to a single-shot multi-frame infrared target detection method.
Background
At present, the target detection is the basis of the infrared imaging guidance automatic target recognition system to complete subsequent tasks such as recognition, tracking and the like, the existing system does not have the autonomous learning capability of target characteristics, and the task environment cannot be considered once exceeding the condition of pre-planning. The single-stage target detection based on deep learning has autonomous learning capability and high calculation efficiency, and is an effective way for solving the problem of infrared imaging guidance target detection in a complex environment. The single-shot multi-frame detector (Single Shot MultiBox Detector, SSD) is a classical single-stage detection model. The SSD destination detection model can be broken down into two modules, a feature extractor and a destination detector. The feature extractor is responsible for extracting features from the input image, and the target detector predicts the target location and class using the extracted features. The feature extractor comprises two parts: a feature extraction network and an auxiliary network. The feature extraction network is generally formed by modifying an image classification network, so that a transfer learning effect can be realized by utilizing weights pre-trained on an image classification data set. The auxiliary network is used for carrying out operations such as transformation, fusion and the like on the characteristics output by the characteristic extraction network. The target detector is made up of several fully connected or convolved layers, each of which can be considered as a collection of several detectors. Each detector can only output one detection result, and the number of detectors determines the upper limit of the number of targets. Each detector consists of 1 locator and 1 classifier. The locator is responsible for mapping the input features to target location information and the classifier is responsible for mapping the input features to target category information.
At present, two main challenges exist in infrared imaging guidance target detection based on SSD model: (1) Effectively representing and processing multi-scale features is one of the major difficulties in SSD model feature extractor design. Although the existing method improves the detection capability of a single-stage model on targets, particularly small targets, by improving the representation and fusion method of multi-scale features, the detection capability of the single-stage model has larger improvement space due to the singleness of connection forms of different scale feature layers and the equality of fusion modes. Higher studyThe effective feature selection and fusion mechanism realizes the fusion of a low-resolution and strong-semantic feature layer and a high-resolution and weak-semantic feature layer, and is a key thought for improving the detection capability of infrared small targets; (2) The loss of target detector positioning of SSD model generally employs L 1 、L 2 The loss function is calculated without taking into account the effect of the target scale change on the loss function, but for the same dimensional errors, small targets are clearly more sensitive than large targets, which reduces the sensitivity of the model to small target positioning errors.
Disclosure of Invention
In view of the above, in order to solve the above-mentioned shortcomings of the prior art, the present invention aims to provide a single-shot multi-frame infrared target detection method, which combines multi-scale feature weighted fusion with scale invariance positioning loss, has autonomous learning ability and high detection rate, and is an effective way for solving the problem of infrared imaging guidance target detection in complex environments.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a single-shot multi-frame infrared target detection method comprises the following steps:
s1: starting from the feature pyramid network, describing inequality of contribution of each feature layer to fusion output based on a learnable weight, realizing bidirectional multi-scale feature weighted fusion between the feature layer with low resolution and strong semantics and the feature layer with high resolution and weak semantics, and constructing an auxiliary network;
s2: from the cross-over ratio, simultaneously considering the influence of the overlapping area and the non-overlapping area on the objective function, constructing the detector positioning loss which keeps invariance to the objective scale change, and constructing the objective detector;
s3: taking a VGG16 convolutional neural network as a feature extraction network and integrating the VGG16 convolutional neural network with an auxiliary network and a target detector to form a single-shot multi-frame infrared target detection model integrating multi-scale feature weighted fusion and scale invariance positioning loss;
s31: adding a weight for representing the importance of each input feature in the feature fusion process, and learning the importance of the input feature through network training;
s32: integrating multi-scale bidirectional jump connection and rapid normalization weighting feature fusion, serving as a functional network layer, and repeating for a plurality of times to construct a bidirectional feature weighting fusion pyramid network serving as an auxiliary network;
s33: focusing on the influence of the overlapping area and the non-overlapping area on the objective function, integrating the factors which keep invariance to the objective scale change, and constructing the positioning loss of the detector;
s34: the feature extraction network, the auxiliary network and the target detector are combined to form a single-shot multi-frame infrared target detection model for realizing fusion of multi-scale feature weighted fusion and scale invariance positioning loss.
Further, the step S1 specifically includes: starting from the thought of merging multi-scale features from top to bottom of a feature pyramid network (Feature Pyramid Network, FPN), adding a bottom-to-top path on the FPN to further merge the features to form a bidirectional path; on the basis, if the node has only one input edge and no feature fusion exists, the contribution of the node to the feature fusion network is small, and the node is removed; adding a jump connection for non-adjacent nodes at the same level, and fusing more features without increasing the cost; in addition, to achieve higher level feature fusion, the bi-directional path is taken as a functional network layer and repeated multiple times.
Further, the two-way multi-scale feature weighted fusion in the step S1 specifically includes the following steps:
in the feature fusion process, a weight representing the importance of each input feature is added, the importance of the input feature is learned through network training, and a weighted fusion equation is as follows:
here ω i Is a weight which can be learned, F i Is the ith input feature of the current layer, at each ω i Relu is then applied to ensure ω i >0,Epsilon is a constant for the sum of weights of the various input features of the current layer.
Further, in the step S2, a loss of positioning of the detector, which remains unchanged for the target scale change, is constructed, and the method specifically includes the following steps:
from the cross-over ratio (Intersection over Union, ioU), construct locator loss that remains constant for target scale variation:
L loc =1-GIoC;
wherein A and B are respectively a prediction anchor frame and a real anchor frame, and C is the closure of the prediction anchor frame and the real anchor frame; the most important of the overlapping area is the intersection area of the prediction anchor frame and the real anchor frame, the most important of the non-overlapping area is the area of the closure C, which is obtained by removing the union of the prediction anchor frame and the real anchor frame, and the non-overlapping area is secondarily influenced by removing the union part of the intersection area; GIoC focuses on the influence of overlapping and non-overlapping areas on the objective function at the same time and is less computationally intensive.
Further, the VGG16 convolutional neural network in step S3 includes: 13 convolutional layers, 5 pooling layers and 2 full connection layers, and the VGG16 convolutional neural network constructs a characteristic extraction network of the single-shot multi-frame detector.
The beneficial effects of the invention are as follows:
the single-shot multi-frame infrared target detection method disclosed by the invention is integrated with multi-scale feature weighted fusion and scale invariance positioning loss, has autonomous learning capability and high detection rate, and is an effective way for solving the problem of infrared imaging guidance target detection in a complex environment;
the invention starts from a feature pyramid network, and describes the inequality of the contribution of each feature layer to fusion output based on the learnable weight, so as to realize the bidirectional multi-scale feature weighted fusion between the feature layer with low resolution and strong semantics and the feature layer with high resolution and weak semantics; and from the cross-over ratio, simultaneously considering the influence of the overlapping area and the non-overlapping area on the objective function, constructing the detector positioning loss which keeps invariance to the target scale change, and improving the sensitivity of the detection model to small target positioning errors. The model provided by the invention has autonomous learning capability and high detection rate, and is an effective way for solving the problem of detection of the infrared imaging guidance target in a complex environment.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a multi-scale feature layer connection strategy of the present invention;
FIG. 2 is a diagram of training set and validation set loss variation according to the present invention;
FIG. 3 is a graph showing the AP change of the infrared target P-R curve according to the present invention.
Detailed Description
Specific examples are given below to further clarify, complete and detailed description of the technical scheme of the invention. The present embodiment is a preferred embodiment based on the technical solution of the present invention, but the scope of the present invention is not limited to the following embodiments.
A single-shot multi-frame infrared target detection method comprises the following steps:
s1: starting from the feature pyramid network, describing inequality of contribution of each feature layer to fusion output based on a learnable weight, realizing bidirectional multi-scale feature weighted fusion between the feature layer with low resolution and strong semantics and the feature layer with high resolution and weak semantics, and constructing an auxiliary network;
s2: from the cross-over ratio, simultaneously considering the influence of the overlapping area and the non-overlapping area on the objective function, constructing the detector positioning loss which keeps invariance to the objective scale change, and constructing the objective detector; the sensitivity of the detection model to small target positioning errors is improved;
s3: taking a VGG16 convolutional neural network as a feature extraction network and integrating the VGG16 convolutional neural network with an auxiliary network and a target detector to form a single-shot multi-frame infrared target detection model integrating multi-scale feature weighted fusion and scale invariance positioning loss;
s31: adding a weight for representing the importance of each input feature in the feature fusion process, and learning the importance of the input feature through network training;
s32: integrating multi-scale bidirectional jump connection and rapid normalization weighting feature fusion, serving as a functional network layer, and repeating for a plurality of times to construct a bidirectional feature weighting fusion pyramid network serving as an auxiliary network;
s33: focusing on the influence of the overlapping area and the non-overlapping area on the objective function, integrating the factors which keep invariance to the objective scale change, and constructing the positioning loss of the detector;
to improve the small target detection capability of the single-stage model, the factor that keeps invariance to the target scale change should be integrated into the detector positioning loss function, and the intersection ratio IoU has invariance to the target scale change:
a and B are respectively a prediction anchor frame and a real anchor frame;
however, ioU only focuses on the overlapping area of the predicted anchor frame and the real anchor frame, and other non-overlapping areas are also required to be focused on in order to better reflect the overlapping ratio of the predicted frame and the real frame. Based on this, the invention improves IoU:
c is the closure of the predicted anchor frame and the real anchor frame. The most important of the overlapping area is the intersection area of the predicted frame and the real frame, the most important of the non-overlapping area is the area of the closure C where the predicted frame and the real frame are combined, and the combined area of the removed intersection area is the secondary influence non-overlapping area, and the GIoC reflects the overlapping area and the non-overlapping area at the same time;
the GIoC-based detector positioning penalty for maintaining invariance to target dimensional changes can be designed as:
L loc =1-GIoC;
s34: the feature extraction network, the auxiliary network and the target detector are combined to form a single-shot multi-frame infrared target detection model for realizing fusion of multi-scale feature weighted fusion and scale invariance positioning loss. The batch size was 32, iterating 20 ten thousand times. The initial learning rate was set to 0.001, divided by 10 at iterations to 5 ten thousand, 10 ten thousand and 15 ten thousand, respectively, with a random gradient descent with a magnitude of 0.9 and a weight decay parameter of 0.0005 for network optimization.
Furthermore, in a classical SSD single-stage target detection network, the features close to the bottom layer tend to have less semantic information, but the position information is rich; the feature semantic information near the top layer is rich, but the position information is rough. In order to improve the detection rate of infrared small targets, the invention starts from FPN and fuses the feature layers with different scales. Further, as an optimization scheme, step S1 is to construct a bidirectional jump connection FPN. The step S1 specifically includes: starting from the thought of merging multi-scale features from top to bottom of a feature pyramid network (Feature Pyramid Network, FPN), adding a bottom-to-top path on the FPN to further merge the features to form a bidirectional path; on the basis, if the node has only one input edge and no feature fusion exists, the contribution of the node to the feature fusion network is small, and the node is removed; adding a jump connection for non-adjacent nodes at the same level, and fusing more features without increasing the cost; in addition, to achieve higher level feature fusion, the bi-directional path is taken as a functional network layer and repeated multiple times. Further, adding a bottom-up path on the FPN to further fuse the features; if the node has only one input edge and no feature fusion exists, the contribution of the node to the feature fusion network is small and the node can be removed; a jump connection is added for non-adjacent nodes at the same level, so that more features can be fused without increasing cost, and further, the two-way jump connection FPN is realized. The structure of the two-way jump connection FPN is shown in fig. 1 (d).
Further, the two-way multi-scale feature weighted fusion in the step S1 specifically includes the following steps:
considering that different input features have different resolutions, their contributions to the output features tend to be unequal; in the feature fusion process, a weight representing the importance of each input feature is added, the importance of the input feature is learned through network training, and a weighted fusion equation is as follows:
here ω i Is a weight which can be learned, F i Is the ith input feature of the current layer, at each ω i Relu is then applied to ensure ω i >0,Epsilon is a constant, which is a small value (e.g., 0.0001) to avoid numerical instability, for the sum of the weights of the various input features of the current layer. In connection with fig. 1 (d), the layer 6 fusion process can be described as:
here, theIs an initial layer 6 feature,/->Is a top-down path 6-level intermediate feature, +.>Is a 6-stage output feature of the bottom-up path. Resize is an upsampling or downsampling operation that adapts to resolution matching, conv is a convolution operation of feature processing. All other layer features are fused in a similar manner.
Further, in the step S2, a loss of positioning of the detector, which remains unchanged for the target scale change, is constructed, and the method specifically includes the following steps:
from the cross-over ratio (Intersection over Union, ioU), construct locator loss that remains constant for target scale variation:
L loc =1-GIoC;
wherein A and B are respectively a prediction anchor frame and a real anchor frame, and C is the closure of the prediction anchor frame and the real anchor frame; the most important of the overlapping area is the intersection area of the prediction anchor frame and the real anchor frame, the most important of the non-overlapping area is the area of the closure C, which is obtained by removing the union of the prediction anchor frame and the real anchor frame, and the non-overlapping area is secondarily influenced by removing the union part of the intersection area; GIoC focuses on the influence of overlapping and non-overlapping areas on the objective function at the same time and is less computationally intensive.
Further, the VGG16 convolutional neural network in step S3 includes: 13 convolutional layers, 5 pooling layers and 2 full connection layers, and the VGG16 convolutional neural network constructs a characteristic extraction network of the single-shot multi-frame detector.
Further, after the end of the above step S34, in this embodiment, the following experiments prove the effects of the present invention:
pascal Voc dataset
The training data portion in VOC2007+voc2012 was selected as the training set in the pasal VOC dataset and tested on the VOC2007 test dataset. Taking VGG16 pre-trained in an ImageNet dataset as a backbone network of the invention, naming the model of the invention as WFSSD, comparing with SSD, DSSD, RSSD and FSSD at input image sizes of 300×300 and 521×512 respectively, and the results are shown in Table 1;
TABLE 1 PASCAL VOC2007 test set detection results
As can be seen from table 1, the mAP of WFSSD300 reaches 84.7%, an increase of 7.2 percentage points relative to SSD 300; the mAP of WFSSD512 reached 86.6%, an increase of 7.1 percent relative to SSD 512. Although the feature extraction network of the DSSD adopts ResNet-101 with better performance, the mAP result of the model of the invention is still higher than that of the DSSD. Because the two improved SSD models, namely the RSSD and the FSSD, do not consider the difference of contribution degrees of the feature layers to fusion output when the high feature layer and the low feature layer are fused, the two improved SSD models are simple non-weighted superposition; according to the invention, a learning-based weighted fusion mode is adopted for different feature layers, and the loss with invariance to the target scale transformation is introduced into the detector, so that the surpassing of the detection performance is realized. In terms of speed, the WFSSD model is not much different from SSD; although YOLOv3 has a faster detection speed, the difference from the model of the invention in terms of accuracy is obvious.
2. Ablation experiments
The two-way feature weighted fusion pyramid network and the target scale invariance GIoC positioning loss are two innovation points provided by the invention, and the results of ablation experiments are shown in table 2 for analyzing the contribution degree of the two-way feature weighted fusion pyramid network and the target scale invariance GIoC positioning loss to the improvement of the WFSSD model performance;
TABLE 2 influence of different Components on the detection Performance of the inventive model
As can be seen from table 2, the bidirectional feature weighted fusion pyramid network and the target scale invariance GIoC positioning loss exert an influence on the performance of the model of the present invention from the two aspects of the auxiliary network and the positioner loss of the feature extractor, respectively, and the influence degree of the bidirectional feature weighted fusion pyramid network is greater.
3. Self-built infrared dataset
The experimental data source is infrared aircraft video, and a total of 5582 frames (352×240) are stored in frames. The targets are classified into three categories by attitude: lateral, backward and back, and distinguishing a machine body and a tail flame during detection, wherein 6 kinds of manually marked target categories are provided: back Fuselage (BAF), back Tail flame (BAT), lateral Fuselage (LAF), lateral Tail flame (Lateral Tail flame, LAT), aft Fuselage (Backward Fuselage, BWF), aft Tail flame (Backward Tail flame, BWT). The total number of targets for the six categories manually marked in the dataset was 19936, distributed as: BAF 3385, BAT 2730, LAF 6438, LAT4904, BWF 352, BWT 2127, unbalanced distribution, and a data augmentation method based on geometric transformation is used to increase the sample size during training to compensate for this.
Training and verification loss change curves of the WFSSD300 model are shown in fig. 2, it can be seen that the loss change of the model for the training set is continuously reduced, and the loss value change of the corresponding verification set is gradually reduced, and finally, the convergence state tends to be stable. The training parameters are the same as the Pascal Voc dataset, the AP and mAP results of the test set are shown in Table 3, and the WFSSD exceeds SSD, DSSD, RSSD and FSSD in terms of detection accuracy.
TABLE 3 Infrared dataset detection results
The accuracy (precision) and recall (recall) curves for class 6 data testing are shown in fig. 3 as a function of AP values.
The 19936 targets marked manually are tested on a trained model, and the test results are divided into three cases: true is detected and correctly classified, false is detected but misclassified, miss indicates no detection, and the results (percentages) are shown in Table 4. In the original image, the targets with the length and width of the mark frame smaller than 50 pixels are 13785 targets, and account for 69.6% of the total targets, and the detection results of the targets are shown in table 5. In the original image, the total of 7666 targets with the length and width of the mark frame smaller than 25 pixels accounts for 38.45% of the total targets, and the detection results of the targets are shown in table 6. The length and width of the marking frame in the original image are less than 1402 targets of 12 pixels, accounting for 7.03% of the total targets, and the detection results are shown in table 7;
TABLE 4 Small target detection Capacity comparison (ALL)
Table 5 comparison of small target detection capability (< 50)
Table 6 small target detection capability comparison (< 25)
Table 7 small target detection capability comparison (< 12)
As can be seen from tables 4 to 7, as the target size becomes smaller, the proportion of the conventional SSD and WFSSD correctly detected (True) to the target is reduced, and the proportion of missed detection (Miss) is increased. At the same size, both indexes of the WFSSD are superior to those of the traditional SSD and other improved SSDs, and the advantage is obviously increased along with the reduction of the size. WFSSD can detect more small objects, benefiting from feature weighted fusion.
In summary, the invention starts from the feature pyramid network, and describes the inequality of the contribution of each feature layer to the fusion output based on the learnable weight, so as to realize the two-way multi-scale feature weighted fusion between the feature layer with low resolution and strong semantics and the feature layer with high resolution and weak semantics; and from the cross-over ratio, simultaneously considering the influence of the overlapping area and the non-overlapping area on the objective function, constructing the detector positioning loss which keeps invariance to the target scale change, and improving the sensitivity of the detection model to small target positioning errors. The model provided by the invention has autonomous learning capability and high detection rate, and is an effective way for solving the problem of detection of the infrared imaging guidance target in a complex environment.
The foregoing has outlined and described the features, principles, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the above-described embodiments, and that the above-described embodiments and descriptions are merely illustrative of the principles of the present invention, and that various changes and modifications may be made in the invention without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (5)

1. A single-shot multi-frame infrared target detection method is characterized in that: the method comprises the following steps:
s1: starting from the feature pyramid network, describing inequality of contribution of each feature layer to fusion output based on a learnable weight, realizing bidirectional multi-scale feature weighted fusion between the feature layer with low resolution and strong semantics and the feature layer with high resolution and weak semantics, and constructing an auxiliary network;
s2: from the cross-over ratio, simultaneously considering the influence of the overlapping area and the non-overlapping area on the objective function, constructing the detector positioning loss which keeps invariance to the objective scale change, and constructing the objective detector;
s3: taking a VGG16 convolutional neural network as a feature extraction network and integrating the VGG16 convolutional neural network with an auxiliary network and a target detector to form a single-shot multi-frame infrared target detection model integrating multi-scale feature weighted fusion and scale invariance positioning loss;
s31: adding a weight for representing the importance of each input feature in the feature fusion process, and learning the importance of the input feature through network training;
s32: integrating multi-scale bidirectional jump connection and rapid normalization weighting feature fusion, serving as a functional network layer, and repeating for a plurality of times to construct a bidirectional feature weighting fusion pyramid network serving as an auxiliary network;
s33: focusing on the influence of the overlapping area and the non-overlapping area on the objective function, integrating the factors which keep invariance to the objective scale change, and constructing the positioning loss of the detector;
s34: the feature extraction network, the auxiliary network and the target detector are combined to form a single-shot multi-frame infrared target detection model for realizing fusion of multi-scale feature weighted fusion and scale invariance positioning loss.
2. The method for detecting the single-shot multi-frame infrared target according to claim 1, wherein the method comprises the following steps of: the step S1 specifically includes: starting from the thought of merging multi-scale features from top to bottom of a feature pyramid network (Feature Pyramid Network, FPN), adding a bottom-to-top path on the FPN to further merge the features to form a bidirectional path; on the basis, if the node has only one input edge and no feature fusion exists, the contribution of the node to the feature fusion network is small, and the node is removed; adding a jump connection for non-adjacent nodes at the same level, and fusing more features without increasing the cost; in addition, to achieve higher level feature fusion, the bi-directional path is taken as a functional network layer and repeated multiple times.
3. The method for detecting the single-shot multi-frame infrared target according to claim 1, wherein the method comprises the following steps of: the two-way multi-scale feature weighted fusion in the step S1 specifically comprises the following steps:
in the feature fusion process, a weight representing the importance of each input feature is added, the importance of the input feature is learned through network training, and a weighted fusion equation is as follows:
here ω i Is a weight which can be learned, F i Is the ith input feature of the current layer, at each ω i Relu is then applied to ensure ω i >0,Epsilon is a constant for the sum of weights of the various input features of the current layer.
4. The method for detecting the single-shot multi-frame infrared target according to claim 1, wherein the method comprises the following steps of: in the step S2, a loss of detector positioning that remains unchanged for the target scale change is constructed, specifically including the steps of:
from the cross-over ratio (Intersection overUnion, ioU), construct locator loss that remains constant for target scale variation:
L loc =1-GIoC;
wherein A and B are respectively a prediction anchor frame and a real anchor frame, and C is the closure of the prediction anchor frame and the real anchor frame; the most important of the overlapping area is the intersection area of the prediction anchor frame and the real anchor frame, the most important of the non-overlapping area is the area of the closure C, which is obtained by removing the union of the prediction anchor frame and the real anchor frame, and the non-overlapping area is secondarily influenced by removing the union part of the intersection area; GIoC focuses on the influence of overlapping and non-overlapping areas on the objective function at the same time and is less computationally intensive.
5. The method for detecting the single-shot multi-frame infrared target according to claim 1, wherein the method comprises the following steps of: the VGG16 convolutional neural network in step S3 includes: 13 convolutional layers, 5 pooling layers and 2 full connection layers, and the VGG16 convolutional neural network constructs a characteristic extraction network of the single-shot multi-frame detector.
CN202010689129.3A 2020-07-17 2020-07-17 Single-shot multi-frame infrared target detection method Active CN111860637B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010689129.3A CN111860637B (en) 2020-07-17 2020-07-17 Single-shot multi-frame infrared target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010689129.3A CN111860637B (en) 2020-07-17 2020-07-17 Single-shot multi-frame infrared target detection method

Publications (2)

Publication Number Publication Date
CN111860637A CN111860637A (en) 2020-10-30
CN111860637B true CN111860637B (en) 2023-11-21

Family

ID=72983197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010689129.3A Active CN111860637B (en) 2020-07-17 2020-07-17 Single-shot multi-frame infrared target detection method

Country Status (1)

Country Link
CN (1) CN111860637B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270366B (en) * 2020-11-02 2022-08-26 重庆邮电大学 Micro target detection method based on self-adaptive multi-feature fusion
CN112560853B (en) * 2020-12-14 2024-06-11 中科云谷科技有限公司 Image processing method, device and storage medium
CN113012228B (en) * 2021-03-23 2023-06-20 华南理工大学 Workpiece positioning system and workpiece positioning method based on deep learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563433A (en) * 2017-08-29 2018-01-09 电子科技大学 A kind of infrared small target detection method based on convolutional neural networks
CN109034210A (en) * 2018-07-04 2018-12-18 国家新闻出版广电总局广播科学研究院 Object detection method based on super Fusion Features Yu multi-Scale Pyramid network
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN110826554A (en) * 2018-08-10 2020-02-21 西安电子科技大学 Infrared target detection method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563433A (en) * 2017-08-29 2018-01-09 电子科技大学 A kind of infrared small target detection method based on convolutional neural networks
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN109034210A (en) * 2018-07-04 2018-12-18 国家新闻出版广电总局广播科学研究院 Object detection method based on super Fusion Features Yu multi-Scale Pyramid network
CN110826554A (en) * 2018-08-10 2020-02-21 西安电子科技大学 Infrared target detection method
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
采用卷积核金字塔和空洞卷积的单阶段目标检测;刘涛;汪西莉;;中国图象图形学报(第01期);全文 *

Also Published As

Publication number Publication date
CN111860637A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN111860637B (en) Single-shot multi-frame infrared target detection method
CN108052911A (en) Multi-modal remote sensing image high-level characteristic integrated classification method based on deep learning
CN107909027A (en) It is a kind of that there is the quick human body target detection method for blocking processing
CN113420607A (en) Multi-scale target detection and identification method for unmanned aerial vehicle
CN112364931A (en) Low-sample target detection method based on meta-feature and weight adjustment and network model
CN113313706B (en) Power equipment defect image detection method based on detection reference point offset analysis
CN116863539A (en) Fall figure target detection method based on optimized YOLOv8s network structure
CN109344720B (en) Emotional state detection method based on self-adaptive feature selection
CN110347853B (en) Image hash code generation method based on recurrent neural network
CN114897802A (en) Metal surface defect detection method based on improved fast RCNN algorithm
CN109558803B (en) SAR target identification method based on convolutional neural network and NP criterion
Liu et al. An advanced YOLOv3 method for small object detection
Yu et al. Remote sensing image classification based on RBF neural network based on fuzzy C-means clustering algorithm
CN113139549A (en) Parameter self-adaptive panorama segmentation method based on multitask learning
Su et al. Segmented handwritten text recognition with recurrent neural network classifiers
CN115018884B (en) Visible light infrared visual tracking method based on multi-strategy fusion tree
CN116363507A (en) XGBoost and deep neural network fusion remote sensing image classification method based on snake optimization algorithm
CN108427957B (en) Image classification method and system
CN109635254A (en) Paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model
CN112633323B (en) Gesture detection method and system for classroom
CN114998567A (en) Infrared point group target identification method based on multi-mode feature discrimination
Duan et al. Improved YOLOv5 object detection algorithm for remote sensing images
CN113377985A (en) Pyramid network-based traditional Chinese medicine image classification and retrieval method
CN107563418A (en) A kind of picture attribute detection method based on area sensitive score collection of illustrative plates and more case-based learnings
Chu et al. EfficientFCOS: An efficient one-stage object detection model based on FCOS

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant