CN115410012A - Method and system for detecting infrared small target in night airport clear airspace and application - Google Patents

Method and system for detecting infrared small target in night airport clear airspace and application Download PDF

Info

Publication number
CN115410012A
CN115410012A CN202211359429.0A CN202211359429A CN115410012A CN 115410012 A CN115410012 A CN 115410012A CN 202211359429 A CN202211359429 A CN 202211359429A CN 115410012 A CN115410012 A CN 115410012A
Authority
CN
China
Prior art keywords
target
infrared small
small
feature
infrared
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211359429.0A
Other languages
Chinese (zh)
Other versions
CN115410012B (en
Inventor
屈景怡
刘闪亮
李云龙
吴仁彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Civil Aviation University of China
Original Assignee
Civil Aviation University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Civil Aviation University of China filed Critical Civil Aviation University of China
Priority to CN202211359429.0A priority Critical patent/CN115410012B/en
Publication of CN115410012A publication Critical patent/CN115410012A/en
Application granted granted Critical
Publication of CN115410012B publication Critical patent/CN115410012B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention belongs to the technical field of image recognition, and discloses a method and a system for detecting a night airport clean airspace infrared small target and application of the method and the system. The method comprises the following steps: setting initial parameters of a heterogeneous parallel network model, inputting images of a training set into the heterogeneous parallel network model for training, and obtaining an infrared small target detection model based on deep learning; inputting an infrared small target image to be detected into an infrared small target detection model, performing feature extraction on the infrared small target image passing through a heterogeneous parallel backbone network, and fusing the feature maps through a pixel aggregation network to obtain a plurality of layers of feature maps containing target information; and then processing a feature map containing target information through a prediction structure, respectively obtaining the type and the position information of the target, and generating a target frame corresponding to each target, thereby directly displaying the type and the position information of the predicted target in the image. The invention is easy to be arranged on hardware equipment; the model is simple to operate and easy to train.

Description

Method and system for detecting infrared small target in night airport clear airspace and application
Technical Field
The invention belongs to the technical field of image recognition, and particularly relates to a method and a system for detecting a night airport clearance area infrared small target and application of the method and the system.
Background
The infrared image small target detection plays an important role in target early warning, ground monitoring and flight guidance. Aiming at the target detection technology of visible light, the existing method obtains good detection performance. However, the difference between the infrared image shot at night and the visible light image is huge, and the infrared small target detection difficulty is increased due to the characteristics of long infrared shooting distance, low image contrast, weak texture characteristics and small target ratio. In order to provide more comprehensive guarantee and realize 24-hour supervision of important places, the research on small target detection of night infrared images is particularly important.
For infrared small target detection, continuous frame video target detection and single frame image target detection can be generally divided. The continuous frame video target detection mainly utilizes the corresponding relation of a target between continuous frames in a video to research; and detecting the single-frame image target, namely detecting and identifying the target directly according to the unique information of the target in the image. The video detection utilizes prior information such as the form and track continuity of a target and time-space domain information to realize target detection, and the fast moving unmanned aerial vehicle target changes rapidly relative to a background in an infrared image, so that the track continuity is difficult to guarantee, and the continuous frame method is difficult to apply. In contrast, the single-frame image target detection method only needs to calculate target information in a single image, has calculation complexity obviously lower than that of video target detection, is easy to implement by hardware, has wide application in infrared target detection, and can be roughly divided into single-frame infrared target detection methods based on model driving and deep learning.
The method for detecting the target of the infrared single-frame image based on model driving generally models a target point, and a small target in the infrared image is regarded as an abnormal point from a highly-correlated background pixel and is marked as a target. The common disadvantages of the model-based driving method are that when the background is a mixed background of buildings, trees, vehicles, etc., the detection performance is limited, and the requirement of real-time detection is difficult to meet. With the development of computer vision, more and more infrared image target detection methods based on a deep learning method are used. The infrared image target detection method based on deep learning meets the requirement of real-time detection, and meanwhile, the detection performance is improved continuously according to the continuous improvement of a machine vision technology.
Through the above analysis, the problems and defects of the prior art are as follows:
(1) In the prior art, the detection speed and the accuracy of the infrared small target in the airport clearance area at night are low.
(2) In the prior art, the application and deployment practicability of various different environments are poor.
(3) In the prior art, the ratio of small target information in an image cannot be effectively enhanced, and effective support cannot be provided for subsequent feature extraction and target detection.
(4) In the prior art, feature maps obtained by three different channels cannot be effectively fused, and the information content of a target is low. And does not provide effective support for subsequent target detection.
Disclosure of Invention
In order to overcome the problems in the related art, the disclosed embodiment of the invention provides a method, a system and an application for detecting a night airport clean airspace infrared small target.
The technical scheme is as follows: the method for detecting the infrared small target in the clean airspace of the airport at night comprises the following steps:
s1, setting initial parameters of a heterogeneous parallel network model, inputting images of a training set in an infrared small target database into the heterogeneous parallel network model with the set parameters for training, and obtaining a deep-learning infrared small target detection model;
s2, inputting the infrared small target image to be detected into an infrared small target detection model, carrying out feature extraction on the infrared small target image of the heterogeneous parallel backbone network, and splicing feature maps of three different channels, wherein the three different channels comprise: a Similarity Object Enhancement (SOE) channel, a general feature extraction structure channel, and a backbone network structure channel based on a Simple Attention mechanism (a Simple Attention Module, simAM);
s3, performing feature splicing on the obtained partial multilayer feature maps through a pixel aggregation network, and then obtaining multilayer feature maps containing predictable targets with different sizes;
s4, judging the type and the position of the target through the prediction structure, obtaining the confidence coefficient of each infrared small target, generating a target prediction frame corresponding to each target, and predicting to obtain the type and the position information of the target;
s5: and deleting the target frames with low scores in the obtained multiple target prediction frames calculated by an IOU (interaction-over-interaction) formula by using a non-maximum value inhibition method to obtain the target frames, and storing and displaying the category and position information of the targets.
In one embodiment, in step S1, the heterogeneous parallel network model builds a heterogeneous parallel backbone network in a VGG form based on a basic Focus module, a C3 module, and an SPP module, generates a multi-layer fused feature map in combination with a pixel aggregation network, and predicts the category and position of a target according to a prediction network; correcting the position information returned to the original image according to the L1 loss function to obtain the final classification and accurate position information of the target; target prediction box loss function
Figure 620821DEST_PATH_IMAGE001
The calculation formula is shown as formula (1):
Figure 65709DEST_PATH_IMAGE002
(1)
in the formula
Figure 75253DEST_PATH_IMAGE003
Indicating the probability of returning to the correct position after mapping a certain target position to the original image,
Figure 187566DEST_PATH_IMAGE004
for the influence range factor, the L1 loss function is used for calculating the loss function, so that the loss value is insensitive to outliers and abnormal values, the gradient change is relatively small, the model in the training stage is not easy to deviate from the optimal model, and the deviation is obtainedA smaller plurality of target bounding boxes;
the initial parameters of the heterogeneous parallel network model comprise: the number of network layers, and the weights and bias values of neurons in each layer.
In one embodiment, in step S2, the feature map obtained by the similarity target enhancement module channel and the feature map obtained by the general feature extraction structure based on the simple attention-free mechanism backbone network structure of the heterogeneous parallel network model are subjected to a splicing operation, so as to enhance the proportion and importance degree of the feature map target information in the convolution operation process.
In one embodiment, in step S5, the score of the target box is calculated by the IOU formula, defined as the ratio between the intersection and union of the two boxes; the calculation of the IOU is shown in equation (2) below:
Figure 257153DEST_PATH_IMAGE005
(2)
wherein A and B respectively represent two frames,
Figure 454916DEST_PATH_IMAGE006
a union region representing two boxes is shown,
Figure 2572DEST_PATH_IMAGE007
representing the intersection area of the two boxes.
In one embodiment, the similarity target enhancement module first partitions the feature map H W into
Figure 969391DEST_PATH_IMAGE008
A 2 × 2 small block
Figure 442836DEST_PATH_IMAGE009
And extending to 4 × 4 large block with one small block as center
Figure 127895DEST_PATH_IMAGE008
Big piece
Figure 479242DEST_PATH_IMAGE010
Calculating the Wasserstein distance between each small block and the corresponding large block; the Wasserstein distance calculation mode is shown as formula (3):
Figure 566147DEST_PATH_IMAGE011
(3)
the distance between two rectangular blocks is defined approximately as:
Figure 977536DEST_PATH_IMAGE012
(4)
and simplified as follows:
Figure 884312DEST_PATH_IMAGE013
(5)
wherein
Figure 773771DEST_PATH_IMAGE014
And
Figure 715182DEST_PATH_IMAGE015
are respectively
Figure 31894DEST_PATH_IMAGE009
And
Figure 927431DEST_PATH_IMAGE010
the mean value and the variance of the feature points in the area, the Wasserstein distance is used for measuring the distance between two distributions, the distance is used as the similarity of the two distributions, and the distance between the central block and the surrounding area is calculated as the similarity between the central block and the surrounding area;
Figure 620581DEST_PATH_IMAGE016
the larger the value, the higher the similarity between the small block and the large block, the greater the probability that the small block is the background,
Figure 416498DEST_PATH_IMAGE016
the smaller the value, theThe lower the similarity between the small block and the large block, the greater the likelihood that the small block is a target; calculating block by block to obtain a final product
Figure 435270DEST_PATH_IMAGE008
The similarity matrix W _ SOE (Wasserstein-SOE);
the feature map input to the similarity target enhancement module is
Figure 316638DEST_PATH_IMAGE017
Dividing the blocks according to 2 × 2, obtaining corresponding 4 × 4 blocks radiating outwards with each 2 × 2 block as the center, sliding according to the step length of 1 by taking the 2 × 2 block as a unit according to the formula (5), and calculating to obtain
Figure 282320DEST_PATH_IMAGE008
W _ soe.
In one embodiment, enhancing the proportion and importance of the feature map target information during the convolution operation further comprises:
inputting a feature map F of size H x W x C by taking a set small block
Figure 932744DEST_PATH_IMAGE018
Forming a similarity matrix W _ soe, then, taking W _ soe '= 1/W _ soe, called Wasserstein similarity, and obtaining a similarity matrix W _ soe' positively correlated with the target and a Wasserstein similarity matrix; on the basis of a similarity matrix W 'obtained by activating a function, a Sigmoid function is used for normalization and activation to obtain a matrix W, the characteristic value of a characteristic diagram target is basically unchanged by combining an original input characteristic diagram, a new characteristic diagram F' containing the target information weight is obtained, and the target information in the characteristic diagram is enhanced.
In one embodiment, in a feature fusion stage in the heterogeneous parallel network model, a pixel aggregation network structure is adopted to perform top-down and bottom-up feature fusion, multilayer feature maps from different depths are fused, and multilayer feature maps with different sizes are further obtained; the feature maps of different depths correspond to different sized objects, respectively, each feature map being responsive to a particular sized object.
In one embodiment, the heterogeneous parallel network model performs pixel-by-pixel prediction on a feature map in a prediction structure to obtain a potential target, and classifies the target to obtain the category information of the target; and reversely transmitting the corresponding target position information on the feature map to the original image through a regression strategy to obtain the position of the target original image, obtain the category and the approximate position information of the target, finally deleting the non-optimal target frame through a non-maximum value inhibition method in the step S5 to obtain the final accurate target frame position, and storing and displaying the position information and the classification information.
Another object of the present invention is to provide a system for implementing the method for detecting infrared small targets in clean airspace of airports at night, where the system for detecting infrared small targets in clean airspace of airports at night includes:
the deep learning infrared small target detection model acquisition module is used for setting initial parameters of the heterogeneous parallel network model, inputting images of a training set in the infrared small target database into the heterogeneous parallel network model with the set parameters for training, and obtaining a deep learning infrared small target detection model;
the different-channel characteristic diagram splicing module is used for inputting the infrared small target image to be detected into the infrared small target detection model, extracting the characteristics of the infrared small target image of the heterogeneous parallel backbone network and splicing the characteristic diagrams from three different channels;
the predictive target feature map obtaining module is used for performing feature splicing on the obtained partial multilayer feature maps through a pixel aggregation network so as to obtain multilayer predictive target feature maps containing target information of different sizes;
the target type and position information acquisition module is used for judging the type and position of the target through the prediction structure, acquiring the confidence coefficient of each infrared small target, generating a target frame corresponding to each target and predicting and acquiring the type and position information of the target;
and the target frame acquisition module deletes the non-optimal target frames in the obtained target prediction frames by using a non-maximum value inhibition method to obtain target frames, and stores and displays the type and position information of the targets.
Another object of the present invention is to provide a computer device, which comprises a memory and a processor, wherein the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to execute the method for detecting night airport net airspace infrared small targets.
Another object of the present invention is to provide a computer-readable storage medium, which stores a computer program, which when executed by a processor, causes the processor to execute the method for detecting infrared small targets in the clear airspace of night airports.
By combining all the technical schemes, the invention has the advantages and positive effects that:
first, aiming at the technical problems existing in the prior art and the difficulty in solving the problems, the technical problems to be solved by the technical scheme of the present invention are closely combined with results, data and the like in the research and development process, and some creative technical effects are brought after the problems are solved. The specific description is as follows:
the method comprises the steps of firstly setting initial parameters of a heterogeneous parallel network model of a detection method of the night airport clear airspace infrared small target, inputting training set images in an infrared small target database into the set heterogeneous parallel network model of the detection method of the night airport clear airspace infrared small target for training, and obtaining an infrared small target detection model based on deep learning; inputting an infrared small target image to be detected into a trained infrared small target detection model, performing feature extraction on the infrared small target image passing through a heterogeneous parallel backbone network, and fusing the feature maps through a pixel aggregation network to further obtain a plurality of layers of feature maps containing target information; and then processing the feature map containing the target information through the prediction structure, respectively obtaining the type and the position information of the target, and generating a target frame corresponding to each target, thereby directly displaying the type and the position information of the predicted target in the image. The invention has good adaptivity and detection performance to infrared small targets, and the detection rate is as high as 80.0%; the detection rate is high, the detection rate is 31.2 frames per second, the model size is small, is only 30.5M, and is easy to deploy in hardware equipment; the model is simple to operate and easy to train.
Secondly, regarding the technical solution as a whole or from the perspective of products, the technical effects and advantages of the technical solution to be protected by the present invention are specifically described as follows:
the method is in a form of combining a model-driven-based method and a deep learning-based method, firstly, an SOE model is provided based on a model-driven principle to increase the difference between a target and a background, so that more potential target information can be purposefully acquired in the convolution operation process, and the model is modularized and can be flexibly used in a neural network. In addition, based on the convolutional neural network principle, a heterogeneous parallel backbone network structure is constructed, the feature graph obtained by the similarity target enhancement module, a general feature extraction structure and the feature graph obtained by the backbone network structure based on the SimAM are spliced, the occupation ratio of feature graph target information in the convolutional operation process is increased, and the effectiveness of the method is verified in a large number of experiments of an infrared small target data set.
The advantages of the present invention over the prior art further include: the integral model of the invention has good performance for infrared target detection. The invention provides a method for detecting a small infrared target in a clear airspace of an airport at night, which has the advantages of high detection speed and high accuracy, can realize the detection speed of 31.2 frames per second for input images of 256 pixels multiplied by 256 pixels, can reach the accuracy of 80.0 percent, can completely realize the real-time detection of important places, monitors the dynamics of the places in real time and ensures the safety of the places.
The integral model of the invention is suitable for application and deployment in various environments. The invention provides a method for detecting infrared small targets in a clear airspace of an airport at night, and a data set used by the method comprises the infrared small targets from different visual angles, different scenes and different distances, so that the method has better adaptability to the infrared targets in different environments, and the application range of the method is better expanded.
The SOE module provided by the invention effectively enhances the proportion of small target information in the image and provides effective support for subsequent feature extraction and target detection. The SOE module effectively highlights the information of the small targets in the feature map in each channel through the relation of the mean value and the variance of the local feature blocks in the feature map, so that the information of multi-learning small targets with purposiveness and tendency is provided in the subsequent feature extraction and feature fusion processes, and effective technical support is provided for the subsequent detection of the infrared small targets. The module has strong flexibility and can be applied to any stage of the network according to the requirements of researchers.
The heterogeneous parallel trunk network structure effectively fuses feature maps obtained by three different channels, and improves the information content of a target. The heterogeneous parallel backbone network structure comprises three different convolution channels, namely a common convolution network channel, a convolution network channel based on SimAM and a convolution network channel based on an SOE module, and splicing the feature maps obtained by the three channels to realize fusion of the feature maps from the three different channels, effectively enhance the proportion of target information in the feature maps and provide effective support for detection of subsequent targets.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a flow chart of a method for detecting infrared small targets in a clear airspace of an airport at night according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a method for detecting infrared small targets in a clear airspace of an airport at night according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a detection system for night airport clean airspace infrared small targets provided by an embodiment of the invention;
FIG. 4 is a schematic diagram of a schematic frame structure of a similarity target enhancement module provided in an embodiment of the present invention;
FIG. 5 (a) is a comparison of the original images before and after processing by the SOE module provided in the embodiment of the present invention;
FIG. 5 (b) is an input feature diagram of a comparison before and after processing by an SOE module according to an embodiment of the present invention;
FIG. 5 (c) is a feature diagram of the processed similarity enhancement module before and after the SOE module processing according to the embodiment of the present invention;
FIG. 6 is a diagram of a heterogeneous parallel backbone structure provided by an embodiment of the present invention, in which a general convolution channel at an upper layer performs a conventional convolution operation on an input feature map, a SimAM channel at a middle layer is a non-parametric attention convolution channel, and a convolution channel diagram including an SOE module of the present invention at a lower layer;
fig. 7 is a PAN feature fusion structure diagram provided in an embodiment of the present invention;
FIG. 8 is a diagram of an example of a data set provided by an embodiment of the present invention;
fig. 9 is a comparison diagram of Receiver Operating Characteristic (ROC) curves provided in the embodiment of the present invention, where a, B, C, and D respectively represent different backbone network structures, and E is a backbone network structure of the present invention, and other parameters are the same;
fig. 10 (a) is 8 original images of small infrared targets to be detected in the improved front and rear small infrared target detection results provided by the embodiment of the present invention;
fig. 10 (b) is a detection result diagram of the detection method for infrared small targets in airport headroom at night in the detection results of infrared small targets before and after improvement provided by the embodiment of the present invention;
in the figure: 1. an infrared small target detection model acquisition module for deep learning; 2. different channel characteristic diagram splicing modules; 3. a characteristic diagram obtaining module capable of predicting targets; 4. a target type and position information acquisition module; 5. and a target frame acquisition module.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, but rather should be construed as broadly as the present invention is capable of modification in various respects, all without departing from the spirit and scope of the present invention.
The invention relates to the technical field of infrared small target detection in video and image processing in important places, in particular to a method, a system and application for detecting an infrared small target in a night airport clearance area, which are used for detecting and positioning the small target in a red image.
1. Illustrative examples are illustrated:
example 1
As shown in fig. 1, the method for detecting infrared small targets in a clear airspace of an airport at night according to the embodiment of the present invention includes the following steps:
s101: setting initial parameters of a heterogeneous parallel network model of the detection method of the small infrared targets in the clean airspace of the airport at night, inputting training set images in an infrared small target database into the heterogeneous parallel network model of the detection method of the small infrared targets in the clean airspace of the airport at night after the set parameters are input for training, and obtaining a deeply-learned infrared small target detection model;
s102: inputting an infrared small target image to be detected into the infrared small target detection model trained in the step S101, performing feature extraction on the heterogeneous parallel backbone network infrared small target image, and splicing feature maps of three different channels, namely a Similarity Object Enhancement (SOE) Module channel and a general feature extraction structure channel, a backbone network structure channel based on a Simple Attention mechanism without parameters (SimAM), and the like;
s103: performing feature splicing on part of the multilayer feature map obtained in the step S102 through a Pixel Aggregation Network (PAN), and then obtaining a plurality of layers of feature maps containing objects of different sizes and predictable targets;
s104: secondly, judging the type and the position of the target through a prediction structure so as to obtain the confidence coefficient of each infrared small target, and generating a target prediction frame corresponding to each target so as to directly predict and obtain the type and the position information of the target;
s105: deleting the Non-optimal target frame from the plurality of target prediction frames obtained in step S104 by using a Non-Maximum Suppression (NMS) method to obtain an optimal target frame, and storing and displaying the category and position information of the target.
Deleting the non-optimal target frame from the plurality of target prediction frames obtained in the step S104, and obtaining the optimal target frame is: and deleting the target frames with low scores in the obtained multiple target prediction frames by IOU (interaction-over-unity) formula calculation to obtain the target frames.
Example 2
Based on the method for detecting the infrared small target in the clean airspace of the airport at night provided by embodiment 1, further, in step S101, the heterogeneous parallel network model builds a heterogeneous parallel backbone network in the VGG form based on basic structures such as basic Focus, C3 and SPP, and combines with a pixel aggregation network to generate a multi-layer fused feature map, predicts the type and position of the target according to a prediction network, and corrects the position information returned to the original image according to an L1 loss function, so as to obtain the final classification and accurate position information of the target. Initial parameters of a heterogeneous parallel network model of the detection method of the infrared small target in the clean airspace of the airport at night comprise the number of network layers, the weight and the bias value of neurons in each layer; the initial learning rate of a heterogeneous parallel network model of the detection method of the infrared small target in the clean airspace of the airport at night is 0.001, and a warmup strategy is adopted in the initial iteration process, so that the model is prevented from oscillating, the convergence rate of the model is higher, and the model effect is better; decay at a rate of 0.1 times in 800 th to 1000 th Epoch, with a maximum training Epoch of 1200.
In the embodiment of the invention, the obtained target prediction frame corrects the position information returning to the original image through the L1 loss function, and the target prediction frame loss function
Figure 856838DEST_PATH_IMAGE001
The calculation formula is shown as formula (1):
Figure 724038DEST_PATH_IMAGE002
(1)
in the formula
Figure 758990DEST_PATH_IMAGE003
Indicating the probability of returning to the correct position after mapping a certain target position to the original image,
Figure 263920DEST_PATH_IMAGE004
and calculating a loss function by using the L1 loss function as an influence range factor, so that the loss value is insensitive to outliers and abnormal values, the gradient change is relatively small, the model is not easy to deviate from the optimal model in the training stage, and a plurality of target bounding boxes with small deviation are obtained.
Example 3
Based on the method for detecting the infrared small target in the clean airspace of the airport at night provided by embodiment 1, further, in step S102, the heterogeneous parallel backbone network of the heterogeneous parallel network model performs a splicing operation with a feature map and a general feature extraction structure obtained by a Similarity Object Enhancement Module (SOE) channel and a feature map obtained by a backbone network structure based on a Simple Attention mechanism (a Simple Attention Module, simAM) provided by the present invention, so as to increase the proportion and the importance degree of the feature map target information in the convolution operation process.
Example 4
Based on the method for detecting the infrared small target in the clean airspace of the airport at night provided in embodiment 1, further, in step S102, a similarity target enhancement module channel in the heterogeneous parallel backbone network is constructed by combining the similarity target enhancement module based on model driving provided by the present invention with operations such as convolution, pooling, and the like, so that the similarity target enhancement module can make the network tend to obtain more target information in the process of feature extraction, and provide a feature map containing more target information for subsequent target prediction.
Example 5
Based on the detection method of the infrared small target in the clean airspace of the airport at night provided by the embodiment 4, further, the similarity target enhancement module enhances the contrast between the target and the background in a similarity contrast mode, and inserts the similarity target enhancement module into the network as a plug-in module; according to the invention, an SOE module based on model driving is constructed by combining a Wasserstein distance principle according to the relation between the mean value and the variance of local feature blocks in a feature map. The mean value and the variance of each local feature block obtain a Wasserstein value, the whole feature map is divided into a plurality of feature blocks, a one-to-one Wasserstein distance matrix is obtained and converted into a Wasserstein similarity matrix positively correlated with an infrared target, the matrix is activated through a Sigmoid function, and is combined with an input feature map to obtain a new feature map, and target information in the feature map is highlighted through a similarity target enhancement module. The module can be added in the convolutional neural network at will, so that the function of enhancing the target characteristic information is realized.
Example 6
Based on the method for detecting the infrared small target in the clean airspace of the airport at night provided by the embodiment 1, further, in the step S103, in the feature fusion stage in the heterogeneous parallel network model, PAN structures are adopted to complete feature fusion from top to bottom and from bottom to top, multi-layer feature maps from different depths are fused, and multi-layer feature maps with different sizes are further obtained. The feature maps of different depths correspond to objects of different sizes, respectively, each feature map responding to an object of a specific size, and in addition, an object may be detected simultaneously by a plurality of feature maps.
Example 7
Based on the method for detecting the infrared small target in the clean airspace of the airport at night provided by the embodiment 1, further, in the step S104, in the prediction stage of the heterogeneous parallel network model, in the prediction structure part, the feature map is firstly predicted pixel by pixel to obtain a potential target, and the targets are classified to obtain the category information of the target. In addition, the corresponding target position information on the feature map is reversely propagated to the original image through a regression strategy, namely, the position of the target original image is obtained, then the category and the approximate position information of the target are obtained, finally, the non-optimal target frame is deleted through the NMS algorithm in the step S105, the final accurate target frame position is obtained, and the position information and the classification information are stored and displayed.
In the embodiment of the present invention, the NMS compares the scores of the plurality of frames to delete the frame with the low score, and the IOU formula calculates the score of the target frame, and the calculation method of the IOU is as shown in the following formula (2):
Figure 358915DEST_PATH_IMAGE005
(2)
wherein A and B respectively represent two frames,
Figure 480455DEST_PATH_IMAGE006
a union region representing the two boxes is shown,
Figure 787940DEST_PATH_IMAGE007
representing the intersection area of the two boxes.
The infrared small target database is derived from flying unmanned aerial vehicle video sequences under a plurality of different scenes, and is made into a data set which can be used by an airport small target recognition model through manual marking.
Example 8
As shown in fig. 2, the method for detecting infrared small targets in a clear airspace of an airport at night according to the embodiment of the present invention includes an improved neural network model for detecting infrared small targets in a heterogeneous parallel network with enhanced similarity targets after training of an image input heterogeneous parallel network architecture with enhanced similarity targets of infrared small targets. The similarity target enhanced heterogeneous parallel network infrared small target detection neural network model is characterized in that a heterogeneous parallel network structure of the network directly performs multi-stage feature extraction on an input image and splices fusion features, and a PAN structure fuses information of a plurality of layers of feature maps to obtain a plurality of layers of fusion feature maps; and directly classifying the target frames based on the multilayer characteristic diagram through the prediction structure, simultaneously performing regression on the target frames to generate a plurality of predicted target frames which are possibly targets, then generating a final detection result by adopting an NMS (network management system) method, and displaying the target class, the corresponding target probability and the target frames. Specifically, the steps are as follows:
the method comprises the following steps: setting initial parameters of a heterogeneous parallel network model of the detection method of the night airport clear airspace infrared small target, inputting an infrared small target image in an infrared small target database into the set heterogeneous parallel network model of the detection method of the night airport clear airspace infrared small target for training to obtain a detection model of the detection method of the night airport clear airspace infrared small target;
the heterogeneous parallel network model of the detection method for the infrared small target in the clean airspace of the airport at night is used for constructing a heterogeneous parallel backbone network in a VGG (virtual ground gateway) form on the basis of basic structures such as Focus, C3 and SPP (spot-shaped Power map), generating a multi-layer fused feature map by combining a PAN (personal area network) structure, further generating a plurality of prediction target frames which are possibly targets according to the prediction network, and then generating a final detection result by adopting an NMS (network management system) method so as to obtain the final classification and accurate position information of the targets. Initial parameters of a heterogeneous parallel network model of the detection method of the infrared small target in the clean airspace of the airport at night comprise the number of network layers, the weight and the bias value of neurons in each layer; the initial learning rate of a heterogeneous parallel network model of the detection method of the infrared small target in the clean airspace of the airport at night is 0.001, and a warmup strategy is adopted in the initial iteration process, so that the model is prevented from oscillating, the convergence rate of the model is higher, and the model effect is better; decay at a rate of 0.1 times in 800 th to 1000 th Epoch, with a maximum training Epoch of 1200.
Step two: and (3) inputting the infrared small target image to be detected into the infrared small target detection model trained in the step one, wherein the whole network structure is internally composed of a heterogeneous parallel backbone network structure, a PAN characteristic fusion structure and a detection structure. The method comprises the steps of inputting an original image, extracting a feature map through a heterogeneous parallel network, obtaining a multi-layer feature map through PAN feature fusion structure fusion, and then realizing classification and regression of targets in the feature map through a detection structure to obtain a final result image.
The heterogeneous parallel backbone network of the heterogeneous parallel network model of the detection method of the night airport clear airspace infrared small target is composed of three different channels, and the feature graph obtained by the SOE channel provided by the invention, the general feature extraction structure and the feature graph obtained by the backbone network structure based on the SimAM channel are spliced, so that the proportion and the importance degree of feature graph target information in the convolution operation process are increased.
The SOE channel is a hybrid network structure which is constructed based on model driving and deep learning by combining SOE application of a similarity target enhancement module with a deep learning strategy, wherein the SOE is obtained by modeling according to the relation between the mean value and the variance of a local feature block in a feature map by combining the Wasserstein distance principle. The mean value and the variance of each local feature block obtain a Wasserstein value, the whole feature map is divided into a plurality of feature blocks, wasserstein distance matrixes corresponding to the input feature maps in a one-to-one mode are obtained and converted into similarity matrixes positively correlated with the infrared targets, then the matrixes are activated through a Sigmoid function and combined with the input feature maps to obtain new feature maps, and target information in the feature maps is highlighted through a similarity target enhancement module. The module can be added in the convolutional neural network at will, so that the function of enhancing the target characteristic information is realized.
The similarity target enhancement module SOE firstly divides the characteristic graph H multiplied by W into
Figure 678535DEST_PATH_IMAGE008
A 2 × 2 small block
Figure 944432DEST_PATH_IMAGE009
And extending to 4 × 4 large block with one small block as center
Figure 257995DEST_PATH_IMAGE008
Big piece
Figure 634749DEST_PATH_IMAGE010
And calculates the Wasserstein distance between each small block and the corresponding large block. The Wasserstein distance calculation mode is shown as formula (3):
Figure 114272DEST_PATH_IMAGE011
(3)
the distance between two rectangular blocks is defined approximately as:
Figure 816649DEST_PATH_IMAGE012
(4)
and is simplified as follows:
Figure 381623DEST_PATH_IMAGE013
(5)
wherein
Figure 30910DEST_PATH_IMAGE014
And
Figure 896098DEST_PATH_IMAGE015
are respectively
Figure 503796DEST_PATH_IMAGE009
And
Figure 54601DEST_PATH_IMAGE010
the distance between a central block and a surrounding domain, namely the similarity between the central block and the surrounding domain, is calculated, the larger the Wa value is, the higher the similarity between a small block and a large block is, the higher the probability that the small block is a background is, the smaller the Wa value is, the lower the similarity between the small block and the large block is, and the higher the probability that the small block is a target is. On the theoretical basis, the calculation is carried out block by block to finally obtain one
Figure 773159DEST_PATH_IMAGE008
The calculation process of the similarity matrix W _ soe is shown in fig. 4.
In fig. 4, the feature map input to the similarity target enhancement module is H × W, and is divided according to 2 × 2 small blocks, and a one-to-one correspondence 4 × 4 large block that radiates outward with each 2 × 2 small block as a center is obtained, and is obtained by calculating according to formula (5) by sliding with 2 × 2 small blocks as a unit and with a step size of 1
Figure 227274DEST_PATH_IMAGE008
W _ soe. From aboveThe similarity matrix values indicate the similarity between the current point and the surrounding points, and the smaller the value, the greater the probability that the range is the target.
In the embodiment of the present invention, the implementation flow of the similarity target enhancement module includes the following steps:
inputting: a feature F with dimensions H × W × C; parameters are as follows:
Figure 5874DEST_PATH_IMAGE019
Figure 811019DEST_PATH_IMAGE020
Figure 333267DEST_PATH_IMAGE021
,I=(W×H)/2×2;
initialization:
Figure 641889DEST_PATH_IMAGE022
1) Let c =1,2,3,C
2) Dividing the c-channel feature map into 2 × 2 small blocks to obtain I = (W × H)/(2 × 2) small blocks;
3) Order toi=1,2,3,I
4) According to
Figure 591390DEST_PATH_IMAGE023
Per region calculation
Figure 618252DEST_PATH_IMAGE024
5) Obtaining a matrix of (W/2) × (H/2);
6) Obtaining a matrix W _ soe of (W/2) × (H/2) × C;
7) Obtaining W _ soe' =1/W _ soe according to the correlation of the target;
8) Obtaining a similarity matrix W' through an activation function;
9) Combining the input characteristic diagram with the similarity matrix W to obtain a new characteristic diagram F';
and (3) outputting: a new feature map F' with size H x W x C containing enhanced target information. Illustratively, the method for implementing the similarity target enhancement module specifically includes the following steps:
first of all, by taking a set small block from the input matrix F
Figure 168358DEST_PATH_IMAGE018
Then, the similarity matrix W _ soe is obtained, and then W _ soe '= 1/W _ soe is taken, which is called Wasserstein similarity, so that a similarity matrix W _ soe' and Wasserstein similarity matrix which is positively correlated with the target can be obtained. Then, on the basis of the similarity matrix W ', a Sigmoid function is used for normalization and activation to obtain a matrix W, and the original input characteristic diagram is combined, so that the characteristic value of the characteristic diagram target is basically unchanged, the characteristic value of the background is reduced, the difference between the background and the target is enlarged, a new characteristic diagram F' containing the target information weight is obtained, and the function of enhancing the target information in the characteristic diagram is realized. In the embodiment of the invention, the process is modularized into the similarity target enhancement module SOE, and the similarity target enhancement module SOE is added to any step of the neural network according to requirements, so that a W _ SOE channel is constructed by adding the similarity target enhancement module SOE to the convolution process.
In order to intuitively feel the change of the similarity target enhancement module before and after, the embodiment of the present invention provides a visualized heat map corresponding to the feature map, for example, fig. 5 (a) is an original image, fig. 5 (b) is an input feature map, and fig. 5 (c) is a feature map processed by the similarity enhancement module.
The heterogeneous parallel network comprises the following steps: the small target in the infrared image has a small target occupation ratio and an unclear target outline due to the fact that the shooting distance is long and the night image resolution is low. Therefore, the method constructs a heterogeneous parallel backbone network, adds a similarity target enhancement module SOE into the backbone network for feature extraction, and realizes feature extraction according to a set direction during auxiliary feature extraction, so that a feature map contains more potential target information, the weight of a target is increased, the importance degree of the potential target in the feature map is purposefully increased in the convolution process, and the target is highlighted. In addition, the method is spliced with a conventional convolution channel and a feature map obtained based on an SimAM channel, so that the feature map contains more target information, meanwhile, the correlation between the original background and the target is kept, and the model detection performance is further improved in an auxiliary mode. Finally, by fusing the characteristic diagrams of the three different channels, the proportion of potential targets is increased on the basis of keeping original basic information, and the effect of target detection is improved finally. The schematic structure is shown in fig. 6.
As shown in fig. 6, the feature map is extracted from the output feature map by three convolution methods, i.e. the general convolution channel at the upper layer, the SimAM channel at the middle layer, and the containing channel at the lower layer
Figure 331486DEST_PATH_IMAGE025
And performing convolution channels of the modules, and performing splicing operation on the outputs of the three channels to fuse the feature information from different operations and further transmitting the feature information to a feature fusion structure. Wherein, the general convolution channel of the upper layer carries out conventional convolution operation on the input characteristic diagram, and the characteristic diagram is extracted in a self-adaptive way; the attention of the SimAM channel in the middle layer to a potential target in the convolution process is increased through a non-parameter attention model; and the lower layer convolution channel sets a similarity target enhancement module through the method, so that the weight of the target is increased, the importance degree of the potential target in the characteristic diagram is purposefully increased in the convolution process, and the target is highlighted. Finally, by fusing the characteristic diagrams of the three different channels, the proportion of potential targets is increased on the basis of keeping original basic information, and the effect of improving target detection is finally realized.
The PAN structure in step three is shown in fig. 7, and the feature splicing is performed on part of the multilayer feature maps obtained in step two, so as to obtain multilayer feature maps containing targets with predictable depth information; as shown in FIG. 6, deep features contain more semantic information and shallow features contain more detailed information. In the invention, not only are the feature maps of different depths in the PAN structure fused, but also the shallow feature map from the backbone network part in the second step is fused and spliced. Therefore, by fusing feature maps of different depths, the feature map finally input to the detection structure includes more feature information.
In fig. 7, the feature map output from the backbone network is first convolved, and two upsampling operations are performed to enlarge the size of the feature map, followed by downsampling to further integrate the feature information. In addition, after all the up-sampling or down-sampling operations, the feature splicing operation is performed again, and the feature splicing operation is spliced with the shallow feature map with the same size, so that the defect that the target information is lost along with the increase of the convolution operation is overcome. Finally, the three feature layers P3, P4, and P5 (the feature layer of the 3,4,5 layer of PAN) on the right side in fig. 7 are input to the detection structure, and the prediction frame size setting information of each pixel point in each feature layer is shown in table 1.
Figure 451889DEST_PATH_IMAGE026
The target area represented in the original image is reflected by each pixel point in the characteristic diagram, the size of the prediction frame refers to the size of a target which can be predicted at each pixel point, such as [10,13,16,30,33,23], the target with the size of 10 × 13,16 × 30 and 33 × 23 can be predicted, and the size of the target frame can be automatically corrected within a certain range according to the confidence coefficient. And predicting the target in the characteristic diagram according to the target weight obtained in the training process, and marking the position and the category of the target in the image by back propagation to the image to be detected.
Step four: and then, judging the type and the position of the target through the prediction structure so as to obtain the confidence coefficient of each infrared small target, and generating a target frame corresponding to each target so as to directly predict and obtain the type and the position information of the target. The detection structure inputs a plurality of feature maps with different depths obtained in the third step into the detection structure, processes each layer of feature map respectively, predicts targets in the feature maps in a feature point-by-feature point manner, orders the target probabilities corresponding to the predicted targets from large to small in the categories of the predicted targets, and selects the target with high target probability as a final target. And then, reversely transmitting the characteristic points of the target to the characteristic diagram corresponding to the original image through the feedback network, and acquiring the position of the target on the original image. The same target may be predicted on different feature maps, and then multiple predicted target boxes may be available.
Step five: and (4) deleting the non-optimal target frames in the plurality of target prediction frames obtained in the fourth step by using an NMS (network management system) method to obtain the optimal target frames, and storing and displaying the type and position information of the targets. In the fourth step, multiple predicted target frames may be obtained for the same target, so in order to obtain a target frame that best matches the target, the present invention adopts an NMS method to process the multiple predicted frames in the fourth step. When the coincidence rate of the two target frames is high, the two prediction frames can be regarded as the same target, and the prediction frame with high target probability is selected as the final target frame. And when the coincidence rate of the two prediction frames is smaller, the two prediction frames are regarded as two targets, the two target frames are reflected on the original image at the same time, and the target frames are marked as the final target type and position.
The database used in the present invention is from a public infrared dataset, the picture specifications are consistent, 256 pixels by 256 pixels, as shown in fig. 8. To verify the effectiveness of the present invention, fig. 8 is a sample of individual images of the data set used, and several representative infrared image effect graphs are shown to facilitate understanding of the objects of the present invention.
Fig. 9 is a comparison graph of Receiver Operating Characteristic (ROC) curves, in which a, B, C, D, and E represent different backbone network structures. A is a backbone network of a single-channel common convolution channel, B is a backbone network of a SimAM channel, C is a backbone network of a W _ soe channel, D is a backbone network of a common convolution combined with the SimAM channel, E is a backbone network structure of a common convolution channel and a SimAM channel combined with a W _ soe channel, and other parameters are the same. Fig. 10 (a) is 8 original images of small infrared targets to be detected in improved front and rear small infrared target detection results provided by the embodiment of the present invention; fig. 10 (b) is a detection result diagram of the detection method of the infrared small target in the airport headroom area at night in the detection results of the infrared small targets before and after improvement provided by the embodiment of the present invention.
Example 9
As shown in fig. 3, the detection system for infrared small targets in clean airspace of airport at night provided by the embodiment of the present invention includes:
the deep learning infrared small target detection model acquisition module 1 is used for setting initial parameters of a heterogeneous parallel network model of a detection method of a night airport clean airspace infrared small target, inputting a training set image in an infrared small target database into the heterogeneous parallel network model of the detection method of the night airport clean airspace infrared small target with the set parameters for training, and obtaining a deep learning infrared small target detection model;
the different-channel characteristic diagram splicing module 2 is used for inputting the infrared small target images to be detected into the trained infrared small target detection model, extracting the characteristics of the infrared small target images of the heterogeneous parallel backbone network, and splicing the characteristic diagrams from three different channels;
a predictable target feature map obtaining module 3, configured to perform feature splicing on the obtained partial multilayer feature maps through a Pixel Aggregation Network (PAN), so as to obtain multilayer predictable target feature maps containing target information of different sizes;
the target type and position information acquisition module 4 is used for judging the type and position of the target through the prediction structure so as to obtain the confidence coefficient of each infrared small target and generate a target frame corresponding to each target, so that the target type and position information is directly obtained through prediction;
and the target frame acquiring module 5 is configured to delete a Non-optimal target frame from the obtained multiple target prediction frames by using a Non-Maximum Suppression (NMS) method, obtain an optimal target frame, and store and display the category and location information of the target.
In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.
For the information interaction, execution process and other contents between the above-mentioned devices/units, because the embodiments of the method of the present invention are based on the same concept, the specific functions and technical effects thereof can be referred to the method embodiments specifically, and are not described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
2. The application example is as follows:
application example
An embodiment of the present invention provides a computer device, including: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various method embodiments described above when executing the computer program.
Embodiments of the present invention further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps in the above method embodiments may be implemented.
The embodiment of the present invention further provides an information data processing terminal, where the information data processing terminal is configured to provide a user input interface to implement the steps in the above method embodiments when implemented on an electronic device, and the information data processing terminal is not limited to a mobile phone, a computer, or a switch.
The embodiment of the present invention further provides a server, where the server is configured to provide a user input interface to implement the steps in the above method embodiments when implemented on an electronic device.
Embodiments of the present invention provide a computer program product, which, when running on an electronic device, enables the electronic device to implement the steps in the above method embodiments when executed.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may be implemented by a computer program, which may be stored in a computer-readable storage medium and used for instructing related hardware to implement the steps of the embodiments of the method according to the embodiments of the present invention. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer memory, read-only memory (ROM), random Access Memory (RAM), electrical carrier signal, telecommunications signal, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc.
3. Evidence of the relevant effects of the examples:
experiments show that
The method for detecting the infrared small target in the clean airspace of the airport at night comprises the steps of firstly setting initial parameters of a heterogeneous parallel network model of the method for detecting the infrared small target in the clean airspace of the airport at night, inputting training set images in an infrared small target database into the set network model for training, and obtaining an infrared small target detection model based on deep learning; inputting an infrared small target image to be detected into a trained infrared small target detection model, performing feature extraction on the infrared small target image passing through a heterogeneous parallel backbone network, and fusing the feature maps through a pixel aggregation network to further obtain a plurality of layers of feature maps containing target information; and then processing the feature map containing the target information through the prediction structure, respectively obtaining the type and the position information of the target, and generating a target frame corresponding to each target, thereby directly displaying the type and the position information of the predicted target in the image. The invention has good adaptivity and detection performance to infrared small targets; the detection speed is high, the model size is small, and the model is easy to deploy in hardware equipment; the model is simple to operate and easy to train. The results of comparison with other existing methods are shown in table 2, and the present invention achieves the best results in terms of model size, detection speed, missed detection rate, and precision rate.
Figure 231626DEST_PATH_IMAGE028
The above description is only for the purpose of illustrating the preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, and any modification, equivalent replacement, and improvement made by those skilled in the art within the technical scope of the present invention disclosed herein, which is within the spirit and principle of the present invention, should be covered by the present invention.

Claims (10)

1. A method for detecting infrared small targets in a clean airspace of an airport at night is characterized by comprising the following steps:
s1, setting initial parameters of a heterogeneous parallel network model, inputting images of a training set in an infrared small target database into the heterogeneous parallel network model with the set parameters for training, and obtaining a deep-learning infrared small target detection model;
s2, inputting the infrared small target image to be detected into an infrared small target detection model, extracting the characteristics of the infrared small target image of the heterogeneous parallel backbone network, and splicing the characteristic diagrams of three different channels, wherein the three different channels comprise: a similarity target enhancement module channel, a general feature extraction structure channel and a backbone network structure channel based on a simple attention-free mechanism;
s3, performing feature splicing on the obtained partial multilayer feature maps through a pixel aggregation network, and then obtaining multilayer feature maps containing predictable targets with different sizes;
s4, judging the type and the position of the target through the prediction structure, obtaining the confidence coefficient of each infrared small target, generating a target prediction frame corresponding to each target, and predicting and obtaining the type and the position information of the target;
s5: and deleting the target frames with low scores in the plurality of obtained target prediction frames calculated by an IOU formula by using a non-maximum value inhibition method to obtain target frames, and storing and displaying the category and position information of the targets.
2. The method for detecting the infrared small target in the net airspace of the airport at night according to claim 1, wherein in step S1, the heterogeneous parallel network model builds a heterogeneous parallel backbone network in a VGG form based on a basic Focus module, a C3 module and an SPP module, generates a multi-layer fused feature map by combining a pixel aggregation network, and predicts the category and position of the target according to a prediction network; correcting the position information returned to the original image according to the L1 loss function to obtain the final classification and accurate position information of the target;
target prediction box loss function
Figure 578320DEST_PATH_IMAGE001
The calculation formula is shown as formula (1):
Figure 964302DEST_PATH_IMAGE002
(1)
in the formula
Figure 212880DEST_PATH_IMAGE003
Indicating the probability of regression being correct after a certain target position is mapped to the original image,
Figure 342510DEST_PATH_IMAGE004
is an influence range factor;
the initial parameters of the heterogeneous parallel network model comprise: the number of network layers, the weight and the bias value of each layer of neuron.
3. The method for detecting the night airport headroom infrared small target as claimed in claim 2, wherein the heterogeneous parallel backbone network of the heterogeneous parallel network model performs a splicing operation with a feature map obtained by a similarity target enhancement module channel and a feature map obtained by a general feature extraction structure based on a backbone network structure without a simple attention mechanism, and enhances the proportion and the importance degree of feature map target information in the convolution operation process.
4. The method for detecting infrared small targets in clear airspace at night airport according to claim 1, wherein in step S5, the target frame score calculated by the IOU formula is defined as the ratio between the intersection and union of two frames, and the IOU is calculated as shown in the following formula (2):
Figure 124259DEST_PATH_IMAGE005
(2)
wherein A and B respectively represent two frames,
Figure 415563DEST_PATH_IMAGE006
a union region representing the two boxes is shown,
Figure 151438DEST_PATH_IMAGE007
representing the intersection area of the two boxes.
5. The method for detecting infrared small targets in airport clean airspace at night according to claim 3, wherein the similarity target enhancement module first divides the characteristic diagram H x W into
Figure 84759DEST_PATH_IMAGE008
A 2 × 2 small block
Figure 222480DEST_PATH_IMAGE009
And extending to 4 × 4 large block with one of the small blocks as center
Figure 684685DEST_PATH_IMAGE008
Big piece
Figure 907856DEST_PATH_IMAGE010
Calculating the Wasserstein distance between each small block and the corresponding large block; the Wasserstein distance calculation mode is shown as formula (3):
Figure 113709DEST_PATH_IMAGE011
(3)
the distance between two rectangular blocks is defined approximately as:
Figure 371515DEST_PATH_IMAGE012
(4)
and simplified as follows:
Figure 783385DEST_PATH_IMAGE013
(5)
wherein
Figure 493852DEST_PATH_IMAGE014
And
Figure 237817DEST_PATH_IMAGE015
are respectively
Figure 350129DEST_PATH_IMAGE009
And
Figure 154137DEST_PATH_IMAGE010
the mean and variance of the feature points in the region, wasserstein distance, are used to measure twoThe distance between the distributions is taken as the similarity of the two distributions, and the distance between the central block and the surrounding domain is calculated as the similarity of the central block and the surrounding domain;
Figure 617480DEST_PATH_IMAGE016
the larger the value, the higher the similarity between the small block and the large block, the greater the probability that the small block is the background,
Figure 165136DEST_PATH_IMAGE016
the smaller the value, the lower the similarity between the small block and the large block, the greater the probability that the small block is a target; calculating block by block to obtain a final product
Figure 131955DEST_PATH_IMAGE008
The similarity matrix W _ soe;
the characteristic diagram input to the similarity target enhancement module is H multiplied by W, the characteristic diagram is divided according to 2 multiplied by 2 small blocks, corresponding 4 multiplied by 4 large blocks which are radiated outwards by taking each 2 multiplied by 2 small block as the center are obtained, sliding is carried out according to the step length of 1 by taking the 2 multiplied by 2 small blocks as the unit according to the formula (5), and calculation is carried out to obtain the characteristic diagram
Figure 372443DEST_PATH_IMAGE008
W _ soe.
6. The method for detecting night airport net airspace infrared small targets of claim 3, wherein the enhancing the proportion and importance degree of the feature map target information in the convolution operation process comprises:
inputting a feature map F of size H x W x C by taking a set small block
Figure 290458DEST_PATH_IMAGE017
Forming a similarity matrix W _ soe, then, taking W _ soe '= 1/W _ soe, called Wasserstein similarity, and obtaining a similarity matrix W _ soe' positively correlated with the target and a Wasserstein similarity matrix;
on the basis of the similarity matrix W ', a Sigmoid function is used for normalization and activation to obtain a matrix W, and the characteristic value of a characteristic diagram target is basically unchanged by combining an original input characteristic diagram, so that a new characteristic diagram F' containing the weight of target information is obtained, and the target information in the characteristic diagram is enhanced.
7. The method for detecting the night airport headroom infrared small target as claimed in claim 2, wherein in the feature fusion stage in the heterogeneous parallel network model, a pixel aggregation network structure is adopted to perform top-down and bottom-up feature fusion, and multiple layers of feature maps from different depths are fused, and multiple layers of feature maps with different sizes are further obtained; the feature maps of different depths correspond to different sized targets, respectively, each feature map being responsive to a target of a particular size.
8. The method for detecting the infrared small target in the airport headroom zone at night according to claim 7, wherein the heterogeneous parallel network model predicts the feature map pixel by pixel in a prediction structure to obtain a potential target, and classifies the target to obtain category information of the target; and reversely transmitting the corresponding target position information on the feature map to the original image through a regression strategy to obtain the position of the target original image, obtain the category and the approximate position information of the target, finally deleting the non-optimal target frame through a non-maximum value inhibition method in the step S5 to obtain the final accurate target frame position, and storing and displaying the position information and the classification information.
9. A system for implementing the method for detecting night airport net airspace infrared small targets according to any one of claims 1-8, wherein the system for detecting night airport net airspace infrared small targets comprises:
the deep learning infrared small target detection model acquisition module (1) is used for setting initial parameters of the heterogeneous parallel network model, inputting images of a training set in an infrared small target database into the heterogeneous parallel network model with the set parameters for training, and obtaining a deep learning infrared small target detection model;
the different-channel characteristic diagram splicing module (2) inputs the infrared small target image to be detected into the infrared small target detection model, performs characteristic extraction on the infrared small target image of the heterogeneous parallel backbone network, and splices characteristic diagrams from three different channels;
a characteristic diagram obtaining module (3) for predicting targets, which carries out characteristic splicing on the obtained partial multilayer characteristic diagrams through a pixel aggregation network so as to obtain multilayer characteristic diagrams containing target information with different sizes and capable of predicting targets;
the target type and position information acquisition module (4) is used for judging the type and position of the target through the prediction structure, acquiring the confidence coefficient of each infrared small target, generating a target frame corresponding to each target and predicting and acquiring the type and position information of the target;
and the target frame acquisition module (5) deletes the non-optimal target frame in the obtained target prediction frames by using a non-maximum value inhibition method to obtain a target frame, and stores and displays the type and position information of the target.
10. A computer arrangement, characterized in that the computer arrangement comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the method of detection of night airport net airspace infrared small targets according to any one of claims 1-8.
CN202211359429.0A 2022-11-02 2022-11-02 Method and system for detecting infrared small target in night airport clear airspace and application Active CN115410012B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211359429.0A CN115410012B (en) 2022-11-02 2022-11-02 Method and system for detecting infrared small target in night airport clear airspace and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211359429.0A CN115410012B (en) 2022-11-02 2022-11-02 Method and system for detecting infrared small target in night airport clear airspace and application

Publications (2)

Publication Number Publication Date
CN115410012A true CN115410012A (en) 2022-11-29
CN115410012B CN115410012B (en) 2023-02-28

Family

ID=84169334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211359429.0A Active CN115410012B (en) 2022-11-02 2022-11-02 Method and system for detecting infrared small target in night airport clear airspace and application

Country Status (1)

Country Link
CN (1) CN115410012B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818964A (en) * 2021-03-31 2021-05-18 中国民航大学 Unmanned aerial vehicle detection method based on FoveaBox anchor-free neural network
CN114627052A (en) * 2022-02-08 2022-06-14 南京邮电大学 Infrared image air leakage and liquid leakage detection method and system based on deep learning
CN114648714A (en) * 2022-01-25 2022-06-21 湖南中南智能装备有限公司 YOLO-based workshop normative behavior monitoring method
US20220207728A1 (en) * 2019-04-05 2022-06-30 Oxford University Innovation Limited Quality assessment in video endoscopy
CN114758288A (en) * 2022-03-15 2022-07-15 华北电力大学 Power distribution network engineering safety control detection method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220207728A1 (en) * 2019-04-05 2022-06-30 Oxford University Innovation Limited Quality assessment in video endoscopy
CN112818964A (en) * 2021-03-31 2021-05-18 中国民航大学 Unmanned aerial vehicle detection method based on FoveaBox anchor-free neural network
CN114648714A (en) * 2022-01-25 2022-06-21 湖南中南智能装备有限公司 YOLO-based workshop normative behavior monitoring method
CN114627052A (en) * 2022-02-08 2022-06-14 南京邮电大学 Infrared image air leakage and liquid leakage detection method and system based on deep learning
CN114758288A (en) * 2022-03-15 2022-07-15 华北电力大学 Power distribution network engineering safety control detection method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JINGYI QU ET AL.: ""Research on recognition algorithm of LSS based on video in airport clearance area"", 《2021 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA》 *
刘闪亮等: ""基于A-YOLOv5s 的机场小目标检测方法"", 《安全与环境学报》 *
林野: ""基于生成对抗网络的跨域人脸合成研究和应用"", 《中国博士学位论文全文数据库信息科技辑》 *

Also Published As

Publication number Publication date
CN115410012B (en) 2023-02-28

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
US20200241545A1 (en) Automatic braking of autonomous vehicles using machine learning based prediction of behavior of a traffic entity
US11373067B2 (en) Parametric top-view representation of scenes
JP2021516806A (en) Neural network for object detection and characterization
US20210326609A1 (en) Object classification using extra-regional context
JP2021515939A (en) Monocular depth estimation method and its devices, equipment and storage media
US20190301861A1 (en) Method and apparatus for binocular ranging
Abdi et al. Deep learning traffic sign detection, recognition and augmentation
CN112200129A (en) Three-dimensional target detection method and device based on deep learning and terminal equipment
CN114359851A (en) Unmanned target detection method, device, equipment and medium
Li et al. Implementation of deep-learning algorithm for obstacle detection and collision avoidance for robotic harvester
CN115223117B (en) Training and using method, device, medium and equipment of three-dimensional target detection model
CN117157678A (en) Method and system for graph-based panorama segmentation
Khalifa et al. A novel multi-view pedestrian detection database for collaborative intelligent transportation systems
CN114972758A (en) Instance segmentation method based on point cloud weak supervision
WO2022217434A1 (en) Cognitive network, method for training cognitive network, and object recognition method and apparatus
CN115410012B (en) Method and system for detecting infrared small target in night airport clear airspace and application
CN116844129A (en) Road side target detection method, system and device for multi-mode feature alignment fusion
Abu-Khadrah et al. Pervasive computing of adaptable recommendation system for head-up display in smart transportation
US20230252658A1 (en) Depth map completion in visual content using semantic and three-dimensional information
US20220237402A1 (en) Static occupancy tracking
Schennings Deep convolutional neural networks for real-time single frame monocular depth estimation
CN115273032A (en) Traffic sign recognition method, apparatus, device and medium
CN114972182A (en) Object detection method and device
Zeng High efficiency pedestrian crossing prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20221129

Assignee: TIANDY TECHNOLOGIES Co.,Ltd.

Assignor: CIVIL AVIATION University OF CHINA

Contract record no.: X2024980002702

Denomination of invention: A Detection Method, System, and Application of Infrared Small Targets in Night Airport Clearance Area

Granted publication date: 20230228

License type: Common License

Record date: 20240312