CN116468928B - Thermal infrared small target detection method based on visual perception correlator - Google Patents

Thermal infrared small target detection method based on visual perception correlator Download PDF

Info

Publication number
CN116468928B
CN116468928B CN202211702320.2A CN202211702320A CN116468928B CN 116468928 B CN116468928 B CN 116468928B CN 202211702320 A CN202211702320 A CN 202211702320A CN 116468928 B CN116468928 B CN 116468928B
Authority
CN
China
Prior art keywords
thermal infrared
visible light
convolution block
correlator
visual perception
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211702320.2A
Other languages
Chinese (zh)
Other versions
CN116468928A (en
Inventor
徐小雨
詹伟达
于永吉
朱德鹏
韩登
李国宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun University of Science and Technology
Original Assignee
Changchun University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun University of Science and Technology filed Critical Changchun University of Science and Technology
Priority to CN202211702320.2A priority Critical patent/CN116468928B/en
Publication of CN116468928A publication Critical patent/CN116468928A/en
Application granted granted Critical
Publication of CN116468928B publication Critical patent/CN116468928B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Photometry And Measurement Of Optical Pulse Characteristics (AREA)
  • Radiation Pyrometers (AREA)

Abstract

The invention belongs to the technical field of computer vision, in particular to a thermal infrared small target detection method based on a visual perception correlator, which comprises the following steps: step 1, constructing a thermal infrared small target detection network: the whole network comprises an input image, an interested region, a visual perception correlator and an output four part, wherein the input image comprises a thermal infrared image and a visible light image, the interested region consists of three parts of feature extraction, suggestion generation and interested region generation, two branches are respectively formed, and the two branches are respectively provided with one input and one output. According to the invention, by designing the visual perception correlator, information complementation of the thermal infrared light and the visible light in input can be realized, and the problem of insufficient characteristics in the process of detecting the thermal infrared small target is enriched. Meanwhile, when the small sample detection task is carried out, the visual perception correlator can also improve the detection performance of the thermal infrared small target by using the thermal infrared image features and the visible light image features in a crossed way.

Description

Thermal infrared small target detection method based on visual perception correlator
Technical Field
The invention relates to the technical field of computer vision, in particular to a thermal infrared small target detection method based on a visual perception correlator.
Background
Thermal infrared small target detection is a technology for detecting weak and small targets from thermal infrared images by using computer technology. Compared with the common target detection, the thermal infrared small target has the following characteristics: a great amount of noise and clutter exist in the thermal infrared image background, and a target is easy to submerge in the background, so that the contrast ratio is low and the signal-to-noise ratio is low; due to the long camera-to-object distance, thermal infrared targets typically occupy only about one to ten pixels in the image. The shape and size of the objects may vary in different scenarios and situations, depending on the type of object. In the existing thermal infrared small target detection method, for example, a GAN network generates a synthesized thermal infrared image by utilizing a visible light image, so that the function of enhancing the thermal infrared image is realized, and a task adjusting domain self-adaptive network is introduced to regulate and control the thermal infrared image and the visible image on a detection head. Humans can correlate useful information from other sensors to perform a more reliable decision heuristic, however there is currently no clear way to store available visible light image information to supplement the target features of the thermal infrared image, implementing the functionality of correlating the two modality features.
Aiming at the problems, referring to the conventional method in the thermal infrared small target detection field, a thermal infrared small target detection method based on a visual perception correlator is designed, wherein the visual perception correlator is introduced, the correlator is used for extracting characteristic information in an input thermal infrared image, perfecting the thermal infrared image characteristic by utilizing the characteristic and realizing explicit correlation with the input visible light image characteristic.
The Chinese patent application publication number is CN114882322A, the name is a thermal infrared small target detection method based on a bidirectional attention aggregation mechanism, and the proposed bidirectional attention aggregation mechanism can extract shape information from two directions according to low-layer characteristics, so that the high-layer characteristics can be guided to finish refinement, and the reconstruction of the shape and the edge of a target can be promoted. However, this method cannot learn thermal infrared features through the correlator and correlate with given thermal infrared small target features, and thus cannot input thermal infrared small target image features from enhancement, resulting in lower detection accuracy.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a thermal infrared small target detection method based on a visual perception correlator, which solves the problem of lower detection precision of the thermal infrared small target by the existing detection method.
(II) technical scheme
The invention adopts the following technical scheme for realizing the purposes:
a thermal infrared small target detection method based on a visual perception correlator comprises the following steps:
step 1, constructing a thermal infrared small target detection network: the whole network comprises four parts of input images, regions of interest, visual perception correlators and output,
the input image includes a thermal infrared image and a visible light image,
the interested region is composed of three parts of feature extraction, generation suggestion and interested region, two branches are respectively formed, the two branches are respectively provided with an input and an output, the input of the first branch is a thermal infrared image, the thermal infrared image sequentially passes through the feature extraction and the interested region, and simultaneously and sequentially passes through the generation suggestion and the interested region during feature extraction to obtain a corresponding output of the thermal infrared image interested region, the input of the second branch is a visible light image, the visible light image sequentially passes through the feature extraction and the interested region, and simultaneously and sequentially passes through the generation suggestion and the interested region during feature extraction to obtain a corresponding output of the visible light image interested region,
the visual perception correlator consists of a thermal infrared visual perception correlator and a visible light visual perception correlator, two branches are respectively formed, the two branches are respectively provided with an input and an output, the two inputs are respectively a thermal infrared image interested region and a visible light image interested region which are obtained by the interested region, the input of the first branch is the thermal infrared image interested region which is obtained by the interested region and is input into the thermal infrared visual perception correlator, the visible light image interested region which is obtained by the interested region is input into the thermal infrared visual perception correlator, the output of the second branch is the visible light image interested region which is obtained by the interested region and is input into the visible light visual perception correlator, the output of the second branch is obtained by the visible light visual perception correlator,
the output consists of a detection head, and comprises an input and three outputs, wherein the input is obtained by adding the outputs of a visible light visual sensor and a thermal infrared visual sensor in a visual sense correlator, the added results can be respectively classified output, regression output and object output through the detection head, the detection head structure consists of five convolution blocks,
step 2, establishing a thermal infrared small target detection data set:
the method comprises the steps that K visible light small target images and K corresponding thermal infrared small target images of different types of targets are obtained through a visible light camera and a thermal infrared camera, targets in each thermal infrared image and each visible light image are marked, M thermal infrared images, visible light images and labels of each image form a training sample set R, the rest K-M thermal infrared images, visible light images and labels of each image form a test sample set E, wherein K is more than or equal to 1000, H is more than or equal to 256, and M is more than or equal to 4;
step 3, training a thermal infrared small target detection network: inputting the data set prepared in the step 2 into the network model constructed in the step 1 for training;
step 4, selecting a minimized loss function: outputting a loss function of the image and the label through a minimized network, considering that the model parameters are pre-trained and finishing until the training times reach a set threshold value or the value of the loss function reaches a set range, and storing the model parameters; simultaneously selecting an optimal evaluation index to measure the accuracy of the algorithm and evaluating the performance of the system;
step 5, fine tuning the model: training and fine-tuning the model by using a public thermal infrared small target detection data set to obtain stable and usable model parameters, and finally enabling the model to have a better fusion effect;
step 6, storing the optimal model: solidifying the finally determined model parameters, when the thermal infrared small target detection operation is needed, directly inputting the image to be detected into a network to obtain the final thermal infrared small target detection result,
in the step 1, the region of interest is generated, the generation proposal adopts the structure of RoI and RPN,
the characteristic extraction part consists of a first convolution block, a second convolution block, a third convolution block, a fourth convolution block, a first space channel convolution block, a second space channel convolution block and a third space channel convolution block, and is provided with an input and an output, wherein the input sequentially passes through the first convolution block, the second convolution block, the first space channel convolution block, the second space channel convolution block and the third space channel convolution block; meanwhile, a residual structure is formed by a third convolution block and a first convolution block, a residual structure is formed by a series structure of a fourth convolution block, a first convolution block and a second convolution block, the first convolution block, the second convolution block, the third convolution block and the fourth convolution block share one structure, and the residual structure is formed by a second convolution layer, a regularization layer, an activation function, a third convolution layer, a regularization layer and an activation function, wherein in the first convolution block, the second convolution block and the third convolution block, the second convolution kernel size of the convolution layer is 3 multiplied by 3, the step size is 2, the third convolution kernel size of the convolution layer is 1 multiplied by 1, the step size is 1, the second convolution kernel size of the convolution layer in the fourth convolution block is 11, the step size is 4, the third convolution kernel size of the convolution layer is 1 multiplied by 1, the first convolution block of the spatial channel, the second convolution block of the spatial channel and the third convolution block share one structure, the first convolution kernel size of the convolution layer is 1 multiplied by 1,
the visual perception correlator comprises a thermal infrared visual perception correlator and a visible light visual perception correlator, the thermal infrared visual perception correlator consists of a thermal infrared key value memory, a thermal infrared similar vector, a regularization layer, an addressing vector and a visible light key value memory, wherein the input characteristic is a thermal infrared interest region characteristic, the output characteristic is a correlation visible light image interest region characteristic, the thermal infrared key value memory is a characteristic vector of the input thermal infrared interest region, the thermal infrared similar vector is obtained by cosine similarity calculation of the thermal infrared key value memory, the addressing vector is obtained by processing the thermal infrared similar vector through the regularization layer, the visible light key value memory is a characteristic vector of the visible light interest region determined by the visible light interest region in the input thermal infrared visual perception correlator according to the addressing vector,
the visible light visual perception correlator consists of a visible light key value memory, a visible light similarity vector, a regularization layer, an addressing vector and a thermal infrared key value memory, wherein the input characteristic is a visible light region of interest characteristic, the output characteristic is a correlation thermal infrared region of interest characteristic, the visible light key value memory is a characteristic vector input into the visible light region of interest, the visible light similarity vector is obtained by cosine similarity calculation of the visible light key value memory, the addressing vector is obtained by processing the visible light similarity vector through the regularization layer, the thermal infrared key value memory is a characteristic vector input into the thermal infrared region of interest in the thermal infrared visual perception correlator and determined according to the addressing vector,
the detection head consists of a convolution block five, a convolution block six, a convolution block seven, a convolution block eight and a convolution block nine, wherein the convolution block five, the convolution block six, the convolution block seven, the convolution block eight and the convolution block nine share a structure, the structure consists of a convolution layer four, a regularization layer and an activation function which are sequentially connected, the convolution kernel is 3 multiplied by 3, and the step length is 1.
(III) beneficial effects
Compared with the prior art, the invention provides a thermal infrared small target detection method based on a visual perception correlator, which has the following beneficial effects:
according to the invention, the visual perception correlator is designed to obtain information complementation capable of realizing thermal infrared and visible light input, so that the problem of insufficient characteristics in the process of detecting the thermal infrared small target is enriched, and meanwhile, the visual perception correlator can also improve the detection performance of the thermal infrared small target by using the thermal infrared and visible light image characteristics in a crossing way when a small sample detection task is carried out.
According to the invention, the space channel convolution block is designed to replace the common convolution downsampling operation, so that the convolution step length and the pooling layer are completely replaced, an activated object is moved from the space dimension to the channel dimension, and the feature map can be downsampled under the condition that the learnable information is not lost.
The method can exert the capability of calculating the sample similarity loss of the Dice loss function by using the combination of the Dice loss function and the cross entropy loss, and reduce the error of similarity calculation in the visual perception correlator.
Drawings
FIG. 1 is a flow chart of a method for detecting a thermal infrared small target based on a visual perception correlator;
FIG. 2 is a network structure diagram of a thermal infrared small target detection method based on a visual perception correlator of the invention;
FIG. 3 is a thermal infrared visual perception correlator according to the present invention;
FIG. 4 is a diagram of a visible light visual perception correlator according to the present invention;
FIG. 5 is a feature extraction structure of the present invention;
FIG. 6 is a spatial channel convolution block structure in a feature extraction architecture of the present invention;
FIG. 7 is a schematic diagram of a space-to-channel convolution block in accordance with the present invention;
FIG. 8 is a block diagram of a first, second, third and fourth convolution of the present invention;
FIG. 9 is a schematic diagram showing the constitution of a detecting head according to the present invention;
FIG. 10 is a block five, six, seven, eight and nine convolution configurations of the present invention;
FIG. 11 is a comparison of related indexes of the prior art and the proposed method of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1, a thermal infrared small target detection method based on a visual perception correlator specifically comprises the following steps:
step 1, constructing a thermal infrared small target detection network model: the whole network comprises an input image, an area of interest, a visual perception correlator and an output four part;
the input image comprises a thermal infrared image and a visible light image, and a corresponding small target data set is obtained by shooting with a thermal infrared camera and a visible light camera respectively;
the method comprises the steps that an interested region consists of three parts of feature extraction, generation suggestion and an interested region, two branches are respectively formed, the two branches are respectively provided with an input and an output, the input of the first branch is a thermal infrared image, the thermal infrared image sequentially passes through the feature extraction and the interested region, and in parallel and sequentially passes through the generation suggestion and the interested region during feature extraction to obtain a corresponding output which is the thermal infrared image interested region, the input of the second branch is a visible light image, the visible light image sequentially passes through the feature extraction and the interested region, and in parallel and sequentially passes through the generation suggestion and the interested region during feature extraction to obtain a corresponding output which is the visible light image interested region, and the feature extraction part consists of a first convolution block, a second convolution block, a third convolution block, a fourth convolution block, a first space channel convolution block, a second space channel convolution block and a third space channel convolution block, and comprises an input and an output which sequentially passes through the first convolution block, the second space channel convolution block and the third space channel convolution block; meanwhile, a residual structure is formed by the third convolution block and the first convolution block, the residual structure is formed by the serial structure of the fourth convolution block, the first convolution block and the second convolution block, the first convolution block, the second convolution block, the third convolution block and the fourth convolution block share one structure, and the residual structure is formed by a second convolution layer, a regularization layer, an activation function, a third convolution layer, a regularization layer and an activation function, wherein in the first convolution block, the second convolution block and the third convolution block, the convolution kernel size of the second convolution layer is 3 multiplied by 3, the step size is 2, the convolution kernel size of the third convolution layer is 1 multiplied by 1, the step size is 1, the convolution kernel size of the second convolution layer in the fourth convolution block is 11, the step size is 4, the convolution kernel size of the third convolution layer is 1 multiplied by 1, the first convolution block of a space channel, the second convolution block of a space channel and the third convolution block share one structure, and the first convolution block of a space channel, wherein the convolution kernel size of the first convolution layer is 1 multiplied by 1;
the visual perception correlator consists of a thermal infrared visual perception correlator and a visible light visual perception correlator, two branches are respectively formed, one input and one output are respectively formed in the two branches, the two inputs are a thermal infrared image interested region and a visible light image interested region which are respectively obtained by the interested region, the input of the first branch is the thermal infrared image interested region which is obtained by the interested region and is input into the thermal infrared visual perception correlator, the visible light image interested region which is obtained by the interested region is input into the thermal infrared visual perception correlator, the output of the second branch is the visible light image interested region which is obtained by the interested region and is input into the visible light visual perception correlator, and the output of the second branch is the output of the visible light visual perception correlator;
the visual perception correlator comprises a thermal infrared visual perception correlator and a visible light visual perception correlator, the thermal infrared visual perception correlator consists of a thermal infrared key value memory, a thermal infrared similar vector, a regularization layer, an addressing vector and a visible light key value memory, wherein the input characteristic is a thermal infrared region of interest characteristic, the output characteristic is a correlation visible light image region of interest characteristic, the thermal infrared key value memory is a characteristic vector of the input thermal infrared region of interest, the thermal infrared similar vector is obtained by cosine similarity calculation of the thermal infrared key value memory, the addressing vector is obtained by processing the thermal infrared similar vector through the regularization layer, and the visible light key value memory is a characteristic vector of a visible light region of interest determined according to the addressing vector in the input thermal infrared visual perception correlator;
the visible light visual perception correlator consists of a visible light key value memory, a visible light similarity vector, a regularization layer, an addressing vector and a thermal infrared key value memory, wherein the input characteristic is a visible light region of interest characteristic, the output characteristic is a correlation thermal infrared region of interest characteristic, the visible light key value memory is a characteristic vector input into the visible light region of interest, the visible light similarity vector is obtained by cosine similarity calculation of the visible light key value memory, the addressing vector is obtained by processing the visible light similarity vector through the regularization layer, and the thermal infrared key value memory is a characteristic vector input into the thermal infrared region of interest in the thermal infrared visual perception correlator and determined according to the addressing vector;
the output consists of a detection head, wherein the detection head consists of five convolution blocks, namely a detection head structure, the detection head consists of five convolution blocks, namely a convolution block five, a convolution block six, a convolution block seven, a convolution block eight and a convolution block nine, the convolution block five, the convolution block six, the convolution block seven, the convolution block eight and the convolution block nine share a structure, the structure consists of a convolution layer four, a regularization layer and an activation function which are sequentially connected, the convolution kernel is 3 multiplied by 3, and the step length is 1;
step 2, establishing a thermal infrared small target detection data set: the method comprises the steps that K visible light small target images and K corresponding thermal infrared small target images of different types of targets are obtained through a visible light camera and a thermal infrared camera, targets in each thermal infrared image and each visible light image are marked, M thermal infrared images, visible light images and labels of each image form a training sample set R, the rest K-M thermal infrared images, visible light images and labels of each image form a test sample set E, K is more than or equal to 1000, H is more than or equal to 256, M is more than or equal to 4, and the self-made data set images are enhanced, subjected to random diffraction transformation and cut into input image sizes to serve as input of the whole network;
step 3, training a thermal infrared small target detection network, and inputting the thermal infrared small target data set obtained in the step 2 into the network model constructed in the step 1 for training;
step 4, selecting a minimized loss function, outputting the loss function of the image and the label through a minimized network, and considering that model parameters are trained until the training times reach a set threshold value or the value of the loss function reaches a set range, and storing the model parameters, wherein the loss function is selected to use two parts of classification and regression loss in the training process, so that the monitoring signal of a positive sample is fully utilized, and meanwhile, if the positive sample has a very high cross-over ratio, the contribution of the corresponding loss function is larger in the training process, so that the training can be focused on samples with high quality;
step 5, fine tuning the model: the method comprises the steps of performing fine adjustment on the parameters of the whole network model by using a thermal infrared small target detection data set to obtain stable available model parameters, further improving the thermal infrared small target detection capability of the model, and finally enabling the model to have a better thermal infrared small target detection effect;
step 6, saving the model: and solidifying the finally determined model parameters, and directly inputting the image to be detected into a network to obtain a final thermal infrared small target detection result when the thermal infrared small target detection operation is needed.
Example 2:
as shown in fig. 1, a thermal infrared small target detection method based on a visual perception correlator specifically comprises the following steps:
step 1, constructing a thermal infrared small target detection network;
as shown in fig. 2, the whole network comprises four parts of an input image, a region of interest, a visual perception correlator and an output;
the input image comprises a thermal infrared image and a visible light image, and a corresponding small target data set is obtained by shooting with a thermal infrared camera and a visible light camera respectively;
the method comprises the steps that an interested region consists of three parts of feature extraction, generation suggestion and an interested region, two branches are respectively formed, the two branches are respectively provided with an input and an output, the input of the first branch is a thermal infrared image, the thermal infrared image sequentially passes through the feature extraction and the interested region, and the thermal infrared image sequentially passes through the generation suggestion and the interested region in parallel during feature extraction to obtain a corresponding output which is the thermal infrared image interested region, the input of the second branch is a visible light image, the visible light image sequentially passes through the feature extraction and the interested region, and the visible light image sequentially passes through the generation suggestion and the interested region in parallel during feature extraction to obtain a corresponding output which is the visible light image interested region, and as shown in fig. 5, the feature extraction part consists of a first convolution block, a second convolution block, a third convolution block, a fourth convolution block, a first space channel convolution block, a second space channel convolution block and a third space channel convolution block, and the input sequentially passes through the first convolution block, the second convolution block, the space channel convolution block and the third convolution block; meanwhile, a residual structure is formed by a convolution block III and a convolution block I, a residual structure is formed by a serial structure of a convolution block IV and a convolution block II, as shown in fig. 6, a space channel convolution block is formed by serial connection and residual connection of a space channel, a convolution layer I and an activation function, as shown in fig. 7, a space channel method principle is that a 4×4 image is divided into frames with the size of 2×2 equal parts, different information of 4 positions in the 4 2×2 frames is extracted and respectively placed in 2×2 images with 4 dimensions, and the method is suitable for images without the size, as shown in fig. 8, a convolution block I, a convolution block II, a convolution block III and a convolution block IV share one structure, and consists of a convolution layer II, a regularization layer, an activation function, a convolution layer III, a regularization layer and an activation function, wherein in the convolution block II and the convolution block III, the convolution layer II, the convolution core size of the convolution layer II is 3×3, the step size of the convolution layer III is 1×1, the convolution layer III, the convolution core size of the convolution layer III is 1×1, the step size of the convolution layer II, and the step size of the convolution layer III is 3×11;
as shown in fig. 3, the thermal infrared visual perception correlator comprises a thermal infrared key value memory, a thermal infrared similar vector, a regularization layer, an addressing vector and a visible light key value memory, wherein the input features are thermal infrared interest region features, the output features are correlation visible light image interest region features, the thermal infrared key value memory is a feature vector of the input thermal infrared interest region, and the thermal infrared similar vector is obtained by cosine similarity calculation of the thermal infrared key value memory;
the addressing vector is obtained by processing a thermal infrared similar vector through a regularization layer, the visible light key value memory is a characteristic vector of a visible light region of interest in the input thermal infrared visual perception correlator, which is determined according to the addressing vector, the thermal infrared key value memory is obtained by dividing the input thermal infrared region of interest characteristic into N equal-divided vectors and flattening the input thermal infrared characteristic, the thermal infrared key value memory comprises a thermal infrared key value 1, thermal infrared key values 2, … … and a thermal infrared key value N, the thermal infrared similar vector is obtained by measuring cosine similarity of each part of thermal infrared characteristic in the thermal infrared key value memory and flattened thermal infrared characteristic, therefore, the thermal infrared similar vectors comprise thermal infrared similar vector 1, thermal infrared similar vectors 2 and … … and thermal infrared similar vector N, the addressing vectors are obtained by regularizing the thermal infrared similar vector 1, the thermal infrared similar vectors 2 and … … and the thermal infrared similar vector N, the addressing vectors comprise addressing vector 1, addressing vectors 2 and … … and addressing vector N, the visible light key value memory is an input visible light key interest region feature vector determined by the visible light key interest region feature vector according to the addressing vectors, so as to correlate the perception content of the visible light small target, and the final output is obtained by flattening the features in the visible light key value memory and using weighted summation operation;
as shown in fig. 4, the visible light visual perception correlator is composed of a visible light key value memory, a visible light similarity vector, a regularization layer, an addressing vector and a thermal infrared key value memory, wherein the input characteristic is a visible light region of interest characteristic, the output characteristic is a correlation thermal infrared region of interest characteristic, the thermal infrared key value memory is a characteristic vector of the input visible light region of interest, the visible light similarity vector is obtained by cosine similarity calculation of the visible light key value memory, the addressing vector is obtained by processing the visible light similarity vector through the regularization layer, the thermal infrared key value memory is a characteristic vector of the thermal infrared region of interest, which is determined by the thermal infrared region of interest in the input visible light visual perception correlator according to the addressing vector, the visible light key value memory is a characteristic vector which divides the input visible light region of interest into N equal parts, and flattens the input visible light characteristic, the visible light key value memory comprises a thermal infrared key value 1, thermal infrared key values 2 and … … and a thermal infrared key value N, and a visible light similarity vector is obtained by measuring cosine similarity between each part of visible light characteristic and flattened visible light characteristic in the visible light key value memory, so that the visible light similarity vector comprises a visible light similarity vector 1, a visible light similarity vector 2 and … … and a visible light similarity vector N, the addressing vector is obtained by regularization treatment of the visible light similarity vector 1, the visible light similarity vector 2 and … … and the visible light similarity vector N, the addressing vector comprises an addressing vector 1, an addressing vector 2 and … … and the addressing vector N, the thermal infrared key value memory is an input thermal infrared interest region feature vector determined according to the addressing vector, thereby correlating the perception content of a thermal infrared small target, the final output is obtained by flattening the features in the thermal infrared key value memory and using a weighted summation operation;
as shown in fig. 9, the output consists of a detection head, and has one input and three outputs, the input is obtained by adding the outputs of a visible light visual sensor and a thermal infrared visual sensor in a visual sense correlator, the added result can be obtained through the detection head to be corresponding to the output, namely category output, regression output and object output, wherein the detection head consists of five convolution blocks, namely a convolution block five, a convolution block six, a convolution block seven, a convolution block eight and a convolution block nine, the convolution block five, the convolution block six, the convolution block seven, the convolution block eight and the convolution block nine share one structure, and as shown in fig. 10, the structure consists of a convolution layer four, a regularization layer and an activation function which are sequentially connected, the convolution core of the convolution layer four is 3×3, and the step length is 1;
step 2, a thermal infrared small target detection data set is established, a visible light camera and a thermal infrared camera acquire K visible light small target images and K corresponding thermal infrared small target images of different targets, then targets in each thermal infrared image and each visible light image are marked, M thermal infrared images, visible light images and labels of each image form a training sample set R, and the rest K-M thermal infrared images, visible light images and labels of each image form a test sample set E, wherein K is more than or equal to 1000, H is more than or equal to 256, and M is more than or equal to 4;
training a thermal infrared small target detection network, carrying out image enhancement on the fused pictures in the step 2, carrying out random diffraction transformation on the same picture, cutting the same picture to the size of an input picture, taking the input picture as the input of the whole network, and marking the fused picture, wherein the random size and the random position can be realized through a software algorithm;
step 4, selecting a minimized loss function, wherein the output of the network and the label calculate the loss function to achieve a better detection effect by minimizing the loss function, the loss function selects the combination of classification loss and regression loss, each loss is subdivided into classification loss and regression box loss, and for the loss function, the total loss is calculated byAnd L Reg Indicating (I)>Method of using the Dice loss function and cross entropy, and calculating classification loss value by each target detection result generated by each thermal infrared image and visible light image sample and corresponding label in training sample set, L Reg The calculation of (1) is to calculate the small target regression frame loss value through each small target detection result generated by each thermal infrared image and visible light image sample and one label in the corresponding training sample set, and in the back propagation process, the parameters of multiple classifications and regression frames are shared, so as not to causeThe same feature map learns more semantic information beyond the tag information, and the total loss function is defined as:
wherein,representing thermal infrared and visible light output images obtained after training, and y represents the labels of thermal infrared images and visible light image samples in a training sample set, +.>The method comprises the steps of representing a thermal infrared small target regression frame predicted value and a visible light small target regression frame predicted value, wherein x represents a label of the thermal infrared small target regression frame predicted value in a training sample set;
setting the training frequency as 300, wherein the number of the network pictures input each time is 10, the upper limit of the number of the network pictures input each time is mainly determined according to the performance of a computer graphics processor, generally, the larger the number of the network pictures input each time is, the better the network is, the more stable the network is, the learning rate in the training process is set to be 0.0001, the quick fitting of the network can be ensured, the network is not over-fitted, the network parameter optimizer selects the adaptive moment estimation algorithm, and the network parameter optimizer has the advantages that after bias correction, the learning rate has a certain determination range each time, so that parameters are stable, the threshold value of a loss function value is set to be about 0.0003, and the training of the whole network can be considered to be basically completed if the threshold value of the loss function value is smaller than 0.0003;
step 5, fine tuning the model, namely fine tuning the parameters of the whole network model by using a SIRST data set of a thermal infrared small target detection data set to obtain stable available model parameters, further improving the thermal infrared small target detection capability of the model, and finally enabling the model to have a better thermal infrared small target detection effect;
step 6, saving the model, after the network training is finished, saving all parameters in the network, inputting the registered thermal infrared and visible light small target images into the network to obtain a detection result, wherein the network has no requirement on the sizes of two input images, any size can be achieved, but the sizes of the two images are required to be consistent,
the implementation of the convolution layer, activation function, regularization layer, RPN and RoI are algorithms well known to those skilled in the art, and the specific procedures and methods can be referred to in the corresponding textbook or technical literature,
the invention further verifies the feasibility and superiority of the method by constructing a thermal infrared small target detection method based on a visual perception correlator and calculating the related indexes of images obtained by the thermal infrared-visible light small target image detection graph through the weak targets in the thermal infrared-visible light small target image detection graph, the related indexes of the prior art and the method provided by the invention have higher precision AP and Recall rate Recall and lower loss rate MR than those shown in FIG. 11, and the indexes further illustrate that the method provided by the invention has better thermal infrared small target detection effect.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A thermal infrared small target detection method based on a visual perception correlator is characterized by comprising the following steps of: the method comprises the following steps:
step 1, constructing a thermal infrared small target detection network: the whole network comprises an input image, an area of interest, a visual perception correlator and an output four part;
the input image includes a thermal infrared image, a visible light image;
the method comprises the steps that an interested region consists of three parts of feature extraction, suggestion generation and interested region generation, two branches are respectively formed, the two branches are respectively provided with one input and one output, the input of the first branch is a thermal infrared image, the thermal infrared image sequentially passes through the feature extraction and the interested region generation, and sequentially passes through the suggestion generation and the interested region generation in parallel during feature extraction to obtain a corresponding output of the thermal infrared image interested region, the input of the second branch is a visible light image, the visible light image sequentially passes through the feature extraction and the interested region generation, and the corresponding output is a visible light image interested region;
the visual perception correlator comprises a thermal infrared visual perception correlator and a visible light visual perception correlator; the thermal infrared visual perception correlator consists of a thermal infrared key value memory, a thermal infrared similar vector, a regularization layer, an addressing vector and a visible light key value memory; the visible light visual perception correlator consists of a visible light key value memory, a visible light similarity vector, a regularization layer, an addressing vector and a thermal infrared key value memory; the thermal infrared visual perception correlator and the visible light visual perception correlator correlate the thermal infrared image region of interest and the visible light image region of interest to obtain correlated thermal infrared and visible light image interesting features;
the detection head consists of a convolution block five, a convolution block six, a convolution block seven, a convolution block eight and a convolution block nine, wherein the five convolution blocks have the same structure, and a detection result is obtained by using convolution, regularization and an activation function on the related image interesting characteristics;
step 2, establishing a thermal infrared small target detection data set: obtaining K visible light small target images and K corresponding thermal infrared small target images of different types of targets by a visible light camera and a thermal infrared camera, and marking targets in each thermal infrared image and each visible light image;
step 3: training a thermal infrared small target detection network: inputting the data set prepared in the step 2 into the network model constructed in the step 1 for training;
step 4, selecting a minimized loss function: outputting a loss function of the image and the label through a minimized network, considering that the model parameters are pre-trained and finishing until the training times reach a set threshold value or the value of the loss function reaches a set range, and storing the model parameters; simultaneously selecting an optimal evaluation index to measure the accuracy of the algorithm and evaluating the performance of the system;
step 5, fine tuning the model: training and fine-tuning the model by using a public thermal infrared small target detection data set to obtain stable and usable model parameters, and finally enabling the model to have a better fusion effect;
step 6, storing the optimal model: and solidifying the finally determined model parameters, and directly inputting the image to be detected into a network to obtain a final thermal infrared small target detection result when the thermal infrared small target detection operation is needed.
2. The visual perception correlator-based thermal infrared small target detection method according to claim 1, wherein the method comprises the following steps of: in the step 1, the region of interest is generated, and the generation adopts the RoI and RPN structures.
3. The visual perception correlator-based thermal infrared small target detection method according to claim 1, wherein the method comprises the following steps of: the characteristic extraction part consists of a first convolution block, a second convolution block, a third convolution block, a fourth convolution block, a first space channel convolution block, a second space channel convolution block and a third space channel convolution block, and comprises an input and an output, wherein the input sequentially passes through the first convolution block, the second convolution block, the first space channel convolution block, the second space channel convolution block and the third space channel convolution block; meanwhile, a residual structure is formed by the third convolution block and the first convolution block, a residual structure is formed by the fourth convolution block, the first convolution block and the second convolution block in series connection, and the first space channel convolution block, the second space channel convolution block and the third space channel convolution block share one structure, wherein the convolution kernel of the first convolution layer is 1 multiplied by 1, and the step length is 1.
4. The visual perception correlator-based thermal infrared small target detection method according to claim 1, wherein the method comprises the following steps of: the thermal infrared visual perception correlator consists of a thermal infrared key value memory, a thermal infrared similar vector, a regularization layer, an addressing vector and a visible light key value memory, wherein the input characteristic is a thermal infrared interest region characteristic, the output characteristic is a correlation visible light image interest region characteristic, the thermal infrared key value memory is a characteristic vector of the input thermal infrared interest region, the thermal infrared similar vector is obtained by cosine similarity calculation of the thermal infrared key value memory, the addressing vector is obtained by processing the thermal infrared similar vector through the regularization layer, and the visible light key value memory is a characteristic vector of the visible light interest region determined by the visible light interest region in the input thermal infrared visual perception correlator according to the addressing vector.
5. The visual perception correlator-based thermal infrared small target detection method according to claim 1, wherein the method comprises the following steps of: the visible light visual perception correlator consists of a visible light key value memory, a visible light similarity vector, a regularization layer, an addressing vector and a thermal infrared key value memory, wherein the input characteristic is a visible light region of interest characteristic, the output characteristic is a correlation thermal infrared region of interest characteristic, the visible light key value memory is a characteristic vector for inputting the visible light region of interest, the visible light similarity vector is obtained by cosine similarity calculation of the visible light key value memory, the addressing vector is obtained by processing the visible light similarity vector through the regularization layer, and the thermal infrared key value memory is a characteristic vector of the thermal infrared region of interest which is determined according to the addressing vector and is input into the thermal infrared visual perception correlator.
6. The visual perception correlator-based thermal infrared small target detection method according to claim 1, wherein the method comprises the following steps of: the detection head consists of a convolution block five, a convolution block six, a convolution block seven, a convolution block eight and a convolution block nine, wherein the convolution block five, the convolution block six, the convolution block seven, the convolution block eight and the convolution block nine share a structure, the structure consists of a convolution layer four, a regularization layer and an activation function which are sequentially connected, the convolution kernel is 3 multiplied by 3, and the step length is 1.
CN202211702320.2A 2022-12-29 2022-12-29 Thermal infrared small target detection method based on visual perception correlator Active CN116468928B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211702320.2A CN116468928B (en) 2022-12-29 2022-12-29 Thermal infrared small target detection method based on visual perception correlator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211702320.2A CN116468928B (en) 2022-12-29 2022-12-29 Thermal infrared small target detection method based on visual perception correlator

Publications (2)

Publication Number Publication Date
CN116468928A CN116468928A (en) 2023-07-21
CN116468928B true CN116468928B (en) 2023-12-19

Family

ID=87175979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211702320.2A Active CN116468928B (en) 2022-12-29 2022-12-29 Thermal infrared small target detection method based on visual perception correlator

Country Status (1)

Country Link
CN (1) CN116468928B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN112418163A (en) * 2020-12-09 2021-02-26 北京深睿博联科技有限责任公司 Multispectral target detection blind guiding system
CN113610167A (en) * 2021-08-10 2021-11-05 宿迁旺春机械制造有限公司 Equipment risk detection method based on metric learning and visual perception
CN114359626A (en) * 2021-12-15 2022-04-15 安徽大学 Visible light-thermal infrared obvious target detection method based on condition generation countermeasure network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN112418163A (en) * 2020-12-09 2021-02-26 北京深睿博联科技有限责任公司 Multispectral target detection blind guiding system
CN113610167A (en) * 2021-08-10 2021-11-05 宿迁旺春机械制造有限公司 Equipment risk detection method based on metric learning and visual perception
CN114359626A (en) * 2021-12-15 2022-04-15 安徽大学 Visible light-thermal infrared obvious target detection method based on condition generation countermeasure network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A novel visible-depth-thermal image dataset of salient object detection for robotic visual perception;Kechen Song;《IEEE TRANSACTION 2022》;第1558-1569页 *
基于注意力机制的红外目标检测方法;顾星;《激光与光电子学进展》;第293-300页 *

Also Published As

Publication number Publication date
CN116468928A (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN113065558B (en) Lightweight small target detection method combined with attention mechanism
CN112270249B (en) Target pose estimation method integrating RGB-D visual characteristics
CN110135366B (en) Shielded pedestrian re-identification method based on multi-scale generation countermeasure network
Chen et al. MFFN: An underwater sensing scene image enhancement method based on multiscale feature fusion network
CN110458165B (en) Natural scene text detection method introducing attention mechanism
CN112800964B (en) Remote sensing image target detection method and system based on multi-module fusion
CN106530271B (en) A kind of infrared image conspicuousness detection method
CN111612817A (en) Target tracking method based on depth feature adaptive fusion and context information
CN115908772A (en) Target detection method and system based on Transformer and fusion attention mechanism
CN106780727B (en) Vehicle head detection model reconstruction method and device
CN114782298B (en) Infrared and visible light image fusion method with regional attention
CN113361466B (en) Multispectral target detection method based on multi-mode cross guidance learning
CN114926722A (en) Method and storage medium for detecting scale self-adaptive target based on YOLOv5
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
CN113610905A (en) Deep learning remote sensing image registration method based on subimage matching and application
CN117671509A (en) Remote sensing target detection method and device, electronic equipment and storage medium
CN114821356B (en) Optical remote sensing target detection method for accurate positioning
CN115620207A (en) Infrared pedestrian detection method based on attention mechanism
CN118314353B (en) Remote sensing image segmentation method based on double-branch multi-scale feature fusion
CN116468928B (en) Thermal infrared small target detection method based on visual perception correlator
CN111582057B (en) Face verification method based on local receptive field
CN116452408A (en) Transparent liquid sensing method based on style migration
CN114842506A (en) Human body posture estimation method and system
CN114581353A (en) Infrared image processing method and device, medium and electronic equipment
CN113379787A (en) Target tracking method based on 3D convolution twin neural network and template updating

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant