CN115035003A - Infrared and visible light image anti-fusion method for interactively compensating attention - Google Patents

Infrared and visible light image anti-fusion method for interactively compensating attention Download PDF

Info

Publication number
CN115035003A
CN115035003A CN202210376347.0A CN202210376347A CN115035003A CN 115035003 A CN115035003 A CN 115035003A CN 202210376347 A CN202210376347 A CN 202210376347A CN 115035003 A CN115035003 A CN 115035003A
Authority
CN
China
Prior art keywords
attention
infrared
image
interactive
visible light
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210376347.0A
Other languages
Chinese (zh)
Inventor
王志社
邵文禹
陈彦林
杨帆
孙婧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiyuan University of Science and Technology
Original Assignee
Taiyuan University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiyuan University of Science and Technology filed Critical Taiyuan University of Science and Technology
Priority to CN202210376347.0A priority Critical patent/CN115035003A/en
Publication of CN115035003A publication Critical patent/CN115035003A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an infrared and visible light image anti-fusion method for interactively compensating attention. According to the invention, a multi-scale encoder-decoder network with triple paths is constructed in an interactive compensation generator, and an infrared path and a visible light path provide extra strength and gradient information for a connecting path under the action of an interactive attention module and a compensation attention module of the multi-scale encoder-decoder network, so that more prominent infrared targets and abundant texture details can be reserved in a fused image, the capability of feature extraction and feature reconstruction is enhanced, and an obtained attention feature map focuses on infrared image target perception and visible light image texture detail representation; during training, the interactive compensation generator is optimized through the double discriminators, and the double discriminators can be used for more uniformly constraining the data distribution similarity between the fusion result and the source image so that the interactive compensation generator can generate a more balanced fusion result.

Description

Infrared and visible light image anti-fusion method for interactively compensating attention
Technical Field
The invention relates to the technical field of image processing, in particular to an infrared and visible light image confrontation fusion method for interactively compensating attention.
Background
The infrared and visible light image fusion aims at integrating the advantages of the two types of sensors, and the fusion image generated by complementation has better target perception and scene expression and is beneficial to human eye observation and subsequent calculation processing. Infrared sensors sensitive to heat source radiation can capture salient target area information, but the infrared images obtained typically lack structural features and textural detail. On the contrary, the visible light sensor can acquire rich scene information and texture details through light reflection imaging, and the visible light image has higher spatial resolution and rich texture details, but cannot effectively highlight the characteristics of the target, is easily influenced by the external environment, and is seriously lost particularly under the low-illumination environmental condition. Because of the difference of the infrared imaging mechanism and the visible light imaging mechanism, the two types of images have stronger complementary information, the cooperative detection capability of the infrared imaging sensor and the visible light imaging sensor can be effectively improved only by applying the fusion technology, and the method and the device are widely applied to the fields of remote sensing detection, medical diagnosis, intelligent driving, safety monitoring and the like.
Currently, infrared and visible light image fusion techniques can be broadly classified into conventional fusion methods and deep learning fusion methods. In the conventional image fusion method, image features are usually extracted by the same feature transformation or feature representation, and are combined by adopting a proper fusion rule, and then a final fusion image is obtained by inverse transformation reconstruction. Since infrared and visible light sensor imaging mechanisms are different, infrared images characterize target features in terms of pixel brightness, while visible light images characterize scene texture in terms of edges and gradients. The traditional fusion method does not consider the inherent different characteristics of the source images, adopts the same transformation or representation model to indiscriminately extract the image characteristics, and inevitably causes the results of low fusion performance and poor visual effect. In addition, the fusion rule is set manually, and is more and more complex, the calculation cost is high, and the practical application of image fusion is limited.
In recent years, the convolution operation has strong feature extraction capability and can learn the model building parameters from a large amount of data, so that the fusion method based on deep learning has a satisfactory effect. Nevertheless, there are some disadvantages. Firstly, the methods blindly rely on convolution operation to extract image features, and do not consider the interaction of the internal features of the two types of images, so that the local feature extraction capability is insufficient, and the target brightness reduction and the texture detail blurring of image fusion are easily caused. Secondly, the methods completely depend on convolution operation to extract the local features of the images, the global dependency of the image features is not considered, the global feature information of the images cannot be effectively extracted, and the loss of the global feature information of the fused images is easy to cause.
In summary, there is an urgent need for a method capable of extracting local and global features of two types of images simultaneously, effectively enhancing the characterization capability of depth features, suppressing irrelevant information when enhancing useful information, and further improving the fusion performance of infrared and visible light images.
Disclosure of Invention
The invention provides an infrared and visible light image anti-fusion method capable of interactively compensating attention, and aims to solve the technical problems that an unbalanced fusion result is easily caused because the existing deep learning fusion method only extracts local features of images, cannot model the local feature interaction relationship and the global feature compensation relationship of two types of images, and namely, the fused images cannot effectively retain infrared typical targets and visible texture details at the same time. The technical scheme is as follows:
an infrared and visible image anti-fusion method for interactively compensating attention, comprising:
s1, determining the triple paths of the infrared path and the visible light path corresponding to the infrared image to be fused and the visible light image to be fused, and the connection path obtained by the channel connection of the infrared image to be fused and the visible light image to be fused as the input of a pre-trained interactive compensation generator, wherein the interactive compensation generator establishes a multi-scale coding-decoding network framework of the triple paths, and the multi-scale coding-decoding network framework comprises an interactive attention coding network, a fusion layer and a compensation attention decoding network;
s2, extracting the multi-scale depth features of the triple paths through convolution layers with 4 convolution kernels of 3 x3 adopted by the interactive attention coding network, wherein the first convolution layer and the second convolution layer of the interactive attention coding network are convolutions with the step length of 1 and used for extracting the shallow layer features of the image, the third convolution layer and the fourth convolution layer are convolutions with the step length of 2 and used for extracting the multi-scale depth features of the image, and the shallow layer features and the multi-scale depth features are subjected to three-level interactive attention action to obtain a final interactive attention map;
s3, directly connecting the final interactive attention map with the compensation attention map obtained by the fourth convolution layer of the infrared path and the visible light path through the fusion layer to obtain a fused attention feature map;
s4, reconstructing the convolutional layer with the 4 convolutional kernels adopted by the compensatory attention decoding network being 3 x3, wherein the first convolutional layer and the second convolutional layer of the compensatory attention decoding network are accompanied by an upsampling operation; and performing channel connection on the fused attention feature map and the infrared path compensation attention map and the visible path compensation attention map of the corresponding scale through up-sampling operation and convolution operation of the first convolution layer to obtain a fused image.
Optionally, the number of input channels of the four convolutional layers of the infrared path and the visible light path of the interactive attention coding network is 1, 16, and 32, respectively, the number of output channels is 16, 32, and 64, respectively, the number of input channels of the four convolutional layers of the connection path is 2, 16, 64, and 128, the number of output channels is 16, 32, and 64, respectively, and the activation function is a prime lu; starting from the second convolutional layer, the characteristics of the infrared path and the visible light path are respectively connected with the characteristics of the connecting path in a channel way, and the channel is recorded as phi m And phi n Then inputting an interactive attention module of the interactive attention coding network to generate an interactive attention fusion mapIs marked as phi F
Optionally, the number of input channels of the four convolutional layers of the complementary attention decoding network is 384, 192, 96 and 32, respectively, the number of output channels is 128, 64, 32 and 1, respectively, and the activation function is PReLU.
Optionally, the interactive attention module is to input the feature Φ m And phi n ∈R H×W×C Firstly, respectively mapping depth features to channel vectors by using global average pooling operation and maximum pooling operation in a channel attention model, performing channel connection on output feature vectors after passing through two convolutional layers and a PReLU active layer, and inputting the output feature vectors to the convolutional layers and the Sigmod active layer to obtain an initial channel weighting coefficient
Figure BDA0003589223530000031
And
Figure BDA0003589223530000032
are respectively represented as
Figure BDA0003589223530000033
And
Figure BDA0003589223530000034
wherein Conv represents convolution operation, Con represents channel connection operation, AP (-) and MP (-) represent global average pooling operation and maximum pooling operation, respectively, σ and δ represent PReLU and Sigmod activation functions, H and W represent height and width of image, respectively, and C represents input channel number;
then, Softmax operation is adopted to obtain the corresponding final channel weighting coefficient, namely
Figure BDA0003589223530000035
And
Figure BDA0003589223530000036
are respectively represented as
Figure BDA0003589223530000037
And
Figure BDA0003589223530000038
multiplying the final channel weighting coefficient with the respective input characteristics to obtain the corresponding channel interaction ideogram diagram
Figure BDA0003589223530000039
And
Figure BDA00035892235300000310
respectively expressed as:
Figure BDA00035892235300000311
and
Figure BDA00035892235300000312
then, taking the corresponding channel interactive attention diagram as the input of a space attention model, performing global average pooling operation and maximum pooling operation, performing channel connection on the output space characteristic diagram, inputting the convolution layer and the Sigmod activation layer to obtain respective initial space weighting coefficients
Figure BDA00035892235300000313
And
Figure BDA00035892235300000314
are respectively represented as
Figure BDA00035892235300000315
And
Figure BDA00035892235300000316
then, a final spatial weighting coefficient is obtained by utilizing Softmax operation
Figure BDA0003589223530000041
And
Figure BDA0003589223530000042
are respectively provided withIs shown as
Figure BDA0003589223530000043
And
Figure BDA0003589223530000044
multiplying the final space weighting coefficient with the corresponding channel attention diagram to obtain the corresponding space interaction attention diagram
Figure BDA0003589223530000045
And
Figure BDA0003589223530000046
are respectively represented as
Figure BDA0003589223530000047
And
Figure BDA0003589223530000048
finally, the space interactive attention diagrams of the two are connected in a channel mode to obtain an interactive attention fusion diagram phi F Is shown as
Figure BDA0003589223530000049
Optionally, the attention compensation module is used for inputting infrared image characteristics or visible light image characteristics phi m ∈R H×W×C Firstly, a global average pooling operation and a maximum pooling operation are used for converting feature mapping to a channel vector in a channel attention model, after passing through two convolutional layers and a PReLU active layer, output feature vectors are subjected to channel connection and input into the convolutional layers and the Sigmod active layer to obtain a channel weighting coefficient
Figure BDA00035892235300000410
Is shown as
Figure BDA00035892235300000411
H and W denote the height and width of the image, respectively, and C denotes the input channelCounting;
then, multiplying the channel weighting coefficient with the input characteristic to obtain the corresponding channel attention chart
Figure BDA00035892235300000412
Is shown as
Figure BDA00035892235300000413
Then, taking the channel attention diagram as the input of a space attention model, performing global average pooling operation and maximum pooling operation, performing channel connection on the output space characteristic diagram, inputting the convolution layer and the Sigmod activation layer to obtain a space weighting coefficient
Figure BDA00035892235300000414
Is shown as
Figure BDA00035892235300000415
Finally, the spatial weighting coefficient is multiplied with the input channel attention diagram to obtain the corresponding spatial attention diagram
Figure BDA00035892235300000416
Is shown as
Figure BDA00035892235300000417
Optionally, the S1 is preceded by:
s01, constructing an interactive compensation generator: establishing a three-path multi-scale coding-decoding network framework by taking an infrared path, a visible light path and a connecting path for connecting an infrared image and a visible light image channel as input, wherein the three-path multi-scale coding-decoding network framework comprises an interactive attention coding network, a fusion layer and a compensation attention decoding network and is used for generating an initial fusion image;
the interactive attention coding network respectively adopts 4 convolution layers with convolution kernels of 3 multiplied by 3 to extract the multi-scale depth features of the triple paths, wherein the first convolution layer and the second convolution layer are convolutions with step length of 1 and are used for extracting the shallow layer of the imageThe third convolution layer and the fourth convolution layer are convolutions with the step length of 2 and are used for extracting the multi-scale depth features of the image; the number of input channels of the four convolution layers of the infrared path and the visible light path is respectively 1, 16 and 32, the number of output channels is respectively 16, 32 and 64, the number of input channels of the four convolution layers of the connecting path is respectively 2, 16, 64 and 128, the number of output channels is respectively 16, 32 and 64, and the activation function is PReLU; starting from the second convolution layer, the characteristics of the infrared path and the visible light path are respectively connected with the characteristics of the connection path in a channel manner, and are marked as phi m And phi n Then inputting the interactive attention module to generate an interactive attention fusion graph, which is marked as phi F (ii) a Obtaining a final interactive attention map after three-level interactive attention action;
the fusion layer directly connects the final interactive attention map with the compensation attention map of the fourth convolution layer of the infrared path and the visible light path through a channel to obtain a fusion attention feature map;
the compensation attention decoding network respectively adopts convolution layers with 4 convolution kernels of 3 multiplied by 3 to reconstruct characteristics, wherein the first convolution layer and the second convolution layer are accompanied by an up-sampling operation; the number of input channels of the four convolutional layers is 384, 192, 96 and 32 respectively, the number of output channels is 128, 64, 32 and 1 respectively, and the activation function is PReLU; the fused attention feature map is subjected to up-sampling operation and first-layer convolution, and the obtained output is in channel connection with the infrared path compensation attention map and the visible path compensation attention map of the corresponding scales, so that an initial fused image is finally obtained;
s02, constructing a double discriminator model comprising an infrared discriminator and a visible light discriminator; in the training process, inputting the initial fusion image obtained by the interactive compensation generator into a corresponding discriminator together with the infrared image and the visible light image so as to restrict the fusion image to have similar data distribution with the infrared image and the visible light image respectively; when the confrontation game of the interactive compensation generator, the infrared discriminator and the visible light discriminator reaches balance, a final fusion result is obtained;
the infrared discriminator and the visible light discriminator have the same network structure and are composed of 4 convolutional layers and 1 full-connection layer, all convolutional layers adopt 3 x3 kernel size and LeakyRelu activation function, the step length is 2, the input channels of the corresponding convolutional layers are respectively 1, 16, 32 and 64, and the number of output channels is respectively 16, 32, 64 and 128;
s03, training a network model: taking the infrared image and the visible light image as training data sets, and adopting a loss function representing the pixel intensity of the infrared image and the edge gradient of the visible light image to supervise network model training to obtain optimal network model parameters;
the loss function comprises an interaction compensation generator loss function and a discriminator loss function; in the interactive compensation generator, the loss function is formed by a competing loss function L adv And a content loss function L con Composition, represented as L G =L adv +L con (ii) a The content loss function of the interaction compensation generator may be expressed as
Figure BDA0003589223530000061
Wherein H and W respectively represent the height and width of the image, | · | | purple F And | · | non-conducting phosphor 1 Representing the Frobenius norm, the L1 norm,
Figure BDA0003589223530000062
representing gradient operators, I f Representing the initial fused image, I ir Representing an infrared image, I vis Representing a visible light image; in the infrared discriminator and the visible light discriminator, the resistance loss function is expressed as
Figure BDA0003589223530000063
N represents the number of training images; meanwhile, the respective loss functions of the infrared discriminator and the visible light discriminator are respectively expressed as
Figure BDA0003589223530000064
And
Figure BDA0003589223530000065
wherein λ is a regularization parameter, | | · | computation circuitry 2 Represents the L2 norm; first item representationThe wasserstein distance between the fusion result and the infrared or visible light image, and the second term is a gradient penalty for limiting the learning ability of the infrared discriminator and the visible light discriminator.
Optionally, the training data set adopts 25 groups of infrared and visible light images of the TNO data set, and uses a sliding window with a step size of 12 to segment the original image into a size of 128 × 128, and convert the gray value range into [ -1,1], and finally obtain 18813 groups of images as the training set;
in the training process, an Adam optimizer is used for updating network model parameters, and the Batchsize and Epoch are respectively set to be 4 and 16; the learning rates of the interactive compensation generator and the discriminator are set to 1 × 10, respectively -4 And 4X 10 -4 The corresponding iteration times are set to 1 and 2, respectively; the regularization parameter λ is set to 10.
By means of the scheme, the invention has the following characteristics:
1. in the interactive compensation generator, a multi-scale encoder-decoder network with a triple path is constructed. The infrared path and the visible light path provide additional intensity and gradient information for the connection path under the action of the interaction attention module and the compensation attention module of the multi-scale encoder-decoder network, so that more prominent infrared targets and rich texture details can be reserved in the fused image.
2. According to the invention, an interactive attention module and a compensation attention module are developed to transfer path characteristics, global characteristics are modeled from channels and space dimensions, the characteristics extraction and characteristics reconstruction capabilities are enhanced, and the obtained attention characteristic diagram focuses on infrared image target perception and visible light image texture detail representation.
3. The invention designs the double discriminators comprising the infrared discriminator and the visible light discriminator when training the interactive compensation generator, optimizes the interactive compensation generator through the double discriminators, and can more uniformly restrict the similarity of data distribution between the fusion result and the source image by using the infrared discriminator and the visible light image discriminator, so that the interactive compensation generator generates a more balanced fusion result and obtains more similar pixel distribution and more detailed texture detail information from the source image.
4. The invention provides an end-to-end (namely, the pre-training network model is the same as the testing network model, and no additional fusion rule needs to be added in the testing network model) infrared image and visible light image generation confrontation fusion method, the fusion effect is obviously improved, the method can also be applied to the fusion of multi-mode images, multi-focus images and medical images, and has high application value in the field of image fusion.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a schematic diagram of a process of fusing an infrared image to be fused and a visible light image to be fused through an interactive attention coding network, a fusion layer and a compensation attention decoding network.
FIG. 3 is a data processing process diagram of the interaction intent module.
FIG. 4 is a data processing diagram of the compensate attention module.
FIG. 5 is a schematic diagram of a training process of the interactive compensation generator.
FIG. 6 is a schematic diagram of a comparison of the first set of Solider _ with _ jeep fusion results.
FIG. 7 is a comparison of the second set of Street fusion results.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention, but are not intended to limit the scope of the invention.
As shown in FIG. 1, the present invention provides a method for interactively compensating attention of anti-fusion of infrared and visible light images, which comprises:
and S1, determining the triple paths of the infrared path and the visible light path corresponding to the infrared image to be fused and the visible light image to be fused respectively, and the connection path obtained by channel connection of the infrared image to be fused and the visible light image to be fused as the input of a pre-trained interactive compensation generator, wherein the interactive compensation generator establishes a multi-scale coding-decoding network framework of the triple paths, and the multi-scale coding-decoding network framework comprises an interactive attention coding network, a fusion layer and a compensation attention decoding network.
S2, extracting the multi-scale depth features of the triple paths through convolution layers with 4 convolution kernels of 3 x3 adopted by the interactive attention coding network, wherein the first convolution layer and the second convolution layer of the interactive attention coding network are convolutions with the step length of 1 and used for extracting the shallow layer features of the image, the third convolution layer and the fourth convolution layer are convolutions with the step length of 2 and used for extracting the multi-scale depth features of the image, and the shallow layer features and the multi-scale depth features are subjected to three-level interactive attention action to obtain the final interactive attention map.
The number of input channels of the four convolutional layers of the infrared path and the visible light path of the interactive attention coding network is respectively 1, 16 and 32, the number of output channels is respectively 16, 32 and 64, the number of input channels of the four convolutional layers of the connection path is respectively 2, 16, 64 and 128, the number of output channels is respectively 16, 32 and 64, and the activation function is PReLU; starting from the second convolutional layer, the characteristics of the infrared path and the visible light path are respectively connected with the characteristics of the connection path (corresponding to C in fig. 2 to 5), and are recorded as Φ m And phi n Then inputting the interactive attention module (Inter _ Att in FIG. 2) of the interactive attention coding network, generating an interactive attention fusion graph, which is marked as phi F
And S3, directly connecting the final interactive attention map with the compensation attention map obtained by the fourth convolution layer of the infrared path and the visible light path through the fusion layer to obtain a fused attention feature map.
S4, reconstructing the convolutional layer with a 3 × 3 convolutional layer reconstruction characteristic by using 4 convolutional kernels adopted by the compensatory attention decoding network, wherein the first convolutional layer and the second convolutional layer of the compensatory attention decoding network are accompanied by an Upsampling operation (Upsampling in fig. 2); and performing channel connection on the fused attention feature map and the infrared path compensation attention map and the visible path compensation attention map of the corresponding scale through up-sampling operation and convolution operation of the first convolution layer to obtain a fused image.
The number of input channels of the four convolutional layers of the complementary attention decoding network is 384, 192, 96 and 32, the number of output channels is 128, 64, 32 and 1, and the activation function is PReLU. In the compensation attention decoding network, different scale features obtained by an infrared path and a visible light path in the interactive attention coding network through a compensation attention module (Comp _ Att in fig. 2) are in channel connection with corresponding scale features of a connection path, and reconstruction of a feature map is completed along with an up-sampling operation to obtain an initial fusion image. The infrared path and the visible light path provide additional strength and gradient information for the connection path, improving feature decoding capability.
Fig. 2 is a schematic diagram of a process of fusing an infrared image to be fused and a visible light image to be fused through an interactive attention coding network, a fusion layer, and a compensation attention decoding network. Conv In fig. 2 represents convolution operation, k3 represents convolution kernel of 3 × 3, s1 represents convolution with step size of 1, In16 represents number of output channels of 16, and the rest of parameters In fig. 2 are the same.
Optionally, as shown in FIG. 3, the interactive attention module, for an input feature Φ m And phi n ∈R H×W×C Firstly, respectively mapping depth features to channel vectors by using global average pooling operation and maximum pooling operation in a channel attention model, performing channel connection on output feature vectors after passing through two convolutional layers and a PReLU active layer, and inputting the output feature vectors to the convolutional layers and the Sigmod active layer to obtain an initial channel weighting coefficient
Figure BDA0003589223530000091
And
Figure BDA0003589223530000092
are respectively represented as
Figure BDA0003589223530000093
And
Figure BDA0003589223530000094
wherein Conv represents convolution operation, Con represents channel connection operation, AP (-) and MP (-) represent global average pooling operation and maximum pooling operation, respectively, σ and δ represent PReLU and Sigmod activation functions, H and W represent height and width of image, respectively, and C represents input channel number;
then, Softmax operation is adopted to obtain the corresponding final channel weighting coefficient, namely
Figure BDA0003589223530000095
And
Figure BDA0003589223530000096
are respectively represented as
Figure BDA0003589223530000097
And
Figure BDA0003589223530000098
multiplying the final channel weighting coefficient with the respective input characteristics to obtain the corresponding channel interactive attention diagram
Figure BDA0003589223530000099
And
Figure BDA00035892235300000910
respectively expressed as:
Figure BDA00035892235300000911
and
Figure BDA00035892235300000912
then, taking the corresponding channel interactive attention diagram as the input of a space attention model, performing global average pooling operation and maximum pooling operation, performing channel connection on the output space characteristic diagram, inputting the convolution layer and the Sigmod activation layer to obtain respective initial space weighting coefficients
Figure BDA00035892235300000913
And
Figure BDA00035892235300000914
are respectively represented as
Figure BDA00035892235300000915
And
Figure BDA00035892235300000916
then, a final spatial weighting coefficient is obtained by utilizing Softmax operation
Figure BDA00035892235300000917
And
Figure BDA00035892235300000918
are respectively represented as
Figure BDA00035892235300000919
And
Figure BDA00035892235300000920
multiplying the final space weighting coefficient with the corresponding channel attention diagram to obtain the corresponding space interaction attention diagram
Figure BDA0003589223530000101
And
Figure BDA0003589223530000102
are respectively represented as
Figure BDA0003589223530000103
And
Figure BDA0003589223530000104
finally, the space interactive attention maps of the two are connected through a channel to obtain an interactive attention fusion map phi F Is shown as
Figure BDA0003589223530000105
Optionally, as shown in FIG. 4, the compensation attention module is configured to compensate for an input infrared image feature or a visible light image feature Φ m ∈R H×W×C Firstly, a global average pooling operation and a maximum pooling operation are used for converting feature mapping into a channel vector in a channel attention model, after the channel vector passes through two convolutional layers and a PReLU active layer, output feature vectors are subjected to channel connection and input into the convolutional layers and the Sigmod active layer, and a channel weighting coefficient is obtained
Figure BDA0003589223530000106
Is shown as
Figure BDA0003589223530000107
H and W respectively represent the height and width of the image, and C represents the number of input channels;
then, multiplying the channel weighting coefficient with the input characteristic to obtain the corresponding channel attention diagram
Figure BDA0003589223530000108
Is shown as
Figure BDA0003589223530000109
Then, taking the channel attention diagram as the input of a space attention model, performing global average pooling operation and maximum pooling operation, performing channel connection on the output space characteristic diagram, inputting the convolution layer and the Sigmod activation layer to obtain a space weighting coefficient
Figure BDA00035892235300001010
Is shown as
Figure BDA00035892235300001011
Finally, the spatial weighting coefficient is multiplied with the input channel attention diagram to obtain the corresponding spatial attention diagram
Figure BDA00035892235300001012
Is shown as
Figure BDA00035892235300001013
The interactive attention module and the compensation attention module are used for establishing a global dependency relationship of local features, realizing feature interaction and compensation of triple paths and enhancing feature extraction and feature reconstruction capabilities.
The above process is the relevant content for image fusion of the infrared image to be fused and the visible light image to be fused. In order to perform image fusion on the infrared image to be fused and the visible light image to be fused through the interactive compensation generator, the interactive compensation generator needs to be trained in advance, and the following content is a process for training the interactive compensation generator.
Specifically, the method for training the interactive compensation generator comprises the following steps:
s01, constructing an interactive compensation generator: and establishing a three-path multi-scale coding-decoding network framework comprising an interactive attention coding network, a fusion layer and a compensation attention decoding network by taking an infrared path, a visible light path and a connecting path for connecting an infrared image and a visible light image channel as input, wherein the three-path multi-scale coding-decoding network framework comprises an interactive attention coding network, a fusion layer and a compensation attention decoding network and is used for generating an initial fusion image.
The interactive attention coding network respectively adopts 4 convolution layers with convolution kernels of 3 multiplied by 3 to extract the multi-scale depth features of the triple paths, wherein the first convolution layer and the second convolution layer are convolutions with the step length of 1 and used for extracting the shallow layer features of the image, and the third convolution layer and the fourth convolution layer are convolutions with the step length of 2 and used for extracting the multi-scale depth features of the image; the number of input channels of the four convolution layers of the infrared path and the visible light path is respectively 1, 16 and 32, the number of output channels is respectively 16, 32 and 64, the number of input channels of the four convolution layers of the connecting path is respectively 2, 16, 64 and 128, the number of output channels is respectively 16, 32 and 64, the activation function is that the PReLU starts from the second convolution layer, the characteristics of the infrared path and the visible light path are respectively in channel connection with the characteristics of the connecting path and are recorded as phi m And phi n Then input the interactive attention modelBlock, generate interactive attention fusion graph, denoted Φ F (ii) a Obtaining a final interactive attention map after the three-level interactive attention;
the fusion layer directly connects the final interactive attention map with the compensation attention map of the fourth convolution layer of the infrared path and the visible light path through a channel to obtain a fusion attention feature map;
the compensation attention decoding network respectively adopts convolution layers with 4 convolution kernels of 3 multiplied by 3 to reconstruct characteristics, wherein the first convolution layer and the second convolution layer are accompanied by an up-sampling operation; the number of input channels of the four convolutional layers is 384, 192, 96 and 32 respectively, the number of output channels is 128, 64, 32 and 1 respectively, and the activation function is PReLU; the fused attention feature map is subjected to up-sampling operation and first-layer convolution, and the obtained output is in channel connection with the infrared path compensation attention map and the visible path compensation attention map of the corresponding scales, so that an initial fused image is finally obtained.
S02, constructing a double discriminator model comprising an infrared discriminator and a visible light discriminator; in the training process, inputting the initial fusion image obtained by the interactive compensation generator into a corresponding discriminator together with the infrared image and the visible light image so as to restrict the fusion image to have similar data distribution with the infrared image and the visible light image respectively; and when the competing games of the interactive compensation generator, the infrared discriminator and the visible light discriminator reach balance, obtaining a final fusion result.
The infrared discriminator causes the fused image to hold as much infrared pixel intensity information as possible, while the visible light discriminator causes the fused image to contain as much visible light detail information as possible. And the final fusion result obtained when the countermeasure game is balanced enables the fusion image to have the infrared pixel intensity and the visible light texture detail information of the source image at the same time.
The infrared discriminator and the visible light discriminator have the same network structure and are respectively composed of 4 convolution layers and 1 full-connection layer, all the convolution layers adopt a 3 x3 kernel size and a LeakyRelu activation function, the step length is 2, the input channels of the corresponding convolution layers are respectively 1, 16, 32 and 64, and the output channels are respectively 16, 32, 64 and 128;
s03, training a network model: the infrared image and the visible light image are used as training data sets, loss functions representing the pixel intensity of the infrared image and the edge gradient of the visible light image are adopted to supervise network model training, and optimal network model parameters, namely parameters of an optimal interaction compensation generator, are obtained.
The loss function comprises an interaction compensation generator loss function and a discriminator loss function; in the interactive compensation generator, the loss function is formed by a competing loss function L adv And a content loss function L con Composition, represented as L G =L adv +L con (ii) a Consider that the infrared image represents the target features in pixel intensities, while the visible image represents the scene texture by edges and gradients. Therefore, the Frobenius norm is adopted to carry out similarity constraint on the pixel intensity of the infrared image and the fused image, the L1 norm is adopted to carry out similarity constraint on the gradient change of the visible light image and the fused image, and therefore, the content loss function of the interactive compensation generator can be expressed as
Figure BDA0003589223530000121
Wherein H and W respectively represent the height and width of the image, | · | | purple F And | · | non-counting 1 Representing the Frobenius norm, the L1 norm,
Figure BDA0003589223530000122
representing gradient operators, I f Representing the initial fused image, I ir Representing an infrared image, I vis Representing a visible light image. In the dual discriminator, the infrared discriminator and the visible light discriminator aim to balance the authenticity of the fused image and the source image, forcing the generated fused image to be simultaneously inclined to the real data distribution of the infrared image and the visible light image. In the infrared discriminator and the visible light discriminator, the resistance loss function is expressed as
Figure BDA0003589223530000123
N represents the number of training images; meanwhile, the respective loss functions of the infrared discriminator and the visible light discriminator are expressed as
Figure BDA0003589223530000124
And
Figure BDA0003589223530000125
wherein λ is a regularization parameter, | | · |. non-woven phosphor 2 Represents the L2 norm; the first term represents the wasserstein distance between the fusion result and the infrared or visible light image, and the second term is a gradient penalty for limiting the learning ability of the infrared discriminator and the visible light discriminator.
Wherein the training dataset takes 25 sets of infrared and visible images of the TNO dataset, the original image is segmented into dimensions 128 x 128 using a sliding window with a step size of 12, the grey value range is converted to [ -1,1]Finally obtaining 18813 groups of images as a training set; in the training process, an Adam optimizer is used for updating network model parameters, and the Batchsize and Epoch are respectively set to be 4 and 16; the learning rates of the interactive compensation generator and the discriminators (infrared discriminator and visible light discriminator) were set to 1 × 10, respectively -4 And 4X 10 -4 The corresponding iteration times are set to 1 and 2, respectively; in the loss function, the regularization parameter λ is set to 10. The experimental training platform is Intel I9-10850KCPU, 64GB memory and NVIDIA GeForce GTX3090 GPU. The compilation environment is Python and PyTorch platforms.
Further, in order to verify the image fusion effect of the interactive compensation generator obtained by training through the method, the embodiment of the invention also verifies the trained interactive compensation generator.
Specifically, in the testing phase, 22 sets of images from the TNO data set were selected for test validation. Comparison methods 9 exemplary methods were selected, including MDLatLRR, DenseeFuse, IFCNN, Res2Fusion, SEDRFUse, RFN-Nest, PMGI, FusionGAN, and GANMCC. In addition, 8 indices such as Average Gradient (AG), entropy of information (EN), Standard Deviation (SD), Mutual Information (MI), Spatial Frequency (SF), entropy of Nonlinear Correlation Information (NCIE), Qabf, and Visual Information Fidelity (VIF) are used as the objective evaluation index. The verification result includes the following two aspects.
(1) And (4) subjective evaluation. Fig. 6 and 7 show a subjective comparison result of two sets of images Solider _ with _ jeep and Street. By contrast, the fusion method of the present invention can be found to have three advantages. First, the fusion result may retain high-luminance target information in the infrared image. For typical infrared targets, such as the car of fig. 6 and the pedestrian of fig. 7, the fusion results of the present invention have higher brightness target features than other methods. Second, the fusion result may preserve the texture details of the visible light image. For example, the house edge of fig. 6 and the billboard of fig. 7, the fusion result of the present invention is more obvious and more precise than other methods for these representative details. Finally, the fusion result obtains higher contrast and better visual effect. Compared with the source image and other fusion results, the method can better reserve the outstanding target characteristics and rich scene detail information and obtain more balanced fusion results.
(2) And (4) objective evaluation. Table 1 gives the results of an objective comparison of 22 sets of images of the TNO dataset. The optimal and suboptimal mean are marked with bold and underline, respectively. It can be seen that the method of the invention achieves an optimum average of the indices AG, EN, MI, SF, NCIE and VIF, a suboptimal average of the indices SD and Qabf. Objective experiments show that the method has better fusion performance than other methods. The maximum value EN indicates that the useful information abundant in the source image can be maintained. This is because the method of the present invention uses a triple path, the infrared path and the visible light path providing additional intensity and gradient information for the connecting path. The maximum values MI and NCIE show that the fusion result has strong correlation and similarity with the source image. This is because the method of the present invention uses dual discriminators to supervise and optimize the interactive compensation generator, which can produce a more balanced fusion result. The maximum values AG, SF and VIF indicate that better image contrast and visual effect can be obtained. According to the method, the interaction attention module and the compensation attention module are adopted, the long dependency relationship of local features is established, and the acquired attention feature map focuses on infrared target perception and visible texture detail representation.
TABLE 1
Figure BDA0003589223530000141
All the optional technical schemes can be combined at will, and the structure after one-to-one combination is not explained in detail in the invention.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and it should be noted that, for those skilled in the art, many modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (7)

1. An infrared and visible image anti-fusion method for interactively compensating attention, which is characterized by comprising the following steps:
s1, determining triple paths of an infrared path and a visible light path corresponding to the infrared image to be fused and the visible light image to be fused respectively, and a connection path obtained by channel connection of the infrared image to be fused and the visible light image to be fused as the input of a pre-trained interactive compensation generator, wherein the interactive compensation generator establishes a multi-scale coding-decoding network framework of the triple paths, and the multi-scale coding-decoding network framework comprises an interactive attention coding network, a fusion layer and a compensation attention decoding network;
s2, extracting the multi-scale depth features of the triple path through convolution layers with 4 convolution kernels of 3 x3 adopted by the interactive attention coding network, wherein the first convolution layer and the second convolution layer of the interactive attention coding network are respectively convolution with the step length of 1 and are used for extracting the shallow layer features of the image, the third convolution layer and the fourth convolution layer are respectively convolution with the step length of 2 and are used for extracting the multi-scale depth features of the image, and the shallow layer features and the multi-scale depth features are subjected to three-level interactive attention action to obtain a final interactive attention map;
s3, directly connecting the final interactive attention map with the compensation attention map obtained by the fourth convolution layer of the infrared path and the visible light path through the fusion layer to obtain a fused attention feature map;
s4, reconstructing the convolutional layer with the 4 convolutional kernels adopted by the compensatory attention decoding network being 3 x3, wherein the first convolutional layer and the second convolutional layer of the compensatory attention decoding network are accompanied by an upsampling operation; and performing channel connection on the fused attention feature map and the infrared path compensation attention map and the visible path compensation attention map of the corresponding scale through up-sampling operation and convolution operation of the first convolution layer to obtain a fused image.
2. The method of claim 1, wherein the number of input channels of the four convolutional layers of the infrared path and the visible light path of the interactive attention coding network are 1, 16, 32, respectively, the number of output channels are 16, 32, 64, respectively, the number of input channels of the four convolutional layers of the connection path are 2, 16, 64, 128, respectively, the number of output channels are 16, 32, 64, respectively, and the activation function is PReLU; starting from the second convolutional layer, the characteristics of the infrared path and the visible light path are respectively connected with the characteristics of the connecting path in a channel way, and the channel is recorded as phi m And phi n Then inputting the interactive attention module of the interactive attention coding network to generate an interactive attention fusion graph, which is marked as phi F
3. The method as claimed in claim 1, wherein the number of input channels of the four convolutional layers of the attention decoding network is 384, 192, 96, 32, the number of output channels is 128, 64, 32, 1, and the activation function is PReLU.
4. The attention-compensating interactive infrared and visible image anti-fusion method of claim 2,
the interactive attention module, for input feature Φ m And
Figure RE-FDA00037469314000000226
first using a global averaging pool in the channel attention modelMapping depth features to channel vectors through the maximum pooling operation and the maximum pooling operation, performing channel connection on output feature vectors after passing through two convolutional layers and a PReLU active layer, and inputting the output feature vectors to the convolutional layers and the Sigmod active layer to obtain initial channel weighting coefficients
Figure RE-FDA0003746931400000021
And
Figure RE-FDA0003746931400000022
are respectively represented as
Figure RE-FDA0003746931400000023
And
Figure RE-FDA0003746931400000024
wherein Conv represents convolution operation, Con represents channel connection operation, AP (-) and MP (-) represent global average pooling operation and maximum pooling operation, respectively, σ and δ represent PReLU and Sigmod activation functions, H and W represent height and width of image, respectively, and C represents input channel number;
then, Softmax operation is adopted to obtain the corresponding final channel weighting coefficient, namely
Figure RE-FDA0003746931400000025
And
Figure RE-FDA0003746931400000026
are respectively represented as
Figure RE-FDA0003746931400000027
And
Figure RE-FDA0003746931400000028
multiplying the final channel weighting coefficient with the respective input characteristics to obtain the corresponding channel interactive attention diagram
Figure RE-FDA0003746931400000029
And
Figure RE-FDA00037469314000000210
respectively expressed as:
Figure RE-FDA00037469314000000211
and
Figure RE-FDA00037469314000000212
then, taking the corresponding channel interactive attention diagram as the input of a space attention model, performing global average pooling operation and maximum pooling operation, performing channel connection on the output space characteristic diagram, inputting the convolution layer and the Sigmod activation layer to obtain respective initial space weighting coefficients
Figure RE-FDA00037469314000000213
And
Figure RE-FDA00037469314000000214
are respectively represented as
Figure RE-FDA00037469314000000215
And
Figure RE-FDA00037469314000000216
then, a final spatial weighting coefficient is obtained by utilizing Softmax operation
Figure RE-FDA00037469314000000217
And
Figure RE-FDA00037469314000000218
are respectively represented as
Figure RE-FDA00037469314000000219
And
Figure RE-FDA00037469314000000220
multiplying the final space weighting coefficient with the corresponding channel attention diagram to obtain the corresponding space interaction attention diagram
Figure RE-FDA00037469314000000221
And
Figure RE-FDA00037469314000000222
are respectively represented as
Figure RE-FDA00037469314000000223
And
Figure RE-FDA00037469314000000224
finally, the space interactive attention diagrams of the two are connected in a channel mode to obtain an interactive attention fusion diagram phi F Is represented as
Figure RE-FDA00037469314000000225
5. The attention-compensating interactive infrared and visible image anti-fusion method of claim 3,
the attention compensation module is used for inputting infrared image characteristics or visible light image characteristics
Figure RE-FDA0003746931400000039
Firstly, a global average pooling operation and a maximum pooling operation are used for converting feature mapping to a channel vector in a channel attention model, after passing through two convolutional layers and a PReLU active layer, output feature vectors are subjected to channel connection and input to the convolutional layers and the Sigmod active layer, and a channel weighting coefficient is obtained
Figure RE-FDA0003746931400000031
Is shown as
Figure RE-FDA0003746931400000032
H and W respectively represent the height and width of the image, and C represents the number of input channels;
then, multiplying the channel weighting coefficient with the input characteristic to obtain the corresponding channel attention diagram
Figure RE-FDA0003746931400000033
Is shown as
Figure RE-FDA0003746931400000034
Then, taking the channel attention diagram as the input of a space attention model, performing global average pooling operation and maximum pooling operation, performing channel connection on the output space characteristic diagram, inputting the convolution layer and the Sigmod activation layer to obtain a space weighting coefficient
Figure RE-FDA0003746931400000035
Is shown as
Figure RE-FDA0003746931400000036
Finally, the spatial weighting coefficient is multiplied with the input channel attention diagram to obtain the corresponding spatial attention diagram
Figure RE-FDA0003746931400000037
Is shown as
Figure RE-FDA0003746931400000038
6. The method for interactively attention-compensating anti-fusion of infrared and visible images as claimed in claim 1, further comprising, before said step S1:
s01, constructing an interactive compensation generator: establishing a three-path multi-scale coding-decoding network framework by taking an infrared path, a visible light path and a connecting path for connecting an infrared image and a visible light image channel as input, wherein the three-path multi-scale coding-decoding network framework comprises an interactive attention coding network, a fusion layer and a compensation attention decoding network and is used for generating an initial fusion image;
the interactive attention coding network respectively adopts 4 convolution layers with convolution kernels of 3 multiplied by 3 to extract the multi-scale depth features of the triple paths, wherein the first convolution layer and the second convolution layer are convolutions with the step length of 1 and used for extracting the shallow layer features of the image, and the third convolution layer and the fourth convolution layer are convolutions with the step length of 2 and used for extracting the multi-scale depth features of the image; the number of input channels of the four convolution layers of the infrared path and the visible light path is respectively 1, 16 and 32, the number of output channels is respectively 16, 32 and 64, the number of input channels of the four convolution layers of the connecting path is respectively 2, 16, 64 and 128, the number of output channels is respectively 16, 32 and 64, and the activation function is PReLU; starting from the second convolutional layer, the characteristics of the infrared path and the visible light path are respectively connected with the characteristics of the connecting path in a channel way, and the channel is recorded as phi m And phi n Then inputting the interactive attention module to generate an interactive attention fusion graph, which is marked as phi F (ii) a Obtaining a final interactive attention map after three-level interactive attention action;
the fusion layer directly connects the final interactive attention map with the compensation attention map of the fourth convolution layer of the infrared path and the visible light path through a channel to obtain a fusion attention feature map;
the compensation attention decoding network respectively adopts convolution layers with 4 convolution kernels of 3 multiplied by 3 to reconstruct characteristics, wherein the first convolution layer and the second convolution layer are accompanied by an up-sampling operation; the number of input channels of the four convolutional layers is 384, 192, 96 and 32 respectively, the number of output channels is 128, 64, 32 and 1 respectively, and the activation function is PReLU; the fused attention feature map is subjected to up-sampling operation and first-layer convolution, and the obtained output is in channel connection with the infrared path compensation attention map and the visible path compensation attention map of the corresponding scales, so that an initial fused image is finally obtained;
s02, constructing a double discriminator comprising an infrared discriminator and a visible light discriminator: in the training process, inputting the initial fusion image obtained by the interactive compensation generator and the infrared image and the visible light image into corresponding discriminators so as to restrict the fusion image to have similar data distribution with the infrared image and the visible light image respectively; when the confrontation game of the interactive compensation generator, the infrared discriminator and the visible light discriminator reaches balance, a final fusion result is obtained;
the infrared discriminator and the visible light discriminator have the same network structure and are composed of 4 convolutional layers and 1 full-connection layer, all convolutional layers adopt 3 x3 kernel size and LeakyRelu activation function, the step length is 2, the input channels of the corresponding convolutional layers are respectively 1, 16, 32 and 64, and the number of output channels is respectively 16, 32, 64 and 128;
s03, training a network model: taking the infrared image and the visible light image as training data sets, and adopting a loss function representing the pixel intensity of the infrared image and the edge gradient of the visible light image to supervise network model training to obtain optimal network model parameters;
the loss function comprises an interaction compensation generator loss function and a discriminator loss function; in the interactive compensation generator, the loss function is formed by a competing loss function L adv And a content loss function L con Composition, represented as L G =L adv +L con (ii) a The content loss function of the interaction compensation generator may be expressed as
Figure RE-FDA0003746931400000041
Wherein H and W respectively represent the height and width of the image, | · | | purple F And | · | non-conducting phosphor 1 Representing the Frobenius norm, the L1 norm,
Figure RE-FDA0003746931400000051
representing gradient operators, I f Representing the initial fused image, I ir Representing an infrared image, I vis Representing a visible light image; in the infrared discriminator and the visible light discriminator, the resistance loss function is expressed as
Figure RE-FDA0003746931400000052
N represents the number of training images; meanwhile, the respective loss functions of the infrared discriminator and the visible light discriminator are respectively expressed as
Figure RE-FDA0003746931400000053
And
Figure RE-FDA0003746931400000054
wherein λ is a regularization parameter, | | · |. non-woven phosphor 2 Represents the L2 norm; the first term represents the wasserstein distance between the fusion result and the infrared or visible light image, and the second term is a gradient penalty for limiting the learning ability of the infrared discriminator and the visible light discriminator.
7. The interactive attention-compensating infrared and visible image pair fusion method according to claim 6, wherein the training dataset uses 25 sets of infrared and visible images of the TNO dataset, divides the original image into the size of 128 x 128 using a sliding window with the step size of 12, converts the gray value range into [ -1,1], and finally obtains 18813 sets of images as the training set;
in the training process, an Adam optimizer is used for updating network model parameters, and the Batchsize and Epoch are respectively set to be 4 and 16; the learning rates of the interactive compensation generator and the dual discriminator are set to 1 × 10 4 And 4X 10 4 The corresponding iteration times are set to 1 and 2, respectively;
in the loss function, the regularization parameter λ is set to 10.
CN202210376347.0A 2022-04-11 2022-04-11 Infrared and visible light image anti-fusion method for interactively compensating attention Withdrawn CN115035003A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210376347.0A CN115035003A (en) 2022-04-11 2022-04-11 Infrared and visible light image anti-fusion method for interactively compensating attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210376347.0A CN115035003A (en) 2022-04-11 2022-04-11 Infrared and visible light image anti-fusion method for interactively compensating attention

Publications (1)

Publication Number Publication Date
CN115035003A true CN115035003A (en) 2022-09-09

Family

ID=83119944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210376347.0A Withdrawn CN115035003A (en) 2022-04-11 2022-04-11 Infrared and visible light image anti-fusion method for interactively compensating attention

Country Status (1)

Country Link
CN (1) CN115035003A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311186A (en) * 2022-10-09 2022-11-08 济南和普威视光电技术有限公司 Cross-scale attention confrontation fusion method for infrared and visible light images and terminal
CN115423734A (en) * 2022-11-02 2022-12-02 国网浙江省电力有限公司金华供电公司 Infrared and visible light image fusion method based on multi-scale attention mechanism
CN115546489A (en) * 2022-11-23 2022-12-30 南京理工大学 Multi-modal image semantic segmentation method based on cross-modal feature enhancement and interaction
CN116363036A (en) * 2023-05-12 2023-06-30 齐鲁工业大学(山东省科学院) Infrared and visible light image fusion method based on visual enhancement
CN116664462A (en) * 2023-05-19 2023-08-29 兰州交通大学 Infrared and visible light image fusion method based on MS-DSC and I_CBAM
CN118411575A (en) * 2024-07-03 2024-07-30 华东交通大学 Chest pathology image classification method and system based on observation mode and feature fusion
CN118446912A (en) * 2024-07-11 2024-08-06 江西财经大学 Multi-mode image fusion method and system based on multi-scale attention sparse cascade

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200394409A1 (en) * 2019-01-03 2020-12-17 Lucomm Technologies, Inc. System for physical-virtual environment fusion
CN113706406A (en) * 2021-08-11 2021-11-26 武汉大学 Infrared and visible light image fusion method based on feature space multi-classification countermeasure mechanism
CN114187214A (en) * 2021-11-12 2022-03-15 国网辽宁省电力有限公司电力科学研究院 Infrared and visible light image fusion system and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200394409A1 (en) * 2019-01-03 2020-12-17 Lucomm Technologies, Inc. System for physical-virtual environment fusion
CN113706406A (en) * 2021-08-11 2021-11-26 武汉大学 Infrared and visible light image fusion method based on feature space multi-classification countermeasure mechanism
CN114187214A (en) * 2021-11-12 2022-03-15 国网辽宁省电力有限公司电力科学研究院 Infrared and visible light image fusion system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHISHE WANG等: "Infrared and Visible Image Fusion via Interactive Compensatory Attention Adversarial Learning", ARXIV, 29 March 2022 (2022-03-29), pages 1 - 13 *
冉鑫;任蕾;: "基于可见光视频图像处理的水上弱小目标检测方法", 上海海事大学学报, no. 02, 15 June 2010 (2010-06-15) *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311186A (en) * 2022-10-09 2022-11-08 济南和普威视光电技术有限公司 Cross-scale attention confrontation fusion method for infrared and visible light images and terminal
CN115311186B (en) * 2022-10-09 2023-02-03 济南和普威视光电技术有限公司 Cross-scale attention confrontation fusion method and terminal for infrared and visible light images
CN115423734A (en) * 2022-11-02 2022-12-02 国网浙江省电力有限公司金华供电公司 Infrared and visible light image fusion method based on multi-scale attention mechanism
CN115546489A (en) * 2022-11-23 2022-12-30 南京理工大学 Multi-modal image semantic segmentation method based on cross-modal feature enhancement and interaction
CN116363036A (en) * 2023-05-12 2023-06-30 齐鲁工业大学(山东省科学院) Infrared and visible light image fusion method based on visual enhancement
CN116363036B (en) * 2023-05-12 2023-10-10 齐鲁工业大学(山东省科学院) Infrared and visible light image fusion method based on visual enhancement
CN116664462A (en) * 2023-05-19 2023-08-29 兰州交通大学 Infrared and visible light image fusion method based on MS-DSC and I_CBAM
CN116664462B (en) * 2023-05-19 2024-01-19 兰州交通大学 Infrared and visible light image fusion method based on MS-DSC and I_CBAM
CN118411575A (en) * 2024-07-03 2024-07-30 华东交通大学 Chest pathology image classification method and system based on observation mode and feature fusion
CN118411575B (en) * 2024-07-03 2024-08-23 华东交通大学 Chest pathology image classification method and system based on observation mode and feature fusion
CN118446912A (en) * 2024-07-11 2024-08-06 江西财经大学 Multi-mode image fusion method and system based on multi-scale attention sparse cascade
CN118446912B (en) * 2024-07-11 2024-09-27 江西财经大学 Multi-mode image fusion method and system based on multi-scale attention sparse cascade

Similar Documents

Publication Publication Date Title
CN115035003A (en) Infrared and visible light image anti-fusion method for interactively compensating attention
Ren et al. Single image dehazing via multi-scale convolutional neural networks with holistic edges
Li et al. Underwater scene prior inspired deep underwater image and video enhancement
CN111709902B (en) Infrared and visible light image fusion method based on self-attention mechanism
US20200265597A1 (en) Method for estimating high-quality depth maps based on depth prediction and enhancement subnetworks
CN111145131A (en) Infrared and visible light image fusion method based on multi-scale generation type countermeasure network
CN115311186B (en) Cross-scale attention confrontation fusion method and terminal for infrared and visible light images
CN113592018B (en) Infrared light and visible light image fusion method based on residual dense network and gradient loss
CN109255358B (en) 3D image quality evaluation method based on visual saliency and depth map
CN114049335B (en) Remote sensing image change detection method based on space-time attention
CN113283444B (en) Heterogeneous image migration method based on generation countermeasure network
CN109255774A (en) A kind of image interfusion method, device and its equipment
CN113762277B (en) Multiband infrared image fusion method based on Cascade-GAN
CN112991371B (en) Automatic image coloring method and system based on coloring overflow constraint
CN114782298B (en) Infrared and visible light image fusion method with regional attention
Singh et al. Weighted least squares based detail enhanced exposure fusion
CN113781375B (en) Vehicle-mounted vision enhancement method based on multi-exposure fusion
CN113920171B (en) Bimodal target tracking method based on feature level and decision level fusion
CN117292117A (en) Small target detection method based on attention mechanism
Kumar et al. Underwater image enhancement using deep learning
CN117495718A (en) Multi-scale self-adaptive remote sensing image defogging method
CN110689510B (en) Sparse representation-based image fusion method introducing dictionary information
CN107578406A (en) Based on grid with Wei pool statistical property without with reference to stereo image quality evaluation method
CN115457265B (en) Image defogging method and system based on generation of countermeasure network and multi-scale fusion
CN116980549A (en) Video frame processing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20220909

WW01 Invention patent application withdrawn after publication