CN115035003A - Infrared and visible light image anti-fusion method for interactively compensating attention - Google Patents
Infrared and visible light image anti-fusion method for interactively compensating attention Download PDFInfo
- Publication number
- CN115035003A CN115035003A CN202210376347.0A CN202210376347A CN115035003A CN 115035003 A CN115035003 A CN 115035003A CN 202210376347 A CN202210376347 A CN 202210376347A CN 115035003 A CN115035003 A CN 115035003A
- Authority
- CN
- China
- Prior art keywords
- attention
- infrared
- image
- interactive
- visible light
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000007500 overflow downdraw method Methods 0.000 title claims abstract description 18
- 230000003141 anti-fusion Effects 0.000 title claims abstract description 10
- 230000002452 interceptive effect Effects 0.000 claims abstract description 112
- 230000004927 fusion Effects 0.000 claims abstract description 80
- 238000012549 training Methods 0.000 claims abstract description 32
- 230000009471 action Effects 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 50
- 238000010586 diagram Methods 0.000 claims description 40
- 238000000034 method Methods 0.000 claims description 35
- 238000011176 pooling Methods 0.000 claims description 30
- 230000004913 activation Effects 0.000 claims description 24
- 230000003993 interaction Effects 0.000 claims description 16
- 239000013598 vector Substances 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 12
- 238000005070 sampling Methods 0.000 claims description 10
- 230000001447 compensatory effect Effects 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 5
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 230000009977 dual effect Effects 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 3
- 238000012935 Averaging Methods 0.000 claims 1
- 238000000605 extraction Methods 0.000 abstract description 5
- 230000008447 perception Effects 0.000 abstract description 4
- 108091006146 Channels Proteins 0.000 description 91
- 238000003384 imaging method Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 230000000295 complement effect Effects 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000002708 enhancing effect Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000003331 infrared imaging Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to an infrared and visible light image anti-fusion method for interactively compensating attention. According to the invention, a multi-scale encoder-decoder network with triple paths is constructed in an interactive compensation generator, and an infrared path and a visible light path provide extra strength and gradient information for a connecting path under the action of an interactive attention module and a compensation attention module of the multi-scale encoder-decoder network, so that more prominent infrared targets and abundant texture details can be reserved in a fused image, the capability of feature extraction and feature reconstruction is enhanced, and an obtained attention feature map focuses on infrared image target perception and visible light image texture detail representation; during training, the interactive compensation generator is optimized through the double discriminators, and the double discriminators can be used for more uniformly constraining the data distribution similarity between the fusion result and the source image so that the interactive compensation generator can generate a more balanced fusion result.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to an infrared and visible light image confrontation fusion method for interactively compensating attention.
Background
The infrared and visible light image fusion aims at integrating the advantages of the two types of sensors, and the fusion image generated by complementation has better target perception and scene expression and is beneficial to human eye observation and subsequent calculation processing. Infrared sensors sensitive to heat source radiation can capture salient target area information, but the infrared images obtained typically lack structural features and textural detail. On the contrary, the visible light sensor can acquire rich scene information and texture details through light reflection imaging, and the visible light image has higher spatial resolution and rich texture details, but cannot effectively highlight the characteristics of the target, is easily influenced by the external environment, and is seriously lost particularly under the low-illumination environmental condition. Because of the difference of the infrared imaging mechanism and the visible light imaging mechanism, the two types of images have stronger complementary information, the cooperative detection capability of the infrared imaging sensor and the visible light imaging sensor can be effectively improved only by applying the fusion technology, and the method and the device are widely applied to the fields of remote sensing detection, medical diagnosis, intelligent driving, safety monitoring and the like.
Currently, infrared and visible light image fusion techniques can be broadly classified into conventional fusion methods and deep learning fusion methods. In the conventional image fusion method, image features are usually extracted by the same feature transformation or feature representation, and are combined by adopting a proper fusion rule, and then a final fusion image is obtained by inverse transformation reconstruction. Since infrared and visible light sensor imaging mechanisms are different, infrared images characterize target features in terms of pixel brightness, while visible light images characterize scene texture in terms of edges and gradients. The traditional fusion method does not consider the inherent different characteristics of the source images, adopts the same transformation or representation model to indiscriminately extract the image characteristics, and inevitably causes the results of low fusion performance and poor visual effect. In addition, the fusion rule is set manually, and is more and more complex, the calculation cost is high, and the practical application of image fusion is limited.
In recent years, the convolution operation has strong feature extraction capability and can learn the model building parameters from a large amount of data, so that the fusion method based on deep learning has a satisfactory effect. Nevertheless, there are some disadvantages. Firstly, the methods blindly rely on convolution operation to extract image features, and do not consider the interaction of the internal features of the two types of images, so that the local feature extraction capability is insufficient, and the target brightness reduction and the texture detail blurring of image fusion are easily caused. Secondly, the methods completely depend on convolution operation to extract the local features of the images, the global dependency of the image features is not considered, the global feature information of the images cannot be effectively extracted, and the loss of the global feature information of the fused images is easy to cause.
In summary, there is an urgent need for a method capable of extracting local and global features of two types of images simultaneously, effectively enhancing the characterization capability of depth features, suppressing irrelevant information when enhancing useful information, and further improving the fusion performance of infrared and visible light images.
Disclosure of Invention
The invention provides an infrared and visible light image anti-fusion method capable of interactively compensating attention, and aims to solve the technical problems that an unbalanced fusion result is easily caused because the existing deep learning fusion method only extracts local features of images, cannot model the local feature interaction relationship and the global feature compensation relationship of two types of images, and namely, the fused images cannot effectively retain infrared typical targets and visible texture details at the same time. The technical scheme is as follows:
an infrared and visible image anti-fusion method for interactively compensating attention, comprising:
s1, determining the triple paths of the infrared path and the visible light path corresponding to the infrared image to be fused and the visible light image to be fused, and the connection path obtained by the channel connection of the infrared image to be fused and the visible light image to be fused as the input of a pre-trained interactive compensation generator, wherein the interactive compensation generator establishes a multi-scale coding-decoding network framework of the triple paths, and the multi-scale coding-decoding network framework comprises an interactive attention coding network, a fusion layer and a compensation attention decoding network;
s2, extracting the multi-scale depth features of the triple paths through convolution layers with 4 convolution kernels of 3 x3 adopted by the interactive attention coding network, wherein the first convolution layer and the second convolution layer of the interactive attention coding network are convolutions with the step length of 1 and used for extracting the shallow layer features of the image, the third convolution layer and the fourth convolution layer are convolutions with the step length of 2 and used for extracting the multi-scale depth features of the image, and the shallow layer features and the multi-scale depth features are subjected to three-level interactive attention action to obtain a final interactive attention map;
s3, directly connecting the final interactive attention map with the compensation attention map obtained by the fourth convolution layer of the infrared path and the visible light path through the fusion layer to obtain a fused attention feature map;
s4, reconstructing the convolutional layer with the 4 convolutional kernels adopted by the compensatory attention decoding network being 3 x3, wherein the first convolutional layer and the second convolutional layer of the compensatory attention decoding network are accompanied by an upsampling operation; and performing channel connection on the fused attention feature map and the infrared path compensation attention map and the visible path compensation attention map of the corresponding scale through up-sampling operation and convolution operation of the first convolution layer to obtain a fused image.
Optionally, the number of input channels of the four convolutional layers of the infrared path and the visible light path of the interactive attention coding network is 1, 16, and 32, respectively, the number of output channels is 16, 32, and 64, respectively, the number of input channels of the four convolutional layers of the connection path is 2, 16, 64, and 128, the number of output channels is 16, 32, and 64, respectively, and the activation function is a prime lu; starting from the second convolutional layer, the characteristics of the infrared path and the visible light path are respectively connected with the characteristics of the connecting path in a channel way, and the channel is recorded as phi m And phi n Then inputting an interactive attention module of the interactive attention coding network to generate an interactive attention fusion mapIs marked as phi F 。
Optionally, the number of input channels of the four convolutional layers of the complementary attention decoding network is 384, 192, 96 and 32, respectively, the number of output channels is 128, 64, 32 and 1, respectively, and the activation function is PReLU.
Optionally, the interactive attention module is to input the feature Φ m And phi n ∈R H×W×C Firstly, respectively mapping depth features to channel vectors by using global average pooling operation and maximum pooling operation in a channel attention model, performing channel connection on output feature vectors after passing through two convolutional layers and a PReLU active layer, and inputting the output feature vectors to the convolutional layers and the Sigmod active layer to obtain an initial channel weighting coefficientAndare respectively represented as
wherein Conv represents convolution operation, Con represents channel connection operation, AP (-) and MP (-) represent global average pooling operation and maximum pooling operation, respectively, σ and δ represent PReLU and Sigmod activation functions, H and W represent height and width of image, respectively, and C represents input channel number;
then, Softmax operation is adopted to obtain the corresponding final channel weighting coefficient, namelyAndare respectively represented asAndmultiplying the final channel weighting coefficient with the respective input characteristics to obtain the corresponding channel interaction ideogram diagramAndrespectively expressed as:and
then, taking the corresponding channel interactive attention diagram as the input of a space attention model, performing global average pooling operation and maximum pooling operation, performing channel connection on the output space characteristic diagram, inputting the convolution layer and the Sigmod activation layer to obtain respective initial space weighting coefficientsAndare respectively represented asAnd
then, a final spatial weighting coefficient is obtained by utilizing Softmax operationAndare respectively provided withIs shown asAndmultiplying the final space weighting coefficient with the corresponding channel attention diagram to obtain the corresponding space interaction attention diagramAndare respectively represented asAnd
finally, the space interactive attention diagrams of the two are connected in a channel mode to obtain an interactive attention fusion diagram phi F Is shown as
Optionally, the attention compensation module is used for inputting infrared image characteristics or visible light image characteristics phi m ∈R H×W×C Firstly, a global average pooling operation and a maximum pooling operation are used for converting feature mapping to a channel vector in a channel attention model, after passing through two convolutional layers and a PReLU active layer, output feature vectors are subjected to channel connection and input into the convolutional layers and the Sigmod active layer to obtain a channel weighting coefficientIs shown as
H and W denote the height and width of the image, respectively, and C denotes the input channelCounting;
then, multiplying the channel weighting coefficient with the input characteristic to obtain the corresponding channel attention chartIs shown as
Then, taking the channel attention diagram as the input of a space attention model, performing global average pooling operation and maximum pooling operation, performing channel connection on the output space characteristic diagram, inputting the convolution layer and the Sigmod activation layer to obtain a space weighting coefficientIs shown as
Finally, the spatial weighting coefficient is multiplied with the input channel attention diagram to obtain the corresponding spatial attention diagramIs shown as
Optionally, the S1 is preceded by:
s01, constructing an interactive compensation generator: establishing a three-path multi-scale coding-decoding network framework by taking an infrared path, a visible light path and a connecting path for connecting an infrared image and a visible light image channel as input, wherein the three-path multi-scale coding-decoding network framework comprises an interactive attention coding network, a fusion layer and a compensation attention decoding network and is used for generating an initial fusion image;
the interactive attention coding network respectively adopts 4 convolution layers with convolution kernels of 3 multiplied by 3 to extract the multi-scale depth features of the triple paths, wherein the first convolution layer and the second convolution layer are convolutions with step length of 1 and are used for extracting the shallow layer of the imageThe third convolution layer and the fourth convolution layer are convolutions with the step length of 2 and are used for extracting the multi-scale depth features of the image; the number of input channels of the four convolution layers of the infrared path and the visible light path is respectively 1, 16 and 32, the number of output channels is respectively 16, 32 and 64, the number of input channels of the four convolution layers of the connecting path is respectively 2, 16, 64 and 128, the number of output channels is respectively 16, 32 and 64, and the activation function is PReLU; starting from the second convolution layer, the characteristics of the infrared path and the visible light path are respectively connected with the characteristics of the connection path in a channel manner, and are marked as phi m And phi n Then inputting the interactive attention module to generate an interactive attention fusion graph, which is marked as phi F (ii) a Obtaining a final interactive attention map after three-level interactive attention action;
the fusion layer directly connects the final interactive attention map with the compensation attention map of the fourth convolution layer of the infrared path and the visible light path through a channel to obtain a fusion attention feature map;
the compensation attention decoding network respectively adopts convolution layers with 4 convolution kernels of 3 multiplied by 3 to reconstruct characteristics, wherein the first convolution layer and the second convolution layer are accompanied by an up-sampling operation; the number of input channels of the four convolutional layers is 384, 192, 96 and 32 respectively, the number of output channels is 128, 64, 32 and 1 respectively, and the activation function is PReLU; the fused attention feature map is subjected to up-sampling operation and first-layer convolution, and the obtained output is in channel connection with the infrared path compensation attention map and the visible path compensation attention map of the corresponding scales, so that an initial fused image is finally obtained;
s02, constructing a double discriminator model comprising an infrared discriminator and a visible light discriminator; in the training process, inputting the initial fusion image obtained by the interactive compensation generator into a corresponding discriminator together with the infrared image and the visible light image so as to restrict the fusion image to have similar data distribution with the infrared image and the visible light image respectively; when the confrontation game of the interactive compensation generator, the infrared discriminator and the visible light discriminator reaches balance, a final fusion result is obtained;
the infrared discriminator and the visible light discriminator have the same network structure and are composed of 4 convolutional layers and 1 full-connection layer, all convolutional layers adopt 3 x3 kernel size and LeakyRelu activation function, the step length is 2, the input channels of the corresponding convolutional layers are respectively 1, 16, 32 and 64, and the number of output channels is respectively 16, 32, 64 and 128;
s03, training a network model: taking the infrared image and the visible light image as training data sets, and adopting a loss function representing the pixel intensity of the infrared image and the edge gradient of the visible light image to supervise network model training to obtain optimal network model parameters;
the loss function comprises an interaction compensation generator loss function and a discriminator loss function; in the interactive compensation generator, the loss function is formed by a competing loss function L adv And a content loss function L con Composition, represented as L G =L adv +L con (ii) a The content loss function of the interaction compensation generator may be expressed asWherein H and W respectively represent the height and width of the image, | · | | purple F And | · | non-conducting phosphor 1 Representing the Frobenius norm, the L1 norm,representing gradient operators, I f Representing the initial fused image, I ir Representing an infrared image, I vis Representing a visible light image; in the infrared discriminator and the visible light discriminator, the resistance loss function is expressed asN represents the number of training images; meanwhile, the respective loss functions of the infrared discriminator and the visible light discriminator are respectively expressed asAndwherein λ is a regularization parameter, | | · | computation circuitry 2 Represents the L2 norm; first item representationThe wasserstein distance between the fusion result and the infrared or visible light image, and the second term is a gradient penalty for limiting the learning ability of the infrared discriminator and the visible light discriminator.
Optionally, the training data set adopts 25 groups of infrared and visible light images of the TNO data set, and uses a sliding window with a step size of 12 to segment the original image into a size of 128 × 128, and convert the gray value range into [ -1,1], and finally obtain 18813 groups of images as the training set;
in the training process, an Adam optimizer is used for updating network model parameters, and the Batchsize and Epoch are respectively set to be 4 and 16; the learning rates of the interactive compensation generator and the discriminator are set to 1 × 10, respectively -4 And 4X 10 -4 The corresponding iteration times are set to 1 and 2, respectively; the regularization parameter λ is set to 10.
By means of the scheme, the invention has the following characteristics:
1. in the interactive compensation generator, a multi-scale encoder-decoder network with a triple path is constructed. The infrared path and the visible light path provide additional intensity and gradient information for the connection path under the action of the interaction attention module and the compensation attention module of the multi-scale encoder-decoder network, so that more prominent infrared targets and rich texture details can be reserved in the fused image.
2. According to the invention, an interactive attention module and a compensation attention module are developed to transfer path characteristics, global characteristics are modeled from channels and space dimensions, the characteristics extraction and characteristics reconstruction capabilities are enhanced, and the obtained attention characteristic diagram focuses on infrared image target perception and visible light image texture detail representation.
3. The invention designs the double discriminators comprising the infrared discriminator and the visible light discriminator when training the interactive compensation generator, optimizes the interactive compensation generator through the double discriminators, and can more uniformly restrict the similarity of data distribution between the fusion result and the source image by using the infrared discriminator and the visible light image discriminator, so that the interactive compensation generator generates a more balanced fusion result and obtains more similar pixel distribution and more detailed texture detail information from the source image.
4. The invention provides an end-to-end (namely, the pre-training network model is the same as the testing network model, and no additional fusion rule needs to be added in the testing network model) infrared image and visible light image generation confrontation fusion method, the fusion effect is obviously improved, the method can also be applied to the fusion of multi-mode images, multi-focus images and medical images, and has high application value in the field of image fusion.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a schematic diagram of a process of fusing an infrared image to be fused and a visible light image to be fused through an interactive attention coding network, a fusion layer and a compensation attention decoding network.
FIG. 3 is a data processing process diagram of the interaction intent module.
FIG. 4 is a data processing diagram of the compensate attention module.
FIG. 5 is a schematic diagram of a training process of the interactive compensation generator.
FIG. 6 is a schematic diagram of a comparison of the first set of Solider _ with _ jeep fusion results.
FIG. 7 is a comparison of the second set of Street fusion results.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention, but are not intended to limit the scope of the invention.
As shown in FIG. 1, the present invention provides a method for interactively compensating attention of anti-fusion of infrared and visible light images, which comprises:
and S1, determining the triple paths of the infrared path and the visible light path corresponding to the infrared image to be fused and the visible light image to be fused respectively, and the connection path obtained by channel connection of the infrared image to be fused and the visible light image to be fused as the input of a pre-trained interactive compensation generator, wherein the interactive compensation generator establishes a multi-scale coding-decoding network framework of the triple paths, and the multi-scale coding-decoding network framework comprises an interactive attention coding network, a fusion layer and a compensation attention decoding network.
S2, extracting the multi-scale depth features of the triple paths through convolution layers with 4 convolution kernels of 3 x3 adopted by the interactive attention coding network, wherein the first convolution layer and the second convolution layer of the interactive attention coding network are convolutions with the step length of 1 and used for extracting the shallow layer features of the image, the third convolution layer and the fourth convolution layer are convolutions with the step length of 2 and used for extracting the multi-scale depth features of the image, and the shallow layer features and the multi-scale depth features are subjected to three-level interactive attention action to obtain the final interactive attention map.
The number of input channels of the four convolutional layers of the infrared path and the visible light path of the interactive attention coding network is respectively 1, 16 and 32, the number of output channels is respectively 16, 32 and 64, the number of input channels of the four convolutional layers of the connection path is respectively 2, 16, 64 and 128, the number of output channels is respectively 16, 32 and 64, and the activation function is PReLU; starting from the second convolutional layer, the characteristics of the infrared path and the visible light path are respectively connected with the characteristics of the connection path (corresponding to C in fig. 2 to 5), and are recorded as Φ m And phi n Then inputting the interactive attention module (Inter _ Att in FIG. 2) of the interactive attention coding network, generating an interactive attention fusion graph, which is marked as phi F 。
And S3, directly connecting the final interactive attention map with the compensation attention map obtained by the fourth convolution layer of the infrared path and the visible light path through the fusion layer to obtain a fused attention feature map.
S4, reconstructing the convolutional layer with a 3 × 3 convolutional layer reconstruction characteristic by using 4 convolutional kernels adopted by the compensatory attention decoding network, wherein the first convolutional layer and the second convolutional layer of the compensatory attention decoding network are accompanied by an Upsampling operation (Upsampling in fig. 2); and performing channel connection on the fused attention feature map and the infrared path compensation attention map and the visible path compensation attention map of the corresponding scale through up-sampling operation and convolution operation of the first convolution layer to obtain a fused image.
The number of input channels of the four convolutional layers of the complementary attention decoding network is 384, 192, 96 and 32, the number of output channels is 128, 64, 32 and 1, and the activation function is PReLU. In the compensation attention decoding network, different scale features obtained by an infrared path and a visible light path in the interactive attention coding network through a compensation attention module (Comp _ Att in fig. 2) are in channel connection with corresponding scale features of a connection path, and reconstruction of a feature map is completed along with an up-sampling operation to obtain an initial fusion image. The infrared path and the visible light path provide additional strength and gradient information for the connection path, improving feature decoding capability.
Fig. 2 is a schematic diagram of a process of fusing an infrared image to be fused and a visible light image to be fused through an interactive attention coding network, a fusion layer, and a compensation attention decoding network. Conv In fig. 2 represents convolution operation, k3 represents convolution kernel of 3 × 3, s1 represents convolution with step size of 1, In16 represents number of output channels of 16, and the rest of parameters In fig. 2 are the same.
Optionally, as shown in FIG. 3, the interactive attention module, for an input feature Φ m And phi n ∈R H×W×C Firstly, respectively mapping depth features to channel vectors by using global average pooling operation and maximum pooling operation in a channel attention model, performing channel connection on output feature vectors after passing through two convolutional layers and a PReLU active layer, and inputting the output feature vectors to the convolutional layers and the Sigmod active layer to obtain an initial channel weighting coefficientAndare respectively represented asAndwherein Conv represents convolution operation, Con represents channel connection operation, AP (-) and MP (-) represent global average pooling operation and maximum pooling operation, respectively, σ and δ represent PReLU and Sigmod activation functions, H and W represent height and width of image, respectively, and C represents input channel number;
then, Softmax operation is adopted to obtain the corresponding final channel weighting coefficient, namelyAndare respectively represented asAndmultiplying the final channel weighting coefficient with the respective input characteristics to obtain the corresponding channel interactive attention diagramAndrespectively expressed as:and
then, taking the corresponding channel interactive attention diagram as the input of a space attention model, performing global average pooling operation and maximum pooling operation, performing channel connection on the output space characteristic diagram, inputting the convolution layer and the Sigmod activation layer to obtain respective initial space weighting coefficientsAndare respectively represented asAnd
then, a final spatial weighting coefficient is obtained by utilizing Softmax operationAndare respectively represented asAndmultiplying the final space weighting coefficient with the corresponding channel attention diagram to obtain the corresponding space interaction attention diagramAndare respectively represented asAnd
finally, the space interactive attention maps of the two are connected through a channel to obtain an interactive attention fusion map phi F Is shown as
Optionally, as shown in FIG. 4, the compensation attention module is configured to compensate for an input infrared image feature or a visible light image feature Φ m ∈R H×W×C Firstly, a global average pooling operation and a maximum pooling operation are used for converting feature mapping into a channel vector in a channel attention model, after the channel vector passes through two convolutional layers and a PReLU active layer, output feature vectors are subjected to channel connection and input into the convolutional layers and the Sigmod active layer, and a channel weighting coefficient is obtainedIs shown asH and W respectively represent the height and width of the image, and C represents the number of input channels;
then, multiplying the channel weighting coefficient with the input characteristic to obtain the corresponding channel attention diagramIs shown as
Then, taking the channel attention diagram as the input of a space attention model, performing global average pooling operation and maximum pooling operation, performing channel connection on the output space characteristic diagram, inputting the convolution layer and the Sigmod activation layer to obtain a space weighting coefficientIs shown as
Finally, the spatial weighting coefficient is multiplied with the input channel attention diagram to obtain the corresponding spatial attention diagramIs shown as
The interactive attention module and the compensation attention module are used for establishing a global dependency relationship of local features, realizing feature interaction and compensation of triple paths and enhancing feature extraction and feature reconstruction capabilities.
The above process is the relevant content for image fusion of the infrared image to be fused and the visible light image to be fused. In order to perform image fusion on the infrared image to be fused and the visible light image to be fused through the interactive compensation generator, the interactive compensation generator needs to be trained in advance, and the following content is a process for training the interactive compensation generator.
Specifically, the method for training the interactive compensation generator comprises the following steps:
s01, constructing an interactive compensation generator: and establishing a three-path multi-scale coding-decoding network framework comprising an interactive attention coding network, a fusion layer and a compensation attention decoding network by taking an infrared path, a visible light path and a connecting path for connecting an infrared image and a visible light image channel as input, wherein the three-path multi-scale coding-decoding network framework comprises an interactive attention coding network, a fusion layer and a compensation attention decoding network and is used for generating an initial fusion image.
The interactive attention coding network respectively adopts 4 convolution layers with convolution kernels of 3 multiplied by 3 to extract the multi-scale depth features of the triple paths, wherein the first convolution layer and the second convolution layer are convolutions with the step length of 1 and used for extracting the shallow layer features of the image, and the third convolution layer and the fourth convolution layer are convolutions with the step length of 2 and used for extracting the multi-scale depth features of the image; the number of input channels of the four convolution layers of the infrared path and the visible light path is respectively 1, 16 and 32, the number of output channels is respectively 16, 32 and 64, the number of input channels of the four convolution layers of the connecting path is respectively 2, 16, 64 and 128, the number of output channels is respectively 16, 32 and 64, the activation function is that the PReLU starts from the second convolution layer, the characteristics of the infrared path and the visible light path are respectively in channel connection with the characteristics of the connecting path and are recorded as phi m And phi n Then input the interactive attention modelBlock, generate interactive attention fusion graph, denoted Φ F (ii) a Obtaining a final interactive attention map after the three-level interactive attention;
the fusion layer directly connects the final interactive attention map with the compensation attention map of the fourth convolution layer of the infrared path and the visible light path through a channel to obtain a fusion attention feature map;
the compensation attention decoding network respectively adopts convolution layers with 4 convolution kernels of 3 multiplied by 3 to reconstruct characteristics, wherein the first convolution layer and the second convolution layer are accompanied by an up-sampling operation; the number of input channels of the four convolutional layers is 384, 192, 96 and 32 respectively, the number of output channels is 128, 64, 32 and 1 respectively, and the activation function is PReLU; the fused attention feature map is subjected to up-sampling operation and first-layer convolution, and the obtained output is in channel connection with the infrared path compensation attention map and the visible path compensation attention map of the corresponding scales, so that an initial fused image is finally obtained.
S02, constructing a double discriminator model comprising an infrared discriminator and a visible light discriminator; in the training process, inputting the initial fusion image obtained by the interactive compensation generator into a corresponding discriminator together with the infrared image and the visible light image so as to restrict the fusion image to have similar data distribution with the infrared image and the visible light image respectively; and when the competing games of the interactive compensation generator, the infrared discriminator and the visible light discriminator reach balance, obtaining a final fusion result.
The infrared discriminator causes the fused image to hold as much infrared pixel intensity information as possible, while the visible light discriminator causes the fused image to contain as much visible light detail information as possible. And the final fusion result obtained when the countermeasure game is balanced enables the fusion image to have the infrared pixel intensity and the visible light texture detail information of the source image at the same time.
The infrared discriminator and the visible light discriminator have the same network structure and are respectively composed of 4 convolution layers and 1 full-connection layer, all the convolution layers adopt a 3 x3 kernel size and a LeakyRelu activation function, the step length is 2, the input channels of the corresponding convolution layers are respectively 1, 16, 32 and 64, and the output channels are respectively 16, 32, 64 and 128;
s03, training a network model: the infrared image and the visible light image are used as training data sets, loss functions representing the pixel intensity of the infrared image and the edge gradient of the visible light image are adopted to supervise network model training, and optimal network model parameters, namely parameters of an optimal interaction compensation generator, are obtained.
The loss function comprises an interaction compensation generator loss function and a discriminator loss function; in the interactive compensation generator, the loss function is formed by a competing loss function L adv And a content loss function L con Composition, represented as L G =L adv +L con (ii) a Consider that the infrared image represents the target features in pixel intensities, while the visible image represents the scene texture by edges and gradients. Therefore, the Frobenius norm is adopted to carry out similarity constraint on the pixel intensity of the infrared image and the fused image, the L1 norm is adopted to carry out similarity constraint on the gradient change of the visible light image and the fused image, and therefore, the content loss function of the interactive compensation generator can be expressed asWherein H and W respectively represent the height and width of the image, | · | | purple F And | · | non-counting 1 Representing the Frobenius norm, the L1 norm,representing gradient operators, I f Representing the initial fused image, I ir Representing an infrared image, I vis Representing a visible light image. In the dual discriminator, the infrared discriminator and the visible light discriminator aim to balance the authenticity of the fused image and the source image, forcing the generated fused image to be simultaneously inclined to the real data distribution of the infrared image and the visible light image. In the infrared discriminator and the visible light discriminator, the resistance loss function is expressed asN represents the number of training images; meanwhile, the respective loss functions of the infrared discriminator and the visible light discriminator are expressed asAndwherein λ is a regularization parameter, | | · |. non-woven phosphor 2 Represents the L2 norm; the first term represents the wasserstein distance between the fusion result and the infrared or visible light image, and the second term is a gradient penalty for limiting the learning ability of the infrared discriminator and the visible light discriminator.
Wherein the training dataset takes 25 sets of infrared and visible images of the TNO dataset, the original image is segmented into dimensions 128 x 128 using a sliding window with a step size of 12, the grey value range is converted to [ -1,1]Finally obtaining 18813 groups of images as a training set; in the training process, an Adam optimizer is used for updating network model parameters, and the Batchsize and Epoch are respectively set to be 4 and 16; the learning rates of the interactive compensation generator and the discriminators (infrared discriminator and visible light discriminator) were set to 1 × 10, respectively -4 And 4X 10 -4 The corresponding iteration times are set to 1 and 2, respectively; in the loss function, the regularization parameter λ is set to 10. The experimental training platform is Intel I9-10850KCPU, 64GB memory and NVIDIA GeForce GTX3090 GPU. The compilation environment is Python and PyTorch platforms.
Further, in order to verify the image fusion effect of the interactive compensation generator obtained by training through the method, the embodiment of the invention also verifies the trained interactive compensation generator.
Specifically, in the testing phase, 22 sets of images from the TNO data set were selected for test validation. Comparison methods 9 exemplary methods were selected, including MDLatLRR, DenseeFuse, IFCNN, Res2Fusion, SEDRFUse, RFN-Nest, PMGI, FusionGAN, and GANMCC. In addition, 8 indices such as Average Gradient (AG), entropy of information (EN), Standard Deviation (SD), Mutual Information (MI), Spatial Frequency (SF), entropy of Nonlinear Correlation Information (NCIE), Qabf, and Visual Information Fidelity (VIF) are used as the objective evaluation index. The verification result includes the following two aspects.
(1) And (4) subjective evaluation. Fig. 6 and 7 show a subjective comparison result of two sets of images Solider _ with _ jeep and Street. By contrast, the fusion method of the present invention can be found to have three advantages. First, the fusion result may retain high-luminance target information in the infrared image. For typical infrared targets, such as the car of fig. 6 and the pedestrian of fig. 7, the fusion results of the present invention have higher brightness target features than other methods. Second, the fusion result may preserve the texture details of the visible light image. For example, the house edge of fig. 6 and the billboard of fig. 7, the fusion result of the present invention is more obvious and more precise than other methods for these representative details. Finally, the fusion result obtains higher contrast and better visual effect. Compared with the source image and other fusion results, the method can better reserve the outstanding target characteristics and rich scene detail information and obtain more balanced fusion results.
(2) And (4) objective evaluation. Table 1 gives the results of an objective comparison of 22 sets of images of the TNO dataset. The optimal and suboptimal mean are marked with bold and underline, respectively. It can be seen that the method of the invention achieves an optimum average of the indices AG, EN, MI, SF, NCIE and VIF, a suboptimal average of the indices SD and Qabf. Objective experiments show that the method has better fusion performance than other methods. The maximum value EN indicates that the useful information abundant in the source image can be maintained. This is because the method of the present invention uses a triple path, the infrared path and the visible light path providing additional intensity and gradient information for the connecting path. The maximum values MI and NCIE show that the fusion result has strong correlation and similarity with the source image. This is because the method of the present invention uses dual discriminators to supervise and optimize the interactive compensation generator, which can produce a more balanced fusion result. The maximum values AG, SF and VIF indicate that better image contrast and visual effect can be obtained. According to the method, the interaction attention module and the compensation attention module are adopted, the long dependency relationship of local features is established, and the acquired attention feature map focuses on infrared target perception and visible texture detail representation.
TABLE 1
All the optional technical schemes can be combined at will, and the structure after one-to-one combination is not explained in detail in the invention.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and it should be noted that, for those skilled in the art, many modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.
Claims (7)
1. An infrared and visible image anti-fusion method for interactively compensating attention, which is characterized by comprising the following steps:
s1, determining triple paths of an infrared path and a visible light path corresponding to the infrared image to be fused and the visible light image to be fused respectively, and a connection path obtained by channel connection of the infrared image to be fused and the visible light image to be fused as the input of a pre-trained interactive compensation generator, wherein the interactive compensation generator establishes a multi-scale coding-decoding network framework of the triple paths, and the multi-scale coding-decoding network framework comprises an interactive attention coding network, a fusion layer and a compensation attention decoding network;
s2, extracting the multi-scale depth features of the triple path through convolution layers with 4 convolution kernels of 3 x3 adopted by the interactive attention coding network, wherein the first convolution layer and the second convolution layer of the interactive attention coding network are respectively convolution with the step length of 1 and are used for extracting the shallow layer features of the image, the third convolution layer and the fourth convolution layer are respectively convolution with the step length of 2 and are used for extracting the multi-scale depth features of the image, and the shallow layer features and the multi-scale depth features are subjected to three-level interactive attention action to obtain a final interactive attention map;
s3, directly connecting the final interactive attention map with the compensation attention map obtained by the fourth convolution layer of the infrared path and the visible light path through the fusion layer to obtain a fused attention feature map;
s4, reconstructing the convolutional layer with the 4 convolutional kernels adopted by the compensatory attention decoding network being 3 x3, wherein the first convolutional layer and the second convolutional layer of the compensatory attention decoding network are accompanied by an upsampling operation; and performing channel connection on the fused attention feature map and the infrared path compensation attention map and the visible path compensation attention map of the corresponding scale through up-sampling operation and convolution operation of the first convolution layer to obtain a fused image.
2. The method of claim 1, wherein the number of input channels of the four convolutional layers of the infrared path and the visible light path of the interactive attention coding network are 1, 16, 32, respectively, the number of output channels are 16, 32, 64, respectively, the number of input channels of the four convolutional layers of the connection path are 2, 16, 64, 128, respectively, the number of output channels are 16, 32, 64, respectively, and the activation function is PReLU; starting from the second convolutional layer, the characteristics of the infrared path and the visible light path are respectively connected with the characteristics of the connecting path in a channel way, and the channel is recorded as phi m And phi n Then inputting the interactive attention module of the interactive attention coding network to generate an interactive attention fusion graph, which is marked as phi F 。
3. The method as claimed in claim 1, wherein the number of input channels of the four convolutional layers of the attention decoding network is 384, 192, 96, 32, the number of output channels is 128, 64, 32, 1, and the activation function is PReLU.
4. The attention-compensating interactive infrared and visible image anti-fusion method of claim 2,
the interactive attention module, for input feature Φ m Andfirst using a global averaging pool in the channel attention modelMapping depth features to channel vectors through the maximum pooling operation and the maximum pooling operation, performing channel connection on output feature vectors after passing through two convolutional layers and a PReLU active layer, and inputting the output feature vectors to the convolutional layers and the Sigmod active layer to obtain initial channel weighting coefficientsAndare respectively represented as
wherein Conv represents convolution operation, Con represents channel connection operation, AP (-) and MP (-) represent global average pooling operation and maximum pooling operation, respectively, σ and δ represent PReLU and Sigmod activation functions, H and W represent height and width of image, respectively, and C represents input channel number;
then, Softmax operation is adopted to obtain the corresponding final channel weighting coefficient, namelyAndare respectively represented asAndmultiplying the final channel weighting coefficient with the respective input characteristics to obtain the corresponding channel interactive attention diagramAndrespectively expressed as:and
then, taking the corresponding channel interactive attention diagram as the input of a space attention model, performing global average pooling operation and maximum pooling operation, performing channel connection on the output space characteristic diagram, inputting the convolution layer and the Sigmod activation layer to obtain respective initial space weighting coefficientsAndare respectively represented asAndthen, a final spatial weighting coefficient is obtained by utilizing Softmax operationAndare respectively represented asAndmultiplying the final space weighting coefficient with the corresponding channel attention diagram to obtain the corresponding space interaction attention diagramAndare respectively represented asAnd
5. The attention-compensating interactive infrared and visible image anti-fusion method of claim 3,
the attention compensation module is used for inputting infrared image characteristics or visible light image characteristicsFirstly, a global average pooling operation and a maximum pooling operation are used for converting feature mapping to a channel vector in a channel attention model, after passing through two convolutional layers and a PReLU active layer, output feature vectors are subjected to channel connection and input to the convolutional layers and the Sigmod active layer, and a channel weighting coefficient is obtainedIs shown asH and W respectively represent the height and width of the image, and C represents the number of input channels;
then, multiplying the channel weighting coefficient with the input characteristic to obtain the corresponding channel attention diagramIs shown as
Then, taking the channel attention diagram as the input of a space attention model, performing global average pooling operation and maximum pooling operation, performing channel connection on the output space characteristic diagram, inputting the convolution layer and the Sigmod activation layer to obtain a space weighting coefficientIs shown as
6. The method for interactively attention-compensating anti-fusion of infrared and visible images as claimed in claim 1, further comprising, before said step S1:
s01, constructing an interactive compensation generator: establishing a three-path multi-scale coding-decoding network framework by taking an infrared path, a visible light path and a connecting path for connecting an infrared image and a visible light image channel as input, wherein the three-path multi-scale coding-decoding network framework comprises an interactive attention coding network, a fusion layer and a compensation attention decoding network and is used for generating an initial fusion image;
the interactive attention coding network respectively adopts 4 convolution layers with convolution kernels of 3 multiplied by 3 to extract the multi-scale depth features of the triple paths, wherein the first convolution layer and the second convolution layer are convolutions with the step length of 1 and used for extracting the shallow layer features of the image, and the third convolution layer and the fourth convolution layer are convolutions with the step length of 2 and used for extracting the multi-scale depth features of the image; the number of input channels of the four convolution layers of the infrared path and the visible light path is respectively 1, 16 and 32, the number of output channels is respectively 16, 32 and 64, the number of input channels of the four convolution layers of the connecting path is respectively 2, 16, 64 and 128, the number of output channels is respectively 16, 32 and 64, and the activation function is PReLU; starting from the second convolutional layer, the characteristics of the infrared path and the visible light path are respectively connected with the characteristics of the connecting path in a channel way, and the channel is recorded as phi m And phi n Then inputting the interactive attention module to generate an interactive attention fusion graph, which is marked as phi F (ii) a Obtaining a final interactive attention map after three-level interactive attention action;
the fusion layer directly connects the final interactive attention map with the compensation attention map of the fourth convolution layer of the infrared path and the visible light path through a channel to obtain a fusion attention feature map;
the compensation attention decoding network respectively adopts convolution layers with 4 convolution kernels of 3 multiplied by 3 to reconstruct characteristics, wherein the first convolution layer and the second convolution layer are accompanied by an up-sampling operation; the number of input channels of the four convolutional layers is 384, 192, 96 and 32 respectively, the number of output channels is 128, 64, 32 and 1 respectively, and the activation function is PReLU; the fused attention feature map is subjected to up-sampling operation and first-layer convolution, and the obtained output is in channel connection with the infrared path compensation attention map and the visible path compensation attention map of the corresponding scales, so that an initial fused image is finally obtained;
s02, constructing a double discriminator comprising an infrared discriminator and a visible light discriminator: in the training process, inputting the initial fusion image obtained by the interactive compensation generator and the infrared image and the visible light image into corresponding discriminators so as to restrict the fusion image to have similar data distribution with the infrared image and the visible light image respectively; when the confrontation game of the interactive compensation generator, the infrared discriminator and the visible light discriminator reaches balance, a final fusion result is obtained;
the infrared discriminator and the visible light discriminator have the same network structure and are composed of 4 convolutional layers and 1 full-connection layer, all convolutional layers adopt 3 x3 kernel size and LeakyRelu activation function, the step length is 2, the input channels of the corresponding convolutional layers are respectively 1, 16, 32 and 64, and the number of output channels is respectively 16, 32, 64 and 128;
s03, training a network model: taking the infrared image and the visible light image as training data sets, and adopting a loss function representing the pixel intensity of the infrared image and the edge gradient of the visible light image to supervise network model training to obtain optimal network model parameters;
the loss function comprises an interaction compensation generator loss function and a discriminator loss function; in the interactive compensation generator, the loss function is formed by a competing loss function L adv And a content loss function L con Composition, represented as L G =L adv +L con (ii) a The content loss function of the interaction compensation generator may be expressed asWherein H and W respectively represent the height and width of the image, | · | | purple F And | · | non-conducting phosphor 1 Representing the Frobenius norm, the L1 norm,representing gradient operators, I f Representing the initial fused image, I ir Representing an infrared image, I vis Representing a visible light image; in the infrared discriminator and the visible light discriminator, the resistance loss function is expressed asN represents the number of training images; meanwhile, the respective loss functions of the infrared discriminator and the visible light discriminator are respectively expressed asAndwherein λ is a regularization parameter, | | · |. non-woven phosphor 2 Represents the L2 norm; the first term represents the wasserstein distance between the fusion result and the infrared or visible light image, and the second term is a gradient penalty for limiting the learning ability of the infrared discriminator and the visible light discriminator.
7. The interactive attention-compensating infrared and visible image pair fusion method according to claim 6, wherein the training dataset uses 25 sets of infrared and visible images of the TNO dataset, divides the original image into the size of 128 x 128 using a sliding window with the step size of 12, converts the gray value range into [ -1,1], and finally obtains 18813 sets of images as the training set;
in the training process, an Adam optimizer is used for updating network model parameters, and the Batchsize and Epoch are respectively set to be 4 and 16; the learning rates of the interactive compensation generator and the dual discriminator are set to 1 × 10 4 And 4X 10 4 The corresponding iteration times are set to 1 and 2, respectively;
in the loss function, the regularization parameter λ is set to 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210376347.0A CN115035003A (en) | 2022-04-11 | 2022-04-11 | Infrared and visible light image anti-fusion method for interactively compensating attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210376347.0A CN115035003A (en) | 2022-04-11 | 2022-04-11 | Infrared and visible light image anti-fusion method for interactively compensating attention |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115035003A true CN115035003A (en) | 2022-09-09 |
Family
ID=83119944
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210376347.0A Withdrawn CN115035003A (en) | 2022-04-11 | 2022-04-11 | Infrared and visible light image anti-fusion method for interactively compensating attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115035003A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115311186A (en) * | 2022-10-09 | 2022-11-08 | 济南和普威视光电技术有限公司 | Cross-scale attention confrontation fusion method for infrared and visible light images and terminal |
CN115423734A (en) * | 2022-11-02 | 2022-12-02 | 国网浙江省电力有限公司金华供电公司 | Infrared and visible light image fusion method based on multi-scale attention mechanism |
CN115546489A (en) * | 2022-11-23 | 2022-12-30 | 南京理工大学 | Multi-modal image semantic segmentation method based on cross-modal feature enhancement and interaction |
CN116363036A (en) * | 2023-05-12 | 2023-06-30 | 齐鲁工业大学(山东省科学院) | Infrared and visible light image fusion method based on visual enhancement |
CN116664462A (en) * | 2023-05-19 | 2023-08-29 | 兰州交通大学 | Infrared and visible light image fusion method based on MS-DSC and I_CBAM |
CN118411575A (en) * | 2024-07-03 | 2024-07-30 | 华东交通大学 | Chest pathology image classification method and system based on observation mode and feature fusion |
CN118446912A (en) * | 2024-07-11 | 2024-08-06 | 江西财经大学 | Multi-mode image fusion method and system based on multi-scale attention sparse cascade |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200394409A1 (en) * | 2019-01-03 | 2020-12-17 | Lucomm Technologies, Inc. | System for physical-virtual environment fusion |
CN113706406A (en) * | 2021-08-11 | 2021-11-26 | 武汉大学 | Infrared and visible light image fusion method based on feature space multi-classification countermeasure mechanism |
CN114187214A (en) * | 2021-11-12 | 2022-03-15 | 国网辽宁省电力有限公司电力科学研究院 | Infrared and visible light image fusion system and method |
-
2022
- 2022-04-11 CN CN202210376347.0A patent/CN115035003A/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200394409A1 (en) * | 2019-01-03 | 2020-12-17 | Lucomm Technologies, Inc. | System for physical-virtual environment fusion |
CN113706406A (en) * | 2021-08-11 | 2021-11-26 | 武汉大学 | Infrared and visible light image fusion method based on feature space multi-classification countermeasure mechanism |
CN114187214A (en) * | 2021-11-12 | 2022-03-15 | 国网辽宁省电力有限公司电力科学研究院 | Infrared and visible light image fusion system and method |
Non-Patent Citations (2)
Title |
---|
ZHISHE WANG等: "Infrared and Visible Image Fusion via Interactive Compensatory Attention Adversarial Learning", ARXIV, 29 March 2022 (2022-03-29), pages 1 - 13 * |
冉鑫;任蕾;: "基于可见光视频图像处理的水上弱小目标检测方法", 上海海事大学学报, no. 02, 15 June 2010 (2010-06-15) * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115311186A (en) * | 2022-10-09 | 2022-11-08 | 济南和普威视光电技术有限公司 | Cross-scale attention confrontation fusion method for infrared and visible light images and terminal |
CN115311186B (en) * | 2022-10-09 | 2023-02-03 | 济南和普威视光电技术有限公司 | Cross-scale attention confrontation fusion method and terminal for infrared and visible light images |
CN115423734A (en) * | 2022-11-02 | 2022-12-02 | 国网浙江省电力有限公司金华供电公司 | Infrared and visible light image fusion method based on multi-scale attention mechanism |
CN115546489A (en) * | 2022-11-23 | 2022-12-30 | 南京理工大学 | Multi-modal image semantic segmentation method based on cross-modal feature enhancement and interaction |
CN116363036A (en) * | 2023-05-12 | 2023-06-30 | 齐鲁工业大学(山东省科学院) | Infrared and visible light image fusion method based on visual enhancement |
CN116363036B (en) * | 2023-05-12 | 2023-10-10 | 齐鲁工业大学(山东省科学院) | Infrared and visible light image fusion method based on visual enhancement |
CN116664462A (en) * | 2023-05-19 | 2023-08-29 | 兰州交通大学 | Infrared and visible light image fusion method based on MS-DSC and I_CBAM |
CN116664462B (en) * | 2023-05-19 | 2024-01-19 | 兰州交通大学 | Infrared and visible light image fusion method based on MS-DSC and I_CBAM |
CN118411575A (en) * | 2024-07-03 | 2024-07-30 | 华东交通大学 | Chest pathology image classification method and system based on observation mode and feature fusion |
CN118411575B (en) * | 2024-07-03 | 2024-08-23 | 华东交通大学 | Chest pathology image classification method and system based on observation mode and feature fusion |
CN118446912A (en) * | 2024-07-11 | 2024-08-06 | 江西财经大学 | Multi-mode image fusion method and system based on multi-scale attention sparse cascade |
CN118446912B (en) * | 2024-07-11 | 2024-09-27 | 江西财经大学 | Multi-mode image fusion method and system based on multi-scale attention sparse cascade |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115035003A (en) | Infrared and visible light image anti-fusion method for interactively compensating attention | |
Ren et al. | Single image dehazing via multi-scale convolutional neural networks with holistic edges | |
Li et al. | Underwater scene prior inspired deep underwater image and video enhancement | |
CN111709902B (en) | Infrared and visible light image fusion method based on self-attention mechanism | |
US20200265597A1 (en) | Method for estimating high-quality depth maps based on depth prediction and enhancement subnetworks | |
CN111145131A (en) | Infrared and visible light image fusion method based on multi-scale generation type countermeasure network | |
CN115311186B (en) | Cross-scale attention confrontation fusion method and terminal for infrared and visible light images | |
CN113592018B (en) | Infrared light and visible light image fusion method based on residual dense network and gradient loss | |
CN109255358B (en) | 3D image quality evaluation method based on visual saliency and depth map | |
CN114049335B (en) | Remote sensing image change detection method based on space-time attention | |
CN113283444B (en) | Heterogeneous image migration method based on generation countermeasure network | |
CN109255774A (en) | A kind of image interfusion method, device and its equipment | |
CN113762277B (en) | Multiband infrared image fusion method based on Cascade-GAN | |
CN112991371B (en) | Automatic image coloring method and system based on coloring overflow constraint | |
CN114782298B (en) | Infrared and visible light image fusion method with regional attention | |
Singh et al. | Weighted least squares based detail enhanced exposure fusion | |
CN113781375B (en) | Vehicle-mounted vision enhancement method based on multi-exposure fusion | |
CN113920171B (en) | Bimodal target tracking method based on feature level and decision level fusion | |
CN117292117A (en) | Small target detection method based on attention mechanism | |
Kumar et al. | Underwater image enhancement using deep learning | |
CN117495718A (en) | Multi-scale self-adaptive remote sensing image defogging method | |
CN110689510B (en) | Sparse representation-based image fusion method introducing dictionary information | |
CN107578406A (en) | Based on grid with Wei pool statistical property without with reference to stereo image quality evaluation method | |
CN115457265B (en) | Image defogging method and system based on generation of countermeasure network and multi-scale fusion | |
CN116980549A (en) | Video frame processing method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20220909 |
|
WW01 | Invention patent application withdrawn after publication |