CN114187221A

CN114187221A - Infrared and visible light image fusion method based on adaptive weight learning

Info

Publication number: CN114187221A
Application number: CN202111513619.9A
Authority: CN
Inventors: 董毅; 翟佳; 彭实; 陈�峰; 郭单; 谢晓丹
Original assignee: Beijing Institute of Environmental Features
Current assignee: Beijing Institute of Environmental Features
Priority date: 2021-12-13
Filing date: 2021-12-13
Publication date: 2022-03-15

Abstract

The invention relates to an infrared and visible light image fusion method based on adaptive weight learning, which is characterized in that a depth feature adaptive extraction module is constructed based on pixel level attention mechanism and fusion weight adaptive generation, and a generator network is constructed based on the depth feature adaptive extraction module in a cross-hierarchy cascade mode; based on twin network thought, a dual-channel discriminator network is established; performing primary connection operation on the infrared image and the visible light image, inputting the infrared image and the visible light image into a generator to generate a fused image, and finishing the fusion generation of the infrared image and the visible light image through game countermeasure of the generator and a discriminator; the invention integrates a pixel level attention mechanism and a level cascade thought, can enhance the network depth feature extraction capability under the condition of reducing the network parameter number, improves the fusion image generation quality of the infrared and visible light images, and realizes the fusion enhancement of the infrared image and the visible light image.

Description

Infrared and visible light image fusion method based on adaptive weight learning

Technical Field

The invention relates to the technical field of computer vision, in particular to an infrared and visible light image fusion method based on adaptive weight learning.

Background

The infrared image is obtained by imaging the thermal radiation characteristic of an object and is not influenced by illumination conditions, but the scene detail information capturing capability of the infrared image is not strong and the target edge information is fuzzy; the visible light image is obtained by imaging the reflection characteristic of an object, the detail texture information of a target can be captured, and the scene restoring capability is strong. The imaging information of the infrared image and the visible light image has high complementarity, so that the infrared image and the visible light image are fused, the defect of insufficient information acquisition capability of a single sensor can be overcome, the environmental adaptability is improved, and high-quality scene imaging information is obtained.

At present, the image fusion enhancement method based on infrared and visible light is mainly applied to the aspects of face recognition, target detection, target tracking, defect detection and the like. However, the infrared and visible light images have the problems of complex background environment, complex combination mode between targets and environments, and the like, which brings great challenges to image fusion, and a satisfactory image fusion effect cannot be obtained by means of an artificial intelligence technology; for example, the existing antagonistic neural network fusion-based algorithm has the problems that the utilization efficiency of the depth features is low, and the texture information of the fused image is not sufficiently restored.

Aiming at the defects, the invention provides an infrared and visible light image fusion method based on adaptive weight learning, which can enhance the network depth feature extraction capability under the condition of reducing the network parameter quantity and improve the fusion image generation quality of the infrared and visible light images.

Disclosure of Invention

The invention aims to solve the technical problems that the depth feature utilization efficiency of the existing image fusion algorithm based on the countermeasure type neural network is low, and the reduction of the detail information of the texture of the fused image is insufficient; the infrared and visible light image fusion method based on adaptive weight learning integrates a pixel level attention mechanism and a hierarchical cascade thought, can enhance the network depth feature extraction capability under the condition of reducing the network parameter quantity, and improves the fusion image generation quality of the infrared and visible light images.

In order to solve the technical problem, the invention provides an infrared and visible light image fusion method based on adaptive weight learning, which comprises the following steps: constructing a generator and a discriminator based on a generation countermeasure mechanism; the source data enters a generator network after connection operation, and a fusion image is generated through depth feature extraction and hierarchy feature fusion; the fused image enters a double-channel discriminator for discrimination; and outputting a fused image result through the counterstudy of the generator and the discriminator.

Preferably, the step of constructing the generator and the discriminator based on the generation countermeasure includes the following processes: and constructing a generator, namely constructing a depth feature self-adaptive extraction module by adopting a pixel-level attention mechanism and a fusion weight self-adaptive learning mechanism, and constructing a generator network by adopting a cross-layer cascade mode and a plurality of depth feature self-adaptive extraction modules.

Preferably, the depth feature adaptive extraction module comprises an attention subnetwork, a non-attention subnetwork and a fusion weight adaptive generation subnetwork.

Preferably, the attention subnetwork and the non-attention subnetwork extract and compress the image features, fuse the weight self-adaptive generation subnetwork to self-adaptively generate the weight factors, and perform channel recovery and adaptive feature reconstruction on the features output by the attention subnetwork and the non-attention subnetwork.

Preferably, the step of entering the generator network after the source data is subjected to connection operation and generating the fusion image through depth feature extraction and hierarchical feature fusion specifically comprises the following steps: performing connection operation on source data to obtain an input image; after extracting gradients from the input image, inputting the input image to a generator as a whole; the fused image is obtained through a generator network.

Preferably, the source data includes an infrared image and a visible light image.

Preferably, the step of constructing the generator and the discriminator based on the generation countermeasure includes the following processes: and (4) constructing a discriminator, namely constructing a double-path discriminator by adopting a twin network.

Preferably, the two path discriminators are a visible light discriminator and an infrared discriminator, respectively, the visible light discriminator is used for discriminating the fusion image and the visible light image, and the infrared discriminator is used for discriminating the fusion image and the infrared image.

Preferably, the loss function of the generator is L_GThe function of the penalty is L_advGradient loss function of L_gradThe loss function of structural similarity is L_ssimThen, there is,

L_G＝L_adv+L_grad+L_ssim，

L_ssim-1-SSIM，

in the formula, N represents the number of fused images,

represents the result of the classification of the fused image,

representing spurious data values, β, that the generator wishes the arbiter to trust₁、β₂Is a constant parameter, I_f、I_ir、I_visRespectively representing a fused image, an infrared image, a visible light image, H and W are input image height and width,

respectively representing the gradient value of the fused image, the gradient value of the infrared image and the gradient value of the visible light image, wherein SSIM is a calculated value by adopting an image multi-scale structure similarity method.

Preferably, the loss function of the infrared discriminator is L_{D_ir}The loss function of the visible light discriminator is L_{D_vis}Then, there are:

in the formula, N represents the number of fused images, a, b and c represent the truth value of the infrared image, the truth value of the fused image and the truth value of the visible light image respectively, and D_IR(I_ir) Indicating the result of discrimination of the infrared image by the infrared discriminator, D_IR(I_f) Representing the result of discrimination of the fused image by the infrared discriminator, D_vis(I_vis) Indicating the result of discrimination of the visible light image by the visible light discriminator, D_vis(I_f) And showing the discrimination result of the visible light discriminator on the fused image.

The implementation of the infrared and visible light image fusion method based on the adaptive weight learning has the following beneficial effects: a depth feature self-adaptive extraction module is constructed on the basis of pixel level attention mechanism and fusion weight self-adaptive generation, and a generator network is constructed on the basis of the depth feature self-adaptive extraction module in a cross-hierarchy cascade mode; based on twin network thought, a dual-channel discriminator network is established; performing primary connection operation on the infrared image and the visible light image, inputting the infrared image and the visible light image into a generator to generate a fused image, and finishing the fusion generation of the infrared image and the visible light image through game countermeasure of the generator and a discriminator; the invention integrates a pixel level attention mechanism and a level cascade thought, can enhance the network depth feature extraction capability under the condition of reducing the network parameter number, improves the generation quality of the fusion image of the infrared image and the visible light image, realizes the fusion enhancement of the infrared image and the visible light image, and has great application value in the aspects of face recognition, target detection, target tracking, defect detection and the like.

Drawings

FIG. 1 is a flowchart of an infrared and visible light image fusion method based on adaptive weight learning according to an embodiment of the present invention;

FIG. 2 is a network framework diagram of an infrared and visible light image fusion method based on adaptive weight learning according to an embodiment of the present invention;

FIG. 3 is a structural framework diagram of a generator of an infrared and visible light image fusion method based on adaptive weight learning according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a depth feature adaptive extraction module of an infrared and visible light image fusion method based on adaptive weight learning according to an embodiment of the present invention;

FIG. 5 is a structural framework diagram of a discriminator of an infrared and visible light image fusion method based on adaptive weight learning according to an embodiment of the present invention;

in the figure, 1: source data; 11: a visible light image; 12: infrared image; 2: a generator; 21: a depth feature adaptive extraction module; 211: an attention subnetwork; 212: a gratuitous subnetwork; 213: fusing weight self-adaptive generation sub-networks; 22: a cascade operation module; 23: a DCB module; 3: fusing the images; 41: a visible light discriminator; 42: an infrared discriminator.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

FIG. 1 is a flowchart of an infrared and visible light image fusion method based on adaptive weight learning according to an embodiment of the present invention; as shown in fig. 1, the method for fusing infrared and visible light images based on adaptive weight learning according to the embodiment of the present invention includes the following steps; step S01: constructing a generator and a discriminator based on a generation countermeasure mechanism; step S02: the source data enters a generator network after connection operation, and a fusion image is generated through depth feature extraction and hierarchy feature fusion; step S03: the fused image enters a double-channel discriminator for discrimination; step S04: and outputting a fused image result through the counterstudy of the generator and the discriminator.

FIG. 2 is a network framework diagram of an infrared and visible light image fusion method based on adaptive weight learning according to an embodiment of the present invention; as shown in fig. 2, the network structure of the infrared and visible light image fusion method based on adaptive weight learning is as follows: the generator 2 and the discriminator are constructed based on a generation countermeasure mechanism, wherein in the construction of the generator 2, a depth feature adaptive extraction module 21(DFAFE module) is constructed by adopting a pixel level attention mechanism and a fusion weight adaptive learning mechanism, and a generator network is constructed by adopting a cross-layer cascade mode through a cascade operation module 22 and a plurality of depth feature adaptive extraction modules 21. In the construction of the discriminator, a twin network is adopted to construct the two-way discriminator. The method comprises the steps that source data 1 (including an infrared image 12 and a visible light image 11) are connected firstly and then enter a generator network, a fused image 3 is generated through depth feature extraction and hierarchical feature fusion, then the fused image enters a dual-channel discriminator for discrimination, and finally fusion generation of high-quality infrared and visible light images 11 is achieved through counterstudy of a generator 2 and the discriminator.

In the infrared and visible light image fusion method based on adaptive weight learning provided by the embodiment of the present invention, the depth feature adaptive extraction module 21 includes three sub-networks: attention subnetwork 211, inattention subnetwork 212, and fusion weight adaptive generation subnetwork 213. The attention sub-network 211 and the non-attention sub-network 212 can extract image features and compress channels, the fusion weight adaptive generation sub-network 213 can adaptively generate weight factors, and can perform channel recovery and adaptive feature reconstruction on the features extracted by the attention sub-network 211 and the non-attention sub-network 212, so that the generalization capability of network representation is improved on the premise of only increasing few parameters. The infrared and visible light image fusion method based on adaptive weight learning provided by the embodiment of the invention adopts a hierarchical cascade mode, and a plurality of depth feature adaptive extraction modules 21 are connected in a cross-hierarchy mode to form a multi-hierarchy feature extraction network, so that the extraction and fusion operation of the hierarchy features is realized.

According to the infrared and visible light image fusion method based on adaptive weight learning provided by the embodiment of the invention, under the guidance of a countermeasure type neural network design framework, a depth feature adaptive extraction module 21 based on a pixel level attention mechanism and fusion weight adaptive generation is designed, image features are efficiently extracted, and then a plurality of depth feature adaptive extraction modules 21 are subjected to cross-layer cascade in a hierarchical cascade mode to further improve the depth feature extraction and hierarchical feature fusion capability so as to construct a generator network. Then, a two-way neural network-based construction discriminator set is constructed. And finally, connecting the infrared image 12 and the visible light image 11, inputting the connected images into a network, and finishing the fusion generation of the infrared image and the visible light image through game countermeasure of the generator 2 and the discriminator. Experimental results show that the fused image 3 generated by the method provided by the invention reserves more detailed information of the visible light image 11 while reserving the target information of the infrared image 12, and has better performance in both subjective evaluation and objective evaluation.

Fig. 3 is a structural framework diagram of the generator 2 of the infrared and visible light image fusion method based on adaptive weight learning according to the embodiment of the present invention, and in the infrared and visible light image fusion method based on adaptive weight learning according to the embodiment of the present invention, a generator network is constructed in a multilayer cascade manner with a depth feature adaptive extraction module 21 as a basic unit, so that hierarchical features can be effectively fused, and the overall feature characterization capability of the network is improved. The data flow of the generator network is that the infrared image 12 and the visible light image 11 are firstly connected to obtain an input image, then the connected input image is input into the generator 2 as a whole after gradient extraction, and then the fusion image 3 is obtained through a feature extraction network constructed by a depth feature adaptive extraction module 21 through cross-layer cascade. In this case, the DCB module 23 (dense connection module) in the network includes 2 convolutional layers, each of which is activated by using the ReLU function.

Fig. 4 is a schematic structural diagram of the depth feature adaptive extraction module 21 of the infrared and visible light image fusion method based on adaptive weight learning according to the embodiment of the present invention; in the infrared and visible light image fusion method based on adaptive weight learning provided by the embodiment of the present invention, the depth feature adaptive extraction module 21 is mainly used for efficiently extracting feature information, and is composed of an attention subnetwork 211, a non-attention subnetwork 212, and a fusion weight adaptive generation subnetwork 213. The attention subnetwork 211 is composed of two 3 × 3 convolutions and one 1 × 1 convolution, and a pixel level attention mechanism is added to the first 3 × 3 convolution, and the pixel level attention mechanism can assign corresponding weights to different channels, so that the network's ability to extract important information can be improved. The gratuitous subnetwork 212, in order to preserve the original information to the maximum, a 3 x 3 convolution is used for feature mapping and a 1 x 1 convolution is used for channel reconstruction.

In order to filter redundant information in the feature network and reduce the number of parameters, a fusion weight adaptive generation sub-network 213 is provided in the depth feature adaptive network. The network can dynamically assign weights to attention subnetwork 211 and inattentive subnetwork 212 through a self-learning process, thereby weakening the ineffective attention characteristics and enabling attention subnetwork 211 and inattentive subnetwork 212 to achieve adaptive balancing.

In addition, the feature x is input_n-1After passing through attention subnetwork 211 and non-attention subnetwork 212, respectively, the number of channels included in the features obtained at the respective outputs is reduced to half the number of input channels, x'_n、x″_n. And x'_n、x″_nAfter entering the fusion weight adaptive generation sub-network 213 as the input features, x 'is convolved by 1 × 1'_n、x″_nChannel lifting is carried out so that x'_n、x″_nNumber of channels and X_nKept consistent then x'_n、x″_nAnd gamma₁、γ₂Corresponding multiplication and element addition are carried out, and finally, the input is carried out after 1 × 1 convolution. By means of the feature channel transformation and the self-adaptive reconstruction operation, the generalization capability of network representation is improved on the premise of only increasing few parameters, the depth feature self-adaptive extraction capability is improved, the capture of complex background and target detail information in infrared and visible light images is facilitated, and efficient high-quality fusion generation of the infrared and visible light images is achieved.

FIG. 5 is a structural framework diagram of a discriminator of an infrared and visible light image fusion method based on adaptive weight learning according to an embodiment of the present invention; according to the infrared and visible light image fusion method based on adaptive weight learning, the discriminator is used for forming a countermeasure game with the generator 2. The invention adopts a double-Discriminator structure, namely a visible light Discriminator 41(Discriminator-VIS) and an infrared Discriminator 42(Discriminator-IR), which are respectively used for distinguishing the fused image from the visible light and the infrared image. The discriminator adopts a twin network structure and shares network parameters. The single-branch network adopts a three-layer convolution structure, the step length is set to be 2, and scalar, namely two-classification results are output.

In the infrared and visible light image fusion method based on adaptive weight learning provided by the embodiment of the invention, the loss function of the generator 2 is L_GThe function of the penalty is L_advGradient loss function of L_gradThe loss function of structural similarity is L_ssimThen, there is,

L_G＝L_adv|L_grad|L_ssim，

L_ssim＝1-SSIM，

in the formula, N represents the number of fused images,

represents the result of the classification of the fused image,

respectively representing the gradient value of the fused image, the gradient value of the infrared image and the gradient value of the visible light image, wherein SSIM is a calculated value by adopting an image multi-scale structure similarity method. In order to enable the countergames between the generator and the discriminator to reach a balanced state and realize the mutual promotion of synchronous learning evolution between the generator and the discriminator, the embodiment of the invention provides a loss function L of a generator 2_GComprises three parts, one is a function for resisting loss L_advThe fusion image generation device is used for controlling the countermeasure process between the generator and the discriminator so that the fusion image can obtain more infrared and visible light detail information; second is the gradient loss function L_gradThe learning ability of the generator for the thermal radiation information of the infrared image and the texture detail information of the visible light image is controlled; third, the loss function L of structural similarity_ssimFor controlling the structural similarity of the fused image with the infrared and visible light images.

In the infrared and visible light image fusion method based on adaptive weight learning provided by the embodiment of the invention, the loss function of the infrared discriminator 42 is L_{D_ir}The loss function of the visible light discriminator 41 is L_{D_vis}Then, there are:

in the formula, N represents the number of fused images, a, b and c represent the truth value of the infrared image, the truth value of the fused image and the truth value of the visible light image respectively, and D_IR(I_ir) Indicating the result of discrimination of the infrared image by the infrared discriminator, D_IR(I_f) Representing the result of discrimination of the fused image by the infrared discriminator, D_vis(I_vis) Indicating the result of discrimination of the visible light image by the visible light discriminator, D_vis(I_f) And showing the discrimination result of the visible light discriminator on the fused image. The embodiment of the invention adopts a double discriminator design, and can effectively capture the thermal radiation information of the infrared image and the texture detail information of the visible light image.

The infrared and visible light image fusion method based on adaptive weight learning is adopted to carry out experiment verification on the infrared and visible light image fusion performance of the method, an image fusion public data set TNO is adopted in a verification experiment, about 20 groups of infrared and visible light image pairs are randomly extracted from the image fusion public data set TNO, and the data are expanded to 10200 pairs of images in a data augmentation mode. Wherein the training data set and the verification data set are constructed according to the ratio of 7: 3. In addition, 10 pairs of infrared and visible images were randomly extracted from the TNO dataset to construct a test dataset.

In the experiment, the contrast algorithm also adopts a neural network image fusion algorithm based on a countermeasure mechanism, namely FusionGAN (fusion generation countermeasure network), DDcGAN (condition generation countermeasure network) and fusionDN (fusion generation countermeasure improvement network).

In order to objectively evaluate the fused images obtained by the different fusion methods, 4 indexes of information Entropy (EN), Standard Deviation (SD), Mutual Information (MI) and multi-scale structure similarity (MS-SSIM) are selected for objectively evaluating, and the larger the EN is, the larger the information amount in the fused images is, the more image details are kept. The larger the SD, the higher the quality of the image and the clearer it. MI measures the similarity degree between the images, and the larger MI indicates that the fused image has more information of the source image, and the quality is better. MS-SSIM measures the similarity of an image to a source image.

The statistics of the experimental results are shown in the table, and it can be seen from the table that the infrared and visible light image fusion algorithm provided by the invention is improved in comparison with a comparison algorithm on multiple indexes, thereby illustrating the effectiveness of the algorithm provided by the invention.

In conclusion, by implementing the infrared and visible light image fusion method based on adaptive weight learning, the generated fusion image retains the target information of the infrared image and more detailed information of the visible light image, and obtains better performance in both subjective evaluation and objective evaluation.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An infrared and visible light image fusion method based on adaptive weight learning is characterized by comprising the following steps:

constructing a generator and a discriminator based on a generation countermeasure mechanism;

the source data enters a generator network after connection operation, and a fusion image is generated through depth feature extraction and hierarchy feature fusion;

the fused image enters a double-channel discriminator for discrimination;

and outputting a fused image result through the counterstudy of the generator and the discriminator.

2. The infrared and visible light image fusion method based on adaptive weight learning as claimed in claim 1, wherein the step of constructing the generator and the discriminator based on the generation countermeasure comprises the following processes: and constructing the generator, constructing a depth feature self-adaptive extraction module by adopting a pixel-level attention mechanism and a fusion weight self-adaptive learning mechanism, and constructing the generator network by adopting a plurality of depth feature self-adaptive extraction modules in a cross-layer cascade mode.

3. The infrared and visible light image fusion method based on adaptive weight learning of claim 2, wherein the depth feature adaptive extraction module comprises an attention subnetwork, a non-attention subnetwork and a fusion weight adaptive generation subnetwork.

4. The infrared and visible light image fusion method based on adaptive weight learning as claimed in claim 3, wherein the attention subnetwork and the inattentive subnetwork extract and channel compress image features, the fusion weight adaptive generation subnetwork adaptively generates weight factors, and performs channel restoration and adaptive feature reconstruction on features output by the attention subnetwork and the inattentive subnetwork.

5. The infrared and visible light image fusion method based on adaptive weight learning according to claim 3, wherein the step of connecting the source data, entering a generator network, and generating a fusion image through depth feature extraction and hierarchical feature fusion specifically comprises the following steps:

performing connection operation on the source data to obtain an input image;

inputting the input image into a generator as a whole after extracting gradients;

and obtaining a fused image through the generator network.

6. The infrared and visible light image fusion method based on adaptive weight learning according to any one of claims 1-5, characterized in that the source data comprises an infrared image and a visible light image.

7. The infrared and visible light image fusion method based on adaptive weight learning as claimed in claim 6, wherein the step of constructing the generator and the discriminator based on the generation countermeasure comprises the following processes: and constructing the discriminator, and constructing a double-path discriminator by adopting a twin network.

8. The infrared and visible light image fusion method based on adaptive weight learning of claim 7, wherein the two-way discriminator is a visible light discriminator and an infrared discriminator, the visible light discriminator is used for discriminating the fusion image from the visible light image, and the infrared discriminator is used for discriminating the fusion image from the infrared image.

9. The infrared and visible light image fusion method based on adaptive weight learning of claim 8, wherein the loss function of the generator is L_GThe function of the penalty is L_advGradient loss function of L_gradThe loss function of structural similarity is L_ssimThen, there is,

L_G＝L_adv|L_grad|L_ssim，

L_ssim＝1-SSIM，

in the formula, N represents the number of fused images,

represents the result of the classification of the fused image,

10. The infrared and visible light image fusion method based on adaptive weight learning of claim 8, wherein the loss function of the infrared discriminator is L_{D_ir}The loss function of the visible light discriminator is L_{D_vis}Then, there are: