CN115601282A - Infrared and visible light image fusion method based on multi-discriminator generation countermeasure network - Google Patents

Infrared and visible light image fusion method based on multi-discriminator generation countermeasure network Download PDF

Info

Publication number
CN115601282A
CN115601282A CN202211405079.7A CN202211405079A CN115601282A CN 115601282 A CN115601282 A CN 115601282A CN 202211405079 A CN202211405079 A CN 202211405079A CN 115601282 A CN115601282 A CN 115601282A
Authority
CN
China
Prior art keywords
image
dif
infrared
visible light
vis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211405079.7A
Other languages
Chinese (zh)
Inventor
康家银
武凌霄
张文娟
姬云翔
马寒雁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Ocean University
Original Assignee
Jiangsu Ocean University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Ocean University filed Critical Jiangsu Ocean University
Priority to CN202211405079.7A priority Critical patent/CN115601282A/en
Publication of CN115601282A publication Critical patent/CN115601282A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an infrared and visible light image fusion method based on a multi-discriminator generation countermeasure network, which comprises the following main processes: calculating and preprocessing a differential image; source image (infrared image and) Iir Visible light image Ivis ) And difference image (infrared difference image) Idif‑ir And a visible light differential image Idif‑vis ) Training network models (generator model and discriminator model) as input; a fused image is generated using the trained generator model. Thus in the practical application of infrared and visible light image fusion,the method can not only fully retain the heat radiation information in the infrared image, but also effectively reproduce the texture details in the visible light image.

Description

Infrared and visible light image fusion method based on multi-discriminator generation countermeasure network
Technical Field
The invention relates to the field of image processing, in particular to an infrared and visible light image fusion method based on a multi-discriminator generation countermeasure network.
Background
The image fusion task aims to fuse different information in images obtained by various sensors into one image so as to meet various application requirements. The infrared image contains rich thermal radiation information, and can be distinguished according to different thermal feedback between the target and the background, but the resolution is low generally, and texture details are lacked; the visible light image has higher resolution and rich details, is highly fit for human visual perception, and is easily influenced by external factors; the fused image obtained by the image fusion technology has the advantages of two images.
In the existing multi-modal image fusion research, the fusion of infrared and visible light images is one of the key branches. Researchers have proposed different fusion methods and strategies for their properties. According to the machine learning technique used in the fusion strategy, the fusion method can be roughly divided into a fusion method based on a conventional algorithm and a fusion method based on deep learning. The common fusion method based on the traditional algorithm comprises the following steps: a multi-scale transform based fusion method, a sparse representation based fusion method, a subspace based fusion method, and the like. The fusion method based on the traditional algorithm usually adopts the same transformation for different images, and key features cannot be extracted from the images in a targeted manner according to the characteristics of the images; in addition, the fusion strategy of the method cannot reserve various detailed information in a targeted manner, so that the fusion result is influenced. In contrast, the fusion method based on deep learning can effectively solve the above problems. Common fusion methods based on deep learning include: a fusion method based on self-encoding (AE), a fusion method based on Convolutional Neural Network (CNN), a fusion method based on generation of a countermeasure network (GAN), and the like. In the existing fusion method, an image fusion method based on AE needs to train an encoder and a decoder in a network on a public data set to obtain the best performance, and the fusion method depends on a characteristic fusion strategy which is manually preset, so that the fusion effect of infrared and visible light images is limited to a certain extent; in the infrared and visible light image fusion method based on the CNN, a true value used for training a deep learning model needs to be obtained in advance, actually, the infrared and visible light image fusion task does not have the true value, and the fused image is evaluated by subjective evaluation of human vision and assisted by objective evaluation indexes, so that a fusion strategy based on the CNN is limited, and a fusion result is influenced; the GAN-based fusion strategy performs unsupervised training by establishing a fighting game between the creator intended to generate an image with infrared intensity and additional visible gradients and the discriminator; the discriminator is intended to distinguish the generated image from the source image so that the final fused image has both the clear thermal radiation intensity of the infrared image and the texture details of the visible image. The fusion strategy based on the GAN not only makes up the defects of the method, but also is more suitable for the fusion task of the infrared and visible light images.
In the existing infrared and visible light image fusion method based on GAN, part scholars use a single generator-single discriminator structure to discriminate the fused image from the visible light image through the discriminator so as to guide the generator to retain the texture details of the visible light image as much as possible; in order to solve the problem of information imbalance of a fusion result caused by a single discriminator, some scholars propose a single generator-double discriminator structure, and source images of two modes are identified by using double discriminators; in addition, some scholars introduce a differential discriminator on the basis of the existing research, and propose a single generator-three discriminator structure, and the differential image is used as the additional input of the network, so that the fusion performance is improved. With the proposed GAN model for image fusion, increasing the number of discriminators can constrain the generator from multiple angles, improving the fusion performance to some extent. In addition, the differential image can focus the unique information of the source image, and the auxiliary image fusion network model reserves more source image information. However, most of the existing methods perform a "thresholding" operation with a threshold value of 0 or take absolute values on the difference image to avoid the occurrence of negative gray values. In practice, the thresholding operation is adopted to lose part of the information of the source image; the absolute value operation, while retaining all information, does not highlight the unique information of the multimodal images.
Aiming at the problems of the existing GAN-based method in fusing infrared and visible light images, the invention provides a novel infrared and visible light image fusion method based on a multi-discriminator generation countermeasure network. The proposed network model adopts a structure of single generator-four discriminators, and two differential discriminators are added on the basis of a leading edge algorithm to establish antagonistic training with the generator, so that the optimization trend of the generator is further constrained. Firstly, the generator adopts a double-encoder-single-decoder structure, different encoders extract the features of images in different modes, and then a decoder reconstructs a fused image according to the fused features. Secondly, different from the absolute value operation of the difference image in other methods, the method carries out normalization processing on the difference image so as to highlight the respective unique information of the source images in the two modes. Finally, in order to avoid the problem that the convergence of the generator is difficult due to excessive constraint of the discriminator, a strategy that the source image loss is used as a main strategy and the differential image loss is used as an auxiliary strategy is adopted in the design of the loss function. Experimental results on a public data set show that the algorithm provided by the invention not only can fully retain heat radiation information in an infrared image, but also can effectively reproduce texture details in a visible light image.
Disclosure of Invention
The present invention aims to solve the above-mentioned problems of the background art by providing an infrared and visible light image fusion method based on a multi-discriminator-generated countermeasure network.
In order to achieve the purpose, the invention provides the following technical scheme: the method for fusing the infrared and visible light images based on the multi-discriminator generation countermeasure network is characterized in that: the method comprises the following steps 1 to 3 to complete the fusion of the infrared and visible light images:
step 1: calculating and preprocessing the difference image to obtain an infrared image I ir And visible light image I vis Respectively calculating difference values and normalizing to obtain a difference image I dif-ir And I dif-vis (ii) a And 2, step: will be provided withThe method comprises the following steps of taking a source image and a differential image as input to train a network model, wherein the training process comprises the following steps of 2-1 to 2-4:
step 2-1: will infrared image I ir And a differential image I dif-ir Joined, visible light image I vis And a differential image I dif-vis Concatenating, as input to a different encoder in generator G in step 2-2;
step 2-2: the generator G carries out feature extraction and fusion on the data in the step 2-1, and then a fusion image I is reconstructed according to the fused features F
Step 2-3: respectively combining the fused images with the infrared image I ir Visible light image I vis And a differential image I dif-ir ,I dif-vis Input to the discriminator (D) ir ,D vis ,D dif-ir And D dif-vis ) In the middle, establishing confrontation training with a generator G;
step 2-4: step 2-1 to step 2-3 are circulated to carry out iterative training, when the antagonistic training tends to be balanced, namely the discriminator cannot distinguish whether the input sample is from the image generated by the generator or the real image, the training is terminated, and the generator G required by fusion is obtained;
and 3, step 3: the fused image is generated using the trained generator model G. Specifically, the infrared image and the visible light image are respectively connected with two differential images and are input into the generator trained in the step 2 together, so as to obtain a final fusion result.
As a preferred technical scheme of the invention: the step 1 of calculating the difference image comprises the following specific steps:
step 1-1: using infrared images I ir Subtracting the visible image I vis To obtain an infrared differential image I capable of emphasizing the intensity of thermal radiation dif-ir
Step 1-2: using visible light images I vis Subtracting the infrared image I ir Obtaining a visible light differential image I capable of highlighting texture details dif-vis
As a preferred technical scheme of the invention: the normalization processing in the preprocessing of the differential image in the step 1 controls the gray value of the pixel to be between 0 and 1, and specifically, the following formula is:
Figure BDA0003936693880000041
where v (i, j) is the gray scale value of the pixel at (i, j) in the difference image, v min And v max The minimum and maximum gray levels in the difference image are respectively.
As a preferred technical scheme of the invention: the generator network G in the step 2-2 adopts a double-encoder-single-decoder structure, and the specific structure is as follows: firstly, connecting a source image and a differential image highlighting the unique information of the modal image as input of a branched encoder; secondly, the two branch encoders are respectively responsible for extracting the characteristics of the infrared image and the visible light image; and finally, connecting the high-dimensional features of the images in different modes, inputting the high-dimensional features into a decoder, and reconstructing to obtain a fused image.
As a preferred technical scheme of the invention: the four discriminators in the step 2-3 adopt the same network structure and sequentially comprise five convolutional neural networks, wherein convolutional kernels with the size of 3 multiplied by 3 are used in the first four layers, and the step length is set to be 2; adding a batch normalization layer into the second layer to the fourth layer; in the last layer, the features extracted from the convolutional layer are first integrated using the fully-connected layer, and then a scalar is calculated using the Tanh activation function.
As a preferred technical scheme of the invention: in the step 2-4, during iterative training, the loss function is adopted to evaluate the model prediction difference, and the loss function is composed of a generator loss function and a discriminator loss function, wherein the generator loss function L G Mainly by opposing the loss L adv And content loss L content The two parts are used for feeding back the loss of the generator network training; four discriminators use similar loss functions L D And the judgment of the input result by the discriminator is fed back to the generator, and the generator establishes confrontation training with the input result, wherein the specific calculation formula is as follows:
L G =L adv +λL content
Figure BDA0003936693880000051
wherein λ is a weight parameter, wherein,
Figure BDA0003936693880000052
and
Figure BDA0003936693880000053
respectively correspond to four discriminators D ir ,D vis ,D dif-ir And D dif-vis
In particular, the antagonistic loss L adv The optimization direction is mainly used for restricting the generator, and the formula is defined as:
L adv =E[log(1-D ir (I F ))]+E[log(1-D vis (I F ))]+E[log(1-D dif-ir (I F ))]+E[log(1-D dif-vis (I F ))]
wherein E [. Cndot. ] is expectation, D (·) is the input image classification probability of the discriminator;
in particular, the content loss L content By comparing the difference between the fused image and the input image, the generator is guided to generate a fusion result which simultaneously retains the thermal radiation information of the infrared image and the texture information of the visible image, and the formula is defined as follows:
L content =αL int +βL grad +γL SSIM
wherein alpha, beta and gamma are weight parameters, L int For loss of strength, L grad For gradient loss, L SSIM Loss of structural similarity; l is int ,L grad And L and SSIM are respectively defined as follows:
Figure BDA0003936693880000061
Figure BDA0003936693880000062
L SSIM =ω·L SSIM-img +(1-ω)·L SSIM-dif
where ω is the weight parameter, H, W is the size of the input image, L int-img For loss of source image intensity, L int-dif Is a loss of differential image intensity, L grad-img For loss of source image gradient, L grad-dif Is a loss of differential image gradient, L SSIM-img For loss of structural similarity of source images, L SSIM-dif For loss of structural similarity of the difference image, L SSIM (. Is) the similarity between the two images; l is a radical of an alcohol int-img ,L int-dif ,L grad-img ,L grad-dif ,L SSIM-img And L SSIM-dif The definitions of (A) are as follows:
L int-img =a·||I F -I ir || F +(1-a)·||I F -I vis || F
L int-dif =a·||I F -I dif-ir || F +(1-a)·||I F -I dif-vis || F
Figure BDA0003936693880000063
Figure BDA0003936693880000064
L SSIM-img =(1-L SSIM (I F ,I ir ))+(1-L SSIM (I F ,I vis ))
L SSIM-dif =(1-L SSIM (I F ,I dif-ir ))+(1-L SSIM (I F ,I dif-vis ))
specifically, the specific formula of the loss function of each arbiter is defined as:
Figure BDA0003936693880000065
Figure BDA0003936693880000066
Figure BDA0003936693880000067
Figure BDA0003936693880000068
wherein, a discriminator D ir And D vis Is a source image (I) ir And I vis ) Or fusing images (I) F ) D, discriminator D dif-ir And D dif-vis The input image of (a) is a difference image (I) dif-ir And I dif-vis ) Or fusing images (I) F )。
As a preferred technical scheme of the invention: in the double-encoder-single-decoder structure, the double-encoder structure comprises two branches which are respectively used for extracting two characteristics of infrared heat radiation intensity and visible light texture, each branch consists of four layers of convolution layers, a DenseNet structure is adopted for intensive connection, and specifically, a first layer of network consists of a convolution kernel with the size of 3 multiplied by 3, a switchable normalization layer and an activation function Leaky ReLU; convolution Block Attention Modules (CBAM) are added in the last three layers, the number of channels in all convolution layers is set to be 64, and the step length of convolution kernels is set to be 1; in addition, CBAM was introduced for improving feature extraction capability, which mainly consists of steps a to F:
step A: firstly inputting the feature maps into a channel attention module, and performing global maximum pooling and global average pooling on the basis of the width and the height of the input feature maps to obtain two feature maps;
and B: inputting the two feature maps into a multilayer perceptron with shared parameters to generate respective channel attention feature maps, and then obtaining a final channel attention feature map through element-by-element summation and Sigmoid activation function operation;
step C: multiplying the original input features by the channel attention feature map according to elements, and inputting the result into a space attention module;
step D: performing maximum pooling and global average pooling on the channel dimensions of the feature maps input into the spatial attention module to obtain two feature maps;
and E, step E: connecting the channels based on channel dimensions, performing convolution through a convolution layer, and generating a space attention diagram through Sigmoid activation function operation;
step F: and multiplying the input vector of the spatial attention module and the spatial attention module element by element to obtain the final output characteristic.
As a preferred technical scheme of the invention: in the double-encoder single-decoder structure, the single-decoder network structure is composed of two convolutional layers, wherein the convolutional layer of the first layer is composed of a convolutional kernel with the size of 3 x 3, a switchable normalization layer and an activation function Leaky ReLU, and the convolutional layer of the second layer is composed of a 3 x 3 convolutional kernel and a Tanh activation function.
Compared with the prior art, the infrared and visible light image fusion method based on the multi-discriminator generated countermeasure network has the following technical effects by adopting the technical scheme:
the invention has the beneficial effects that: the framework for generating the antagonistic network fusion comprises a generator and four discriminators, and the differential image is used as auxiliary information, so that the fusion performance of the network is further improved. In the method provided by the invention, the differential image is not only used as additional information of the source image and used for guiding the generator to pay attention to the unique information of the images in different modes, but also used as real data distribution and assisting the differential discriminator and the generator to carry out antagonistic training. In the proposed network model, the generator adopts a double-encoder-single-decoder structure, wherein the encoder aims at extracting different modal characteristics, mainly adopts a dense connection structure and combines an attention module; the decoder aims at reconstructing the fused image from the joined high-dimensional features. The discriminator judges whether the input image is from a real image or an image generated by the generator, and performs constraint optimization on the generator according to the result of the judgment.
Drawings
FIG. 1 is a flow chart of a method for generating an infrared and visible light image fusion method of a countermeasure network based on multiple discriminators according to the present invention;
FIG. 2 is an overall fusion framework of the infrared and visible light image fusion method for generating a countermeasure network based on multiple discriminators according to the present invention;
FIG. 3 is a diagram of a difference image and its preprocessing effect according to the method of the present invention;
FIG. 4 is a diagram of a generator network structure case of the method of the present invention;
FIG. 5 is a diagram of a Convolutional Block Attention Module (CBAM) architecture case of the method of the present invention;
FIG. 6 is a diagram of an example of a network structure of an arbiter in the method of the present invention;
Detailed Description
The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the present invention more comprehensible and clear for those skilled in the art, and thus define the scope of the present invention more clearly.
The embodiment is as follows: referring to fig. 1, the present invention provides a technical solution: the method for fusing the infrared and visible light images based on the multi-discriminator generation countermeasure network completes the fusion of the infrared and visible light images according to the steps 1 to 3: step 1, calculating and preprocessing a differential image; step 2, taking the source image and the difference image as input to train the network model; and 3, generating a fused image by using the trained generator model.
Experimental groups: referring to fig. 2, the method for fusing infrared and visible light images based on a multi-discriminator-generated countermeasure network includes the following steps:
as shown in FIG. 3, step 1 is performed using the infrared image I ir Subtracting the visible image I vis The obtained infrared difference image can emphasize the intensity of the thermal radiation, as shown in fig. 3 (c); using visible light images I vis Subtracting the infrared image I ir The obtained visible light differential image can highlight the texture detail,as shown in FIG. 3 (d); the difference image is normalized to control the pixel gray value between 0 and 1, which is specifically as follows:
Figure BDA0003936693880000091
where v (i, j) is the gray scale value of the pixel at (i, j) in the difference image, v min And v max The minimum and maximum gray levels in the difference image are respectively. FIGS. 3 (e) and (f) are normalized infrared difference images I dif-ir Image I differentiated from visible light dif-vis
And (3) executing the training task of the step 2 according to the steps 2-1 to 2-4:
step 2-1: will infrared image I ir And a differential image I dif-ir Bonding, visible light image I vis And a differential image I dif-vis Concatenating as input to the different encoders in the generator G in step 2-2;
step 2-2: the generator G carries out feature extraction and fusion on the data in the step 2-1, and then a fusion image I is reconstructed according to the fused features F (ii) a As a specific embodiment of the present invention, a specific network structure of a generator network is shown in fig. 4, a dual encoder structure includes two branches for extracting two features of infrared thermal radiation intensity and visible light texture, respectively, each branch is composed of four convolution layers, a DenseNet structure is adopted for dense connection, and the multilayer features are fully utilized. Specifically, the first layer network is used for extracting shallow features of the image and is composed of a convolution kernel with the size of 3 × 3, a switchable normalization layer and an activation function leak ReLU; the last three layers are used for extracting the depth features of the image, and a convolution block attention module is added to the network structure of the image on the basis of the first layer network. The number of channels in all convolutional layers is set to 64 and the convolutional kernel step size is set to 1.
In the deep feature layer of the encoder, CBAM is introduced to improve the feature extraction capability. The CBAM includes two sub-modules, a channel attention module and a spatial attention module, wherein the channel attention module is used for describing the relationship of each channel, the spatial attention module is used for describing the spatial relationship of the depth feature, and the structure is shown in fig. 5. Specifically, 1) inputting the feature map into a channel attention module, and performing global maximum pooling and global average pooling based on the width and height of the input feature map to obtain two feature maps; inputting the two feature maps into a multilayer perceptron shared by parameters to generate respective channel attention feature maps, and then obtaining a final channel attention feature map through element-by-element summation and Sigmoid activation function operation; 2) Multiplying the original input features by the channel attention feature map according to elements, and inputting the result into a space attention module; performing maximum pooling and global average pooling on the channel dimensions of the feature maps input into the spatial attention module to obtain two feature maps; then, the channels are connected based on the channel dimension, convolution is carried out through a convolution layer, and a space attention diagram is generated through Sigmoid activation function operation; and finally, multiplying the input vector of the spatial attention module and the spatial attention diagram according to elements to obtain the final output characteristic.
After the two branch encoders extract the features of the source images with different modalities, the features are connected based on the channel dimension and input into the decoder. The decoder reconstructs a fused image according to the connected high-dimensional features, and the network structure of the fused image is formed by two convolution layers. The convolution layer of the first layer is composed of convolution kernels with the size of 3 x 3, the switchable normalization layer and an activation function Leaky ReLU, and the convolution layer of the second layer is composed of a convolution kernel with the size of 3 x 3 and a Tanh activation function.
Step 2-3: respectively combining the fused images with the infrared image I ir Visible light image I vis And a differential image I dif-ir ,I dif-vis Input to the discriminator (D) ir ,D vis ,D dif-ir And D dif-vis ) In the middle, establishing confrontation training with a generator G;
as a specific embodiment of the present invention, the four discriminators adopt the same network structure, and mainly consist of five convolutional neural networks, and the specific network structure is shown in fig. 6. Specifically, a convolution kernel of size 3 × 3 is used in the first four layers and the step size is set to 2; adding a batch normalization layer in the second layer to the fourth layer, integrating the extracted features of the convolution layer by using a full connection layer (FC) in the last layer, and then calculating a scalar by using a Tanh activation function so as to reflect the probability that the discriminator judges that the input image is from the source image or the difference image instead of the fusion image.
Step 2-4: and (4) performing iterative training by looping the step 2-1 to the step 2-3, and terminating the training when the antagonistic training tends to be balanced, namely the discriminator cannot distinguish whether the input sample is from the image generated by the generator or the real image.
Wherein the loss function is composed of a generator loss function and a discriminator loss function, wherein the generator loss function L G Mainly by opposing the loss L adv And content loss L content The two parts are used for feeding back the training loss of the generator network; the four discriminators use similar loss functions L D And the judgment of the input result by the discriminator is fed back to the generator, and the generator establishes confrontation training with the input result, wherein the specific calculation formula is as follows:
L G =L adv +λL content
Figure BDA0003936693880000111
wherein λ is a weight parameter, wherein,
Figure BDA0003936693880000112
and
Figure BDA0003936693880000113
respectively correspond to four discriminators D ir ,D vis ,D dif-ir And D dif-vis
Against loss L adv The optimization direction is mainly used for constraining the generator, and the formula is defined as follows:
L adv =E[log(1-D ir (I F ))]+E[log(1-D vis (I F ))]+E[log(1-D dif-ir (I F ))]+E[log(1-D dif-vis (I F ))]
where E [. Cndot. ] is the expectation and D (·) is the probability that the classifier classified the input image.
Content loss L content By comparing the difference between the fused image and the input image, the generator is guided to generate a fused result which simultaneously retains infrared image heat radiation information and visible light image texture information, and the formula is defined as:
L content =αL int +βL grad +γL SSIM
wherein alpha, beta and gamma are weight parameters, L int For loss of strength, L grad For gradient loss, L SSIM Loss of structural similarity; l is a radical of an alcohol int ,L grad And L SSIM Are respectively defined as follows:
Figure BDA0003936693880000114
Figure BDA0003936693880000115
L SSIM =ω·L SSIM-img +(1-ω)·L SSIM-dif
where ω is the weight parameter, H, W is the size of the input image, L int-img For loss of source image intensity, L int-dif For loss of differential image intensity, L grad-img For loss of source image gradient, L grad-dif For differential image gradient loss, L SSIM-img Loss of structural similarity for source images, L SSIM-dif For loss of structural similarity of the difference image, L SSIM (. Is) the similarity between the two images; l is a radical of an alcohol int-img ,L int-dif ,L grad-img ,L grad-dif ,L SSIM-img And L SSIM-dif The definitions of (A) are as follows:
L int-img =a·||I F -I ir || F +(1-a)·||I F -I vis || F
L int-dif =a·||I F -I dif-ir || F +(1-a)·||I F -I dif-vis || F
Figure BDA0003936693880000121
Figure BDA0003936693880000122
L SSIM-img =(1-L SSIM (I F ,I ir ))+(1-L SSIM (I F ,I vis ))
L SSIM-dif =(1-L SSIM (I F ,I dif-ir ))+(1-L SSIM (I F ,I dif-vis ))
the penalty function for each discriminator is defined as:
Figure BDA0003936693880000123
Figure BDA0003936693880000124
Figure BDA0003936693880000125
Figure BDA0003936693880000126
wherein, a discriminator D ir And D vis Is a source image (I) ir And I vis ) Or fusing images (I) F ) D discriminator D dif-ir And D dif-vis Is a differential image (I) dif-ir And I dif-vis ) Or fusing images (I) F )。
And 3, step 3: a fused image is generated using the trained generator model. Specifically, the infrared image and the visible light image are respectively connected with two differential images, and are input into the generator trained in the step 2 together to obtain a final fusion result.
The experimental conclusion is that: the invention provides an infrared and visible light image fusion method for generating a countermeasure network based on multiple discriminators, which is an end-to-end network model consisting of one generator and four discriminators. The disclosed infrared and visible light image data sets are used for carrying out experiments, and the experimental results show that compared with the existing method, the algorithm disclosed by the invention has the advantages that the obtained fusion result is richer in texture information and better in subjective visual effect; in addition, the objective evaluation result shows that the average values of the algorithm are respectively 6.02%, 25.93%, 7.61% and 16.77% better than the average values of the comparison method in terms of information entropy, average gradient, correlation coefficient, difference correlation and indexes, so that the method provided by the invention can better fuse the texture information of the visible light image while effectively retaining the thermal radiation information of the infrared image, thereby improving the performance of the existing infrared and visible light image fusion algorithm.
The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims (8)

1. The method for fusing the infrared and visible light images based on the multi-discriminator generation countermeasure network is characterized in that: the method comprises the following steps 1 to 3 to complete the fusion of the infrared and visible light images:
step 1: calculating and preprocessing the difference image to obtain an infrared image I ir And visible light image I vis Respectively calculating difference values and normalizing to obtain a difference image I dif-ir And I dif-vis
Step 2: taking a source image and a difference image as input to train a network model, wherein the training process comprises the following steps of 2-1 to 2-4:
step 2-1: will infrared image I ir And a differential image I dif-ir Joined, visible light image I vis And a differential image I dif-vis Concatenating, as input to a different encoder in generator G in step 2-2;
step 2-2: the generator G carries out feature extraction and fusion on the data in the step 2-1, and then a fusion image I is reconstructed according to the fused features F
Step 2-3: fusing the images I F Respectively with infrared image I ir Visible light image I vis And a differential image I dif-ir 、I dif-vis Input to the discriminator (D) ir ,D vis ,D dif-ir And D dif-vis ) In the middle, establishing confrontation training with a generator G;
step 2-4: step 2-1 to step 2-3 are circulated to carry out iterative training, and when the antagonistic training tends to balance, namely the discriminator cannot distinguish whether the input sample is from the image generated by the generator or the real image, the training is terminated, so that the generator G required by fusion is obtained;
and step 3: and (3) generating a fusion image by using the trained generator G, specifically, respectively connecting the infrared image and the visible light image with the two differential images, and inputting the images into the generator trained in the step 2 together to obtain a final fusion result.
2. The method for fusing infrared and visible light images based on multi-discriminator generated countermeasure network according to claim 1, wherein: the differential image calculation in step 1 specifically comprises the following steps:
step 1-1: using infrared images I ir Subtracting the visible image I vis Obtaining an infrared differential image I capable of emphasizing the intensity of thermal radiation dif-ir
Step 1-2: using visible light images I vis Subtracting the infrared image I ir Obtaining a visible light differential image I capable of highlighting texture details dif-vis
3. The infrared and visible light image fusion method based on multi-discriminator generated countermeasure network according to claim 1, characterized in that: the normalization processing in the preprocessing of the differential image in the step 1 controls the gray value of the pixel between 0 and 1, and specifically, the following formula is:
Figure FDA0003936693870000021
where v (i, j) is the gray scale value of the pixel at (i, j) in the difference image, v min And v max The minimum and maximum gray levels in the difference image are respectively.
4. The method for fusing infrared and visible light images based on multi-discriminator generated countermeasure network according to claim 1, wherein: the generator network G in step 2-2 adopts a double-encoder-single-decoder structure, and the specific structure is as follows: firstly, connecting a source image and a differential image highlighting the unique information of the modal image as input of a branched encoder; secondly, the two branch encoders are respectively responsible for extracting the characteristics of the infrared image and the visible light image; and finally, connecting the high-dimensional features of the images in different modes, inputting the high-dimensional features into a decoder, and reconstructing to obtain a fused image.
5. The infrared and visible light image fusion method based on multi-discriminator generated countermeasure network according to claim 1, characterized in that: the four discriminators in the step 2-3 adopt the same network structure and sequentially comprise five layers of convolutional neural networks, wherein a convolutional kernel with the size of 3 multiplied by 3 is used in the first four layers, and the step length is set to be 2; adding a batch normalization layer into the second layer to the fourth layer; in the last layer, the features extracted from the convolutional layer are first integrated using the fully-connected layer, and then a scalar is calculated using the Tanh activation function.
6. The infrared and visible light image fusion method based on multi-discriminator generated countermeasure network according to claim 1, characterized in that: in the step 2-4 of iterative training, a loss function is adopted to evaluate model prediction differenceA loss function consisting of a generator loss function and a discriminator loss function, wherein the generator loss function L G Mainly by opposing the loss L adv And content loss L content The two parts are used for feeding back the loss of the generator network training; the four discriminators use similar loss functions L D And the judgment of the input result by the discriminator is fed back to the generator, and the generator establishes confrontation training with the input result, wherein the specific calculation formula is as follows:
L G =L adv +λL content
Figure FDA0003936693870000031
wherein λ is a weight parameter, wherein,
Figure FDA0003936693870000032
and
Figure FDA0003936693870000033
respectively corresponding to four discriminators D ir ,D vis ,D dif-ir And D dif-vis
In particular, the antagonistic loss L adv The optimization direction is mainly used for restricting the generator, and the formula is defined as:
Figure FDA0003936693870000036
wherein E [. Cndot. ] is expectation, D (·) is the input image classification probability of the discriminator;
in particular, the content loss L content By comparing the difference between the fused image and the input image, the generator is guided to generate a fused result which simultaneously retains infrared image heat radiation information and visible light image texture information, and the formula is defined as:
L content =αL int +βL grad +γL SSIM
wherein alpha, alpha,Beta, gamma are weight parameters, L int For loss of strength, L grad For gradient loss, L SSIM Loss of structural similarity; l is int ,L grad And L SSIM Are respectively defined as follows:
Figure FDA0003936693870000034
Figure FDA0003936693870000035
L SSIM =ω·L SSIM-img +(1-ω)·L SSIM-dif
where ω is the weight parameter, H, W is the size of the input image, L int-img For loss of source image intensity, L int-dif For loss of differential image intensity, L grad-img For loss of source image gradient, L grad-dif For differential image gradient loss, L SSIM-img For loss of structural similarity of source images, L SSIM-dif For loss of structural similarity of the difference image, L SSIM (. Is) the similarity between the two images; l is int-img ,L int-dif ,L grad-img ,L grad-dif ,L SSIM-img And L SSIM-dif The definitions of (A) are as follows:
L int-img =a·||I F -I ir || F +(1-a)·||I F -I vis || F
L int-dif =a·||I F -I dif-ir || F +(1-a)·||I F -I dif-vis || F
Figure FDA0003936693870000041
Figure FDA0003936693870000042
L SSIM-img =(1-L SSIM (I F ,I ir ))+(1-L SSIM (I F ,I vis ))
L SSIM-dif =(1-L SSIM (I F ,I dif-ir ))+(1-L SSIM (I F ,I dif-vis ))
specifically, the specific formula of the loss function of each arbiter is defined as:
Figure FDA0003936693870000043
Figure FDA0003936693870000044
Figure FDA0003936693870000045
Figure FDA0003936693870000046
wherein, a discriminator D ir And D vis Is a source image (I) ir And I vis ) Or fusing images (I) F ) D, discriminator D dif-ir And D dif-vis The input image of (a) is a difference image (I) dif-ir And I dif-vis ) Or fusing images (I) F )。
7. The method for fusing infrared and visible light images based on multi-discriminator generated countermeasure network according to claim 4, wherein: in the double-encoder-single-decoder structure, the double-encoder structure comprises two branches which are respectively used for extracting two characteristics of infrared heat radiation intensity and visible light texture, each branch consists of four layers of convolution layers, a DenseNet structure is adopted for intensive connection, and specifically, a first layer of network consists of a convolution kernel with the size of 3 multiplied by 3, a switchable normalization layer and an activation function Leaky ReLU; convolution Block Attention Modules (CBAM) are added in the last three layers, the number of channels in all convolution layers is set to be 64, and the step length of convolution kernels is set to be 1; note that CBAM was introduced to improve the feature extraction capability, which mainly consists of steps a to F:
step A: firstly inputting the feature maps into a channel attention module, and performing global maximum pooling and global average pooling on the basis of the width and the height of the input feature maps to obtain two feature maps;
and B: inputting the two feature maps into a multilayer perceptron shared by parameters to generate respective channel attention feature maps, and then obtaining a final channel attention feature map through element-by-element summation and Sigmoid activation function operation;
step C: multiplying the original input features by the channel attention feature map according to elements, and inputting the result into a space attention module;
step D: performing maximum pooling and global average pooling on the channel dimensions of the feature maps input into the spatial attention module to obtain two feature maps;
step E: connecting the channels based on channel dimensions, performing convolution through a convolution layer, and generating a space attention diagram through Sigmoid activation function operation;
step F: and multiplying the input vector of the spatial attention module and the spatial attention module element by element to obtain the final output characteristic.
8. The infrared and visible light image fusion method based on multi-discriminator generated countermeasure network according to claim 4, characterized in that: in the dual encoder-single decoder structure, the single decoder network structure is composed of two convolutional layers, wherein the convolutional layer of the first layer is composed of a convolutional kernel with the size of 3 x 3, a switchable normalization layer and an activation function Leaky ReLU, and the convolutional layer of the second layer is composed of a 3 x 3 convolutional kernel and a Tanh activation function.
CN202211405079.7A 2022-11-10 2022-11-10 Infrared and visible light image fusion method based on multi-discriminator generation countermeasure network Pending CN115601282A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211405079.7A CN115601282A (en) 2022-11-10 2022-11-10 Infrared and visible light image fusion method based on multi-discriminator generation countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211405079.7A CN115601282A (en) 2022-11-10 2022-11-10 Infrared and visible light image fusion method based on multi-discriminator generation countermeasure network

Publications (1)

Publication Number Publication Date
CN115601282A true CN115601282A (en) 2023-01-13

Family

ID=84853703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211405079.7A Pending CN115601282A (en) 2022-11-10 2022-11-10 Infrared and visible light image fusion method based on multi-discriminator generation countermeasure network

Country Status (1)

Country Link
CN (1) CN115601282A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116109539A (en) * 2023-03-21 2023-05-12 智洋创新科技股份有限公司 Infrared image texture information enhancement method and system based on generation of countermeasure network
CN116664462A (en) * 2023-05-19 2023-08-29 兰州交通大学 Infrared and visible light image fusion method based on MS-DSC and I_CBAM
CN117455774A (en) * 2023-11-17 2024-01-26 武汉大学 Image reconstruction method and system based on differential output
CN117635418A (en) * 2024-01-25 2024-03-01 南京信息工程大学 Training method for generating countermeasure network, bidirectional image style conversion method and device
CN117934978A (en) * 2024-03-22 2024-04-26 安徽大学 Hyperspectral and laser radar multilayer fusion classification method based on countermeasure learning

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116109539A (en) * 2023-03-21 2023-05-12 智洋创新科技股份有限公司 Infrared image texture information enhancement method and system based on generation of countermeasure network
CN116664462A (en) * 2023-05-19 2023-08-29 兰州交通大学 Infrared and visible light image fusion method based on MS-DSC and I_CBAM
CN116664462B (en) * 2023-05-19 2024-01-19 兰州交通大学 Infrared and visible light image fusion method based on MS-DSC and I_CBAM
CN117455774A (en) * 2023-11-17 2024-01-26 武汉大学 Image reconstruction method and system based on differential output
CN117455774B (en) * 2023-11-17 2024-05-14 武汉大学 Image reconstruction method and system based on differential output
CN117635418A (en) * 2024-01-25 2024-03-01 南京信息工程大学 Training method for generating countermeasure network, bidirectional image style conversion method and device
CN117635418B (en) * 2024-01-25 2024-05-14 南京信息工程大学 Training method for generating countermeasure network, bidirectional image style conversion method and device
CN117934978A (en) * 2024-03-22 2024-04-26 安徽大学 Hyperspectral and laser radar multilayer fusion classification method based on countermeasure learning

Similar Documents

Publication Publication Date Title
CN115601282A (en) Infrared and visible light image fusion method based on multi-discriminator generation countermeasure network
CN111325155B (en) Video motion recognition method based on residual difference type 3D CNN and multi-mode feature fusion strategy
CN112784764B (en) Expression recognition method and system based on local and global attention mechanism
CN111046964B (en) Convolutional neural network-based human and vehicle infrared thermal image identification method
CN111275618A (en) Depth map super-resolution reconstruction network construction method based on double-branch perception
CN112967178B (en) Image conversion method, device, equipment and storage medium
CN110443162B (en) Two-stage training method for disguised face recognition
CN112801015A (en) Multi-mode face recognition method based on attention mechanism
CN110032925A (en) A kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm
CN110351548B (en) Stereo image quality evaluation method guided by deep learning and disparity map weighting
CN114299559A (en) Finger vein identification method based on lightweight fusion global and local feature network
CN115393225A (en) Low-illumination image enhancement method based on multilevel feature extraction and fusion
CN116385832A (en) Bimodal biological feature recognition network model training method
CN113807497B (en) Unpaired image translation method for enhancing texture details
CN114743162A (en) Cross-modal pedestrian re-identification method based on generation of countermeasure network
CN109583406B (en) Facial expression recognition method based on feature attention mechanism
CN113706404A (en) Depression angle human face image correction method and system based on self-attention mechanism
CN111382684B (en) Angle robust personalized facial expression recognition method based on antagonistic learning
CN107909565A (en) Stereo-picture Comfort Evaluation method based on convolutional neural networks
CN113222879B (en) Generation countermeasure network for fusion of infrared and visible light images
CN115358961A (en) Multi-focus image fusion method based on deep learning
CN114898429A (en) Thermal infrared-visible light cross-modal face recognition method
CN116977455A (en) Face sketch image generation system and method based on deep two-way learning
CN114911967A (en) Three-dimensional model sketch retrieval method based on adaptive domain enhancement
CN114066844A (en) Pneumonia X-ray image analysis model and method based on attention superposition and feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination