CN111696066A

CN111696066A - Multi-band image synchronous fusion and enhancement method based on improved WGAN-GP

Info

Publication number: CN111696066A
Application number: CN202010538631.4A
Authority: CN
Inventors: 李大威; 田嵩旺; 蔺素珍; 杨博; 张海松
Original assignee: North University of China
Current assignee: North University of China
Priority date: 2020-06-13
Filing date: 2020-06-13
Publication date: 2020-09-22
Anticipated expiration: 2040-06-13
Also published as: CN111696066B

Abstract

The invention relates to a multiband image synchronous fusion and enhancement method, namely synchronously realizing multiband image fusion and image super-resolution, in particular to a multiband image synchronous fusion and enhancement method based on improved WGAN-GP, which comprises the following steps: the method comprises the steps of designing and constructing a generation countermeasure network, wherein a network structure comprises a designed feature extraction (coding) network, a feature combination network and a decoding network, and a generation model and a fusion enhancement result are obtained through dynamic balance training of a generator and a discriminator. The invention realizes the end-to-end synchronous fusion and enhanced neural network of multiband images, so that the low-resolution source images are fused to form a high-quality fusion result.

Description

Multi-band image synchronous fusion and enhancement method based on improved WGAN-GP

Technical Field

The invention relates to a multiband image synchronous fusion and enhancement method, namely a network model is used for realizing multiband image fusion and image super-resolution at the same time, in particular to a multiband image synchronous fusion and enhancement method based on improved WGAN-GP.

Background

The multiband image fusion aims to combine a more information-rich and more comprehensive image by extracting complementary information in different waveband images of the same scene for subsequent target extraction and decision making. In recent years, image fusion technology is rapidly developed, and a plurality of effective fusion methods are proposed successively, but due to the limitation of an imaging sensor and signal transmission broadband, the resolution of most source images is limited, so that a large part of acquired infrared and visible light images are not high-resolution. For a low-resolution source image, although the existing image fusion method can integrate information of each wave band, the obtained fusion image is still low in resolution, so that the method is not beneficial to subsequent target recognition.

At present, the image quality of a low-Resolution image is often improved through Super Resolution (SR) reconstruction. In the field of image fusion, the image quality is improved mainly by two modes of fusing a low-resolution source image SR and then fusing the low-resolution source image SR or fusing the low-resolution source image SR and then fusing a result SR. Studies have shown that both of these approaches can lose detail information or introduce noise. For this reason, a new method is needed to achieve image fusion and image super-resolution simultaneously.

Disclosure of Invention

The invention provides a novel method for synchronously fusing and enhancing multiband images based on a Wasserstein generated confrontation Network (WGAN-GP) with gradient punishment, aiming at improving the fusion result of a low-resolution source image.

The invention is realized by adopting the following technical scheme: the multiband image synchronous fusion and enhancement method based on the improved WGAN-GP comprises the following steps:

designing and constructing a generation countermeasure network: generating a countermeasure network into a generator model and a discriminator model; the generator comprises a feature extraction sub-network, a feature combination sub-network and a decoding sub-network, wherein the feature extraction sub-network is used for extracting features of source images in different wave bands, the feature combination sub-network is used for combining the features of the source images in different wave bands extracted by the feature extraction sub-network in a high-level feature space, and the decoding sub-network converts the combined feature map into a fused image;

firstly, inputting multiband low-resolution source images into a feature extraction sub-network by using the generation countermeasure network, then combining the extracted multiband image features in a high-level feature space, and then reconstructing a fusion image through a decoding sub-network;

the reconstructed fusion image and the label image corresponding to the multiband low-resolution source image are respectively sent to a discriminator for classification and identification, a generator and the discriminator are optimized in an iterative mode, the fusion image and the label image output by the generator are enabled to be continuously similar through dynamic gaming of the generator and the discriminator, the discriminator cannot distinguish the fusion image and the label image, a generator model obtained when the generator and the discriminator reach dynamic balance is a final multiband image synchronous fusion enhancement network model, and the generator model is applied to fuse the multiband low-resolution source image.

In the above method for synchronously fusing and enhancing multiband images based on improved WGAN-GP, the loss function of the generator comprises three parts: against loss L_advContent loss L_conAnd a perceptual loss L_per(ii) a Wherein the loss is resisted

Where G represents the generator, D represents the discriminator, E represents the expectation, x represents the generator sample input (here, the low resolution source image), I_LRA low resolution source image representing an input generator; content loss

represents a gradient, and λ represents a weight coefficient; loss of perception

Wherein the content of the first and second substances,

a representation feature extractor; in summary, the final composite loss function is:

wherein, theta_GRepresenting a training parameter of the generator, λ₁、λ₂Respectively represent the content loss L_conAnd a perceptual loss L_perThe weight of (c);

the discriminator loss function is:

wherein the content of the first and second substances,

for random sampling on a straight line between pairs of points sampled from the label data distribution y and the generator input data distribution x, i.e.

Wherein α∈ [0,1]，θ_DAs a training parameter for the arbiter, lambda₃Is the weight of the gradient penalty term.

In the multiband image synchronous fusion and enhancement method based on the improved WGAN-GP, lambda is 5 and lambda is₁＝1，λ₂＝0.1，λ₃When the loss function is 10, the loss functions can be balanced, and the network training effect achieves a better effect.

According to the multiband image synchronous fusion and enhancement method based on the improved WGAN-GP, a multiband low-resolution source image input into a generator is obtained through the following processes: the multiband image is blocked through a sliding window, the blocked image is expanded through rotation and mirror image operation, then the length and the width of the blocked image are respectively reduced to be one fourth of the original image through Resize operation and serve as low-resolution images, the low-resolution images are amplified to a target size through a bicubic interpolation method, and the multiband low-resolution source image input into the generator is obtained.

According to the multiband image synchronous fusion and enhancement method based on the improved WGAN-GP, the label image is obtained by adopting the following steps: the multiband image is blocked through a sliding window, the blocking result is used as a high-resolution source image, and is respectively input into a plurality of fusion algorithms for fusion, and then the fusion result utilizes a plurality of objective indexes: and evaluating the information entropy, the standard deviation, the mutual information, the average gradient, the spatial frequency, the contrast, the peak signal-to-noise ratio, the correlation coefficient, the structural similarity, the visual information fidelity and the edge information retention value, and selecting the fusion result of the fusion algorithm with the most optimal indexes as the label image.

In the multiband image synchronous fusion and enhancement method based on the improved WGAN-GP, the feature extraction sub-network consists of 8 continuous convolution layers, and each convolution layer is followed by a correction linear unit; the feature combination sub-network consists of 1 merging connection layer; the decoding subnetwork is composed of 8 continuous convolutional layers, and the first 7 convolutional layers are followed by a modified linear unit; the discriminator consists of 6 convolutional layers, 3 max pooling layers and two fully-connected layers, each of which is followed by a Leaky ReLU activation function.

According to the multiband image synchronous fusion and enhancement method based on the improved WGAN-GP, in order to improve the similarity between an input multiband low-resolution source image and an output fusion image, more information in the source image is stored, and three jump connections are formed between a feature extraction sub-network and a decoding sub-network.

In order to solve the problem of poor image quality of a low-resolution source image fusion result, an end-to-end network model is established, and image fusion and image super-resolution can be synchronously performed, so that the low-resolution source image is fused with a high-quality fusion result. The invention avoids the step-by-step image fusion and image super-resolution, has high fusion result definition and obvious edge, and better conforms to the visual characteristics of human eyes.

Drawings

FIG. 1 is an overall flow chart of the present invention.

FIG. 2 is a generative model architecture.

Fig. 3 shows a discriminating network structure.

Fig. 4 is a high resolution infrared long wave image.

FIG. 5 is a low resolution infrared long wave image.

Fig. 6 is a high resolution near infrared image.

Fig. 7 is a low resolution near infrared image.

Fig. 8 is a high resolution visible light image.

Fig. 9 is a low resolution visible light image.

FIG. 10 is a fused image of the present invention.

Detailed Description

The multiband image synchronous fusion and enhancement method based on the improved WGAN-GP comprises the following steps:

the first step is to design and construct a generation countermeasure network: designing and constructing a generation confrontation network structure, wherein the generation confrontation network is divided into a generator model and a discriminator model.

The generator network structure comprises a feature extraction (coding) sub-network, a feature combination sub-network and a decoding sub-network, wherein the feature extraction (coding) sub-network is used for extracting features of source images in different wave bands, the feature combination sub-network is used for combining the features of the source images in different wave bands extracted by the feature extraction sub-network in a high-level feature space, and the decoding sub-network converts a feature map obtained by combining the feature combination sub-networks into a final fusion image, specifically as follows:

the feature extraction (coding) subnetwork consists of 8 consecutive convolutional layers, each of which is followed by a modified linear unit (ReLU), the number of these convolutional filters is 32, 64, 128, 256 and 256, respectively, each convolutional layer uses a 3 × 3 convolutional kernel, the step size is 1, and the padding is 0; the feature combination sub-network consists of 1 merging connection layer (Concatenate) and aims to combine the extracted different wave band information in a high-level feature space; the decoding subnetwork is also composed of 8 consecutive convolutional layers, unlike the feature extraction (coding) network, the number of these convolutional filters is 256, 128, 64, 32, and 1, respectively, each convolutional layer uses a 3 × 3 convolutional kernel, the step size is 1, and the padding is 0; in order to improve the similarity between the input multiband low-resolution source image and the output fusion image and save more information in the source image, three hop connections are arranged between a feature extraction (coding) sub-network and a decoding sub-network.

The discriminator network comprises 6 convolutional layers, 3 max pooling layers and two fully-connected layers, each convolutional layer followed by a Leaky ReLU activation function. These convolution filters are 64, 128, 256 and 256, respectively, and each convolution layer uses a 3 × 3 convolution kernel with a step size of 1 and a fill of 1. In both fully connected layers, the neuron numbers are 128 and 1, respectively.

The specific process is as follows: inputting a multiband low-resolution source image into a generator, extracting features through a feature extraction (coding) network respectively, then combining the extracted different band features in a high-level feature space through a feature combination network, inputting the combined feature image into a decoding subnetwork, expanding information layer by layer, and converting the feature image into a fusion image. In order to improve the similarity between the input multiband low-resolution source image and the output fusion image and save more information in the source image, three hop connections are arranged between a feature extraction (coding) sub-network and a decoding sub-network. The discriminator is used as a classifier, the task of the classifier is to distinguish the generated data and the real data (namely, distinguish the fused image and the label image output by the generator) as much as possible, and the fused result and the label image are respectively sent to a discrimination network for classification and identification.

The second step establishes an input image dataset: selecting 20 groups of images from the public TNO image fusion data set as a training set, and selecting 8 groups of images as a test set, wherein each group of images comprises three-band images of infrared long wave (8-12 mu m), near infrared (700-700 nm) and visible light (390-700 nm); the training set image is partitioned by a sliding window, the window size is 128 multiplied by 128, the step length is 64, the partitioned image is expanded by rotating and mirroring, and then the length and the width are respectively reduced to one fourth of the original image by Resize operation to be used as a low-resolution image training set; the test set images are reduced to a quarter of the original image each, directly using the Resize operation.

And step three, establishing a label image data set: taking the training set image blocking result as a high-resolution source image, and respectively inputting the high-resolution source image into a plurality of fusion algorithms (LP-max-mean, DTCTWT-max-mean, NSCT-max-mean, NSST-max-mean, LP-max-CL and NSST-max-CL) with good performance, wherein max-mean represents that high frequency is subjected to large fusion by adopting an absolute value, low frequency is subjected to average fusion, max-CL represents that high frequency is subjected to large fusion, and low frequency is subjected to large fusion by adopting window definition), and then fusing the result by using a plurality of objective indexes: information Entropy (IE), Standard Deviation (SD), Mutual Information (MI), Average Gradient (AG), Spatial Frequency (SF), Contrast (Contrast, C), Peak Signal to Noise ratio (PSNR), Correlation Coefficient (CC), Structural Similarity (SSIM), Visual Information fidelity (VIFF), and Edge Information Preservation value (Edge Information Preservation Values, EIPV) are evaluated, and the fusion result of the method with the largest number of optimal indices is selected as a label image to create a label image dataset.

The fourth step establishes a loss function for generating the countermeasure network: the generator loss function includes the penalty loss L_advContent loss L_conAnd a perceptual loss L_perThree parts in total;

against loss L_advComprises the following steps:

where G represents the generator, D represents the discriminator, E represents the mathematical expectation, and x represents the generator input samples (here, the multiband low resolution source image), I_LRA multiband low resolution source image representing an input generator;

content loss L_conComprises the following steps:

wherein, | | · | represents the F norm, y represents the real sample input (here, the label image), I_HRFThe image of the label is represented and,

represents a gradient, and λ represents a weight coefficient;

loss of perception L_perComprises the following steps:

wherein the content of the first and second substances,

representing a feature extractor VGG19, calculating perceptual loss using a feature map of 2 nd, 4 th, 8 th and 12 th convolutional layers of a pre-trained VGG19 model (each convolutional layer contributes equivalently when calculating perceptual loss);

in summary, the final composite loss function is:

the discriminator loss function is:

wherein the content of the first and second substances,

Wherein α∈ [0,1]，θ_DFor the training parameters of the discriminator, the former two terms perform Wasserstein distance estimation, and the last term is the gradient penalty term of network regularization, lambda₃A weight that is a gradient penalty term;

the fifth step is training to generate a confrontation network: inputting a multiband low-resolution source image into a generator, and sequentially obtaining a fused image through a feature extraction (coding) sub-network, a feature combination sub-network and a decoding sub-network; the label images corresponding to the fusion image and the multiband low-resolution source image are sent to a discriminator to be classified and identified, the fusion image output by the generator and the label image of the discriminator tend to be similar continuously through dynamic games of the generator and the discriminator, a generator network model obtained when a loss function reaches the minimum value is a final multiband image synchronous fusion enhanced network model, the generator is used for inputting the multiband low-resolution source image, and the output result is a final fusion and enhanced result image.

Training of neural networks

The specific training process for the neural network is as follows:

(1) the generator and the arbiter are trained in turn, namely, the generator is trained once, the arbiter is trained once again, and then circulation is carried out in sequence until the generator and the arbiter reach dynamic balance;

(2) the generator loss function and the discriminator loss function are designed. Through experiments, lambda is 5 and lambda₁＝1，λ₂＝0.1，λ₃When the training of the generator and the discriminator reaches the balance, the image effect is optimal. The quality of images generated by a single loss function or two loss functions is not as good as the quality of images generated by using the loss functions in a composite mode.

The multi-band image synchronous fusion and enhancement method based on the improved WGAN-GP has no available standard fusion result due to the image fusion, and the training set and the test set of the invention comprise three-band images of infrared long wave (8-12 μm), near infrared (700-700 nm) and visible light (390-700 nm). In order to ensure that the training image has low resolution, the length and the width of the image expanded after being partitioned are respectively reduced to one fourth of the original image by utilizing a Resize method to be used as a training set; 80% of all dataset images were used for training, 20% for validation; the verification image is a TNO image fusion data set 8 groups of multiband images, and the length and the width of the verification image are respectively reduced to be one fourth of the original image by directly utilizing a Resize method. And when the image is verified to be input into the generator, the image needs to be amplified to a target size by adopting a bicubic interpolation method.

According to the multiband image synchronous fusion and enhancement method based on the improved WGAN-GP, the value of batch is between 12 and 20 during network training, the value determines the stability of error convergence, but if the value is too large, more memory is occupied, and if the value is too small, time is consumed; the learning rate is 0.0002, the learning rate determines the convergence speed of the network, too large results in network oscillation, too small consumes more time and affects the network efficiency, so the learning rate is selected to be between 0.002 and 0.00002.

Claims

1. The multiband image synchronous fusion and enhancement method based on the improved WGAN-GP is characterized by comprising the following steps of:

the generation countermeasure network is utilized to firstly input multiband low-resolution source images into a feature extraction network respectively, then extracted multiband image features are combined in a high-level feature space, and then a fused image is reconstructed through a decoding sub-network;

and respectively sending the reconstructed fusion image and the label image corresponding to the multiband low-resolution source image into a discriminator for classification and identification, iteratively optimizing a generator and the discriminator, enabling the fusion image output by the generator and the label image output by the discriminator to be continuously similar through the dynamic game of the generator and the discriminator, obtaining a generator model which is a final multiband image synchronous fusion enhanced network model when the generator and the discriminator reach dynamic balance, and fusing the multiband low-resolution source image by using the generator model.

2. The improved WGAN-GP based multiband image synchronization fusion and enhancement method according to claim 1, wherein the generator loss function comprises three parts: against loss L_advContent loss L_conAnd a perceptual loss L_per(ii) a Wherein the loss is resisted

Where G denotes the generator, D denotes the discriminator, E denotes the expectation, x denotes the generator sample input, I_LRA low resolution source image representing an input generator; content loss

Wherein, | | | |, represents the F norm, y represents the true sample input, I_HRFThe image of the label is represented and,

Wherein the content of the first and second substances,

the discriminator loss function is:

wherein the content of the first and second substances,

3. The improved WGAN-GP based multiband image synchronization fusion and enhancement method according to claim 2, wherein λ -5, λ₁＝1，λ₂＝0.1，λ₃When the loss function is 10, the loss functions can be balanced, and the network training effect achieves a better effect.

4. The improved WGAN-GP based multiband image synchronization fusion and enhancement method according to claims 1-3, characterized in that the multiband low resolution source image of the input generator is derived by the following process: the multiband image is blocked through a sliding window, the blocked image is expanded through rotation and mirror image operation, then the length and the width of the blocked image are respectively reduced to be one fourth of the original image through Resize operation and serve as low-resolution images, the low-resolution images are amplified to a target size through a bicubic interpolation method, and the multiband low-resolution source image input into the generator is obtained.

5. The method for synchronously fusing and enhancing multiband images based on improved WGAN-GP according to claims 1 to 3, wherein the label image is obtained by adopting the following steps: the multiband image is blocked through a sliding window, the blocking result is used as a high-resolution source image, and is respectively input into a plurality of fusion algorithms for fusion, and then the fusion result utilizes a plurality of objective indexes: and evaluating the information entropy, the standard deviation, the mutual information, the average gradient, the spatial frequency, the contrast, the peak signal-to-noise ratio, the correlation coefficient, the structural similarity, the visual information fidelity and the edge information retention value, and selecting the fusion result of the fusion algorithm with the most optimal indexes as the label image.

6. The improved WGAN-GP based multiband image synchronization fusion and enhancement method according to claim 3, wherein a feature extraction sub-network consists of 8 consecutive convolutional layers, each convolutional layer followed by a modified linear unit; the feature combination sub-network consists of 1 merging connection layer; the decoding subnetwork is composed of 8 continuous convolutional layers, and the first 7 convolutional layers are followed by a modified linear unit; the discriminator consists of 6 convolutional layers, 3 max pooling layers and two fully-connected layers, each of which is followed by a Leaky ReLU activation function.

7. The method of claim 6, wherein there are three hop connections between feature extraction sub-networks and decoding sub-networks in order to improve the similarity between the input multiband low resolution source images and the output fused image, and preserve the information in more source images.