CN115841438A

CN115841438A - Infrared image and visible light image fusion method based on improved GAN network

Info

Publication number: CN115841438A
Application number: CN202211300180.6A
Authority: CN
Inventors: 吴杰; 高策; 余毅; 张艳超
Original assignee: Changchun Institute of Optics Fine Mechanics and Physics of CAS
Current assignee: Changchun Institute of Optics Fine Mechanics and Physics of CAS
Priority date: 2022-10-24
Filing date: 2022-10-24
Publication date: 2023-03-24

Abstract

The invention relates to an infrared image and visible light image fusion method based on an improved GAN network, which comprises the steps of firstly decomposing an infrared image and a visible light image into a basic layer image and a detail layer image through a guide filter respectively; then, enhancing the basic layer image of the visible light image in a histogram mapping mode, and fusing the basic layer image by using an improved LT algorithm; secondly, extracting characteristic information of a source image by combining a generator and an encoder in an improved GAN algorithm, and guiding the fusion of the detail layer images of the infrared and visible light images in a counterstudy mode; and finally, setting relevant parameters for the fusion image of the basic layer and the fusion image of the detail layer, and obtaining the final fusion image by adopting a weighted average method. The method effectively avoids the defect that the traditional fusion method excessively depends on complex fusion rules, and retains the global structural features and local textures of the image, so that the image has natural impression.

Description

Infrared image and visible light image fusion method based on improved GAN network

Technical Field

The invention relates to the technical field of image processing, in particular to an infrared image and visible light image fusion method based on an improved GAN network.

Background

Infrared and visible image fusion are the core technologies in multi-sensor image fusion. The infrared image is a thermal radiation image of an object, can generate clear foreground contour information under the influence of environmental factors such as illumination and the like, and has the defect of low resolution. The visible light image has the advantages of high resolution and rich target texture details, and the defect that the visible light image is easily influenced by environmental factors such as haze, illumination and the like. The two kinds of images can be fused to effectively concentrate advantages and make up respective adverse factors to generate clear images with large information quantity, so that the technology has wide application in the fields of target detection, object identification, military detection and the like.

The image fusion is an information technology for fusing the characteristics of details, outlines and the like of different images into a new image. The infrared and visible light fused image not only retains the clear texture details of the visible light image, but also has the anti-interference characteristic of the infrared image. Image fusion methods are roughly classified into three types. One is the conventional method using transform domain, which mainly includes laplacian pyramid fusion method and non-subsampled Contourlet transform (NSCT). The algorithm analyzes the image by a filter according to the characteristics of the image so as to realize fusion, and has the defect of over-dependence on complex fusion rules. And the other is a fusion method based on a spatial domain, which mainly comprises a principal component analysis algorithm and has the defect of larger operation amount by reducing dimension of the source image information of the data set. And thirdly, a fusion mode based on deep learning is mainly combined with a convolutional neural network and image processing to better extract the image depth characteristics and prepare for a subsequent fusion stage. Beginning in 2018, GAN networks are gradually applied to the field of image fusion, and the method is roughly divided into two stages: in the network stage, the images to be fused are used as a training set to train the network, the network comprises a generator and a discriminator and is used for extracting features and guiding to generate the images, and the images are fused through the trained model. The GAN algorithm solves the problem that the fusion rule of the traditional algorithm is complex, however, the result of the antagonistic generation stage has uncertainty due to random network initialization parameters, and the extracted image features have a fuzzy effect, so that noise and artifacts exist in the fused image.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides an infrared image and visible light image fusion method based on an improved GAN network, which can increase the detail texture information of an image after the infrared image and the visible light image are fused, effectively reduce artifacts and noise and keep high contrast.

In order to achieve the purpose, the invention adopts the following technical scheme:

an infrared image and visible light image fusion method based on an improved GAN network comprises the following steps:

(1) Respectively solving infrared images I by utilizing guide filters ₁ Base layer image of

And a visible light image I ₂ Based on the base layer image->

And respectively calculate to obtain the corresponding detail layer image->

And detail layer image->

(2) Mapping base layer image by histogram

Performing enhancement to obtain an enhanced base layer image->

The base layer image is then based on the improved LT algorithm>

And enhancementBase layer image->

Fusing to obtain a base layer fused image F _b And ^ ing on the detail layer image>

And detail layer image->

Performing fusion model training on the improved GAN network as a training set to obtain a detail layer fusion image F _d Wherein the improved GAN network comprises a generator, an encoder, D _z A discriminator and a target discriminator, the encoder will input the detail layer image->

Mapping to a low-dimensional feature vector z through convolution operation, feeding the low-dimensional feature vector z and a label vector l into a generator, mapping the low-dimensional image to a high-dimensional image in the generator in a mode of 2 deconvolution, connecting the label vector l to the low-dimensional feature vector z, and obtaining a new vector [ z, l ]]Feedback to the generator and output the detail layer fusion image F _d Pick up the detail layer image->

As target image, image->

And the detail layer image->

Between D _z Discriminator, said D _z The discriminator is used for forcing the distribution of the generated low-dimensional feature vector z to be gradually close to the prior and the image is/is based on the detail level>

Fusing image F with detail layer _d Inter-construction object discriminationA target discriminator for fusing the detail layer into an image F _d And detail layer image>

Performing antagonistic learning;

(3) Fusing base layer to image F _b Fused image F with detail layer _d And carrying out weighted addition on the corresponding pixel points to obtain a final fusion image.

The invention provides an infrared image and visible light image fusion method based on an improved GAN Network, which is a layered image fusion method based on an improved generation type countermeasure Network (GAN) and Laplace Transformation (LT), wherein the method comprises the steps of firstly decomposing an infrared image and a visible light image into a basic layer image and a detail layer image through a guide filter respectively; then, enhancement processing is carried out on the basic layer image of the visible light image in a histogram mapping mode, the contour effect after fusion is improved, and an improved LT algorithm is used for fusing the basic layer image; secondly, extracting characteristic information of a source image by combining a generator and an encoder in an improved GAN algorithm, and guiding the fusion of detail layer images of infrared and visible light images in a counterstudy mode to make the fused image detail information richer; and finally, setting relevant parameters for the fusion image of the basic layer and the fusion image of the detail layer, and obtaining the final fusion image by adopting a weighted average method. The method effectively avoids the defect that the traditional fusion method excessively depends on complex fusion rules, and retains the global structural features and local textures of the image, so that the image has natural impression. The invention has the following beneficial effects:

1) By utilizing the end-to-end characteristic of the GAN network, the traditional registration method is prevented from excessively depending on manual design weight distribution and complex fusion rules, so that the algorithm performance is improved;

2) Enhancing the basic layer image of the low-illumination visible light image in a histogram mapping mode, and fusing the basic layer image of the low-illumination visible light image with the basic layer image of the infrared image by an improved LT algorithm to ensure that the fused image has good contrast and overall appearance;

3) The target image and the generated image are identified according to the double discriminators and the encoder, and the guidance of the generated image is realized by restricting network parameters in a low-rank prior decomposition mode, so that the anti-interference capability is improved, and the stability and the continuity are better;

4) The generator is combined with the depth migration module to enhance the extraction of the detail information of the image, the fault tolerance capability is strong, the information loss can be well prevented, and the noise and the artifacts of the fused image are reduced.

Drawings

FIG. 1 is a schematic diagram of an infrared image and visible light image fusion method based on an improved GAN network according to the present invention;

FIG. 2 is a schematic diagram of a histogram mapping algorithm;

FIG. 3 is a base layer image

And enhancing the base layer image->

A fusion schematic;

fig. 4 is a block diagram of an improved GAN network;

FIG. 5 is a diagram of a densely connected network architecture for a depth feature migration module;

fig. 6 is a schematic view of fused image reconstruction.

Detailed Description

The technical solution of the present invention will be described in detail with reference to the accompanying drawings and preferred embodiments.

The invention provides an infrared image and visible light image fusion method based on an improved GAN network, as shown in figure 1, the method comprises the following steps:

(1) Guided filtering based image decomposition

The invention relates to a novel visible light and infrared image fusion method, and the fusion principle is shown in figure 1. Firstly, respectively solving an infrared image I by utilizing a guide filter ₁ Base layer image of

And a visible light image I ₂ Based on the base layer image->

And respectively calculate to obtain the corresponding detail layer image->

And detail layer image->

Compared with the traditional method, the multi-scale decomposition analysis mode adopting the guide filter can separate the overlapped features in the space. Let I _k (k =1,2) is the input image, then I ₁ And I ₂ Respectively infrared image and visible image, for each input image I _k By solving equation (1), the corresponding base layer image ≥ can be obtained>

Equation (1) is as follows:

wherein, g _x ＝[-1，1]Representing horizontal gradient operator, g _y ＝[-1，1] ^T Denotes the vertical gradient operator and λ denotes the scale factor, in the above equation the λ parameter is set to 5.

Detail layer image

Subtracting the base layer image part from the source image, and calculating according to the following formula:

wherein the base layer image contains large-scale contour information and the detail layer image contains small-scale texture information and edge information.

(2) Base layer image fusion and detail layer image fusion

When the base layer image fusion is carried out, the base layer image of the visible light image is firstly processed

Increasing contrast in low-light images using a sub-histogram mapping algorithm resulting in an enhanced base layer image +>

Then carries out->

And

the fusion of (1). The histogram mapping algorithm is to divide the image S into m × n subintervals by using local minimum values, S = [ S ] ₁ ,s ₂ ,…,s _m×n ]Enhancing the reconstructed image T, T = [ T ] for the base image by the subinterval histogram ₁ ,t ₂ ,…,t _m×n ]As shown in fig. 2. The invention uses the local entropy to control the mapping range of the sub-histogram, thereby avoiding the generation of excessive histogram peaks in the traditional method, wherein the local entropy is used for measuring the texture richness of the image block, if the local entropy is smaller, the smaller the local entropy is, the less the texture is represented, and the remapping range is enlarged. The mapping range controlled by the local entropy is:

wherein, gamma is _j ∈[0,0.8]For the purpose of normalization, the method is used,

is a sub-histogram interval [ m _j ,m _j+1 ]Wherein p () represents a probability density and u represents a sub-histogram bin.

Histogram areaWhen inter-grain is small, R _j More mapping intervals are allocated. Thus, the resulting enhanced base layer image after the final enhancement process

The calculation formula is as follows:

wherein R is _t The local entropy for each partition j.

Enhancement of base layer images by improving LT algorithm (i.e. fusion rule of Laplacian pyramid transform)

And base layer image>

The process of performing fusion includes the steps of:

as shown in fig. 3, the base layer image will be enhanced

And the base layer image->

Respectively decomposed into->

And &>

Where l.epsilon. (1,2,3,4). Wherein a region average gradient is calculated over a top-level image by means of an m x n matrix of the region size centered on each pixel in the image>

And &>

The fusion is carried out, and the formula is as follows:

wherein, delta I _x And Δ I _y The first order differences of the pixel f (x, y) on the x-axis and the y-axis are respectively represented by G ₁ (i, j) and G ₂ (i, j) represents the average gradient of the region of each pixel point at the top layer, which can reflect the definition of the image, and the top layer image fusion can be represented as:

bottom three-layer LT fusion requires calculation of regional energy of each layer of infrared and visible light images

The calculation formula is as follows:

wherein p and q are 1,w which is a 3 × 3 matrix, when l is greater than 0 and less than 4, the fusion result of the LT image pyramid of l layers is shown as the following formula:

based on a fusion algorithm of the infrared image and the visible light image of the base layer of the improved LT, the enhancement processing is carried out on the base layer of the low-light visible light image in a histogram mapping mode, and the low-light visible light image is fused with the base layer of the infrared image through the improved LT algorithm, so that the fused image has good contrast and overall appearance.

When the detail layer image fusion is carried out, the detail layer image of the infrared image is used

And a detail layer image of the visible light image->

Performing fusion model training on the improved GAN network as a training set, optimizing and guiding the image fusion process in a counterstudy mode to obtain a detail layer fusion image F _d . Wherein the improved GAN network comprises a generator, an encoder and two discriminators, as shown in FIG. 4, wherein the encoder comprises five convolution layers and a BN layer, the generator comprises three dense connection modules and three deconvolution modules, D _z The discriminator and the target discriminator are respectively composed of a convolution layer, a BN layer, a Leaky Relu layer and a full connection layer. The improved GAN network of the present invention differs from the traditional generation of countermeasure networks GAN in two ways: firstly, a discriminator and an encoder are added for providing a prior optimization model to guide the generation of a fusion image; and secondly, a depth feature migration module is added in the generator network, so that the network can better extract the depth features of the source image and obtain the best fusion effect. Compared with the traditional GAN, an encoder and a discriminator are added, the fusion algorithm of the infrared image and the visible light image of the detail layer based on the improved GAN restrains network parameters in a low-rank prior decomposition mode, guides the generated fusion image to be closer to a target image, designs a loss function of a network and ensures the stability of a network model.

The purpose of the encoder is to reduce the dimension of the source image while providing a priori initialization constraint model. Detail layer image

And the input x is sent into a newly added encoder E as an input x, the encoder E maps the input x to a low-dimensional feature vector z through convolution operation, the low-dimensional feature vector z is spliced with a label vector o, and network parameters are constrained in a low-rank prior decomposition mode. Wherein:

E(x)＝z∈R ⁿ (9)

wherein R is ⁿ Representing an n-dimensional space, the output preserves the texture features of the input x.

The generator aims to extract more detail information in a source image, generate a fusion image with rich textures, and introduce a depth feature migration module to realize feature migration and dimension ascending of a low-dimensional feature vector z, as shown in fig. 3. The depth feature migration module extracts features of real semantic information in a dense connection mode, and assumes that the network has L layers and x layers in common ₀ As input to the network, x _l For output at layer I in the network, x _l-1 Is the output of layer l-1, H _l () For the nonlinear transformation acting on the l-th layer, the dense connection network structure of the depth feature migration module is shown in fig. 5, where Input represents the Input of an image, BN-ReLU-Conv represents a regularization layer, an activation layer and a convolution layer which are connected in sequence, and the relationship is shown in formula (10). The depth feature migration module integrates the features of each stage in a dense connection mode, and adds the output features of the previous layer to each input end, so that the dimension expansion is achieved. Mapping the low-dimensional image into a high-dimensional image in a generator by using 2 deconvolution modes, connecting a label vector l to a low-dimensional feature vector z, and connecting a new vector [ z, l]Fed back to the generator to output a detail layer fusion image F _d Is formula (11):

x _l ＝H _l ([x ₀ ,x ₁ ,...,x _l-1 ]) (10)

F _d ＝G(z,l)＝G(E(x),l) (11)

combining with the generator network design of the depth feature migration module, integrating the features of each stage in a jump connection mode, adding the output features of the previous layer at each input end, and mapping the low-dimensional image into a high-dimensional image by combining with deconvolution, so that the fused image has richer detail information, and the artifacts and the noise are effectively reduced.

The purpose of the discriminator is to distinguish the generated image from the target image and to transfer information to the fusion process of the generator. Detail layer image with richer texture features

As target images, in each case a detail layer image->

And the detail layer image->

Between D _z Discriminator and image on detail layer->

Fusing image F with detail layer _d And constructing a target discriminator. Wherein D _z The discriminator is used to force the distribution of the generated low-dimensional feature vector z to be progressively closer to a priori. The object discriminator is used to fuse the detail layers into an image F _d And the detail layer image->

Performing counterstudy to fuse detail layers into an image F _d And is more true. Still see FIG. 4,D _z The discriminator and the target discriminator both comprise three convolution modules, a normalization layer, an activation function layer and a full-link layer, wherein the normalization layer uses a BN layer to normalize data, the activation function layer takes Leaky Relu as an activation function, and the full-link layer predicts the data.

The training loss function of encoder E and generator G in the present invention is as follows:

/>

where L () represents a norm, and x is an input, i.e., a detail layer image

G () and E () represent the generator output and the encoder output, respectively, and l is the label vector.

D _z Loss function of discriminator

And the loss function L of the target discriminator D _D Respectively as follows:

wherein b and c represent detail layer images, respectively

And detail layer image->

The real label of (1), the value ranges are 0.4-0.6 and 0.7-1, respectively; d represents a true label of the fused detail layer, and the value thereof ranges from 0 to 0.3; n represents the number of fused images; />

And &>

Respectively represent->

And F _d The classification result of (1).

(3) Fused image reconstruction

Through the above steps (1) and (2), the base layer fusion image F can be obtained _b Fused image F with detail layer _d Fusing the image F with the base layer _b Fused image F with detail layer _d To obtain the final fused image. Fusing base layer to image F _b Fused image F with detail layer _d The corresponding pixel points in (b) are weighted and added as shown in formula (14), so as to obtain a final fusion image, as shown in fig. 6:

F(x,y)＝αF _b (x,y)+βF _d (x,y) (14)

wherein, (x, y) represent the pixel coordinates of the fusion image, wherein α is the fusion parameter of the base layer, β is the fusion parameter of the detail layer, α =0.6, and β =0.4 in this patent.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An infrared image and visible light image fusion method based on an improved GAN network is characterized by comprising the following steps:

And a visible light image I ₂ Base layer image of

And respectively calculating to obtain corresponding detail layer images

And detail layer image

(2) Mapping base layer image by histogram

Enhancing to obtain enhanced base layer image

Enhancement of the base layer image by improving the LT algorithm

And base layer image

Fusing to obtain a base layer fused image F _b And with detail layer images

And detail layer image

Performing fusion model training on the improved GAN network as a training set to obtain a detail layer fusion image F _d Wherein the improved GAN network comprises a generator, an encoder, D _z Discriminator and object discriminator, encoder inputting detail layer image

Mapping to a low-dimensional feature vector z through convolution operation, feeding the low-dimensional feature vector z and a label vector l into a generator, mapping the low-dimensional image to a high-dimensional image in the generator in a mode of 2 deconvolution, connecting the label vector l to the low-dimensional feature vector z, and obtaining a new vector [ z, l ]]Feedback to the generator and output the detail layer fusion image F _d Image of detail layer

As the target image, in the detail layer image

With detail layer images

Between D _z Discriminator, said D _z The discriminator is used for forcing the distribution of the generated low-dimensional feature vector z to be gradually close to the prior and the image at the detail layer

Fusing image F with detail layer _d A target discriminator is constructed for fusing the detail layers into an image F _d With detail layer images

Performing antagonistic learning;

2. The method as claimed in claim 1, wherein the histogram mapping algorithm is used to blend the base layer image with the infrared image based on the improved GAN network

When enhancement is carried out, the local entropy is used for controlling the mapping range of the sub-histogram, and the mapping range controlled by the local entropy is as follows:

wherein, γ _j ∈[0,0.8]For the purpose of normalization, the method is used,

is a sub-histogram interval [ m _j ,m _j+1 ]Wherein p () represents a probability density and u represents a sub-histogram bin;

enhanced base layer image obtained after enhancement processing

The calculation formula is as follows:

wherein R is _t The local entropy for each partition j.

3. The method for fusing infrared image and visible image based on improved GAN network as claimed in claim 1, wherein the enhancement of the base layer image is performed by improving LT algorithm

And base layer image

When fusion is carried out, the basic layer image is enhanced

And base layer image

Respectively decomposing the L-layer pyramid into

And

wherein l is ∈ (1,2,3,4); calculating the average gradient of the region by using an m multiplied by n matrix of the region size taking each pixel in the image as the center, and comparing the average gradient with the top layer image

And

the fusion is carried out, and the formula is as follows:

wherein, delta I _x And Δ I _y The first order differences of the pixel f (x, y) on the x-axis and the y-axis are respectively represented by G ₁ (i, j) and G ₂ (i, j) represents the regional average gradient of each pixel point at the top level, and the top-level image fusion can be represented as:

The calculation formula is as follows:

4. the method as claimed in claim 1, wherein the improved GAN network incorporates a depth feature migration module, the depth feature migration module integrates features of each stage by adopting a jump connection manner, adds output features of a previous layer to each input end, and maps a low-dimensional image into a high-dimensional image by combining deconvolution.

5. The method for fusing infrared image and visible light image based on improved GAN network as claimed in claim 1, wherein D is _z The discriminator and the target discriminator both comprise three convolution modules, a BatchNorm normalization layer, a Leaky Relu activation function layer and a full link layer.

6. The method for fusing infrared images and visible light images based on the improved GAN network as claimed in claim 1, wherein the training loss function of the encoder (E) and the generator (G) is as follows:

where L () represents a norm, and x is an input, i.e., a detail layer image

G () and E () represent the generator output and the encoder output, respectively, l being the label vector;

D _z loss function of discriminator

wherein b and c represent detail layer images

And detail layer image

The real label of (1), the value ranges are 0.4-0.6 and 0.7-1, respectively; d represents a true label of the fused detail layer, and the value thereof ranges from 0 to 0.3; n represents the number of fused images;

and

respectively representing detail layer images

Detail layer image

Fused image F with detail layer _d The classification result of (1).