CN114841907A

CN114841907A - A method for multi-scale generative adversarial fusion networks for infrared and visible light images

Info

Publication number: CN114841907A
Application number: CN202210599873.3A
Authority: CN
Inventors: 王文卿; 张纪乾; 刘涵; 李余兴
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2022-05-27
Filing date: 2022-05-27
Publication date: 2022-08-02
Anticipated expiration: 2042-05-27
Also published as: CN114841907B

Abstract

The invention discloses a method for generating a confrontation fusion network in a multi-scale mode facing infrared and visible light images, which comprises the steps of selecting a plurality of infrared and visible light image pairs from a standard training set, inputting the image pairs into an edge-preserving filter, and obtaining a basic layer and a detail layer; and inputting the basic layer into a gradient filter to obtain a gradient map and a new basic layer, adding the gradient map and the original detail layer to obtain a new detail layer, calculating to obtain network parameters of the discriminator, training, and finally obtaining an output which is a final fusion image. The image obtained by fusion retains the target information and the texture information of the source image to the maximum extent, improves the quality of the fused image, and provides more convenient prerequisite for subsequent target detection and identification.

Description

A method for multi-scale generative adversarial fusion networks for infrared and visible light images

技术领域technical field

本发明属于数字图像处理中图像分解和图像融合技术领域，具体涉及一种面向红外与可见光图像的多尺度生成对抗融合网络的方法。The invention belongs to the technical field of image decomposition and image fusion in digital image processing, and in particular relates to a multi-scale generation confrontation fusion network method for infrared and visible light images.

背景技术Background technique

图像融合是信息融合领域的一个分支，属于交叉领域的研究，涉及到传感器成像、图像预处理、计算机视觉和人工智能等领域。随着多类型成像传感器的飞速发展，单一传感器提供的图像目标信息有限的问题得到了有效解决。针对同一场景，通过将来自相同或不同成像传感器的两张或多张源图像进行融合，能够得到信息丰富、清晰度高的融合图像。可见光传感器是利用物体反射光线进行成像，得到的图像具有高分辨率、细节丰富的特点。但在光照条件不好的情况下，获得的图像就不太清晰。红外传感器成像时，通过目标的热辐射信息进行成像，穿透力较强，同时可以解决可见光传感器在光照不足或有物体遮挡的情况下成像效果不好的问题，它在光照条件较差时仍能够探测到目标，但是所成图像细节信息和对比度信息不足。红外与可见光图像融合技术可实现将两类图像各自优势进行互补，保证最后所得融合图像中饱含热辐射信息、对比度信息和细节信息，以便更好的了解图像目标信息，最终实现系统全天候工作。近年来，基于多尺度图像融合技术取得了重要进展。通常，基于多尺度变换的红外和可见光图像融合方案包括三个步骤。首先，将每个源图像分解为一系列多尺度表示，然后根据给定的融合规则对源图像的多尺度表示进行融合，最后对融合后的图像进行相应的多尺度逆变换。同时随着深度学习的飞速发展，无监督深度学习在融合领域得到拓展并取得一定成果。此类方法虽更适合进行无参考图像的多源图像融合，但对于网络结构、损失函数的设计提出了更高的要求。因此，基于无监督的生成对抗网络融合方法逐渐受到了研究人员的关注。Image fusion is a branch of the field of information fusion, which is a cross-domain research involving sensor imaging, image preprocessing, computer vision, and artificial intelligence. With the rapid development of multi-type imaging sensors, the problem of limited image target information provided by a single sensor has been effectively solved. For the same scene, by fusing two or more source images from the same or different imaging sensors, a fusion image with rich information and high definition can be obtained. The visible light sensor uses the reflected light of the object to image, and the obtained image has the characteristics of high resolution and rich details. But in poor lighting conditions, the resulting images are less clear. When the infrared sensor is imaging, it uses the thermal radiation information of the target to image, and the penetrating power is strong. At the same time, it can solve the problem that the imaging effect of the visible light sensor is not good when the light is insufficient or there is an object occluded. The target can be detected, but the detail information and contrast information of the resulting image are insufficient. Infrared and visible light image fusion technology can complement the respective advantages of the two types of images, ensuring that the final fusion image is full of thermal radiation information, contrast information and detail information, so as to better understand the image target information, and finally realize the all-weather operation of the system. In recent years, important progress has been made in multi-scale image fusion technology. Generally, the fusion scheme of infrared and visible light images based on multi-scale transformation includes three steps. First, each source image is decomposed into a series of multi-scale representations, then the multi-scale representations of the source images are fused according to the given fusion rules, and finally the corresponding multi-scale inverse transformation is performed on the fused images. At the same time, with the rapid development of deep learning, unsupervised deep learning has been expanded in the field of fusion and achieved certain results. Although this method is more suitable for multi-source image fusion without reference images, it puts forward higher requirements for the design of network structure and loss function. Therefore, unsupervised generative adversarial network fusion methods have gradually attracted the attention of researchers.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种面向红外与可见光图像的多尺度生成对抗融合网络的方法，通过对源图像进行多尺度分解，然后将分解后的对应分量输入至生成对抗网络从而得到最终的融合图像，这样融合得到的图像最大限度地保留源图像的目标信息和纹理信息，提高了融合图像的质量，对后续目标检测和识别提供了更便利的先决条件。The purpose of the present invention is to provide a multi-scale generative adversarial fusion network method for infrared and visible light images, by performing multi-scale decomposition on the source image, and then inputting the decomposed corresponding components into the generative adversarial network to obtain the final fusion image , so that the fused image retains the target information and texture information of the source image to the greatest extent, improves the quality of the fused image, and provides a more convenient prerequisite for subsequent target detection and recognition.

本发明所采用的技术方案是，面向红外与可见光图像的多尺度生成对抗融合网络的方法，具体按照以下步骤实施：The technical solution adopted in the present invention is a method for multi-scale generation of an adversarial fusion network for infrared and visible light images, which is specifically implemented according to the following steps:

步骤1、从标准训练集中选取若干红外与可见光图像对，然后将图像对输入至边缘保持滤波器，得到基础层和细节层；Step 1. Select several infrared and visible light image pairs from the standard training set, and then input the image pairs into the edge preserving filter to obtain the base layer and the detail layer;

步骤2、将步骤1得到的基础层输入至梯度滤波器得到梯度图和新的基础层，梯度图和步骤1的细节层相加作为新的细节层；Step 2. Input the base layer obtained in step 1 into the gradient filter to obtain a gradient map and a new base layer, and add the gradient map and the detail layer of step 1 as a new detail layer;

步骤3、将步骤2得到的基础层和细节层输入至生成器网络G，经过生成器网络G后得到对应源图像对的融合图像，计算生成器损失函数L_G,对生成器网络G参数进行更新，得到最终的生成器网络参数，将源图像分别和融合图像输入至判别器网络D进行分类，计算判别器损失函数L_D，对判别器网络参数进行更新，得到最终的判别器网络参数；Step 3. Input the base layer and detail layer obtained in step 2 to the generator network G, and obtain the fusion image corresponding to the source image pair after passing through the generator network G, calculate the generator loss function L _G , and perform the generator network G parameters. update to obtain the final generator network parameters, input the source image and the fusion image to the discriminator network D for classification, calculate the discriminator loss function L _D , update the discriminator network parameters, and obtain the final discriminator network parameters;

步骤4、开始训练网络，判别迭代是否结束，即当前的迭代次数是否达到了设定的迭代次数，迭代次数达到设置的迭代次数求得的网络参数作为最终的网络参数，将网络参数保存；Step 4. Start training the network, and determine whether the iteration is over, that is, whether the current number of iterations has reached the set number of iterations, and the network parameters obtained when the number of iterations reaches the set number of iterations are used as the final network parameters, and the network parameters are saved;

步骤5、将步骤4得到的生成器网络参数加载到测试网络中的生成器网络中，将测试的红外与可见光源图像进行多尺度分解，即步骤1、步骤2中的滤波操作，然后将分解得到对应的基础层和细节层拼接起来当作测试网络的输入，得到的输出即为最终的融合图像。Step 5. Load the generator network parameters obtained in step 4 into the generator network in the test network, and perform multi-scale decomposition of the tested infrared and visible light source images, that is, the filtering operations in steps 1 and 2, and then decompose The corresponding base layer and detail layer are spliced together as the input of the test network, and the obtained output is the final fusion image.

本发明的特点还在于，The present invention is also characterized in that,

步骤1中滤波公式如下：The filtering formula in step 1 is as follows:

其中：in:

式(1)中I_q为输入图像，

为滤波后图像，q是I_q的一个像素点，s是q的像素集合，p是q领域中的一个像素，

是部分输入图像块，

是

周边图像块，

是空间滤波器内核，

是距离滤波器内核，空间内核和距离内核通常都以高斯的方式表示；In formula (1), I _q is the input image,

is the filtered image, q is a pixel of I _q , s is the set of pixels of q, p is a pixel in the field of q,

is a partial input image patch,

Yes

surrounding image blocks,

is the spatial filter kernel,

is the distance filter kernel, the spatial kernel and the distance kernel are usually expressed in Gaussian form;

式(3)中I_d0为经过双边滤波得到的细节层，I_b0为得到的基础层。In formula (3), I _d0 is the detail layer obtained by bilateral filtering, and I _b0 is the obtained base layer.

步骤2中in step 2

通过下面公式求出图像中所有像素点的梯度值：The gradient value of all pixels in the image is obtained by the following formula:

然后定义一个阈值Gmax，如若该像素点梯度值大于阈值设为白色，否则为黑色，这样求得梯度图I_G；Then define a threshold Gmax, if the gradient value of the pixel is greater than the threshold, set it as white, otherwise it is black, and thus obtain the gradient map _IG ;

将步骤1得到的基础层I_b0进行梯度滤波，得到梯度图I_G，然后将原基础层I_b0减去梯度图可得新基础层I_b，梯度图I_G同原细节层I_d0相加可得新细节层I_d。The base layer I _b0 obtained in step 1 is subjected to gradient filtering to obtain a gradient map I _G , and then the original base layer I _b0 is subtracted from the gradient map to obtain a new base layer I _b , and the gradient map I _G is added to the original detail layer I _d0 A new level of detail I _d is available.

步骤3中生成器网络结构由双流网络和其后面接的卷积神经网络组成，双流网络中上下两支网络结构相同，均为六层卷积神经网络，前四层结构相同，网络结构为3×3的卷积层、批量归一化层和激活层，该激活层的激活函数为Leaky Relu；后两层结构相同，由5×5的卷积层、批量归一化层和激活层组成，该激活层的激活函数为Leaky Relu，双流网络后面接的网络结构由1×1的卷积层和激活层组成，该激活层的激活函数为tanh，这一层卷积神经网络的输出就是最终的融合图像。In step 3, the generator network structure consists of a dual-stream network and a convolutional neural network followed by it. The upper and lower network structures in the dual-stream network are the same, both are six-layer convolutional neural networks, the first four layers have the same structure, and the network structure is 3 ×3 convolutional layer, batch normalization layer and activation layer, the activation function of this activation layer is Leaky Relu; the latter two layers have the same structure and consist of 5×5 convolutional layer, batch normalization layer and activation layer , the activation function of this activation layer is Leaky Relu, the network structure behind the two-stream network consists of a 1×1 convolution layer and an activation layer, the activation function of this activation layer is tanh, and the output of this layer of convolutional neural network is The final fused image.

步骤3中生成器损失函数L_G为：The generator loss function _LG in step 3 is:

L_G＝λL_content+L_Gen，(6)L _G =λL _content +L _Gen , (6)

其中，L_content是生成器输入和输出的比较后的内容损失，L_Gen是生成器和判别器的对抗损失，λ为常数；where L _content is the content loss after comparing the generator input and output, L _Gen is the adversarial loss between the generator and the discriminator, and λ is a constant;

其中，H和W分别是生成器输入的图像高度和宽度，||·||₂为计算二范数，I_f为生成器输出即融合图像，I_b为输入生成器的基础层，I_d为输入生成器的细节层，

为梯度算子，ξ为常量；Among them, H and W are the height and width of the image input by the generator, ||·|| ₂ is the calculation of the second norm, I _f is the output of the generator, i.e. the fusion image, I _b is the base layer of the input generator, I _d is the detail layer of the input generator,

is the gradient operator, and ξ is a constant;

L_Gen＝E[log(1-D_V(G(I_b,I_d)))]+E[log(1-D_I(G(I_b,I_d)))](8)L _Gen = E[log(1-D _V (G(I _b , I _d )))]+E[log(1-D _I (G(I _b , I _d )))](8)

D_V(G(I_b,I_d))表示以红外图像或者融合图像为输入的判别器判别值，G(I_b,I_d)表示生成器生成的融合图像，D_I(G(I_b,I_d))表示以可见光图像或者融合图像为输入的判别器判别值。D _V (G(I _b , I _d )) represents the discriminator discriminant value input with infrared image or fusion image, G(I _b , I _d ) represents the fusion image generated by the generator, D _I (G(I _{b )} , I _d )) represents the discriminator discriminant value that takes the visible light image or the fusion image as input.

计算生成器损失函数L_G，同时利用SGD对网络参数进行更新从而达到优化的目的，得到生成器的网络参数。Calculate the generator loss function L _G , and use SGD to update the network parameters to achieve the purpose of optimization, and obtain the network parameters of the generator.

步骤3中两个判别器网络D_I和D_V具有相同的网络结构，均由五层卷积神经网络组成，前四层结构相同，网络结构为3×3的卷积层、批量归一化层和激活层，该激活层的激活函数为Leaky Relu；最后一层为全连接层，输出的是对输入的分类结果，从而预测输出是融合图像还是源图像；In step 3, the two discriminator networks D _I and D _V have the same network structure, both composed of five-layer convolutional neural networks, the first four layers have the same structure, and the network structure is 3 × 3 convolutional layers, batch normalization layer and activation layer, the activation function of the activation layer is Leaky Relu; the last layer is a fully connected layer, the output is the classification result of the input, so as to predict whether the output is a fusion image or a source image;

输入为红外图像和融合图像的判别器损失函数为：The loss function of the discriminator whose input is the infrared image and the fused image is:

其中，D_I(I_I)表示以红外图像作为输入的判别器判别值，D_I(G(I_b,I_d))表示以融合图像作为输入的判别器判别值；Wherein, D _I (I _I ) represents the discriminator discriminant value that takes infrared image as input, D _I (G(I _b , I _d )) represents the discriminator discriminant value that takes fusion image as input;

输入为可见光图像和融合图像的判别器损失函数为：The loss function of the discriminator whose input is the visible light image and the fused image is:

其中，D_V(I_V)表示以可见光图像作为输入的判别器判别值，D_V(G(I_b,I_d))表示以融合图像作为输入的判别器判别值；Wherein, D _V ( _IV ) represents the discriminator discriminant value that takes the visible light image as input, and D _V (G(I _b , I _d )) represents the discriminator discriminant value that takes the fusion image as input;

给判别器输出设一个阈值，当其判别器输出值时大于预设的阈值时继续更新网络参数，直至小于预设的阈值，在此过程中要经过判别器D_I和D_V后，计算对应的判别器损失函数

和

更新网络参数的优化方法是SGD最终得到判别器的网络参数。Set a threshold for the output of the discriminator. When the output value of the discriminator is greater than the preset threshold, continue to update the network parameters until it is less than the preset threshold. In this process, after passing through the discriminator D _I and D _V , calculate the corresponding The discriminator loss function of

and

The optimization method for updating the network parameters is that SGD finally obtains the network parameters of the discriminator.

本发明的有益效果是，一种面向红外与可见光图像的多尺度生成对抗融合网络的方法，结合了多尺度分解和生成对抗网络，既将源图像进行了优化处理，也把融合效果好的神经网络应用到了融合过程中。一种面向红外与可见光图像的多尺度生成对抗融合网络的方法，首先通过边缘滤波保持器和梯度滤波得到图像的基础层和细节层，这样得到的图像分量最大限度保留了我们所需要的信息，然后利用生成对抗网络中生成器网络的两个分支网络分别融合多尺度分解后的基础层(结构信息)和细节层(细节信息)，最终将生成的基础层图像和细节层图像相加得到最后的融合图像，同时生成对抗网络中两个判别器结构对两个源图像和融合图像进行分类判别。经过本发明融合得到的图像最大限度地保留源图像的目标信息和纹理信息，提高了融合图像的质量，这样对后续目标检测和识别提供了更便利的条件。The beneficial effect of the present invention is that a multi-scale generation confrontation fusion network method for infrared and visible light images combines multi-scale decomposition and generation confrontation network, which not only optimizes the source image, but also integrates neural networks with good fusion effects. The network is applied to the fusion process. A method for multi-scale generative adversarial fusion network for infrared and visible light images. First, the base layer and detail layer of the image are obtained through edge filter holder and gradient filtering. The obtained image components retain the information we need to the greatest extent. Then, the two branch networks of the generator network in the generative adversarial network are used to fuse the base layer (structure information) and the detail layer (detail information) after multi-scale decomposition respectively, and finally the generated base layer image and detail layer image are added to get the final result. The fusion image is generated, and the two discriminator structures in the generative adversarial network are used to classify and discriminate the two source images and the fusion image. The image obtained by the fusion of the present invention retains the target information and texture information of the source image to the greatest extent, improves the quality of the fusion image, and thus provides more convenient conditions for subsequent target detection and recognition.

附图说明Description of drawings

图1是本发明的面向红外与可见光图像的多尺度生成对抗融合网络的方法整体流程图；Fig. 1 is the overall flow chart of the method for multi-scale generation confrontation fusion network for infrared and visible light images of the present invention;

图2是本发明的源图像双边滤波后的基础层和细节层图；Fig. 2 is the base layer and detail layer diagram after the source image bilateral filtering of the present invention;

图3是本发明的双边滤波后的基础层进行梯度滤波后得到新的基础层和细节层图；Fig. 3 is the base layer after bilateral filtering of the present invention to obtain new base layer and detail layer diagram after gradient filtering;

图4是本发明生成对抗网络中生成器的网络结构图；Fig. 4 is the network structure diagram of the generator in the generation confrontation network of the present invention;

图5是本发明生成对抗网络中判别器的网络结构图。FIG. 5 is a network structure diagram of the discriminator in the Generative Adversarial Network of the present invention.

具体实施方式Detailed ways

下面结合附图和具体实施方式对本发明进行详细说明。The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

本发明面向红外与可见光图像的多尺度生成对抗融合网络的方法；首先通过边缘滤波保持器和梯度滤波对源图像进行分解，得到图像的基础层和细节层，然后将基础层和细节层输入至生成对抗网络中生成器网络进行融合，然后将融合图像和两个源图像分别输入至判别器中进行判别来优化网络参数，得到最终的融合图像，实现图像融合。算法总体网络结构如图1所示，基于多尺度分解后生成对抗网络的红外与可见光图像融合过程主要分为以下三个阶段；The method of the present invention is oriented to the multi-scale generation confrontation fusion network of infrared and visible light images; firstly, the source image is decomposed by the edge filter holder and gradient filtering to obtain the base layer and the detail layer of the image, and then the base layer and the detail layer are input into In the generative adversarial network, the generator network is fused, and then the fused image and the two source images are respectively input into the discriminator for discrimination to optimize the network parameters, and the final fused image is obtained to realize image fusion. The overall network structure of the algorithm is shown in Figure 1. The fusion process of infrared and visible light images based on the multi-scale decomposition of the generative adversarial network is mainly divided into the following three stages;

1)源图像多尺度分解1) Multi-scale decomposition of source image

源图像的多尺度分解主要分为三步，首先将图像输入至边缘保持滤波器(双边滤波)得到基础层和细节层，得到的基础层和细节层如图2所示，然后将基础层通过梯度滤波得到梯度图和新的基础层，最后将梯度图和细节层相加当作新的细节层，得到新的基础层和新的细节层如图3所示。双边滤波和梯度滤波原理如下：The multi-scale decomposition of the source image is mainly divided into three steps. First, the image is input to the edge-preserving filter (bilateral filtering) to obtain the base layer and the detail layer. The obtained base layer and detail layer are shown in Figure 2, and then the base layer is passed through Gradient filtering obtains a gradient map and a new base layer. Finally, the gradient map and the detail layer are added as a new detail layer, and a new base layer and a new detail layer are obtained as shown in Figure 3. The principles of bilateral filtering and gradient filtering are as follows:

双边滤波是一种边缘保持滤波器，它可以达到保持边缘、降噪平滑的效果。和其他滤波原理一样，双边滤波也是采用加权平均的方法，用周边像素亮度值的加权平均代表某个像素的强度，所用的加权平均基于高斯分布。最重要的是，双边滤波的权重不仅考虑了像素的欧氏距离(如普通的高斯低通滤波，只考虑了位置对中心像素的影响)，还考虑了像素范围域中的辐射差异(例如卷积核中像素与中心像素之间相似程度、颜色强度，深度距离等)，在计算中心像素的时候同时考虑这两个权重。滤波公式如下：Bilateral filtering is an edge-preserving filter, which can achieve the effect of maintaining edges and reducing noise and smoothness. Like other filtering principles, bilateral filtering also uses a weighted average method, where the weighted average of the brightness values of surrounding pixels represents the intensity of a pixel, and the weighted average used is based on a Gaussian distribution. Most importantly, the weights of bilateral filtering take into account not only the Euclidean distance of pixels (like ordinary Gaussian low-pass filtering, which only considers the effect of position on the central pixel), but also the radiance differences in the pixel range domain (such as volume Similarity, color intensity, depth distance, etc. between the pixels in the product kernel and the center pixel), these two weights are considered at the same time when calculating the center pixel. The filtering formula is as follows:

式(1)中I_q为输入图像，

为滤波后图像In formula (1), I _q is the input image,

is the filtered image

步骤2，将步骤1得到的基础层输入至梯度滤波器得到梯度图和新的基础层，梯度图和步骤1的细节层相加作为新的细节层。梯度滤波原理如下：Step 2, input the base layer obtained in step 1 to the gradient filter to obtain a gradient map and a new base layer, and add the gradient map and the detail layer of step 1 as a new detail layer. The principle of gradient filtering is as follows:

梯度简单来说就是求导，三种不同的滤波器：Sobel、Scharr和Laplacian；Sobel、Scharr其实就是求一阶或者二阶导数；Scharr是对Sobel的优化；Laplacian是求二阶导数。这里采用的是Sobel滤波器，目的是让高频通过，阻挡低频，使得边缘更加明显达到增强图像的目的。其具体原理如下：Gradient is simply derivation, three different filters: Sobel, Scharr and Laplacian; Sobel, Scharr are actually the first or second derivative; Scharr is the optimization of Sobel; Laplacian is the second derivative. The Sobel filter is used here, the purpose is to let the high frequency pass, block the low frequency, and make the edge more obvious to enhance the image. The specific principle is as follows:

Sobel算子是一离散性差分算子，用来运算图像亮度函数的灰度之近似值。在图像的任何一点使用此算子，将会产生对应的灰度矢量或是其法矢量。The Sobel operator is a discrete difference operator, which is used to calculate the approximation of the gray level of the image brightness function. Using this operator at any point in the image will yield the corresponding grayscale vector or its normal vector.

该算子包含两组3x3的矩阵，分别为横向及纵向，将之与图像作平面卷积，即可分别得出横向及纵向的亮度差分近似值。如果以A代表原始图像，Gx及Gy分别代表经横向及纵向边缘检测的图像灰度值，其公式如下：The operator consists of two sets of 3x3 matrices, which are horizontal and vertical, respectively. By convolving them with the image plane, the approximation of the horizontal and vertical luminance differences can be obtained respectively. If A represents the original image, Gx and Gy represent the gray value of the image detected by the horizontal and vertical edges, respectively, and the formula is as follows:

G_X＝G_x*A and G_Y＝G_y*A (4)G _X =G _x *A and G _Y =G _y *A (4)

2)生成对抗网络参数获取2) Generative Adversarial Network Parameter Acquisition

判别器网络参数获取：将1)得到的基础层和细节层在图像通道的维度上两两拼接起来作为生成器的输入，生成器的网络结构图如图4所示，它是由一个双流网络后接一个卷积块组成。双流网络上下两支网络相同，由六层卷积神经网络组成，前四层结构相同，网络结构为3×3的卷积层、批量归一化层和激活层(激活函数为Leaky Relu)；后两层结构相同，由5×5的卷积层、批量归一化层和激活层(激活函数为Leaky Relu)组成。双流网络后面接的网络结构由1×1的卷积层和激活层(激活函数为tanh)组成，这一层输出的就是最终的融合图像。然后将各自的融合结果(基础层、细节层的融合图)进行拼接后得到最终的融合图像。在此过程中要经过生成器G后，计算生成器损失函数L_G，同时利用SGD(随机梯度下降)对网络参数进行更新从而达到优化的目的，得到生成器的网络参数。Obtaining the network parameters of the discriminator: The base layer and the detail layer obtained in 1) are spliced together in the dimension of the image channel as the input of the generator. The network structure diagram of the generator is shown in Figure 4, which is composed of a two-stream network. It is followed by a convolution block. The upper and lower two networks of the dual-stream network are the same and consist of six-layer convolutional neural networks. The first four layers have the same structure. The network structure is a 3×3 convolutional layer, a batch normalization layer and an activation layer (the activation function is Leaky Relu); The latter two layers have the same structure, consisting of a 5×5 convolutional layer, a batch normalization layer, and an activation layer (the activation function is Leaky Relu). The network structure followed by the two-stream network consists of a 1×1 convolutional layer and an activation layer (the activation function is tanh), and the output of this layer is the final fusion image. Then the respective fusion results (the fusion map of the base layer and the detail layer) are spliced to obtain the final fusion image. In this process, after passing through the generator _G , the generator loss function LG is calculated, and at the same time, SGD (stochastic gradient descent) is used to update the network parameters to achieve the purpose of optimization, and the network parameters of the generator are obtained.

判别器网络参数获取：因为源图像是一对，所以采用两个判别器，一个用来获取融合图像为红外图像的概率P_I，另一个获取融合图像为可见光图像的概率P_V。这两个判别器网络结构完全相同，如图5所示。它由五层卷积神经网络组成，前四层结构相同，网络结构为3×3的卷积层、批量归一化层和激活层(激活函数为Leaky Relu)；最后一层为全连接层，输出的是对输入的分类结果。输入源图像和融合图像得到两个概率，当概率大于预设的阈值时继续更新网络参数，直至概率小于预设的阈值。在此过程中要经过判别器D_I和D_V后，计算对应的判别器损失函数

和

更新网络参数的优化方法是SGD(随机梯度下降)，最终得到判别器的网络参数。Discriminator network parameter acquisition: Because the source image is a pair, two discriminators are used, one is used to obtain the probability P _I that the fusion image is an infrared image, and the other is used to obtain the probability P _V that the fusion image is a visible light image. The two discriminator network structures are exactly the same, as shown in Figure 5. It consists of five layers of convolutional neural networks. The first four layers have the same structure. The network structure is a 3×3 convolution layer, a batch normalization layer and an activation layer (the activation function is Leaky Relu); the last layer is a fully connected layer. , the output is the classification result of the input. Two probabilities are obtained from the input source image and the fusion image. When the probability is greater than the preset threshold, the network parameters are continuously updated until the probability is less than the preset threshold. In this process, after passing through the discriminator D _I and D _V , the corresponding discriminator loss function is calculated

and

The optimization method for updating the network parameters is SGD (Stochastic Gradient Descent), and finally the network parameters of the discriminator are obtained.

损失函数包括生成器损失函数L_G和两个判别器的损失函数

和

设计如下：The loss function includes the generator loss function _LG and the loss functions of the two discriminators

and

The design is as follows:

生成器损失函数目的是保存更多源图像的信息，其有内容损失和对抗损失两部分组成：The purpose of the generator loss function is to save more information about the source image, which consists of two parts: content loss and adversarial loss:

L_G＝λL_content+L_Gen， (5)L _G =λL _content +L _Gen , (5)

其中H和W分别是生成器输入的图像高度和宽度，||·||₂为计算二范数，I_f为生成器输出即融合图像，I_b为输入生成器的基础层，I_d为输入生成器的细节层，

为梯度算子，ξ为常量；where H and W are the height and width of the image input by the generator, ||·|| ₂ is the calculation of the second norm, I _f is the output of the generator, i.e. the fusion image, I _b is the base layer of the input generator, and I _d is The detail layer of the input generator,

is the gradient operator, and ξ is a constant;

L_Gen＝E[log(1-D_V(G(I_b,I_d)))]+E[log(1-D_I(G(I_b,I_d)))] (7)L _Gen = E[log(1-D _V (G(I _b , I _d )))]+E[log(1-D _I (G(I _b , I _d )))] (7)

用两个判别器就是有效的减少融合结果的信息丢失，其作用也是让生成器保存更多源图像信息；定义如下：Using two discriminators can effectively reduce the information loss of the fusion result, and its role is to allow the generator to save more source image information; the definition is as follows:

其中，D_V(I_V)表示以可见光图像作为输入的判别器判别值，D_V(G(I_b,I_d))表示以融合图像作为输入的判别器判别值，D_I(I_I)表示以红外图像作为输入的判别器判别值，D_I(G(I_b,I_d))表示以融合图像作为输入的判别器判别值。Among them, D _V ( _IV ) represents the discriminator discriminant value that takes the visible light image as input, D _V (G(I _b , I _d )) represents the discriminator discriminant value that takes the fusion image as input, D _I (I _I ) represents the discriminator discriminant value with infrared image as input, D _I (G(I _b , I _d )) represents the discriminator discriminant value with fusion image as input.

3)融合测试网络3) Fusion test network

用2)得到的生成器网络参数输入至生成器网络中，将测试的图片先进行多尺度分解，然后将分解得到对应的基础层和细节层拼接起来输入至生成器网络中，这样生成器的输出就是最终的融合图像。Input the generator network parameters obtained in 2) into the generator network, first perform multi-scale decomposition on the test image, and then splicing the corresponding base layer and detail layer obtained from the decomposition and input them into the generator network, so that the generator's The output is the final fused image.

本发明面向红外与可见光图像的多尺度生成对抗融合网络的方法，流程图如图1所示，具体按照以下步骤实施：The method of the present invention for the multi-scale generation confrontation fusion network of infrared and visible light images, the flowchart is shown in Figure 1, and the specific implementation is carried out according to the following steps:

结合图2，步骤1、从标准训练集中选取若干红外与可见光图像对，然后将图像对输入至边缘保持滤波器(双边滤波)，得到基础层和细节层；With reference to Figure 2, step 1, select several infrared and visible light image pairs from the standard training set, and then input the image pairs to the edge preserving filter (bilateral filtering) to obtain the base layer and the detail layer;

步骤1中滤波公式如下：The filtering formula in step 1 is as follows:

其中：in:

式(1)中I_q为输入图像，

是部分输入图像块，

是

周边图像块，

是空间滤波器内核，

is a partial input image patch,

Yes

surrounding image blocks,

is the spatial filter kernel,

结合图3，步骤2、将步骤1得到的基础层输入至梯度滤波器得到梯度图和新的基础层，梯度图和步骤1的细节层相加作为新的细节层；With reference to Figure 3, step 2, input the base layer obtained in step 1 into the gradient filter to obtain a gradient map and a new base layer, and add the gradient map and the detail layer of step 1 as a new detail layer;

步骤2中in step 2

梯度滤波原理如下：The principle of gradient filtering is as follows:

梯度滤波简单来说就是对图像进行求导，包括三种不同的滤波器：Sobel、Scharr和Laplacian；Sobel和Scharr是求一阶导数，Scharr是对Sobel的优化，Laplacian是求二阶导数。此处采用的是Sobel滤波器，目的是让高频信息通过，阻挡低频信息，使得边缘更加明显达到增强图像的目的。其Gradient filtering is simply the derivation of the image, including three different filters: Sobel, Scharr and Laplacian; Sobel and Scharr are the first derivative, Scharr is the optimization of Sobel, and Laplacian is the second derivative. The Sobel filter is used here, the purpose is to let high-frequency information pass, block low-frequency information, and make the edge more obvious to enhance the image. That

具体原理如下：The specific principles are as follows:

G_X＝G_x*A and G_y＝G_y*A (4)G _x =G _x *A and G _y =G _y *A (4)

然后定义一个阈值Gmax(此处定义为150)，如若该像素点梯度值大于阈值设为白色，否则为黑色，这样求得梯度图I_G；Then define a threshold Gmax (defined as 150 here), if the gradient value of the pixel is greater than the threshold, set it as white, otherwise it is black, and thus obtain the gradient map _IG ;

结合图4、图5，步骤3、将步骤2得到的基础层和细节层输入至生成器网络G，经过生成器网络G后得到对应源图像对的融合图像，计算生成器损失函数L_G,对生成器网络G参数进行更新，得到最终的生成器网络参数，将源图像分别和融合图像输入至判别器网络D进行分类，计算判别器损失函数L_D，对判别器网络参数进行更新，得到最终的判别器网络参数；With reference to Figure 4 and Figure 5, in step 3, the base layer and detail layer obtained in step 2 are input to the generator network G, and after the generator network G, the fusion image corresponding to the source image pair is obtained, and the generator loss function L _G is calculated, Update the generator network G parameters to obtain the final generator network parameters, input the source image and the fusion image to the discriminator network D for classification, calculate the discriminator loss function L _D , and update the discriminator network parameters to obtain The final discriminator network parameters;

L_G＝λL_content+L_Gen， (6)L _G =λL _content +L _Gen , (6)

is the gradient operator, and ξ is a constant;

L_Gen＝E[log(1-D_V(G(I_b,I_d)))]+E[log(1-D_I(G(I_b,I_d)))] (8)L _Gen = E[log(1-D _V (G(I _b , I _d )))]+E[log(1-D _I (G(I _b , I _d )))] (8)

计算生成器损失函数L_G，同时利用SGD(随机梯度下降)对网络参数进行更新从而达到优化的目的，得到生成器的网络参数。Calculate the generator loss function L _G , and use SGD (Stochastic Gradient Descent) to update the network parameters to achieve the purpose of optimization, and obtain the network parameters of the generator.

步骤3中两个判别器网络D_I和D_V具有相同的网络结构，均由五层卷积神经网络组成，前四层结构相同，网络结构为3×3的卷积层、批量归一化层和激活层，该激活层的激活函数为Leaky Relu；最后一层为全连接层，输出的是对输入的分类结果，从而预测输出是融合图像还是源图像(此处源图像有两种可能，红外与可见光图像。具体指代哪一个取决于判别器是D_I还是D_V，D_I指代红外图像，D_V指代可见光图像)；In step 3, the two discriminator networks D _I and D _V have the same network structure, both composed of five-layer convolutional neural networks, the first four layers have the same structure, and the network structure is 3 × 3 convolutional layers, batch normalization layer and activation layer, the activation function of the activation layer is Leaky Relu; the last layer is a fully connected layer, which outputs the classification result of the input, so as to predict whether the output is a fusion image or a source image (there are two possibilities for the source image here , infrared and visible light images. Which one refers to depends on whether the discriminator is _DI or _DV , _DI refers to infrared images, and _DV refers to visible light images);

和

更新网络参数的优化方法是SGD(随机梯度下降)，最终得到判别器的网络参数。Set a threshold for the output of the discriminator. When the output value of the discriminator is greater than the preset threshold, continue to update the network parameters until it is less than the preset threshold. In this process, after passing through the discriminator D _I and D _V , calculate the corresponding The discriminator loss function of

and

具体的，将步骤4得到的生成器网络参数加载到测试网络中的生成器网络中，将测试的红外与可见光源图像进行多尺度分解，即步骤1、步骤2中的滤波操作，然后将分解得到对应的基础层和细节层拼接起来当作测试网络的输入，得到的输出即为最终的融合图像。Specifically, the generator network parameters obtained in step 4 are loaded into the generator network in the test network, and the tested infrared and visible light source images are subjected to multi-scale decomposition, that is, the filtering operations in steps 1 and 2, and then the decomposition The corresponding base layer and detail layer are spliced together as the input of the test network, and the obtained output is the final fusion image.

Claims

1. The method for multi-scale generation confrontation fusion network for infrared and visible light images, characterized in that it is specifically implemented according to the following steps:

Step 1. Select several infrared and visible light image pairs from the standard training set, and then input the image pairs into the edge preserving filter to obtain the base layer and the detail layer;

Step 2. Input the base layer obtained in step 1 into the gradient filter to obtain a gradient map and a new base layer, and add the gradient map and the detail layer of step 1 as a new detail layer;

Step 3. Input the base layer and the detail layer obtained in step 2 to the generator network G, obtain the fusion image corresponding to the source image pair after passing through the generator network G, calculate the generator loss function L _G , and perform the generator network G parameters. update to obtain the final generator network parameters, input the source image and the fusion image to the discriminator network D for classification, calculate the discriminator loss function L _D , update the discriminator network parameters, and obtain the final discriminator network parameters;

Step 4. Start training the network, and determine whether the iteration is over, that is, whether the current number of iterations has reached the set number of iterations, and the network parameters obtained when the number of iterations reaches the set number of iterations are used as the final network parameters, and the network parameters are saved;

Step 5. Load the generator network parameters obtained in step 4 into the generator network in the test network, and perform multi-scale decomposition of the tested infrared and visible light source images, that is, the filtering operations in steps 1 and 2, and then decompose The corresponding base layer and detail layer are spliced together as the input of the test network, and the obtained output is the final fusion image.

2. the method for multi-scale generation confrontation fusion network oriented to infrared and visible light images according to claim 1, is characterized in that, in described step 1, filtering formula is as follows:

in:

In formula (1), I _q is the input image,

is a partial input image patch,

Yes

surrounding image blocks,

is the spatial filter kernel,

In formula (3), I _d0 is the detail layer obtained by bilateral filtering, and I _b0 is the obtained base layer.

3. The method for multi-scale generation confrontation fusion network for infrared and visible light images according to claim 2, wherein in the step 2

The gradient value of all pixels in the image is obtained by the following formula:

Then define a threshold Gmax, if the gradient value of the pixel is greater than the threshold, set it as white, otherwise it is black, and thus obtain the gradient map _IG ;

The base layer I _b0 obtained in step 1 is subjected to gradient filtering to obtain a gradient map I _G , and then the original base layer I _b0 is subtracted from the gradient map to obtain a new base layer I _b , and the gradient map I _G is added to the original detail layer I _d0 A new level of detail I _d is available.

4. The method for multi-scale generation confrontation fusion network for infrared and visible light images according to claim 3, characterized in that, in the step 3, the generator network structure is composed of a two-stream network and a convolutional neural network followed by it. , the upper and lower networks in the dual-stream network have the same structure, both are six-layer convolutional neural networks, the first four layers have the same structure, and the network structure is a 3×3 convolutional layer, a batch normalization layer and an activation layer. The activation function is Leaky Relu; the latter two layers have the same structure, consisting of a 5×5 convolution layer, a batch normalization layer and an activation layer. The activation function of the activation layer is Leaky Relu, and the network structure behind the two-stream network consists of 1 A ×1 convolutional layer and an activation layer are composed. The activation function of the activation layer is tanh, and the output of this layer of convolutional neural network is the final fusion image.

5. The method for multi-scale generation confrontation fusion network for infrared and visible light images according to claim 4, characterized in that, in the step 3, the generator loss function _LG is:

L _G =λL _content +L _Gen , (6)

where L _content is the content loss after comparing the generator input and output, L _Gen is the adversarial loss between the generator and the discriminator, and λ is a constant;

Among them, H and W are the height and width of the image input by the generator, ||·|| ₂ is the calculation of the second norm, I _f is the output of the generator, i.e. the fusion image, I _b is the base layer of the input generator, I _d is the detail layer of the input generator,

is the gradient operator, and ξ is a constant;

L _Gen = E[log(1-D _V (G(I _b , I _d )))]+E[log(1-D _I (G(I _b , I _d )))] (8)

D _V (G(I _b , I _d )) represents the discriminator discriminant value input with infrared image or fusion image, G(I _b , I _d ) represents the fusion image generated by the generator, D _I (G(I _{b )} , I _d )) represents the discriminator discriminant value input with visible light image or fusion image;

Calculate the generator loss function L _G , and use SGD to update the network parameters to achieve the purpose of optimization, and obtain the network parameters of the generator.

6. The method for multi-scale generation adversarial fusion network for infrared and visible light images according to claim 5, characterized in that, in the step 3, the two discriminator networks D _I and D _V have the same network structure, both of which have the same network structure. It consists of a five-layer convolutional neural network. The first four layers have the same structure. The network structure is a 3×3 convolution layer, a batch normalization layer and an activation layer. The activation function of the activation layer is Leaky Relu; the last layer is full The connection layer outputs the classification result of the input, thereby predicting whether the output is a fusion image or a source image;

The loss function of the discriminator whose input is the infrared image and the fused image is:

Wherein, D _I (I _I ) represents the discriminator discriminant value that takes infrared image as input, D _I (G(I _b , I _d )) represents the discriminator discriminant value that takes fusion image as input;

The loss function of the discriminator whose input is the visible light image and the fused image is:

Wherein, D _V ( _IV ) represents the discriminator discriminant value that takes the visible light image as input, and D _V (G(I _b , I _d )) represents the discriminator discriminant value that takes the fusion image as input;

Set a threshold for the output of the discriminator. When the output value of the discriminator is greater than the preset threshold, continue to update the network parameters until it is less than the preset threshold. In this process, after passing through the discriminator D _I and D _V , calculate the corresponding The discriminator loss function of

and