CN114841907A - A method for multi-scale generative adversarial fusion networks for infrared and visible light images - Google Patents

A method for multi-scale generative adversarial fusion networks for infrared and visible light images Download PDF

Info

Publication number
CN114841907A
CN114841907A CN202210599873.3A CN202210599873A CN114841907A CN 114841907 A CN114841907 A CN 114841907A CN 202210599873 A CN202210599873 A CN 202210599873A CN 114841907 A CN114841907 A CN 114841907A
Authority
CN
China
Prior art keywords
image
network
layer
discriminator
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210599873.3A
Other languages
Chinese (zh)
Other versions
CN114841907B (en
Inventor
王文卿
张纪乾
刘涵
李余兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN202210599873.3A priority Critical patent/CN114841907B/en
Publication of CN114841907A publication Critical patent/CN114841907A/en
Application granted granted Critical
Publication of CN114841907B publication Critical patent/CN114841907B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • G06T2207/20028Bilateral filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for generating a confrontation fusion network in a multi-scale mode facing infrared and visible light images, which comprises the steps of selecting a plurality of infrared and visible light image pairs from a standard training set, inputting the image pairs into an edge-preserving filter, and obtaining a basic layer and a detail layer; and inputting the basic layer into a gradient filter to obtain a gradient map and a new basic layer, adding the gradient map and the original detail layer to obtain a new detail layer, calculating to obtain network parameters of the discriminator, training, and finally obtaining an output which is a final fusion image. The image obtained by fusion retains the target information and the texture information of the source image to the maximum extent, improves the quality of the fused image, and provides more convenient prerequisite for subsequent target detection and identification.

Description

面向红外与可见光图像的多尺度生成对抗融合网络的方法A method for multi-scale generative adversarial fusion networks for infrared and visible light images

技术领域technical field

本发明属于数字图像处理中图像分解和图像融合技术领域,具体涉及一种面向红外与可见光图像的多尺度生成对抗融合网络的方法。The invention belongs to the technical field of image decomposition and image fusion in digital image processing, and in particular relates to a multi-scale generation confrontation fusion network method for infrared and visible light images.

背景技术Background technique

图像融合是信息融合领域的一个分支,属于交叉领域的研究,涉及到传感器成像、图像预处理、计算机视觉和人工智能等领域。随着多类型成像传感器的飞速发展,单一传感器提供的图像目标信息有限的问题得到了有效解决。针对同一场景,通过将来自相同或不同成像传感器的两张或多张源图像进行融合,能够得到信息丰富、清晰度高的融合图像。可见光传感器是利用物体反射光线进行成像,得到的图像具有高分辨率、细节丰富的特点。但在光照条件不好的情况下,获得的图像就不太清晰。红外传感器成像时,通过目标的热辐射信息进行成像,穿透力较强,同时可以解决可见光传感器在光照不足或有物体遮挡的情况下成像效果不好的问题,它在光照条件较差时仍能够探测到目标,但是所成图像细节信息和对比度信息不足。红外与可见光图像融合技术可实现将两类图像各自优势进行互补,保证最后所得融合图像中饱含热辐射信息、对比度信息和细节信息,以便更好的了解图像目标信息,最终实现系统全天候工作。近年来,基于多尺度图像融合技术取得了重要进展。通常,基于多尺度变换的红外和可见光图像融合方案包括三个步骤。首先,将每个源图像分解为一系列多尺度表示,然后根据给定的融合规则对源图像的多尺度表示进行融合,最后对融合后的图像进行相应的多尺度逆变换。同时随着深度学习的飞速发展,无监督深度学习在融合领域得到拓展并取得一定成果。此类方法虽更适合进行无参考图像的多源图像融合,但对于网络结构、损失函数的设计提出了更高的要求。因此,基于无监督的生成对抗网络融合方法逐渐受到了研究人员的关注。Image fusion is a branch of the field of information fusion, which is a cross-domain research involving sensor imaging, image preprocessing, computer vision, and artificial intelligence. With the rapid development of multi-type imaging sensors, the problem of limited image target information provided by a single sensor has been effectively solved. For the same scene, by fusing two or more source images from the same or different imaging sensors, a fusion image with rich information and high definition can be obtained. The visible light sensor uses the reflected light of the object to image, and the obtained image has the characteristics of high resolution and rich details. But in poor lighting conditions, the resulting images are less clear. When the infrared sensor is imaging, it uses the thermal radiation information of the target to image, and the penetrating power is strong. At the same time, it can solve the problem that the imaging effect of the visible light sensor is not good when the light is insufficient or there is an object occluded. The target can be detected, but the detail information and contrast information of the resulting image are insufficient. Infrared and visible light image fusion technology can complement the respective advantages of the two types of images, ensuring that the final fusion image is full of thermal radiation information, contrast information and detail information, so as to better understand the image target information, and finally realize the all-weather operation of the system. In recent years, important progress has been made in multi-scale image fusion technology. Generally, the fusion scheme of infrared and visible light images based on multi-scale transformation includes three steps. First, each source image is decomposed into a series of multi-scale representations, then the multi-scale representations of the source images are fused according to the given fusion rules, and finally the corresponding multi-scale inverse transformation is performed on the fused images. At the same time, with the rapid development of deep learning, unsupervised deep learning has been expanded in the field of fusion and achieved certain results. Although this method is more suitable for multi-source image fusion without reference images, it puts forward higher requirements for the design of network structure and loss function. Therefore, unsupervised generative adversarial network fusion methods have gradually attracted the attention of researchers.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种面向红外与可见光图像的多尺度生成对抗融合网络的方法,通过对源图像进行多尺度分解,然后将分解后的对应分量输入至生成对抗网络从而得到最终的融合图像,这样融合得到的图像最大限度地保留源图像的目标信息和纹理信息,提高了融合图像的质量,对后续目标检测和识别提供了更便利的先决条件。The purpose of the present invention is to provide a multi-scale generative adversarial fusion network method for infrared and visible light images, by performing multi-scale decomposition on the source image, and then inputting the decomposed corresponding components into the generative adversarial network to obtain the final fusion image , so that the fused image retains the target information and texture information of the source image to the greatest extent, improves the quality of the fused image, and provides a more convenient prerequisite for subsequent target detection and recognition.

本发明所采用的技术方案是,面向红外与可见光图像的多尺度生成对抗融合网络的方法,具体按照以下步骤实施:The technical solution adopted in the present invention is a method for multi-scale generation of an adversarial fusion network for infrared and visible light images, which is specifically implemented according to the following steps:

步骤1、从标准训练集中选取若干红外与可见光图像对,然后将图像对输入至边缘保持滤波器,得到基础层和细节层;Step 1. Select several infrared and visible light image pairs from the standard training set, and then input the image pairs into the edge preserving filter to obtain the base layer and the detail layer;

步骤2、将步骤1得到的基础层输入至梯度滤波器得到梯度图和新的基础层,梯度图和步骤1的细节层相加作为新的细节层;Step 2. Input the base layer obtained in step 1 into the gradient filter to obtain a gradient map and a new base layer, and add the gradient map and the detail layer of step 1 as a new detail layer;

步骤3、将步骤2得到的基础层和细节层输入至生成器网络G,经过生成器网络G后得到对应源图像对的融合图像,计算生成器损失函数LG,对生成器网络G参数进行更新,得到最终的生成器网络参数,将源图像分别和融合图像输入至判别器网络D进行分类,计算判别器损失函数LD,对判别器网络参数进行更新,得到最终的判别器网络参数;Step 3. Input the base layer and detail layer obtained in step 2 to the generator network G, and obtain the fusion image corresponding to the source image pair after passing through the generator network G, calculate the generator loss function L G , and perform the generator network G parameters. update to obtain the final generator network parameters, input the source image and the fusion image to the discriminator network D for classification, calculate the discriminator loss function L D , update the discriminator network parameters, and obtain the final discriminator network parameters;

步骤4、开始训练网络,判别迭代是否结束,即当前的迭代次数是否达到了设定的迭代次数,迭代次数达到设置的迭代次数求得的网络参数作为最终的网络参数,将网络参数保存;Step 4. Start training the network, and determine whether the iteration is over, that is, whether the current number of iterations has reached the set number of iterations, and the network parameters obtained when the number of iterations reaches the set number of iterations are used as the final network parameters, and the network parameters are saved;

步骤5、将步骤4得到的生成器网络参数加载到测试网络中的生成器网络中,将测试的红外与可见光源图像进行多尺度分解,即步骤1、步骤2中的滤波操作,然后将分解得到对应的基础层和细节层拼接起来当作测试网络的输入,得到的输出即为最终的融合图像。Step 5. Load the generator network parameters obtained in step 4 into the generator network in the test network, and perform multi-scale decomposition of the tested infrared and visible light source images, that is, the filtering operations in steps 1 and 2, and then decompose The corresponding base layer and detail layer are spliced together as the input of the test network, and the obtained output is the final fusion image.

本发明的特点还在于,The present invention is also characterized in that,

步骤1中滤波公式如下:The filtering formula in step 1 is as follows:

Figure BDA0003666431360000031
Figure BDA0003666431360000031

其中:in:

Figure BDA0003666431360000032
Figure BDA0003666431360000032

式(1)中Iq为输入图像,

Figure BDA0003666431360000033
为滤波后图像,q是Iq的一个像素点,s是q的像素集合,p是q领域中的一个像素,
Figure BDA0003666431360000034
是部分输入图像块,
Figure BDA0003666431360000035
Figure BDA0003666431360000036
周边图像块,
Figure BDA0003666431360000037
是空间滤波器内核,
Figure BDA0003666431360000038
是距离滤波器内核,空间内核和距离内核通常都以高斯的方式表示;In formula (1), I q is the input image,
Figure BDA0003666431360000033
is the filtered image, q is a pixel of I q , s is the set of pixels of q, p is a pixel in the field of q,
Figure BDA0003666431360000034
is a partial input image patch,
Figure BDA0003666431360000035
Yes
Figure BDA0003666431360000036
surrounding image blocks,
Figure BDA0003666431360000037
is the spatial filter kernel,
Figure BDA0003666431360000038
is the distance filter kernel, the spatial kernel and the distance kernel are usually expressed in Gaussian form;

Figure BDA0003666431360000039
Figure BDA0003666431360000039

式(3)中Id0为经过双边滤波得到的细节层,Ib0为得到的基础层。In formula (3), I d0 is the detail layer obtained by bilateral filtering, and I b0 is the obtained base layer.

步骤2中in step 2

通过下面公式求出图像中所有像素点的梯度值:The gradient value of all pixels in the image is obtained by the following formula:

Figure BDA00036664313600000310
Figure BDA00036664313600000310

然后定义一个阈值Gmax,如若该像素点梯度值大于阈值设为白色,否则为黑色,这样求得梯度图IGThen define a threshold Gmax, if the gradient value of the pixel is greater than the threshold, set it as white, otherwise it is black, and thus obtain the gradient map IG ;

将步骤1得到的基础层Ib0进行梯度滤波,得到梯度图IG,然后将原基础层Ib0减去梯度图可得新基础层Ib,梯度图IG同原细节层Id0相加可得新细节层IdThe base layer I b0 obtained in step 1 is subjected to gradient filtering to obtain a gradient map I G , and then the original base layer I b0 is subtracted from the gradient map to obtain a new base layer I b , and the gradient map I G is added to the original detail layer I d0 A new level of detail I d is available.

步骤3中生成器网络结构由双流网络和其后面接的卷积神经网络组成,双流网络中上下两支网络结构相同,均为六层卷积神经网络,前四层结构相同,网络结构为3×3的卷积层、批量归一化层和激活层,该激活层的激活函数为Leaky Relu;后两层结构相同,由5×5的卷积层、批量归一化层和激活层组成,该激活层的激活函数为Leaky Relu,双流网络后面接的网络结构由1×1的卷积层和激活层组成,该激活层的激活函数为tanh,这一层卷积神经网络的输出就是最终的融合图像。In step 3, the generator network structure consists of a dual-stream network and a convolutional neural network followed by it. The upper and lower network structures in the dual-stream network are the same, both are six-layer convolutional neural networks, the first four layers have the same structure, and the network structure is 3 ×3 convolutional layer, batch normalization layer and activation layer, the activation function of this activation layer is Leaky Relu; the latter two layers have the same structure and consist of 5×5 convolutional layer, batch normalization layer and activation layer , the activation function of this activation layer is Leaky Relu, the network structure behind the two-stream network consists of a 1×1 convolution layer and an activation layer, the activation function of this activation layer is tanh, and the output of this layer of convolutional neural network is The final fused image.

步骤3中生成器损失函数LG为:The generator loss function LG in step 3 is:

LG=λLcontent+LGen,(6)L G =λL content +L Gen , (6)

其中,Lcontent是生成器输入和输出的比较后的内容损失,LGen是生成器和判别器的对抗损失,λ为常数;where L content is the content loss after comparing the generator input and output, L Gen is the adversarial loss between the generator and the discriminator, and λ is a constant;

Figure BDA0003666431360000041
Figure BDA0003666431360000041

其中,H和W分别是生成器输入的图像高度和宽度,||·||2为计算二范数,If为生成器输出即融合图像,Ib为输入生成器的基础层,Id为输入生成器的细节层,

Figure BDA0003666431360000042
为梯度算子,ξ为常量;Among them, H and W are the height and width of the image input by the generator, ||·|| 2 is the calculation of the second norm, I f is the output of the generator, i.e. the fusion image, I b is the base layer of the input generator, I d is the detail layer of the input generator,
Figure BDA0003666431360000042
is the gradient operator, and ξ is a constant;

LGen=E[log(1-DV(G(Ib,Id)))]+E[log(1-DI(G(Ib,Id)))](8)L Gen = E[log(1-D V (G(I b , I d )))]+E[log(1-D I (G(I b , I d )))](8)

DV(G(Ib,Id))表示以红外图像或者融合图像为输入的判别器判别值,G(Ib,Id)表示生成器生成的融合图像,DI(G(Ib,Id))表示以可见光图像或者融合图像为输入的判别器判别值。D V (G(I b , I d )) represents the discriminator discriminant value input with infrared image or fusion image, G(I b , I d ) represents the fusion image generated by the generator, D I (G(I b ) , I d )) represents the discriminator discriminant value that takes the visible light image or the fusion image as input.

计算生成器损失函数LG,同时利用SGD对网络参数进行更新从而达到优化的目的,得到生成器的网络参数。Calculate the generator loss function L G , and use SGD to update the network parameters to achieve the purpose of optimization, and obtain the network parameters of the generator.

步骤3中两个判别器网络DI和DV具有相同的网络结构,均由五层卷积神经网络组成,前四层结构相同,网络结构为3×3的卷积层、批量归一化层和激活层,该激活层的激活函数为Leaky Relu;最后一层为全连接层,输出的是对输入的分类结果,从而预测输出是融合图像还是源图像;In step 3, the two discriminator networks D I and D V have the same network structure, both composed of five-layer convolutional neural networks, the first four layers have the same structure, and the network structure is 3 × 3 convolutional layers, batch normalization layer and activation layer, the activation function of the activation layer is Leaky Relu; the last layer is a fully connected layer, the output is the classification result of the input, so as to predict whether the output is a fusion image or a source image;

输入为红外图像和融合图像的判别器损失函数为:The loss function of the discriminator whose input is the infrared image and the fused image is:

Figure BDA0003666431360000051
Figure BDA0003666431360000051

其中,DI(II)表示以红外图像作为输入的判别器判别值,DI(G(Ib,Id))表示以融合图像作为输入的判别器判别值;Wherein, D I (I I ) represents the discriminator discriminant value that takes infrared image as input, D I (G(I b , I d )) represents the discriminator discriminant value that takes fusion image as input;

输入为可见光图像和融合图像的判别器损失函数为:The loss function of the discriminator whose input is the visible light image and the fused image is:

Figure BDA0003666431360000052
Figure BDA0003666431360000052

其中,DV(IV)表示以可见光图像作为输入的判别器判别值,DV(G(Ib,Id))表示以融合图像作为输入的判别器判别值;Wherein, D V ( IV ) represents the discriminator discriminant value that takes the visible light image as input, and D V (G(I b , I d )) represents the discriminator discriminant value that takes the fusion image as input;

给判别器输出设一个阈值,当其判别器输出值时大于预设的阈值时继续更新网络参数,直至小于预设的阈值,在此过程中要经过判别器DI和DV后,计算对应的判别器损失函数

Figure BDA0003666431360000053
Figure BDA0003666431360000054
更新网络参数的优化方法是SGD最终得到判别器的网络参数。Set a threshold for the output of the discriminator. When the output value of the discriminator is greater than the preset threshold, continue to update the network parameters until it is less than the preset threshold. In this process, after passing through the discriminator D I and D V , calculate the corresponding The discriminator loss function of
Figure BDA0003666431360000053
and
Figure BDA0003666431360000054
The optimization method for updating the network parameters is that SGD finally obtains the network parameters of the discriminator.

本发明的有益效果是,一种面向红外与可见光图像的多尺度生成对抗融合网络的方法,结合了多尺度分解和生成对抗网络,既将源图像进行了优化处理,也把融合效果好的神经网络应用到了融合过程中。一种面向红外与可见光图像的多尺度生成对抗融合网络的方法,首先通过边缘滤波保持器和梯度滤波得到图像的基础层和细节层,这样得到的图像分量最大限度保留了我们所需要的信息,然后利用生成对抗网络中生成器网络的两个分支网络分别融合多尺度分解后的基础层(结构信息)和细节层(细节信息),最终将生成的基础层图像和细节层图像相加得到最后的融合图像,同时生成对抗网络中两个判别器结构对两个源图像和融合图像进行分类判别。经过本发明融合得到的图像最大限度地保留源图像的目标信息和纹理信息,提高了融合图像的质量,这样对后续目标检测和识别提供了更便利的条件。The beneficial effect of the present invention is that a multi-scale generation confrontation fusion network method for infrared and visible light images combines multi-scale decomposition and generation confrontation network, which not only optimizes the source image, but also integrates neural networks with good fusion effects. The network is applied to the fusion process. A method for multi-scale generative adversarial fusion network for infrared and visible light images. First, the base layer and detail layer of the image are obtained through edge filter holder and gradient filtering. The obtained image components retain the information we need to the greatest extent. Then, the two branch networks of the generator network in the generative adversarial network are used to fuse the base layer (structure information) and the detail layer (detail information) after multi-scale decomposition respectively, and finally the generated base layer image and detail layer image are added to get the final result. The fusion image is generated, and the two discriminator structures in the generative adversarial network are used to classify and discriminate the two source images and the fusion image. The image obtained by the fusion of the present invention retains the target information and texture information of the source image to the greatest extent, improves the quality of the fusion image, and thus provides more convenient conditions for subsequent target detection and recognition.

附图说明Description of drawings

图1是本发明的面向红外与可见光图像的多尺度生成对抗融合网络的方法整体流程图;Fig. 1 is the overall flow chart of the method for multi-scale generation confrontation fusion network for infrared and visible light images of the present invention;

图2是本发明的源图像双边滤波后的基础层和细节层图;Fig. 2 is the base layer and detail layer diagram after the source image bilateral filtering of the present invention;

图3是本发明的双边滤波后的基础层进行梯度滤波后得到新的基础层和细节层图;Fig. 3 is the base layer after bilateral filtering of the present invention to obtain new base layer and detail layer diagram after gradient filtering;

图4是本发明生成对抗网络中生成器的网络结构图;Fig. 4 is the network structure diagram of the generator in the generation confrontation network of the present invention;

图5是本发明生成对抗网络中判别器的网络结构图。FIG. 5 is a network structure diagram of the discriminator in the Generative Adversarial Network of the present invention.

具体实施方式Detailed ways

下面结合附图和具体实施方式对本发明进行详细说明。The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

本发明面向红外与可见光图像的多尺度生成对抗融合网络的方法;首先通过边缘滤波保持器和梯度滤波对源图像进行分解,得到图像的基础层和细节层,然后将基础层和细节层输入至生成对抗网络中生成器网络进行融合,然后将融合图像和两个源图像分别输入至判别器中进行判别来优化网络参数,得到最终的融合图像,实现图像融合。算法总体网络结构如图1所示,基于多尺度分解后生成对抗网络的红外与可见光图像融合过程主要分为以下三个阶段;The method of the present invention is oriented to the multi-scale generation confrontation fusion network of infrared and visible light images; firstly, the source image is decomposed by the edge filter holder and gradient filtering to obtain the base layer and the detail layer of the image, and then the base layer and the detail layer are input into In the generative adversarial network, the generator network is fused, and then the fused image and the two source images are respectively input into the discriminator for discrimination to optimize the network parameters, and the final fused image is obtained to realize image fusion. The overall network structure of the algorithm is shown in Figure 1. The fusion process of infrared and visible light images based on the multi-scale decomposition of the generative adversarial network is mainly divided into the following three stages;

1)源图像多尺度分解1) Multi-scale decomposition of source image

源图像的多尺度分解主要分为三步,首先将图像输入至边缘保持滤波器(双边滤波)得到基础层和细节层,得到的基础层和细节层如图2所示,然后将基础层通过梯度滤波得到梯度图和新的基础层,最后将梯度图和细节层相加当作新的细节层,得到新的基础层和新的细节层如图3所示。双边滤波和梯度滤波原理如下:The multi-scale decomposition of the source image is mainly divided into three steps. First, the image is input to the edge-preserving filter (bilateral filtering) to obtain the base layer and the detail layer. The obtained base layer and detail layer are shown in Figure 2, and then the base layer is passed through Gradient filtering obtains a gradient map and a new base layer. Finally, the gradient map and the detail layer are added as a new detail layer, and a new base layer and a new detail layer are obtained as shown in Figure 3. The principles of bilateral filtering and gradient filtering are as follows:

双边滤波是一种边缘保持滤波器,它可以达到保持边缘、降噪平滑的效果。和其他滤波原理一样,双边滤波也是采用加权平均的方法,用周边像素亮度值的加权平均代表某个像素的强度,所用的加权平均基于高斯分布。最重要的是,双边滤波的权重不仅考虑了像素的欧氏距离(如普通的高斯低通滤波,只考虑了位置对中心像素的影响),还考虑了像素范围域中的辐射差异(例如卷积核中像素与中心像素之间相似程度、颜色强度,深度距离等),在计算中心像素的时候同时考虑这两个权重。滤波公式如下:Bilateral filtering is an edge-preserving filter, which can achieve the effect of maintaining edges and reducing noise and smoothness. Like other filtering principles, bilateral filtering also uses a weighted average method, where the weighted average of the brightness values of surrounding pixels represents the intensity of a pixel, and the weighted average used is based on a Gaussian distribution. Most importantly, the weights of bilateral filtering take into account not only the Euclidean distance of pixels (like ordinary Gaussian low-pass filtering, which only considers the effect of position on the central pixel), but also the radiance differences in the pixel range domain (such as volume Similarity, color intensity, depth distance, etc. between the pixels in the product kernel and the center pixel), these two weights are considered at the same time when calculating the center pixel. The filtering formula is as follows:

Figure BDA0003666431360000071
Figure BDA0003666431360000071

Figure BDA0003666431360000072
Figure BDA0003666431360000072

式(1)中Iq为输入图像,

Figure BDA0003666431360000073
为滤波后图像In formula (1), I q is the input image,
Figure BDA0003666431360000073
is the filtered image

步骤2,将步骤1得到的基础层输入至梯度滤波器得到梯度图和新的基础层,梯度图和步骤1的细节层相加作为新的细节层。梯度滤波原理如下:Step 2, input the base layer obtained in step 1 to the gradient filter to obtain a gradient map and a new base layer, and add the gradient map and the detail layer of step 1 as a new detail layer. The principle of gradient filtering is as follows:

梯度简单来说就是求导,三种不同的滤波器:Sobel、Scharr和Laplacian;Sobel、Scharr其实就是求一阶或者二阶导数;Scharr是对Sobel的优化;Laplacian是求二阶导数。这里采用的是Sobel滤波器,目的是让高频通过,阻挡低频,使得边缘更加明显达到增强图像的目的。其具体原理如下:Gradient is simply derivation, three different filters: Sobel, Scharr and Laplacian; Sobel, Scharr are actually the first or second derivative; Scharr is the optimization of Sobel; Laplacian is the second derivative. The Sobel filter is used here, the purpose is to let the high frequency pass, block the low frequency, and make the edge more obvious to enhance the image. The specific principle is as follows:

Sobel算子是一离散性差分算子,用来运算图像亮度函数的灰度之近似值。在图像的任何一点使用此算子,将会产生对应的灰度矢量或是其法矢量。The Sobel operator is a discrete difference operator, which is used to calculate the approximation of the gray level of the image brightness function. Using this operator at any point in the image will yield the corresponding grayscale vector or its normal vector.

Figure BDA0003666431360000081
Figure BDA0003666431360000081

该算子包含两组3x3的矩阵,分别为横向及纵向,将之与图像作平面卷积,即可分别得出横向及纵向的亮度差分近似值。如果以A代表原始图像,Gx及Gy分别代表经横向及纵向边缘检测的图像灰度值,其公式如下:The operator consists of two sets of 3x3 matrices, which are horizontal and vertical, respectively. By convolving them with the image plane, the approximation of the horizontal and vertical luminance differences can be obtained respectively. If A represents the original image, Gx and Gy represent the gray value of the image detected by the horizontal and vertical edges, respectively, and the formula is as follows:

GX=Gx*A and GY=Gy*A (4)G X =G x *A and G Y =G y *A (4)

2)生成对抗网络参数获取2) Generative Adversarial Network Parameter Acquisition

判别器网络参数获取:将1)得到的基础层和细节层在图像通道的维度上两两拼接起来作为生成器的输入,生成器的网络结构图如图4所示,它是由一个双流网络后接一个卷积块组成。双流网络上下两支网络相同,由六层卷积神经网络组成,前四层结构相同,网络结构为3×3的卷积层、批量归一化层和激活层(激活函数为Leaky Relu);后两层结构相同,由5×5的卷积层、批量归一化层和激活层(激活函数为Leaky Relu)组成。双流网络后面接的网络结构由1×1的卷积层和激活层(激活函数为tanh)组成,这一层输出的就是最终的融合图像。然后将各自的融合结果(基础层、细节层的融合图)进行拼接后得到最终的融合图像。在此过程中要经过生成器G后,计算生成器损失函数LG,同时利用SGD(随机梯度下降)对网络参数进行更新从而达到优化的目的,得到生成器的网络参数。Obtaining the network parameters of the discriminator: The base layer and the detail layer obtained in 1) are spliced together in the dimension of the image channel as the input of the generator. The network structure diagram of the generator is shown in Figure 4, which is composed of a two-stream network. It is followed by a convolution block. The upper and lower two networks of the dual-stream network are the same and consist of six-layer convolutional neural networks. The first four layers have the same structure. The network structure is a 3×3 convolutional layer, a batch normalization layer and an activation layer (the activation function is Leaky Relu); The latter two layers have the same structure, consisting of a 5×5 convolutional layer, a batch normalization layer, and an activation layer (the activation function is Leaky Relu). The network structure followed by the two-stream network consists of a 1×1 convolutional layer and an activation layer (the activation function is tanh), and the output of this layer is the final fusion image. Then the respective fusion results (the fusion map of the base layer and the detail layer) are spliced to obtain the final fusion image. In this process, after passing through the generator G , the generator loss function LG is calculated, and at the same time, SGD (stochastic gradient descent) is used to update the network parameters to achieve the purpose of optimization, and the network parameters of the generator are obtained.

判别器网络参数获取:因为源图像是一对,所以采用两个判别器,一个用来获取融合图像为红外图像的概率PI,另一个获取融合图像为可见光图像的概率PV。这两个判别器网络结构完全相同,如图5所示。它由五层卷积神经网络组成,前四层结构相同,网络结构为3×3的卷积层、批量归一化层和激活层(激活函数为Leaky Relu);最后一层为全连接层,输出的是对输入的分类结果。输入源图像和融合图像得到两个概率,当概率大于预设的阈值时继续更新网络参数,直至概率小于预设的阈值。在此过程中要经过判别器DI和DV后,计算对应的判别器损失函数

Figure BDA0003666431360000091
Figure BDA0003666431360000092
更新网络参数的优化方法是SGD(随机梯度下降),最终得到判别器的网络参数。Discriminator network parameter acquisition: Because the source image is a pair, two discriminators are used, one is used to obtain the probability P I that the fusion image is an infrared image, and the other is used to obtain the probability P V that the fusion image is a visible light image. The two discriminator network structures are exactly the same, as shown in Figure 5. It consists of five layers of convolutional neural networks. The first four layers have the same structure. The network structure is a 3×3 convolution layer, a batch normalization layer and an activation layer (the activation function is Leaky Relu); the last layer is a fully connected layer. , the output is the classification result of the input. Two probabilities are obtained from the input source image and the fusion image. When the probability is greater than the preset threshold, the network parameters are continuously updated until the probability is less than the preset threshold. In this process, after passing through the discriminator D I and D V , the corresponding discriminator loss function is calculated
Figure BDA0003666431360000091
and
Figure BDA0003666431360000092
The optimization method for updating the network parameters is SGD (Stochastic Gradient Descent), and finally the network parameters of the discriminator are obtained.

损失函数包括生成器损失函数LG和两个判别器的损失函数

Figure BDA0003666431360000093
Figure BDA0003666431360000094
设计如下:The loss function includes the generator loss function LG and the loss functions of the two discriminators
Figure BDA0003666431360000093
and
Figure BDA0003666431360000094
The design is as follows:

生成器损失函数目的是保存更多源图像的信息,其有内容损失和对抗损失两部分组成:The purpose of the generator loss function is to save more information about the source image, which consists of two parts: content loss and adversarial loss:

LG=λLcontent+LGen, (5)L G =λL content +L Gen , (5)

其中,Lcontent是生成器输入和输出的比较后的内容损失,LGen是生成器和判别器的对抗损失,λ为常数;where L content is the content loss after comparing the generator input and output, L Gen is the adversarial loss between the generator and the discriminator, and λ is a constant;

Figure BDA0003666431360000095
Figure BDA0003666431360000095

其中H和W分别是生成器输入的图像高度和宽度,||·||2为计算二范数,If为生成器输出即融合图像,Ib为输入生成器的基础层,Id为输入生成器的细节层,

Figure BDA0003666431360000096
为梯度算子,ξ为常量;where H and W are the height and width of the image input by the generator, ||·|| 2 is the calculation of the second norm, I f is the output of the generator, i.e. the fusion image, I b is the base layer of the input generator, and I d is The detail layer of the input generator,
Figure BDA0003666431360000096
is the gradient operator, and ξ is a constant;

LGen=E[log(1-DV(G(Ib,Id)))]+E[log(1-DI(G(Ib,Id)))] (7)L Gen = E[log(1-D V (G(I b , I d )))]+E[log(1-D I (G(I b , I d )))] (7)

DV(G(Ib,Id))表示以红外图像或者融合图像为输入的判别器判别值,G(Ib,Id)表示生成器生成的融合图像,DI(G(Ib,Id))表示以可见光图像或者融合图像为输入的判别器判别值。D V (G(I b , I d )) represents the discriminator discriminant value input with infrared image or fusion image, G(I b , I d ) represents the fusion image generated by the generator, D I (G(I b ) , I d )) represents the discriminator discriminant value that takes the visible light image or the fusion image as input.

用两个判别器就是有效的减少融合结果的信息丢失,其作用也是让生成器保存更多源图像信息;定义如下:Using two discriminators can effectively reduce the information loss of the fusion result, and its role is to allow the generator to save more source image information; the definition is as follows:

Figure BDA0003666431360000101
Figure BDA0003666431360000101

Figure BDA0003666431360000102
Figure BDA0003666431360000102

其中,DV(IV)表示以可见光图像作为输入的判别器判别值,DV(G(Ib,Id))表示以融合图像作为输入的判别器判别值,DI(II)表示以红外图像作为输入的判别器判别值,DI(G(Ib,Id))表示以融合图像作为输入的判别器判别值。Among them, D V ( IV ) represents the discriminator discriminant value that takes the visible light image as input, D V (G(I b , I d )) represents the discriminator discriminant value that takes the fusion image as input, D I (I I ) represents the discriminator discriminant value with infrared image as input, D I (G(I b , I d )) represents the discriminator discriminant value with fusion image as input.

3)融合测试网络3) Fusion test network

用2)得到的生成器网络参数输入至生成器网络中,将测试的图片先进行多尺度分解,然后将分解得到对应的基础层和细节层拼接起来输入至生成器网络中,这样生成器的输出就是最终的融合图像。Input the generator network parameters obtained in 2) into the generator network, first perform multi-scale decomposition on the test image, and then splicing the corresponding base layer and detail layer obtained from the decomposition and input them into the generator network, so that the generator's The output is the final fused image.

本发明面向红外与可见光图像的多尺度生成对抗融合网络的方法,流程图如图1所示,具体按照以下步骤实施:The method of the present invention for the multi-scale generation confrontation fusion network of infrared and visible light images, the flowchart is shown in Figure 1, and the specific implementation is carried out according to the following steps:

结合图2,步骤1、从标准训练集中选取若干红外与可见光图像对,然后将图像对输入至边缘保持滤波器(双边滤波),得到基础层和细节层;With reference to Figure 2, step 1, select several infrared and visible light image pairs from the standard training set, and then input the image pairs to the edge preserving filter (bilateral filtering) to obtain the base layer and the detail layer;

步骤1中滤波公式如下:The filtering formula in step 1 is as follows:

Figure BDA0003666431360000103
Figure BDA0003666431360000103

其中:in:

Figure BDA0003666431360000104
Figure BDA0003666431360000104

式(1)中Iq为输入图像,

Figure BDA0003666431360000105
为滤波后图像,q是Iq的一个像素点,s是q的像素集合,p是q领域中的一个像素,
Figure BDA0003666431360000106
是部分输入图像块,
Figure BDA0003666431360000107
Figure BDA0003666431360000108
周边图像块,
Figure BDA0003666431360000109
是空间滤波器内核,
Figure BDA00036664313600001010
是距离滤波器内核,空间内核和距离内核通常都以高斯的方式表示;In formula (1), I q is the input image,
Figure BDA0003666431360000105
is the filtered image, q is a pixel of I q , s is the set of pixels of q, p is a pixel in the field of q,
Figure BDA0003666431360000106
is a partial input image patch,
Figure BDA0003666431360000107
Yes
Figure BDA0003666431360000108
surrounding image blocks,
Figure BDA0003666431360000109
is the spatial filter kernel,
Figure BDA00036664313600001010
is the distance filter kernel, the spatial kernel and the distance kernel are usually expressed in Gaussian form;

Figure BDA0003666431360000111
Figure BDA0003666431360000111

式(3)中Id0为经过双边滤波得到的细节层,Ib0为得到的基础层。In formula (3), I d0 is the detail layer obtained by bilateral filtering, and I b0 is the obtained base layer.

结合图3,步骤2、将步骤1得到的基础层输入至梯度滤波器得到梯度图和新的基础层,梯度图和步骤1的细节层相加作为新的细节层;With reference to Figure 3, step 2, input the base layer obtained in step 1 into the gradient filter to obtain a gradient map and a new base layer, and add the gradient map and the detail layer of step 1 as a new detail layer;

步骤2中in step 2

梯度滤波原理如下:The principle of gradient filtering is as follows:

梯度滤波简单来说就是对图像进行求导,包括三种不同的滤波器:Sobel、Scharr和Laplacian;Sobel和Scharr是求一阶导数,Scharr是对Sobel的优化,Laplacian是求二阶导数。此处采用的是Sobel滤波器,目的是让高频信息通过,阻挡低频信息,使得边缘更加明显达到增强图像的目的。其Gradient filtering is simply the derivation of the image, including three different filters: Sobel, Scharr and Laplacian; Sobel and Scharr are the first derivative, Scharr is the optimization of Sobel, and Laplacian is the second derivative. The Sobel filter is used here, the purpose is to let high-frequency information pass, block low-frequency information, and make the edge more obvious to enhance the image. That

具体原理如下:The specific principles are as follows:

Sobel算子是一离散性差分算子,用来运算图像亮度函数的灰度之近似值。在图像的任何一点使用此算子,将会产生对应的灰度矢量或是其法矢量。The Sobel operator is a discrete difference operator, which is used to calculate the approximation of the gray level of the image brightness function. Using this operator at any point in the image will yield the corresponding grayscale vector or its normal vector.

Figure BDA0003666431360000112
Figure BDA0003666431360000112

该算子包含两组3x3的矩阵,分别为横向及纵向,将之与图像作平面卷积,即可分别得出横向及纵向的亮度差分近似值。如果以A代表原始图像,Gx及Gy分别代表经横向及纵向边缘检测的图像灰度值,其公式如下:The operator consists of two sets of 3x3 matrices, which are horizontal and vertical, respectively. By convolving them with the image plane, the approximation of the horizontal and vertical luminance differences can be obtained respectively. If A represents the original image, Gx and Gy represent the gray value of the image detected by the horizontal and vertical edges, respectively, and the formula is as follows:

GX=Gx*A and Gy=Gy*A (4)G x =G x *A and G y =G y *A (4)

通过下面公式求出图像中所有像素点的梯度值:The gradient value of all pixels in the image is obtained by the following formula:

Figure BDA0003666431360000113
Figure BDA0003666431360000113

然后定义一个阈值Gmax(此处定义为150),如若该像素点梯度值大于阈值设为白色,否则为黑色,这样求得梯度图IGThen define a threshold Gmax (defined as 150 here), if the gradient value of the pixel is greater than the threshold, set it as white, otherwise it is black, and thus obtain the gradient map IG ;

将步骤1得到的基础层Ib0进行梯度滤波,得到梯度图IG,然后将原基础层Ib0减去梯度图可得新基础层Ib,梯度图IG同原细节层Id0相加可得新细节层IdThe base layer I b0 obtained in step 1 is subjected to gradient filtering to obtain a gradient map I G , and then the original base layer I b0 is subtracted from the gradient map to obtain a new base layer I b , and the gradient map I G is added to the original detail layer I d0 A new level of detail I d is available.

结合图4、图5,步骤3、将步骤2得到的基础层和细节层输入至生成器网络G,经过生成器网络G后得到对应源图像对的融合图像,计算生成器损失函数LG,对生成器网络G参数进行更新,得到最终的生成器网络参数,将源图像分别和融合图像输入至判别器网络D进行分类,计算判别器损失函数LD,对判别器网络参数进行更新,得到最终的判别器网络参数;With reference to Figure 4 and Figure 5, in step 3, the base layer and detail layer obtained in step 2 are input to the generator network G, and after the generator network G, the fusion image corresponding to the source image pair is obtained, and the generator loss function L G is calculated, Update the generator network G parameters to obtain the final generator network parameters, input the source image and the fusion image to the discriminator network D for classification, calculate the discriminator loss function L D , and update the discriminator network parameters to obtain The final discriminator network parameters;

步骤3中生成器网络结构由双流网络和其后面接的卷积神经网络组成,双流网络中上下两支网络结构相同,均为六层卷积神经网络,前四层结构相同,网络结构为3×3的卷积层、批量归一化层和激活层,该激活层的激活函数为Leaky Relu;后两层结构相同,由5×5的卷积层、批量归一化层和激活层组成,该激活层的激活函数为Leaky Relu,双流网络后面接的网络结构由1×1的卷积层和激活层组成,该激活层的激活函数为tanh,这一层卷积神经网络的输出就是最终的融合图像。In step 3, the generator network structure consists of a dual-stream network and a convolutional neural network followed by it. The upper and lower network structures in the dual-stream network are the same, both are six-layer convolutional neural networks, the first four layers have the same structure, and the network structure is 3 ×3 convolutional layer, batch normalization layer and activation layer, the activation function of this activation layer is Leaky Relu; the latter two layers have the same structure and consist of 5×5 convolutional layer, batch normalization layer and activation layer , the activation function of this activation layer is Leaky Relu, the network structure behind the two-stream network consists of a 1×1 convolution layer and an activation layer, the activation function of this activation layer is tanh, and the output of this layer of convolutional neural network is The final fused image.

步骤3中生成器损失函数LG为:The generator loss function LG in step 3 is:

LG=λLcontent+LGen, (6)L G =λL content +L Gen , (6)

其中,Lcontent是生成器输入和输出的比较后的内容损失,LGen是生成器和判别器的对抗损失,λ为常数;where L content is the content loss after comparing the generator input and output, L Gen is the adversarial loss between the generator and the discriminator, and λ is a constant;

Figure BDA0003666431360000121
Figure BDA0003666431360000121

其中,H和W分别是生成器输入的图像高度和宽度,||·||2为计算二范数,If为生成器输出即融合图像,Ib为输入生成器的基础层,Id为输入生成器的细节层,

Figure BDA0003666431360000131
为梯度算子,ξ为常量;Among them, H and W are the height and width of the image input by the generator, ||·|| 2 is the calculation of the second norm, I f is the output of the generator, i.e. the fusion image, I b is the base layer of the input generator, I d is the detail layer of the input generator,
Figure BDA0003666431360000131
is the gradient operator, and ξ is a constant;

LGen=E[log(1-DV(G(Ib,Id)))]+E[log(1-DI(G(Ib,Id)))] (8)L Gen = E[log(1-D V (G(I b , I d )))]+E[log(1-D I (G(I b , I d )))] (8)

DV(G(Ib,Id))表示以红外图像或者融合图像为输入的判别器判别值,G(Ib,Id)表示生成器生成的融合图像,DI(G(Ib,Id))表示以可见光图像或者融合图像为输入的判别器判别值。D V (G(I b , I d )) represents the discriminator discriminant value input with infrared image or fusion image, G(I b , I d ) represents the fusion image generated by the generator, D I (G(I b ) , I d )) represents the discriminator discriminant value that takes the visible light image or the fusion image as input.

计算生成器损失函数LG,同时利用SGD(随机梯度下降)对网络参数进行更新从而达到优化的目的,得到生成器的网络参数。Calculate the generator loss function L G , and use SGD (Stochastic Gradient Descent) to update the network parameters to achieve the purpose of optimization, and obtain the network parameters of the generator.

步骤3中两个判别器网络DI和DV具有相同的网络结构,均由五层卷积神经网络组成,前四层结构相同,网络结构为3×3的卷积层、批量归一化层和激活层,该激活层的激活函数为Leaky Relu;最后一层为全连接层,输出的是对输入的分类结果,从而预测输出是融合图像还是源图像(此处源图像有两种可能,红外与可见光图像。具体指代哪一个取决于判别器是DI还是DV,DI指代红外图像,DV指代可见光图像);In step 3, the two discriminator networks D I and D V have the same network structure, both composed of five-layer convolutional neural networks, the first four layers have the same structure, and the network structure is 3 × 3 convolutional layers, batch normalization layer and activation layer, the activation function of the activation layer is Leaky Relu; the last layer is a fully connected layer, which outputs the classification result of the input, so as to predict whether the output is a fusion image or a source image (there are two possibilities for the source image here , infrared and visible light images. Which one refers to depends on whether the discriminator is DI or DV , DI refers to infrared images, and DV refers to visible light images);

输入为红外图像和融合图像的判别器损失函数为:The loss function of the discriminator whose input is the infrared image and the fused image is:

Figure BDA0003666431360000132
Figure BDA0003666431360000132

其中,DI(II)表示以红外图像作为输入的判别器判别值,DI(G(Ib,Id))表示以融合图像作为输入的判别器判别值;Wherein, D I (I I ) represents the discriminator discriminant value that takes infrared image as input, D I (G(I b , I d )) represents the discriminator discriminant value that takes fusion image as input;

输入为可见光图像和融合图像的判别器损失函数为:The loss function of the discriminator whose input is the visible light image and the fused image is:

Figure BDA0003666431360000133
Figure BDA0003666431360000133

其中,DV(IV)表示以可见光图像作为输入的判别器判别值,DV(G(Ib,Id))表示以融合图像作为输入的判别器判别值;Wherein, D V ( IV ) represents the discriminator discriminant value that takes the visible light image as input, and D V (G(I b , I d )) represents the discriminator discriminant value that takes the fusion image as input;

给判别器输出设一个阈值,当其判别器输出值时大于预设的阈值时继续更新网络参数,直至小于预设的阈值,在此过程中要经过判别器DI和DV后,计算对应的判别器损失函数

Figure BDA0003666431360000141
Figure BDA0003666431360000142
更新网络参数的优化方法是SGD(随机梯度下降),最终得到判别器的网络参数。Set a threshold for the output of the discriminator. When the output value of the discriminator is greater than the preset threshold, continue to update the network parameters until it is less than the preset threshold. In this process, after passing through the discriminator D I and D V , calculate the corresponding The discriminator loss function of
Figure BDA0003666431360000141
and
Figure BDA0003666431360000142
The optimization method for updating the network parameters is SGD (Stochastic Gradient Descent), and finally the network parameters of the discriminator are obtained.

步骤4、开始训练网络,判别迭代是否结束,即当前的迭代次数是否达到了设定的迭代次数,迭代次数达到设置的迭代次数求得的网络参数作为最终的网络参数,将网络参数保存;Step 4. Start training the network, and determine whether the iteration is over, that is, whether the current number of iterations has reached the set number of iterations, and the network parameters obtained when the number of iterations reaches the set number of iterations are used as the final network parameters, and the network parameters are saved;

具体的,将步骤4得到的生成器网络参数加载到测试网络中的生成器网络中,将测试的红外与可见光源图像进行多尺度分解,即步骤1、步骤2中的滤波操作,然后将分解得到对应的基础层和细节层拼接起来当作测试网络的输入,得到的输出即为最终的融合图像。Specifically, the generator network parameters obtained in step 4 are loaded into the generator network in the test network, and the tested infrared and visible light source images are subjected to multi-scale decomposition, that is, the filtering operations in steps 1 and 2, and then the decomposition The corresponding base layer and detail layer are spliced together as the input of the test network, and the obtained output is the final fusion image.

步骤5、将步骤4得到的生成器网络参数加载到测试网络中的生成器网络中,将测试的红外与可见光源图像进行多尺度分解,即步骤1、步骤2中的滤波操作,然后将分解得到对应的基础层和细节层拼接起来当作测试网络的输入,得到的输出即为最终的融合图像。Step 5. Load the generator network parameters obtained in step 4 into the generator network in the test network, and perform multi-scale decomposition of the tested infrared and visible light source images, that is, the filtering operations in steps 1 and 2, and then decompose The corresponding base layer and detail layer are spliced together as the input of the test network, and the obtained output is the final fusion image.

Claims (6)

1.面向红外与可见光图像的多尺度生成对抗融合网络的方法,其特征在于,具体按照以下步骤实施:1. The method for multi-scale generation confrontation fusion network for infrared and visible light images, characterized in that it is specifically implemented according to the following steps: 步骤1、从标准训练集中选取若干红外与可见光图像对,然后将图像对输入至边缘保持滤波器,得到基础层和细节层;Step 1. Select several infrared and visible light image pairs from the standard training set, and then input the image pairs into the edge preserving filter to obtain the base layer and the detail layer; 步骤2、将步骤1得到的基础层输入至梯度滤波器得到梯度图和新的基础层,梯度图和步骤1的细节层相加作为新的细节层;Step 2. Input the base layer obtained in step 1 into the gradient filter to obtain a gradient map and a new base layer, and add the gradient map and the detail layer of step 1 as a new detail layer; 步骤3、将步骤2得到的基础层和细节层输入至生成器网络G,经过生成器网络G后得到对应源图像对的融合图像,计算生成器损失函数LG,对生成器网络G参数进行更新,得到最终的生成器网络参数,将源图像分别和融合图像输入至判别器网络D进行分类,计算判别器损失函数LD,对判别器网络参数进行更新,得到最终的判别器网络参数;Step 3. Input the base layer and the detail layer obtained in step 2 to the generator network G, obtain the fusion image corresponding to the source image pair after passing through the generator network G, calculate the generator loss function L G , and perform the generator network G parameters. update to obtain the final generator network parameters, input the source image and the fusion image to the discriminator network D for classification, calculate the discriminator loss function L D , update the discriminator network parameters, and obtain the final discriminator network parameters; 步骤4、开始训练网络,判别迭代是否结束,即当前的迭代次数是否达到了设定的迭代次数,迭代次数达到设置的迭代次数求得的网络参数作为最终的网络参数,将网络参数保存;Step 4. Start training the network, and determine whether the iteration is over, that is, whether the current number of iterations has reached the set number of iterations, and the network parameters obtained when the number of iterations reaches the set number of iterations are used as the final network parameters, and the network parameters are saved; 步骤5、将步骤4得到的生成器网络参数加载到测试网络中的生成器网络中,将测试的红外与可见光源图像进行多尺度分解,即步骤1、步骤2中的滤波操作,然后将分解得到对应的基础层和细节层拼接起来当作测试网络的输入,得到的输出即为最终的融合图像。Step 5. Load the generator network parameters obtained in step 4 into the generator network in the test network, and perform multi-scale decomposition of the tested infrared and visible light source images, that is, the filtering operations in steps 1 and 2, and then decompose The corresponding base layer and detail layer are spliced together as the input of the test network, and the obtained output is the final fusion image. 2.根据权利要求1所述的面向红外与可见光图像的多尺度生成对抗融合网络的方法,其特征在于,所述步骤1中滤波公式如下:2. the method for multi-scale generation confrontation fusion network oriented to infrared and visible light images according to claim 1, is characterized in that, in described step 1, filtering formula is as follows:
Figure FDA0003666431350000011
Figure FDA0003666431350000011
其中:in:
Figure FDA0003666431350000021
Figure FDA0003666431350000021
式(1)中Iq为输入图像,
Figure FDA0003666431350000022
为滤波后图像,q是Iq的一个像素点,s是q的像素集合,p是q领域中的一个像素,
Figure FDA0003666431350000023
是部分输入图像块,
Figure FDA0003666431350000024
Figure FDA0003666431350000025
周边图像块,
Figure FDA0003666431350000026
是空间滤波器内核,
Figure FDA0003666431350000027
是距离滤波器内核,空间内核和距离内核通常都以高斯的方式表示;
In formula (1), I q is the input image,
Figure FDA0003666431350000022
is the filtered image, q is a pixel of I q , s is the set of pixels of q, p is a pixel in the field of q,
Figure FDA0003666431350000023
is a partial input image patch,
Figure FDA0003666431350000024
Yes
Figure FDA0003666431350000025
surrounding image blocks,
Figure FDA0003666431350000026
is the spatial filter kernel,
Figure FDA0003666431350000027
is the distance filter kernel, the spatial kernel and the distance kernel are usually expressed in Gaussian form;
Figure FDA0003666431350000028
Figure FDA0003666431350000028
式(3)中Id0为经过双边滤波得到的细节层,Ib0为得到的基础层。In formula (3), I d0 is the detail layer obtained by bilateral filtering, and I b0 is the obtained base layer.
3.根据权利要求2所述的面向红外与可见光图像的多尺度生成对抗融合网络的方法,其特征在于,所述步骤2中3. The method for multi-scale generation confrontation fusion network for infrared and visible light images according to claim 2, wherein in the step 2 通过下面公式求出图像中所有像素点的梯度值:The gradient value of all pixels in the image is obtained by the following formula:
Figure FDA0003666431350000029
Figure FDA0003666431350000029
然后定义一个阈值Gmax,如若该像素点梯度值大于阈值设为白色,否则为黑色,这样求得梯度图IGThen define a threshold Gmax, if the gradient value of the pixel is greater than the threshold, set it as white, otherwise it is black, and thus obtain the gradient map IG ; 将步骤1得到的基础层Ib0进行梯度滤波,得到梯度图IG,然后将原基础层Ib0减去梯度图可得新基础层Ib,梯度图IG同原细节层Id0相加可得新细节层IdThe base layer I b0 obtained in step 1 is subjected to gradient filtering to obtain a gradient map I G , and then the original base layer I b0 is subtracted from the gradient map to obtain a new base layer I b , and the gradient map I G is added to the original detail layer I d0 A new level of detail I d is available.
4.根据权利要求3所述的面向红外与可见光图像的多尺度生成对抗融合网络的方法,其特征在于,所述步骤3中生成器网络结构由双流网络和其后面接的卷积神经网络组成,双流网络中上下两支网络结构相同,均为六层卷积神经网络,前四层结构相同,网络结构为3×3的卷积层、批量归一化层和激活层,该激活层的激活函数为Leaky Relu;后两层结构相同,由5×5的卷积层、批量归一化层和激活层组成,该激活层的激活函数为Leaky Relu,双流网络后面接的网络结构由1×1的卷积层和激活层组成,该激活层的激活函数为tanh,这一层卷积神经网络的输出就是最终的融合图像。4. The method for multi-scale generation confrontation fusion network for infrared and visible light images according to claim 3, characterized in that, in the step 3, the generator network structure is composed of a two-stream network and a convolutional neural network followed by it. , the upper and lower networks in the dual-stream network have the same structure, both are six-layer convolutional neural networks, the first four layers have the same structure, and the network structure is a 3×3 convolutional layer, a batch normalization layer and an activation layer. The activation function is Leaky Relu; the latter two layers have the same structure, consisting of a 5×5 convolution layer, a batch normalization layer and an activation layer. The activation function of the activation layer is Leaky Relu, and the network structure behind the two-stream network consists of 1 A ×1 convolutional layer and an activation layer are composed. The activation function of the activation layer is tanh, and the output of this layer of convolutional neural network is the final fusion image. 5.根据权利要求4所述的面向红外与可见光图像的多尺度生成对抗融合网络的方法,其特征在于,所述步骤3中生成器损失函数LG为:5. The method for multi-scale generation confrontation fusion network for infrared and visible light images according to claim 4, characterized in that, in the step 3, the generator loss function LG is: LG=λLcontent+LGen, (6)L G =λL content +L Gen , (6) 其中,Lcontent是生成器输入和输出的比较后的内容损失,LGen是生成器和判别器的对抗损失,λ为常数;where L content is the content loss after comparing the generator input and output, L Gen is the adversarial loss between the generator and the discriminator, and λ is a constant;
Figure FDA0003666431350000031
Figure FDA0003666431350000031
其中,H和W分别是生成器输入的图像高度和宽度,||·||2为计算二范数,If为生成器输出即融合图像,Ib为输入生成器的基础层,Id为输入生成器的细节层,
Figure FDA0003666431350000032
为梯度算子,ξ为常量;
Among them, H and W are the height and width of the image input by the generator, ||·|| 2 is the calculation of the second norm, I f is the output of the generator, i.e. the fusion image, I b is the base layer of the input generator, I d is the detail layer of the input generator,
Figure FDA0003666431350000032
is the gradient operator, and ξ is a constant;
LGen=E[log(1-DV(G(Ib,Id)))]+E[log(1-DI(G(Ib,Id)))] (8)L Gen = E[log(1-D V (G(I b , I d )))]+E[log(1-D I (G(I b , I d )))] (8) DV(G(Ib,Id))表示以红外图像或者融合图像为输入的判别器判别值,G(Ib,Id)表示生成器生成的融合图像,DI(G(Ib,Id))表示以可见光图像或者融合图像为输入的判别器判别值;D V (G(I b , I d )) represents the discriminator discriminant value input with infrared image or fusion image, G(I b , I d ) represents the fusion image generated by the generator, D I (G(I b ) , I d )) represents the discriminator discriminant value input with visible light image or fusion image; 计算生成器损失函数LG,同时利用SGD对网络参数进行更新从而达到优化的目的,得到生成器的网络参数。Calculate the generator loss function L G , and use SGD to update the network parameters to achieve the purpose of optimization, and obtain the network parameters of the generator.
6.根据权利要求5所述的面向红外与可见光图像的多尺度生成对抗融合网络的方法,其特征在于,所述步骤3中两个判别器网络DI和DV具有相同的网络结构,均由五层卷积神经网络组成,前四层结构相同,网络结构为3×3的卷积层、批量归一化层和激活层,该激活层的激活函数为Leaky Relu;最后一层为全连接层,输出的是对输入的分类结果,从而预测输出是融合图像还是源图像;6. The method for multi-scale generation adversarial fusion network for infrared and visible light images according to claim 5, characterized in that, in the step 3, the two discriminator networks D I and D V have the same network structure, both of which have the same network structure. It consists of a five-layer convolutional neural network. The first four layers have the same structure. The network structure is a 3×3 convolution layer, a batch normalization layer and an activation layer. The activation function of the activation layer is Leaky Relu; the last layer is full The connection layer outputs the classification result of the input, thereby predicting whether the output is a fusion image or a source image; 输入为红外图像和融合图像的判别器损失函数为:The loss function of the discriminator whose input is the infrared image and the fused image is:
Figure FDA0003666431350000041
Figure FDA0003666431350000041
其中,DI(II)表示以红外图像作为输入的判别器判别值,DI(G(Ib,Id))表示以融合图像作为输入的判别器判别值;Wherein, D I (I I ) represents the discriminator discriminant value that takes infrared image as input, D I (G(I b , I d )) represents the discriminator discriminant value that takes fusion image as input; 输入为可见光图像和融合图像的判别器损失函数为:The loss function of the discriminator whose input is the visible light image and the fused image is:
Figure FDA0003666431350000042
Figure FDA0003666431350000042
其中,DV(IV)表示以可见光图像作为输入的判别器判别值,DV(G(Ib,Id))表示以融合图像作为输入的判别器判别值;Wherein, D V ( IV ) represents the discriminator discriminant value that takes the visible light image as input, and D V (G(I b , I d )) represents the discriminator discriminant value that takes the fusion image as input; 给判别器输出设一个阈值,当其判别器输出值时大于预设的阈值时继续更新网络参数,直至小于预设的阈值,在此过程中要经过判别器DI和DV后,计算对应的判别器损失函数
Figure FDA0003666431350000043
Figure FDA0003666431350000044
更新网络参数的优化方法是SGD最终得到判别器的网络参数。
Set a threshold for the output of the discriminator. When the output value of the discriminator is greater than the preset threshold, continue to update the network parameters until it is less than the preset threshold. In this process, after passing through the discriminator D I and D V , calculate the corresponding The discriminator loss function of
Figure FDA0003666431350000043
and
Figure FDA0003666431350000044
The optimization method for updating the network parameters is that SGD finally obtains the network parameters of the discriminator.
CN202210599873.3A 2022-05-27 2022-05-27 A multi-scale generative adversarial fusion network approach for infrared and visible light images Active CN114841907B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210599873.3A CN114841907B (en) 2022-05-27 2022-05-27 A multi-scale generative adversarial fusion network approach for infrared and visible light images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210599873.3A CN114841907B (en) 2022-05-27 2022-05-27 A multi-scale generative adversarial fusion network approach for infrared and visible light images

Publications (2)

Publication Number Publication Date
CN114841907A true CN114841907A (en) 2022-08-02
CN114841907B CN114841907B (en) 2025-03-18

Family

ID=82572920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210599873.3A Active CN114841907B (en) 2022-05-27 2022-05-27 A multi-scale generative adversarial fusion network approach for infrared and visible light images

Country Status (1)

Country Link
CN (1) CN114841907B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117333410A (en) * 2023-10-12 2024-01-02 江苏海洋大学 Infrared and visible light image fusion method based on Swin transducer and GAN
CN117808691A (en) * 2023-12-12 2024-04-02 武汉工程大学 Image fusion method based on difference significance aggregation and joint gradient constraint
CN117934869A (en) * 2024-03-22 2024-04-26 中铁大桥局集团有限公司 A target detection method, system, computing device and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909560A (en) * 2017-09-22 2018-04-13 洛阳师范学院 A kind of multi-focus image fusing method and system based on SiR
CN109118467A (en) * 2018-08-31 2019-01-01 武汉大学 Based on the infrared and visible light image fusion method for generating confrontation network
CN110348319A (en) * 2019-06-18 2019-10-18 武汉大学 A kind of face method for anti-counterfeit merged based on face depth information and edge image
US20210133932A1 (en) * 2019-11-01 2021-05-06 Lg Electronics Inc. Color restoration method and apparatus
CN113222879A (en) * 2021-07-08 2021-08-06 中国工程物理研究院流体物理研究所 Generation countermeasure network for fusion of infrared and visible light images
CN114463235A (en) * 2022-01-27 2022-05-10 上海电力大学 A kind of infrared and visible light image fusion method, device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909560A (en) * 2017-09-22 2018-04-13 洛阳师范学院 A kind of multi-focus image fusing method and system based on SiR
CN109118467A (en) * 2018-08-31 2019-01-01 武汉大学 Based on the infrared and visible light image fusion method for generating confrontation network
CN110348319A (en) * 2019-06-18 2019-10-18 武汉大学 A kind of face method for anti-counterfeit merged based on face depth information and edge image
US20210133932A1 (en) * 2019-11-01 2021-05-06 Lg Electronics Inc. Color restoration method and apparatus
CN113222879A (en) * 2021-07-08 2021-08-06 中国工程物理研究院流体物理研究所 Generation countermeasure network for fusion of infrared and visible light images
CN114463235A (en) * 2022-01-27 2022-05-10 上海电力大学 A kind of infrared and visible light image fusion method, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王娟;柯聪;刘敏;熊炜;袁旭亮;丁畅;: "神经网络框架下的红外与可见光图像融合算法综述", 激光杂志, no. 07, 25 July 2020 (2020-07-25) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117333410A (en) * 2023-10-12 2024-01-02 江苏海洋大学 Infrared and visible light image fusion method based on Swin transducer and GAN
CN117808691A (en) * 2023-12-12 2024-04-02 武汉工程大学 Image fusion method based on difference significance aggregation and joint gradient constraint
CN117934869A (en) * 2024-03-22 2024-04-26 中铁大桥局集团有限公司 A target detection method, system, computing device and medium

Also Published As

Publication number Publication date
CN114841907B (en) 2025-03-18

Similar Documents

Publication Publication Date Title
CN113537099B (en) Dynamic detection method for fire smoke in highway tunnel
CN112614077B (en) Unsupervised low-illumination image enhancement method based on generation countermeasure network
CN107145846B (en) A kind of insulator recognition methods based on deep learning
CN114841907A (en) A method for multi-scale generative adversarial fusion networks for infrared and visible light images
CN106169081B (en) A kind of image classification and processing method based on different illumination
CN103871029B (en) A kind of image enhaucament and dividing method
CN109978807B (en) Shadow removing method based on generating type countermeasure network
CN110288550B (en) Single-image defogging method for generating countermeasure network based on priori knowledge guiding condition
CN114863097A (en) Infrared dim target detection method based on attention system convolutional neural network
CN111179202B (en) Single image defogging enhancement method and system based on generation countermeasure network
CN111242026B (en) Remote sensing image target detection method based on spatial hierarchy perception module and metric learning
CN105550999A (en) Video image enhancement processing method based on background reuse
CN111179208A (en) Infrared-visible image fusion method based on saliency map and convolutional neural network
CN117974706B (en) A concave point segmentation method for rock slices based on dynamic threshold and local search
CN110135501A (en) High Dynamic Range Image Forensics Method Based on Neural Network Framework
CN117115033A (en) Electric power operation site weak light image enhancement method based on strong light inhibition
CN110472696A (en) A method of Terahertz human body image is generated based on DCGAN
CN113362375A (en) Moving object detection method for vehicle
Zhu et al. Infrared and visible image fusion using threshold segmentation and weight optimization
CN105608683A (en) Defogging method of single image
Satrasupalli et al. End to end system for hazy image classification and reconstruction based on mean channel prior using deep learning network
CN117745555A (en) Fusion method of multi-scale infrared and visible light images based on double partial differential equations
Abid et al. A novel neural network approach for image edge detection
CN113553919B (en) Object frequency feature expression method, network and image classification method based on deep learning
Li et al. Image object detection algorithm based on improved Gaussian mixture model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant