CN112561846A

CN112561846A - Method and device for training image fusion model and electronic equipment

Info

Publication number: CN112561846A
Application number: CN202011548642.7A
Authority: CN
Inventors: 龙勇志
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2021-03-26

Abstract

The application discloses a method for training an image fusion model and electronic equipment, and belongs to the field of image processing. The method comprises the following steps: acquiring a training data set, wherein the training data set comprises an image pair; inputting an image pair into an initial image fusion model, wherein the image fusion model is a deep neural network model comprising a shallow feature extraction network, a deep feature extraction network, a global feature fusion network and a feature reconstruction network; sequentially processing a shallow feature extraction network, a deep feature extraction network, a global feature fusion network and a feature reconstruction network to obtain a fusion image of the image pair; and calculating a difference value between the fused image and the image pair according to a preset loss function, updating the network parameters of the image fusion model according to the calculated difference value until the calculated difference value is smaller than a preset threshold value, and obtaining the trained image fusion model, wherein the loss function is used for calculating a structural loss value and a content loss value. The method and the device can improve the objective authenticity of the fusion image.

Description

Method, apparatus and electronic device for training image fusion model

技术领域technical field

本申请属于图像处理技术领域，具体涉及一种训练图像融合模型的方法、装置和电子设备。The present application belongs to the technical field of image processing, and in particular relates to a method, apparatus and electronic device for training an image fusion model.

背景技术Background technique

随着传感器成像技术的迅猛发展，单一传感器成像难以满足日常应用需求，多传感器成像已引领技术革新。图像融合是指将多个传感器探测的图像信息综合处理后，实现对探测场景更全面、更可靠的描述。With the rapid development of sensor imaging technology, it is difficult for single sensor imaging to meet the needs of daily applications, and multi-sensor imaging has led the technological innovation. Image fusion refers to the comprehensive processing of image information detected by multiple sensors to achieve a more comprehensive and reliable description of the detection scene.

红外与可见光作为图像处理领域最广泛应用的图像类型，红外图像可以高效捕捉场景热辐射且辨识场景高亮目标，可见光图像具有高分辨率特性可呈现场景细节纹理信息，二者图像信息具有高效互补性。因此对红外图像和可见光图像进行融合，可以获得丰富场景信息量的融合图像，能够对场景背景和目标进行清晰准确的描述。Infrared and visible light are the most widely used image types in the field of image processing. Infrared images can efficiently capture the thermal radiation of the scene and identify bright targets in the scene. The visible light image has high resolution characteristics and can present detailed texture information of the scene. The image information of the two is highly complementary. sex. Therefore, by fusing the infrared image and the visible light image, a fusion image with rich scene information can be obtained, and the scene background and target can be described clearly and accurately.

目前，针对红外图像与可见光图像的融合任务，通常使用基于多尺度分解融合算法，例如小波变换融合算法，将图像进行特征提取以及分解，随后对特征进行分类，同时针对不同特征及场景类别特征制定不同的特征融合策略，得到融合后的多组融合特征进行逆变换，将融合特征变换回融合图像。At present, for the fusion task of infrared image and visible light image, the fusion algorithm based on multi-scale decomposition, such as wavelet transform fusion algorithm, is usually used to extract and decompose the image features, and then classify the features. Different feature fusion strategies are used to obtain multiple sets of fused features after inverse transformation, and transform the fused features back into the fused image.

然而，经典基于多尺度分解的图像融合算法，通常沿用固定的变换以及特征提取层次对图像进行特征分解和重构，不仅对图像特征提取和分解存在较大的局限性，同时融合算法中需采用人为制定的特征融合规则，导致融合结果容易引入人为视觉伪影，破坏图像信息的客观真实性，影响融合图像的效果。However, the classical image fusion algorithm based on multi-scale decomposition usually uses a fixed transformation and feature extraction level to decompose and reconstruct the image. The artificially formulated feature fusion rules cause the fusion results to easily introduce artificial visual artifacts, destroy the objective authenticity of image information, and affect the effect of fusion images.

发明内容SUMMARY OF THE INVENTION

本申请实施例的目的是提供一种训练图像融合模型的方法，能够解决现有技术对红外图像与可见光图像进行融合需采用人为制定的特征融合规则，导致融合结果容易引入人为视觉伪影，破坏图像信息的客观真实性，影响融合图像的效果的问题。The purpose of the embodiments of the present application is to provide a method for training an image fusion model, which can solve the problem that the fusion of infrared images and visible light images in the prior art requires the use of artificially formulated feature fusion rules, resulting in the fusion results easily introducing artificial visual artifacts and damaging The objective authenticity of image information affects the effect of fused images.

为了解决上述技术问题，本申请是这样实现的：In order to solve the above technical problems, this application is implemented as follows:

第一方面，本申请实施例提供了一种训练图像融合模型的方法，该方法包括：In a first aspect, an embodiment of the present application provides a method for training an image fusion model, the method comprising:

获取训练数据集，所述训练数据集中包含图像对，所述图像对包含同一场景下的红外图像和可见光图像；Obtaining a training data set, the training data set includes an image pair, and the image pair includes an infrared image and a visible light image in the same scene;

将所述图像对输入初始的图像融合模型，所述图像融合模型为包含浅层特征提取网络、深层特征提取网络、全局特征融合网络、特征重构网络的深度神经网络模型；Inputting the image pair into an initial image fusion model, where the image fusion model is a deep neural network model comprising a shallow feature extraction network, a deep feature extraction network, a global feature fusion network, and a feature reconstruction network;

经过所述浅层特征提取网络、深层特征提取网络、全局特征融合网络、特征重构网络的依次处理，得到所述图像对的融合图像；Through the sequential processing of the shallow feature extraction network, the deep feature extraction network, the global feature fusion network, and the feature reconstruction network, the fusion image of the image pair is obtained;

根据预设损失函数计算所述融合图像与所述图像对之间的差异值，并根据计算的差异值更新所述图像融合模型的网络参数，直到计算的差异值小于预设阈值，得到训练完成的图像融合模型，所述预设损失函数用于计算结构损失值和内容损失值。Calculate the difference value between the fused image and the image pair according to the preset loss function, and update the network parameters of the image fusion model according to the calculated difference value, until the calculated difference value is less than the preset threshold, and the training is completed The image fusion model of , the preset loss function is used to calculate the structure loss value and the content loss value.

第二方面，本申请实施例提供了一种训练图像融合模型的装置，该装置包括：In a second aspect, an embodiment of the present application provides an apparatus for training an image fusion model, the apparatus comprising:

数据获取模块，用于获取训练数据集，所述训练数据集中包含图像对，所述图像对包含同一场景下的红外图像和可见光图像；a data acquisition module, used for acquiring a training data set, the training data set includes image pairs, and the image pairs include infrared images and visible light images in the same scene;

数据输入模块，用于将所述图像对输入初始的图像融合模型，所述图像融合模型为包含浅层特征提取网络、深层特征提取网络、全局特征融合网络、特征重构网络的深度神经网络模型；A data input module for inputting the image pair into an initial image fusion model, where the image fusion model is a deep neural network model including a shallow feature extraction network, a deep feature extraction network, a global feature fusion network, and a feature reconstruction network ;

数据处理模块，用于经过所述浅层特征提取网络、深层特征提取网络、全局特征融合网络、特征重构网络的依次处理，得到所述图像对的融合图像；a data processing module, configured to obtain a fusion image of the image pair through the sequential processing of the shallow feature extraction network, the deep feature extraction network, the global feature fusion network, and the feature reconstruction network;

参数调整模块，用于根据预设损失函数计算所述融合图像与所述图像对之间的差异值，并根据计算的差异值更新所述图像融合模型的网络参数，直到计算的差异值小于预设阈值，得到训练完成的图像融合模型，所述预设损失函数用于计算结构损失值和内容损失值。A parameter adjustment module, configured to calculate the difference value between the fusion image and the image pair according to a preset loss function, and update the network parameters of the image fusion model according to the calculated difference value, until the calculated difference value is less than the predetermined value. A threshold is set to obtain a trained image fusion model, and the preset loss function is used to calculate the structure loss value and the content loss value.

第三方面，本申请实施例提供了一种电子设备，该电子设备包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令，所述程序或指令被所述处理器执行时实现如第一方面所述的方法的步骤。In a third aspect, embodiments of the present application provide an electronic device, the electronic device includes a processor, a memory, and a program or instruction stored on the memory and executable on the processor, the program or instruction being The processor implements the steps of the method according to the first aspect when executed.

第四方面，本申请实施例提供了一种可读存储介质，所述可读存储介质上存储程序或指令，所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤。In a fourth aspect, an embodiment of the present application provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the steps of the method according to the first aspect are implemented .

第五方面，本申请实施例提供了一种芯片，所述芯片包括处理器和通信接口，所述通信接口和所述处理器耦合，所述处理器用于运行程序或指令，实现如第一方面所述的方法。In a fifth aspect, an embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction, and implement the first aspect the method described.

在本申请实施例中，基于神经网络模型实现红外图像与可见光图像的融合，神经网络模型可以通过模拟人眼神经元结构，分解并提取更丰富且类型适宜的图像特征，可以提高特征提取的准确性。In the embodiments of the present application, the fusion of infrared images and visible light images is realized based on a neural network model. The neural network model can decompose and extract more abundant and suitable image features by simulating the structure of human eyes, which can improve the accuracy of feature extraction. sex.

此外，本申请通过设置预设损失函数迫使神经网络去拟合最为合适的图像特征融合规则和策略，以不添加人为融合规则和策略的前提下提升融合性能，高效融合图像信息。所述预设损失函数可用于计算结构损失值和内容损失值，在尽可能将红外图像和可见光图像的结构信息保留在融合图像中的基础上，使得融合图像可以体现红外图像和可见光图像中的细致内容信息。In addition, the present application forces the neural network to fit the most suitable image feature fusion rules and strategies by setting a preset loss function, so as to improve the fusion performance and efficiently fuse image information without adding artificial fusion rules and strategies. The preset loss function can be used to calculate the structure loss value and the content loss value. On the basis of keeping the structure information of the infrared image and the visible light image in the fusion image as much as possible, the fusion image can reflect the infrared image and the visible light image. Detailed content information.

由此，使用本申请训练得到的图像融合模型对红外图像和可见光图像进行融合，可以避免融合图像引入人为视觉伪影，提高融合图像的客观真实性，并且使得融合图像可以更好的保留结构特征和内容特征，进一步提高融合图像的效果。Therefore, using the image fusion model trained in the present application to fuse infrared images and visible light images can avoid the introduction of artificial visual artifacts into the fusion image, improve the objective authenticity of the fusion image, and enable the fusion image to better retain structural features. and content features to further improve the effect of fused images.

再者，训练完成的图像融合模型为端对端模型，在得到训练完成的图像融合模型之后，将待融合的图像对输入所述训练完成的图像融合模型，即可通过所述训练完成的图像融合模型输出融合图像，可以提高图像融合操作的便捷性。Furthermore, the trained image fusion model is an end-to-end model. After the trained image fusion model is obtained, the image pair to be fused is input into the trained image fusion model, and the trained image can be passed through. The fusion model outputs fusion images, which can improve the convenience of image fusion operations.

附图说明Description of drawings

图1是本申请的一种训练图像融合模型的方法实施例的步骤流程图；1 is a flow chart of steps of a method embodiment of a training image fusion model of the present application;

图2是某一场景下的红外图像和可见光图像的示意图；2 is a schematic diagram of an infrared image and a visible light image under a certain scene;

图3是本申请的一种包括至少两层聚集残余致密块的网络结构示意图；3 is a schematic diagram of a network structure comprising at least two layers of aggregated residual dense blocks according to the present application;

图4是本申请的一种图像融合模型的模型结构示意图；Fig. 4 is the model structure schematic diagram of a kind of image fusion model of the present application;

图5是本申请的一种图像对经过图像融合模型各网络模块处理的流程示意图；5 is a schematic flow chart of an image pair processed by each network module of an image fusion model according to the present application;

图6是本申请的一种像素级损失函数的计算流程示意图；6 is a schematic diagram of a calculation flow of a pixel-level loss function of the present application;

图7是本申请的一种特征级损失函数的计算流程示意图；7 is a schematic diagram of the calculation flow of a feature-level loss function of the present application;

图8是本申请的一种训练图像融合模型的装置实施例的结构示意图；8 is a schematic structural diagram of an embodiment of an apparatus for training an image fusion model of the present application;

图9是本申请的一种电子设备的结构示意图；9 is a schematic structural diagram of an electronic device of the present application;

图10是实现本申请实施例的一种电子设备的硬件结构示意图。FIG. 10 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.

本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象，而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施，且“第一”、“第二”等所区分的对象通常为一类，并不限定对象的个数，例如第一对象可以是一个，也可以是多个。此外，说明书以及权利要求中“和/或”表示所连接对象的至少其中之一，字符“/”，一般表示前后关联对象是一种“或”的关系。The terms "first", "second" and the like in the description and claims of the present application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the present application can be practiced in sequences other than those illustrated or described herein, and distinguish between "first", "second", etc. The objects are usually of one type, and the number of objects is not limited. For example, the first object may be one or more than one. In addition, "and/or" in the description and claims indicates at least one of the connected objects, and the character "/" generally indicates that the associated objects are in an "or" relationship.

下面结合附图，通过具体的实施例及其应用场景对本申请实施例提供的训练图像融合模型的方法进行详细地说明。The method for training an image fusion model provided by the embodiments of the present application will be described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

参照图1，示出了本申请的一种训练图像融合模型的方法实施例的步骤流程图，包括如下步骤：Referring to FIG. 1, a flow chart of steps of a method embodiment of a method for training an image fusion model of the present application is shown, including the following steps:

步骤101、获取训练数据集，所述训练数据集中包含图像对，所述图像对包含同一场景下的红外图像和可见光图像；Step 101, obtaining a training data set, the training data set includes an image pair, and the image pair includes an infrared image and a visible light image in the same scene;

步骤102、将所述图像对输入初始的图像融合模型，所述图像融合模型为包含浅层特征提取网络、深层特征提取网络、全局特征融合网络、特征重构网络的深度神经网络模型；Step 102, inputting the image pair into an initial image fusion model, where the image fusion model is a deep neural network model comprising a shallow feature extraction network, a deep feature extraction network, a global feature fusion network, and a feature reconstruction network;

步骤103、经过所述浅层特征提取网络、深层特征提取网络、全局特征融合网络、特征重构网络的依次处理，得到所述图像对的融合图像；Step 103, through the sequential processing of the shallow feature extraction network, the deep feature extraction network, the global feature fusion network, and the feature reconstruction network, to obtain a fusion image of the image pair;

步骤104、根据预设损失函数计算所述融合图像与所述图像对之间的差异值，并根据计算的差异值更新所述图像融合模型的网络参数，直到计算的差异值小于预设阈值，得到训练完成的图像融合模型，所述预设损失函数用于计算结构损失值和内容损失值。Step 104: Calculate the difference value between the fused image and the image pair according to a preset loss function, and update the network parameters of the image fusion model according to the calculated difference value, until the calculated difference value is less than a preset threshold, The trained image fusion model is obtained, and the preset loss function is used to calculate the structure loss value and the content loss value.

本申请提供了一种训练图像融合模型的方法，该图像融合模型为深度神经网络模型，该深度神经网络模型包含浅层特征提取网络、深层特征提取网络、全局特征融合网络、特征重构网络。本申请基于神经网络模型实现红外图像与可见光图像的融合，神经网络模型可以通过模拟人眼神经元结构，分解并提取更丰富且类型适宜的图像特征，可以提高特征提取的准确性。The present application provides a method for training an image fusion model, where the image fusion model is a deep neural network model, and the deep neural network model includes a shallow feature extraction network, a deep feature extraction network, a global feature fusion network, and a feature reconstruction network. The present application realizes the fusion of infrared images and visible light images based on a neural network model. The neural network model can decompose and extract more abundant and suitable image features by simulating the neuron structure of the human eye, which can improve the accuracy of feature extraction.

所述图像融合模型可以为根据大量的训练数据和机器学习方法，对现有的神经网络进行无监督或者有监督训练而得到的。需要说明的是，本申请实施例对所述图像融合模型的模型结构以及训练方法不加以限制。所述图像融合模型可以是融合了多种神经网络的深度神经网络模型。所述神经网络包括但不限于以下的至少一种或者至少两种的组合、叠加、嵌套：CNN(Convolutional Neural Network，卷积神经网络)、LSTM(Long Short-TermMemory，长短时记忆)网络、RNN(Simple Recurrent Neural Network，循环神经网络)、注意力神经网络等。The image fusion model can be obtained by performing unsupervised or supervised training on an existing neural network according to a large amount of training data and machine learning methods. It should be noted that the embodiment of the present application does not limit the model structure and training method of the image fusion model. The image fusion model may be a deep neural network model that fuses multiple neural networks. The neural network includes but is not limited to at least one of the following or a combination, superposition and nesting of at least two: CNN (Convolutional Neural Network, convolutional neural network), LSTM (Long Short-TermMemory, long short-term memory) network, RNN (Simple Recurrent Neural Network, recurrent neural network), attention neural network, etc.

在本申请实施例中，收集的训练数据集由大量的图像对组成，每一个图像对包含同一场景下的红外图像和可见光图像。参照图2，示出了某一场景下的红外图像和可见光图像的示意图。如图2所示，图a和图b分别为某一场景下的红外图像和可见光图像，图a和图b可以组成一个图像对。In the embodiment of the present application, the collected training data set is composed of a large number of image pairs, and each image pair includes an infrared image and a visible light image in the same scene. Referring to FIG. 2 , a schematic diagram of an infrared image and a visible light image in a certain scene is shown. As shown in Figure 2, Figures a and b are infrared images and visible light images in a certain scene, respectively, and Figures a and b can form an image pair.

将训练数据集中的各图像对依次输入初始的图像融合模型，经过所述浅层特征提取网络、深层特征提取网络、全局特征融合网络、特征重构网络的依次处理，可以得到图像对的融合图像。Input each image pair in the training data set into the initial image fusion model in turn, and through the sequential processing of the shallow feature extraction network, the deep feature extraction network, the global feature fusion network, and the feature reconstruction network, the fusion image of the image pair can be obtained. .

其中，浅层特征提取网络用于提取图像浅层的基础特征图谱。深层特征提取网络用于对浅层特征提取网络提取的基础浅层特征图谱进一步提取多层次的深度特征信息。全局特征融合网络用于进一步整合提取的图像特征信息，使得融合网络所提取的全局特征图谱得以整合，深化网络多尺度特征图谱的信息量。特征重构网络用于对融合后的图像特征进行重构，得到图像对的融合图像。Among them, the shallow feature extraction network is used to extract the basic feature map of the shallow layer of the image. The deep feature extraction network is used to further extract multi-level deep feature information from the basic shallow feature map extracted by the shallow feature extraction network. The global feature fusion network is used to further integrate the extracted image feature information, so that the global feature maps extracted by the fusion network can be integrated, and the information content of the multi-scale feature maps of the network can be deepened. The feature reconstruction network is used to reconstruct the fused image features to obtain a fused image of the image pair.

最后，根据预设损失函数计算所述融合图像与所述图像对之间的差异值，并根据计算的差异值更新所述图像融合模型的网络参数，直到计算的差异值小于预设阈值，得到训练完成的图像融合模型。Finally, the difference value between the fused image and the image pair is calculated according to the preset loss function, and the network parameters of the image fusion model are updated according to the calculated difference value, until the calculated difference value is less than the preset threshold value, obtaining The trained image fusion model.

其中，结构损失值用于表示两幅图像的结构相似性，本申请利用结构损失值旨在将红外图像和可见光图像中的结构信息更好地保留在融合图像中。内容损失值用于表示两幅图像的内容相似性，本申请利用内容损失值旨在有效地计算图像对之间的像素灰度信息，尽可能地减少误差，使得融合图像能够保存红外图像和可见光图像中的细致内容信息。Among them, the structural loss value is used to represent the structural similarity of the two images, and this application uses the structural loss value to better preserve the structural information in the infrared image and the visible light image in the fusion image. The content loss value is used to represent the content similarity of two images. The purpose of this application is to use the content loss value to effectively calculate the pixel grayscale information between image pairs, reduce errors as much as possible, and enable the fusion image to save infrared images and visible light. Detailed content information in the image.

本申请通过设置预设损失函数迫使神经网络去拟合最为合适的图像特征融合规则和策略，以不添加人为融合规则和策略的前提下提升融合性能，高效融合图像信息。所述预设损失函数可用于计算结构损失值和内容损失值，在尽可能将红外图像和可见光图像的结构信息保留在融合图像中的基础上，使得融合图像可以体现红外图像和可见光图像中的细致内容信息。The present application forces the neural network to fit the most suitable image feature fusion rules and strategies by setting a preset loss function, so as to improve the fusion performance and efficiently fuse image information without adding artificial fusion rules and strategies. The preset loss function can be used to calculate the structure loss value and the content loss value. On the basis of keeping the structure information of the infrared image and the visible light image in the fusion image as much as possible, the fusion image can reflect the infrared image and the visible light image. Detailed content information.

此外，训练完成的图像融合模型为端对端模型，在得到训练完成的图像融合模型之后，将待融合的图像对输入所述训练完成的图像融合模型，即可通过所述训练完成的图像融合模型输出融合图像，可以提高图像融合操作的便捷性。In addition, the trained image fusion model is an end-to-end model. After the trained image fusion model is obtained, the image pair to be fused is input into the trained image fusion model, and then the trained image fusion model can be used. The model outputs fused images, which can improve the convenience of image fusion operations.

在本申请的一种可选实施例中，步骤103所述经过所述浅层特征提取网络、深层特征提取网络、全局特征融合网络、特征重构网络的依次处理，得到所述图像对的融合图像，包括：In an optional embodiment of the present application, in step 103, the fusion of the image pair is obtained by sequentially processing the shallow feature extraction network, the deep feature extraction network, the global feature fusion network, and the feature reconstruction network. images, including:

步骤S11、对所述图像对进行通道级联处理，得到级联图像；Step S11, performing channel concatenation processing on the image pair to obtain concatenated images;

步骤S12、将所述级联图像输入所述浅层特征提取网络以提取浅层特征图谱；Step S12, inputting the cascaded images into the shallow feature extraction network to extract a shallow feature map;

步骤S13、将所述浅层特征图谱输入所述深层特征提取网络以提取深度特征信息；Step S13, inputting the shallow feature map into the deep feature extraction network to extract deep feature information;

步骤S14、将所述浅层特征图谱和所述深度特征信息输入所述全局特征融合网络以对所述浅层特征图谱和所述深度特征信息进行整合，得到整合后的图像；Step S14, inputting the shallow feature map and the depth feature information into the global feature fusion network to integrate the shallow feature map and the depth feature information to obtain an integrated image;

步骤S15、将所述整合后的图像输入所述特征重构网络以对所述整合后的图像进行特征重构，得到融合图像。Step S15: Input the integrated image into the feature reconstruction network to perform feature reconstruction on the integrated image to obtain a fusion image.

需要说明的是，在将图像对输入初始的图像融合模型之前，可以对图像对进行精准对齐操作，然后再将对齐后的图像对输入初始的图像融合模型，以进一步提高融合效果。It should be noted that, before inputting the image pair into the initial image fusion model, the image pair can be precisely aligned, and then the aligned image pair can be input into the initial image fusion model to further improve the fusion effect.

在将图像对输入初始的图像融合模型之后，首先通过通道级联操作将输入的红外图像与可见光图像连接起来，得到级联图像，该级联图像可以理解为红外图像与可见光图像的通道级联图像容器I_v,r。在本申请实施例中，红外图像用I_r表示，可见光图像用I_v表示，I_r和I_v二者的通道级联数学表达式1-1如下所示：After inputting the image pair into the initial image fusion model, the input infrared image and the visible light image are first connected through the channel cascade operation to obtain a cascaded image, which can be understood as the channel cascade of the infrared image and the visible light image. Image container I _v,r . In the embodiment of the present application, the infrared image is represented by I _r , the visible light image is represented by I _v , and the channel cascade mathematical expression 1-1 of both I _r and I _v is as follows:

I_v,r＝Concat(I_v,I_r) (1-1)I _v,r =Concat(I _v ,I _r ) (1-1)

其中，Concat为图像通道级联函数。然后将级联图像I_v,r输入到浅层特征提取网络(本申请将浅层特征提取网络记为BFEnet)中，进行基础浅层特征提取，得到级联图像I_v,r的浅层特征图谱。Among them, Concat is the image channel cascade function. Then, the cascaded images I _v,r are input into the shallow feature extraction network (this application will denote the shallow feature extraction network as BFEnet), and basic shallow feature extraction is performed to obtain the shallow features of the cascaded images I _v,r Atlas.

进一步的，所述浅层特征提取网络可以为卷积神经网络。一个示例中，所述浅层特征提取网络可以包含两个卷积层，每个卷积层所使用的卷积核尺寸为3×3像素，以此从级联得到的I_v,r中提取图像基础浅层特征图谱，卷积层的操作如公式1-2表示：Further, the shallow feature extraction network may be a convolutional neural network. In an example, the shallow feature extraction network may include two convolutional layers, and the size of the convolution kernel used in each convolutional layer is 3×3 pixels, so as to extract from the concatenated I _v,r The basic shallow feature map of the image, the operation of the convolution layer is expressed as formula 1-2:

其中，

为BFEnet模块中第i个卷积层提取的浅层特征图谱。in,

Shallow feature map extracted for the ith convolutional layer in the BFEnet module.

可以理解的是，上述浅层特征提取网络包含两个卷积层，以及每个卷积层所使用的卷积核尺寸为3×3像素，仅作为本申请实施例的一种应用示例。本申请实施例对所述浅层特征提取网络包含的卷积层的个数，以及每个卷积层所使用的卷积核尺寸不做限制。It can be understood that the above shallow feature extraction network includes two convolution layers, and the size of the convolution kernel used in each convolution layer is 3×3 pixels, which is only an application example of the embodiment of the present application. The embodiments of the present application do not limit the number of convolutional layers included in the shallow feature extraction network and the size of the convolution kernel used by each convolutional layer.

随后将所述浅层特征图谱传入深层特征提取网络(本申请将深层特征提取网络记为RXDBs)，以进一步提取BFEnet中的多层次深度特征信息。深层特征提取网络的模块结构对红外图像与可见光图像进行针对性地改善，进一步提升模块内部的特征流通，使得模块内部的浅层及深层次特征均成为模块输出的组成部分，使得整个网络模块具有更佳的图像特征提取能力。The shallow feature map is then passed to a deep feature extraction network (denoted as RXDBs in this application) to further extract multi-level deep feature information in BFEnet. The module structure of the deep feature extraction network improves infrared images and visible light images in a targeted manner, and further improves the feature circulation inside the module, so that the shallow and deep features inside the module become part of the output of the module, so that the entire network module has Better image feature extraction capability.

接下来，将提取的特征图输入全局特征融合网络(本申请将全局特征融合网络记为GFF)进行全局特征融合，进一步整合提取的图像特征信息。Next, the extracted feature map is input into a global feature fusion network (this application will denote the global feature fusion network as GFF) to perform global feature fusion, and further integrate the extracted image feature information.

在本申请的一种可选实施例中，所述深层特征提取网络包括至少两层用于提取深度特征信息的聚集残余致密块，所述将所述浅层特征图谱和所述深度特征信息输入所述全局特征融合网络以对所述浅层特征图谱和所述深度特征信息进行整合，得到整合后的图像，包括：In an optional embodiment of the present application, the deep feature extraction network includes at least two layers of aggregated residual dense blocks for extracting deep feature information, and the shallow feature map and the deep feature information are input The global feature fusion network integrates the shallow feature map and the depth feature information to obtain an integrated image, including:

步骤S21、对每一层聚集残余致密块提取的深度特征信息进行特征融合，得到全局特征；Step S21, performing feature fusion on the depth feature information extracted from each layer of aggregated residual dense blocks to obtain global features;

步骤S22、对所述全局特征以及所述浅层特征提取网络提取的浅层特征图谱进行特征融合，得到整合后的图像。Step S22: Perform feature fusion on the global feature and the shallow feature map extracted by the shallow feature extraction network to obtain an integrated image.

在本申请实施例中，所述深层特征提取网络RXDBs包括至少两层用于提取深度特征信息的聚集残余致密块RXDB，所述全局特征融合网络GFF包括稠密特征融合模块(本申请将稠密特征融合模块记为DFF)和全局残余学习模块(本申请将全局残余学习模块记为GRL)。本申请实施例将每一层聚集残余致密块提取的深度特征信息输入所述稠密特征融合模块DFF进行特征融合，得到全局特征；以及将所述全局特征以及所述浅层特征提取网络提取的浅层特征图谱输入所述全局残余学习模块GRL进行特征融合，得到整合后的图像。In the embodiment of the present application, the deep feature extraction network RXDBs includes at least two layers of aggregated residual dense blocks RXDB for extracting deep feature information, and the global feature fusion network GFF includes a dense feature fusion module (this application fuses dense features The module is denoted as DFF) and the global residual learning module (the global residual learning module is denoted as GRL in this application). In the embodiment of the present application, the depth feature information extracted from each layer of aggregated residual dense blocks is input into the dense feature fusion module DFF for feature fusion to obtain global features; and the global features and the shallow feature extraction network extracted by the shallow feature extraction network are The layer feature map is input into the global residual learning module GRL for feature fusion to obtain an integrated image.

参照图3，示出了本申请的一种包括至少两层聚集残余致密块的网络结构示意图，该结构计算过程由公式1-3表示：Referring to FIG. 3, a schematic diagram of a network structure including at least two layers of aggregated residual dense blocks according to the present application is shown, and the structure calculation process is represented by formulas 1-3:

其中，

为第i个聚集残余致密块提取的图像特征。稠密特征融合模块(DFF)通过整合所有聚集残余致密块的特征得到全局特征。进一步地，为了避免模型过拟合并且提高计算效率，本申请在DFF模块内部进行特征图谱降维，具体计算如公式1-4所示：in,

Image features extracted for the ith aggregated residual dense block. The dense feature fusion module (DFF) obtains global features by integrating the features of all aggregated residual dense blocks. Further, in order to avoid model overfitting and improve computational efficiency, the present application performs feature map dimension reduction inside the DFF module, and the specific calculation is shown in formulas 1-4:

其中，

表示由6层聚集残余致密块提取的特征图谱，

为GFF模块中DFF得到的图像特征。全局残余学习模块GRL充分利用前面所有特征，其中包含BFENet首个卷积层所获得的基础浅层特征图谱，使得融合网络所提取的全局特征图谱得以整合，深化网络多尺度特征图谱的信息量，得到整合后的图像。全局残余学习模块GRL计算如公式1-5所示：in,

represents the feature map extracted from the 6-layer aggregated residual dense block,

It is the image feature obtained by DFF in the GFF module. The global residual learning module GRL makes full use of all the previous features, including the basic shallow feature map obtained by the first convolutional layer of BFENet, so that the global feature map extracted by the fusion network can be integrated, and the information content of the multi-scale feature map of the network can be deepened. Get the integrated image. The global residual learning module GRL is calculated as shown in Equation 1-5:

其中，

为GFF模块中GRL得到的图像特征。in,

It is the image feature obtained by GRL in the GFF module.

需要说明的是，所述深层特征提取网络包含6层聚集残余致密块，仅作为本申请实施例的一种应用示例，本申请实施例对所述深层特征提取网络包含的聚集残余致密块的数量不加以限制。It should be noted that the deep feature extraction network includes 6 layers of aggregated residual dense blocks, which is only an application example of this embodiment of the present application. Unrestricted.

最后，将所述整合后的图像输入特征重构网络(本申请将特征重构网络记为Rbnet)，以对所述整合后的图像进行特征重构，得到融合图像。Finally, the integrated image is input into a feature reconstruction network (referred to as Rbnet in this application) to perform feature reconstruction on the integrated image to obtain a fusion image.

所述特征重构网络可以为卷积神经网络。一个示例中，所述特征重构网络包含3个卷积层，前两个卷积层的卷积核尺寸为3×3像素，最后一个卷积层采用尺寸为1×1像素的卷积核，以便对融合图像的局部特征重构。特征重构网络计算公式如1-6所示：The feature reconstruction network may be a convolutional neural network. In an example, the feature reconstruction network includes three convolutional layers, the size of the convolution kernel of the first two convolutional layers is 3×3 pixels, and the size of the convolutional kernel of the last convolutional layer is 1×1 pixel. , in order to reconstruct the local features of the fused image. The calculation formula of the feature reconstruction network is shown in 1-6:

其中，I_f表示融合图像。Among them, If represents the _fused image.

可以理解的是，上述特征重构网络包含3个卷积层，以及前两层卷积核尺寸为3×3像素，最后一层卷积核尺寸为1×1像素，仅作为本申请实施例的一种应用示例。本申请实施例对所述特征重构网络包含的卷积层的个数，以及每个卷积层所使用的卷积核尺寸不做限制。It can be understood that the above feature reconstruction network includes 3 convolution layers, the size of the convolution kernel of the first two layers is 3 × 3 pixels, and the size of the convolution kernel of the last layer is 1 × 1 pixels, which is only an embodiment of this application. An example of application. The embodiments of the present application do not limit the number of convolutional layers included in the feature reconstruction network and the size of the convolutional kernel used by each convolutional layer.

参照图4，示出了本申请的一种图像融合模型的模型结构示意图。以及参照图5，示出了本申请的一种图像对经过图像融合模型各网络模块处理的流程示意图。Referring to FIG. 4 , a schematic structural diagram of an image fusion model of the present application is shown. And referring to FIG. 5 , a schematic flow chart of an image pair processed by each network module of the image fusion model according to the present application is shown.

本申请的图像融合模型对图像对依次经过所述浅层特征提取网络、深层特征提取网络、全局特征融合网络、特征重构网络的处理，得到所述图像对的融合图像，在此过程中，图像融合模型的中间层可以对图像特征进行复用，以增加特征的复用性和流通性，在提升对图像特征的高效融合的同时，可以进一步减小网络模型的尺寸和计算复杂度。The image fusion model of the present application sequentially processes the image pairs through the shallow feature extraction network, the deep feature extraction network, the global feature fusion network, and the feature reconstruction network to obtain a fusion image of the image pair. In this process, The middle layer of the image fusion model can reuse image features to increase the reusability and circulation of features. While improving the efficient fusion of image features, it can further reduce the size and computational complexity of the network model.

在本申请的一种可选实施例中，所述预设损失函数包含结构损失函数和内容损失函数，所述根据预设损失函数计算所述融合图像与所述图像对之间的差异值，包括：In an optional embodiment of the present application, the preset loss function includes a structure loss function and a content loss function, and the difference value between the fused image and the image pair is calculated according to the preset loss function, include:

步骤S31、通过所述结构损失函数对第一结构损失和第二结构损失进行加权平均计算，得到所述融合图像的结构损失值，所述第一结构损失为所述红外图像与所述融合图像之间的结构损失，所述第二结构损失为所述可见光图像与所述融合图像之间的结构损失；Step S31: Perform a weighted average calculation on the first structure loss and the second structure loss through the structure loss function to obtain a structure loss value of the fusion image, where the first structure loss is the infrared image and the fusion image The structure loss between the visible light image and the fusion image is the second structure loss;

步骤S32、通过所述内容损失函数对第一内容损失和第二内容损失进行加权平均计算，得到所述融合图像的内容损失值，所述第一内容损失为所述红外图像与所述融合图像之间的内容损失，所述第二内容损失为所述可见光图像与所述融合图像之间的内容损失；Step S32: Perform a weighted average calculation on the first content loss and the second content loss through the content loss function to obtain a content loss value of the fused image, where the first content loss is the infrared image and the fused image The content loss between the two, the second content loss is the content loss between the visible light image and the fusion image;

步骤S33、对所述融合图像的结构损失值和内容损失值进行加权计算，得到所述融合图像与所述图像对之间的差异值。Step S33: Perform weighted calculation on the structure loss value and the content loss value of the fused image to obtain a difference value between the fused image and the image pair.

需要说明的是，本申请实施例对步骤S31和步骤S32的先后顺序不做限制。It should be noted that, this embodiment of the present application does not limit the sequence of step S31 and step S32.

在本申请实施例中，所述预设损失函数用于计算结构损失值和内容损失值。结构损失值用于将红外图像和可见光图像中的结构信息更好地保留在融合图像中。内容损失值用于在融合图像中保存红外图像和可见光图像中的细致内容信息。In the embodiment of the present application, the preset loss function is used to calculate the structure loss value and the content loss value. The structure loss value is used to better preserve the structural information in the infrared and visible images in the fused image. The content loss value is used to preserve detailed content information in the infrared and visible images in the fused image.

高效的网络损失函数能够快速地减小神经网络所得结果与预期真实目标间的差距。为此本申请针对红外图像与可见光图像融合任务的特征性，设计预设损失函数如公式1-7所示：An efficient network loss function can quickly reduce the gap between the results obtained by the neural network and the expected real target. For this reason, in view of the characteristics of the infrared image and visible light image fusion task, this application designs a preset loss function as shown in formula 1-7:

L＝αL_ssim+βL_content (1-7)L=αL _ssim +βL _content (1-7)

其中，α和β为可训练参数，L_ssim为结构损失函数，用于计算结构损失值，L_content为内容损失函数，用于计算内容损失值。L_ssim和L_content分别如公式1-8和1-9所示：Among them, α and β are trainable parameters, L _ssim is the structure loss function, which is used to calculate the structure loss value, and L _content is the content loss function, which is used to calculate the content loss value. L _ssim and L _content are shown in equations 1-8 and 1-9, respectively:

L_ssim＝1-SSIM(A,B) (1-8)L _ssim =1-SSIM(A,B) (1-8)

L_content＝||A-B||₂ (1-9)L _content =||AB|| ₂ (1-9)

SSIM(A,B)表示图像A和图像B之间的结构相似性(structural similarity,SSIM)。||·||₂表示l₂范数运算，用于计算图像A与图像B之间的欧氏距离。SSIM(A,B) represents the structural similarity (SSIM) between image A and image B. ||·|| ₂ represents the l ₂ norm operation, which is used to calculate the Euclidean distance between image A and image B.

在实际应用中，物体亮度信息与照度和反射系数有关，且场景中的物体的结构与照度相互独立，反射系数与物体目标自身有关。因此SSIM定义可以通过分离照度对物体的影响来反应一张图像中的结构信息，其中亮度对比常以两幅图像的平均灰度来作为估量指标，其中，图像的平均灰度和标准差分别如公式1-10和1-11所示。In practical applications, the brightness information of an object is related to the illuminance and reflection coefficient, and the structure of the object in the scene and the illuminance are independent of each other, and the reflection coefficient is related to the object target itself. Therefore, the SSIM definition can reflect the structural information in an image by separating the influence of illumination on the object. The brightness contrast is often measured by the average gray level of the two images. The average gray level and standard deviation of the images are as follows Equations 1-10 and 1-11 are shown.

上式中x_i代表图像对应位置的灰度值，N表示图像包含像素的数量，μ_x为计算所得图像的平均灰度值，σ_x为计算所得的图像标准差。In the above formula, x _i represents the gray value of the corresponding position of the image, N represents the number of pixels included in the image, μ _x is the average gray value of the calculated image, and σ _x is the calculated image standard deviation.

由此，本申请在保留丰富纹理细节的背景信息的同时，平滑地引入红外图像中的场景高亮目标，可以生成更加平滑自然的融合图像。Therefore, the present application can smoothly introduce the scene highlight target in the infrared image while retaining the background information with rich texture details, and can generate a smoother and more natural fusion image.

本申请实施例通过结构损失函数对红外图像与所述融合图像之间的结构损失、以及可见光图像与所述融合图像之间的结构损失进行加权平均计算，得到所述融合图像的结构损失值。In this embodiment of the present application, the structure loss between the infrared image and the fused image, and the structural loss between the visible light image and the fused image are weighted and averaged by the structure loss function, and the structure loss value of the fused image is obtained.

通过内容损失函数对红外图像与所述融合图像之间的内容损失、以及可见光图像与所述融合图像之间的内容损失进行加权平均计算，得到所述融合图像的内容损失值。The content loss between the infrared image and the fused image, and the content loss between the visible light image and the fused image are weighted and averaged by a content loss function to obtain a content loss value of the fused image.

对所述融合图像的结构损失值和内容损失值进行加权计算，即可得到所述融合图像与所述图像对之间的差异值，根据该差异值对图像融合模型的网络参数进行迭代更新，直到计算的差异值小于预设阈值，即可得到训练完成的图像融合模型。The weighted calculation is performed on the structure loss value and the content loss value of the fused image to obtain the difference value between the fused image and the image pair, and the network parameters of the image fusion model are iteratively updated according to the difference value, Until the calculated difference value is smaller than the preset threshold, the trained image fusion model can be obtained.

进一步地，本申请实施例针对预设损失函数设计两种不同的计算策略，分别为像素级损失函数和特征级损失函数。下面分别进行说明。Further, the embodiment of the present application designs two different calculation strategies for the preset loss function, which are a pixel-level loss function and a feature-level loss function, respectively. Each of them will be described below.

在本申请的一种可选实施例中，所述通过所述结构损失函数对第一结构损失和第二结构损失进行加权平均计算，得到所述融合图像的结构损失值，包括：In an optional embodiment of the present application, the weighted average calculation is performed on the first structure loss and the second structure loss by using the structure loss function to obtain the structure loss value of the fused image, including:

通过所述结构损失函数对像素级的第一结构损失和第二结构损失进行加权平均计算，得到像素级的结构损失值；Perform a weighted average calculation on the pixel-level first structure loss and the second structure loss through the structure loss function to obtain a pixel-level structure loss value;

所述通过所述内容损失函数对第一内容损失和第二内容损失进行加权平均计算，得到所述融合图像的内容损失值，包括：The weighted average calculation is performed on the first content loss and the second content loss through the content loss function to obtain the content loss value of the fused image, including:

通过所述内容损失函数对像素级的第一内容损失和第二内容损失进行加权平均计算，得到像素级的内容损失值；Perform a weighted average calculation on the pixel-level first content loss and the second content loss through the content loss function to obtain a pixel-level content loss value;

所述对所述融合图像的结构损失值和内容损失值进行加权计算，得到所述融合图像与所述图像对之间的差异值，包括：The weighted calculation is performed on the structure loss value and the content loss value of the fused image to obtain the difference value between the fused image and the image pair, including:

对所述像素级的结构损失值和所述像素级的内容损失值进行加权计算，得到所述融合图像与所述图像对之间像素级的差异值。A weighted calculation is performed on the pixel-level structure loss value and the pixel-level content loss value to obtain a pixel-level difference value between the fused image and the image pair.

像素级损失函数L_Pixel-wise的计算策略，是直接作用于输入图像(红外图像和可见光图像)和输出图像(融合图像)的L_ssim和L_content的加权组合。计算公式如1-12所示：The calculation strategy of pixel-wise loss function L _Pixel-wise is a weighted combination of L _ssim and L _content directly acting on the input image (infrared image and visible light image) and output image (fused image). The calculation formula is shown in 1-12:

L_Pixel-wise＝αL_ssim+βL_content (1-12)L _Pixel-wise = αL _ssim + βL _content (1-12)

1-12中的L_ssim用于计算输入图像与输出图像的像素级的结构相似度，也即计算融合图像的像素级的结构损失值，该结构损失值由像素级的第一结构损失L_ssim-r和像素级的第二结构损失L_ssim-v加权平均所得到。其中，像素级的第一结构损失L_ssim-r计算如公式1-13所示，像素级的第二结构损失L_ssim-v计算如公式1-14所示，融合图像的像素级的结构损失如公式1-15所示：The L _ssim in 1-12 is used to calculate the pixel-level structural similarity between the input image and the output image, that is, to calculate the pixel-level structural loss value of the fused image, which is determined by the pixel-level first structural loss L _ssim . _-r and pixel-level second structural loss L _ssim-v weighted average. Among them, the first pixel-level structural loss L _ssim-r is calculated as shown in Equation 1-13, and the pixel-level second structural loss L _ssim-v is calculated as shown in Equation 1-14. The pixel-level structural loss of the fused image As shown in Equation 1-15:

L_ssim-r＝1-SSIM(I_r,I_f) (1-13)L _ssim-r =1-SSIM(I _r ,I _f ) (1-13)

L_ssim-v＝1-SSIM(I_v,I_f) (1-14)L _ssim-v =1-SSIM(I _v ,I _f ) (1-14)

L_ssim＝λL_ssim-r+(1-λ)L_ssim-v (1-15)L _ssim =λL _ssim-r +(1-λ)L _ssim-v (1-15)

类似地，1-12中的L_content用于计算输入图像与输出图像的像素级的内容相似度，也即计算融合图像的像素级的内容损失值，该内容损失值由像素级的第一内容损失L_content-r和像素级的第二内容损失L_content-v加权平均所得到。其中，像素级的第一内容损失L_content-r计算如公式1-16所示，像素级的第二内容损失L_content-v计算如公式1-17所示，融合图像的像素级的内容损失如公式1-18所示：Similarly, L _content in 1-12 is used to calculate the pixel-level content similarity between the input image and the output image, that is, to calculate the pixel-level content loss value of the fused image, which is determined by the pixel-level first content. A weighted average of the loss L _content-r and the pixel-level second content loss L _content-v . The calculation of the pixel-level first content loss L _content-r is shown in formula 1-16, the calculation of the pixel-level second content loss L _content-v is shown in formula 1-17, and the pixel-level content loss of the fused image As shown in Equation 1-18:

L_content-r＝||I_f-I_r||₂ (1-16)L _content-r =||I _f -I _r || ₂ (1-16)

L_content-v＝||I_f-I_v||₂ (1-17)L _content-v =||I _f -I _v || ₂ (1-17)

L_content＝δL_content-r+(1-δ)L_content-v (1-18)L _content =δL _content-r +(1-δ)L _content-v (1-18)

参照图6，示出了本申请的一种像素级损失函数的计算流程示意图。Referring to FIG. 6 , a schematic diagram of a calculation flow of a pixel-level loss function of the present application is shown.

像素级计算策略提供了网络损失函数的较为粗略的估计，为获得更为详尽的损失信息，本申请在网络训练中优选采用特征级损失函数计算策略。The pixel-level calculation strategy provides a relatively rough estimation of the network loss function. In order to obtain more detailed loss information, the present application preferably adopts the feature-level loss function calculation strategy in network training.

在本申请的一种可选实施例中，所述根据预设损失函数计算所述融合图像与所述图像对之间的差异值之前，还包括：In an optional embodiment of the present application, before calculating the difference value between the fused image and the image pair according to a preset loss function, the method further includes:

步骤S41、利用预设神经网络分别对所述红外图像、可见光图像、以及融合图像提取深度特征图谱；Step S41, using a preset neural network to extract a depth feature map from the infrared image, the visible light image, and the fused image, respectively;

步骤S42、基于所述红外图像的深度特征图谱和所述融合图像的深度特征图谱，计算特征级的第一结构损失，以及基于所述可见光图像的深度特征图谱和所述融合图像的深度特征图谱，计算特征级的第二结构损失；Step S42, based on the depth feature map of the infrared image and the depth feature map of the fused image, calculate the first structural loss of the feature level, and based on the depth feature map of the visible light image and the depth feature map of the fused image , calculate the feature-level second structure loss;

步骤S43、基于所述红外图像的深度特征图谱和所述融合图像的深度特征图谱，计算特征级的第一内容损失，以及基于所述可见光图像的深度特征图谱和所述融合图像的深度特征图谱，计算特征级的第二内容损失；Step S43, based on the depth feature map of the infrared image and the depth feature map of the fused image, calculate the first content loss of the feature level, and based on the depth feature map of the visible light image and the depth feature map of the fused image , calculate the feature-level second content loss;

所述通过所述结构损失函数对第一结构损失和第二结构损失进行加权平均计算，得到所述融合图像的结构损失值，包括：The weighted average calculation of the first structure loss and the second structure loss through the structure loss function to obtain the structure loss value of the fused image, including:

通过所述结构损失函数对所述特征级的第一结构损失和所述特征级的第二结构损失进行加权平均计算，得到特征级的结构损失值；A weighted average calculation is performed on the first structure loss of the feature level and the second structure loss of the feature level through the structure loss function to obtain a structure loss value of the feature level;

通过所述内容损失函数对所述特征级的第一内容损失和所述特征级的第二内容损失进行加权平均计算，得到特征级的内容损失值；Perform a weighted average calculation on the feature-level first content loss and the feature-level second content loss by using the content loss function to obtain a feature-level content loss value;

对所述特征级的结构损失值和所述特征级的内容损失值进行加权计算，得到所述融合图像与所述图像对之间特征级的差异值。A weighted calculation is performed on the feature-level structure loss value and the feature-level content loss value to obtain a feature-level difference value between the fused image and the image pair.

特征级计算策略利用预设神经网络对需要计算损失的图像对进行特征提取，得到对应的两组深度特征图谱，从全新角度评估损失函数，即在计算得到的特征图谱中进行损失函数计算。The feature-level computing strategy uses a preset neural network to perform feature extraction on the image pairs that need to calculate the loss, and obtains two sets of corresponding deep feature maps, and evaluates the loss function from a new perspective, that is, the loss function is calculated in the calculated feature map.

所述预设神经网络用于提取图像的深度特征图谱，所述预设网络可以为预先训练好的神经网络。一个示例中，所述预设网络可以为VGG19网络。VGG网络是Oxford的VisualGeometry Group的组提出的，VGG网络有两种结构，分别是VGG16和VGG19，两种网络的区别在于深度不同。可以理解，本申请实施例对所述预设神经网络的种类不做限制。The preset neural network is used to extract the depth feature map of the image, and the preset network may be a pre-trained neural network. In an example, the preset network may be a VGG19 network. The VGG network is proposed by the VisualGeometry Group of Oxford. The VGG network has two structures, namely VGG16 and VGG19. The difference between the two networks is the depth. It can be understood that the embodiment of the present application does not limit the types of the preset neural network.

特征级损失函数L_Feature-wise的计算如公式1-19所示：The calculation of the feature-level loss function L _Feature-wise is shown in Equation 1-19:

L_Feature-wise＝αL_ssim+βL_content (1-19)L _Feature-wise = αL _ssim + βL _content (1-19)

1-19中的L_ssim用于计算输入图像与输出图像的特征级的结构相似度，也即计算融合图像的特征级的结构损失值，该结构损失值由特征级的第一结构损失L_ssim-r和特征级的第二结构损失L_ssim-v加权平均所得到。其中，特征级的第一结构损失L_ssim-r计算如公式1-20所示，特征级的第二结构损失L_ssim-v计算如公式1-21所示，融合图像的特征级的结构损失如公式1-22所示：The L _ssim in 1-19 is used to calculate the feature-level structural similarity of the input image and the output image, that is, to calculate the feature-level structural loss value of the fused image, which is determined by the feature-level first structural loss L _{ssim -r} and feature-level second structural loss L _ssim-v weighted average obtained. Among them, the first feature-level structural loss L _ssim-r is calculated as shown in formula 1-20, and the second feature-level structural loss L _ssim-v is calculated as shown in formula 1-21. The feature-level structural loss of the fused image As shown in Equation 1-22:

L_ssim＝λL_ssim-r+(1-λ)L_ssim-v (1-22)L _ssim =λL _ssim-r +(1-λ)L _ssim-v (1-22)

其中，VGG19网络的'ReLU3_2'层的特征图谱标识为m∈{1,2,...,M},M＝256。

分别表示VGG19网络从红外图像、可见光图像和融合图像中提取的深度特征图谱。Among them, the feature map of the 'ReLU3_2' layer of the VGG19 network is identified as m∈{1,2,...,M}, M=256.

denote the deep feature maps extracted by the VGG19 network from infrared images, visible light images, and fused images, respectively.

类似地，1-19中的L_content用于计算输入图像与输出图像的特征级的内容相似度，也即计算融合图像的特征级的内容损失值，该内容损失值由特征级的第一内容损失L_content-r和特征级的第二内容损失L_content-v加权平均所得到。其中，特征级的第一内容损失L_content-r计算如公式1-23所示，特征级的第二内容损失L_content-v计算如公式1-24所示，融合图像的特征级的内容损失如公式1-25所示：Similarly, L _content in 1-19 is used to calculate the feature-level content similarity between the input image and the output image, that is, to calculate the feature-level content loss value of the fused image, which is determined by the feature-level first content. The loss L _content-r and the feature-level second content loss L _content-v are weighted averaged. The calculation of the feature-level first content loss L _content-r is shown in formula 1-23, the calculation of the feature-level second content loss L _content-v is shown in formula 1-24, and the feature-level content loss of the fused image As shown in Equation 1-25:

L_content＝δL_content-r+(1-δ)L_content-v (1-25)L _content =δL _content-r +(1-δ)L _content-v (1-25)

参照图7，示出了本申请的一种特征级损失函数的计算流程示意图。Referring to FIG. 7 , a schematic diagram of the calculation flow of a feature-level loss function of the present application is shown.

在本申请的一个应用示例中，网络训练中网络框架相关模块的卷积层参数设置如表1所示,聚集残余致密块RXDB的内部卷积层参数设置如表2所示。为降低计算成本，本申请实施例可以将每个训练图像块对的像素值进行归一化处理，并将学习率设置为恒定的分段衰减，当网络训练迭代次数达到[5000,10000,14000,18000,20000,24000]时，则将网络训练学习率设置为[0.01,0.007,0.005,0.0025,0.001,0.0001,0.00005]对应数值。In an application example of this application, the parameter settings of the convolution layer of the network framework related modules in network training are shown in Table 1, and the parameter settings of the internal convolution layer of the aggregated residual dense block RXDB are shown in Table 2. In order to reduce the computational cost, the embodiment of the present application can normalize the pixel value of each training image block pair, and set the learning rate to a constant piecewise decay. When the number of network training iterations reaches [5000, 10000, 14000 , 18000, 20000, 24000], set the network training learning rate to the corresponding value of [0.01, 0.007, 0.005, 0.0025, 0.001, 0.0001, 0.00005].

表1Table 1

表2Table 2

可以理解的是，上述参数设置仅作为本申请的一种应用示例，本申请对实际应用中图像融合模型的参数设置不做限制。It can be understood that the above parameter setting is only an application example of the present application, and the present application does not limit the parameter setting of the image fusion model in practical application.

在本申请的一种可选实施例中，所述得到训练完成的图像融合模型之后，还包括：In an optional embodiment of the present application, after obtaining the trained image fusion model, the method further includes:

步骤S51、将待融合的图像对输入所述训练完成的图像融合模型，所述待融合的图像对包含同一场景下的红外图像和可见光图像；Step S51, inputting the image pair to be fused into the trained image fusion model, and the image pair to be fused includes an infrared image and a visible light image in the same scene;

步骤S52、通过所述训练完成的图像融合模型输出融合图像。Step S52, outputting a fused image through the image fusion model completed by the training.

在图像融合模型训练完成之后，即可利用训练完成的图像融合模型进行图像融合处理。在本申请实施例中，图像融合模型为端到端模型，将待融合的同一场景下的红外图像和可见光图像作为图像对输入训练完成的图像融合模型，即可输出融合模型。After the training of the image fusion model is completed, the image fusion process can be performed by using the trained image fusion model. In the embodiment of the present application, the image fusion model is an end-to-end model, and the infrared image and the visible light image in the same scene to be fused are used as the image pair to input the image fusion model after training, and then the fusion model can be output.

进一步地，在将待融合的图像对输入训练完成的图像融合模型之前，可以对图像对进行精准对齐操作，然后再将对齐后的图像对输入训练完成的图像融合模型，以提高融合效果。Further, before inputting the image pairs to be fused into the trained image fusion model, the image pairs can be precisely aligned, and then the aligned image pairs can be input into the trained image fusion model to improve the fusion effect.

综上，本申请提供了一种训练图像融合模型的方法，训练完成的图像融合模型能够有效实现对红外图像与可见光图像的融合，在不引入人为融合规则的前提下，有效提升图像特征分析和提取效果，进一步提升融合特征的丰富性，使得融合图像具有更佳的成像性能，在保留丰富纹理细节的背景信息的同时，平滑地引入红外图像中的场景高亮目标，生成平滑自然的融合图像。同时可以避免引入人为视觉伪影，提供融合图像的客观真实性，提升融合图像的效果。In summary, the present application provides a method for training an image fusion model, and the trained image fusion model can effectively realize the fusion of infrared images and visible light images, and effectively improve image feature analysis and performance without introducing artificial fusion rules. The extraction effect further enhances the richness of fusion features, making the fusion image have better imaging performance. While retaining the background information of rich texture details, it smoothly introduces the scene highlight target in the infrared image to generate a smooth and natural fusion image. . At the same time, the introduction of artificial visual artifacts can be avoided, the objective authenticity of the fused image can be provided, and the effect of the fused image can be improved.

需要说明的是，本申请实施例提供的训练图像融合模型的方法，执行主体可以为训练图像融合模型的装置，或者该训练图像融合模型的装置中的用于执行训练图像融合模型的方法的控制模块。本申请实施例中以训练图像融合模型的装置执行训练图像融合模型的方法为例，说明本申请实施例提供的训练图像融合模型的装置。It should be noted that, in the method for training an image fusion model provided by the embodiments of the present application, the execution subject may be a device for training an image fusion model, or the control of the method for executing a method for training an image fusion model in the device for training an image fusion model. module. In the embodiments of the present application, the apparatus for training an image fusion model provided by the embodiments of the present application is described by taking the method for training an image fusion model performed by an apparatus for training an image fusion model as an example.

参照图8，示出了本申请的一种训练图像融合模型的装置实施例的结构示意图，该装置包括：Referring to FIG. 8 , a schematic structural diagram of an embodiment of an apparatus for training an image fusion model of the present application is shown, and the apparatus includes:

数据获取模块801，用于获取训练数据集，所述训练数据集中包含图像对，所述图像对包含同一场景下的红外图像和可见光图像；A data acquisition module 801, configured to acquire a training data set, where the training data set includes an image pair, and the image pair includes an infrared image and a visible light image in the same scene;

数据输入模块802，用于将所述图像对输入初始的图像融合模型，所述图像融合模型为包含浅层特征提取网络、深层特征提取网络、全局特征融合网络、特征重构网络的深度神经网络模型；The data input module 802 is used to input the image pair into an initial image fusion model, where the image fusion model is a deep neural network including a shallow feature extraction network, a deep feature extraction network, a global feature fusion network, and a feature reconstruction network Model;

数据处理模块803，用于经过所述浅层特征提取网络、深层特征提取网络、全局特征融合网络、特征重构网络的依次处理，得到所述图像对的融合图像；The data processing module 803 is configured to obtain the fusion image of the image pair through the sequential processing of the shallow feature extraction network, the deep feature extraction network, the global feature fusion network, and the feature reconstruction network;

参数调整模块804，用于根据预设损失函数计算所述融合图像与所述图像对之间的差异值，并根据计算的差异值更新所述图像融合模型的网络参数，直到计算的差异值小于预设阈值，得到训练完成的图像融合模型，所述预设损失函数用于计算结构损失值和内容损失值。A parameter adjustment module 804, configured to calculate the difference value between the fused image and the image pair according to a preset loss function, and update the network parameters of the image fusion model according to the calculated difference value, until the calculated difference value is less than A preset threshold is used to obtain a trained image fusion model, and the preset loss function is used to calculate the structure loss value and the content loss value.

可选地，所述数据处理模块803，包括：Optionally, the data processing module 803 includes:

级联处理子模块，用于对所述图像对进行通道级联处理，得到级联图像；a cascade processing sub-module for performing channel cascade processing on the image pair to obtain cascaded images;

浅层提取子模块，用于将所述级联图像输入所述浅层特征提取网络以提取浅层特征图谱；a shallow layer extraction sub-module for inputting the concatenated images into the shallow layer feature extraction network to extract a shallow layer feature map;

深层提取子模块，用于将所述浅层特征图谱输入所述深层特征提取网络以提取深度特征信息；a deep extraction sub-module for inputting the shallow feature map into the deep feature extraction network to extract deep feature information;

特征融合子模块，用于将所述浅层特征图谱和所述深度特征信息输入所述全局特征融合网络以对所述浅层特征图谱和所述深度特征信息进行整合，得到整合后的图像；A feature fusion submodule, configured to input the shallow feature map and the deep feature information into the global feature fusion network to integrate the shallow feature map and the deep feature information to obtain an integrated image;

特征重构子模块，用于将所述整合后的图像输入所述特征重构网络以对所述整合后的图像进行特征重构，得到融合图像。The feature reconstruction sub-module is used for inputting the integrated image into the feature reconstruction network to perform feature reconstruction on the integrated image to obtain a fusion image.

可选地，所述预设损失函数包含结构损失函数和内容损失函数，所述参数调整模块，包括：Optionally, the preset loss function includes a structure loss function and a content loss function, and the parameter adjustment module includes:

结构损失值计算子模块，用于通过所述结构损失函数对第一结构损失和第二结构损失进行加权平均计算，得到所述融合图像的结构损失值，所述第一结构损失为所述红外图像与所述融合图像之间的结构损失，所述第二结构损失为所述可见光图像与所述融合图像之间的结构损失；The structure loss value calculation sub-module is used to perform a weighted average calculation on the first structure loss and the second structure loss through the structure loss function to obtain the structure loss value of the fusion image, and the first structure loss is the infrared The structure loss between the image and the fused image, and the second structure loss is the structure loss between the visible light image and the fused image;

内容损失值计算子模块，用于通过所述内容损失函数对第一内容损失和第二内容损失进行加权平均计算，得到所述融合图像的内容损失值，所述第一内容损失为所述红外图像与所述融合图像之间的内容损失，所述第二内容损失为所述可见光图像与所述融合图像之间的内容损失；A content loss value calculation submodule, configured to perform a weighted average calculation on the first content loss and the second content loss through the content loss function to obtain a content loss value of the fused image, where the first content loss is the infrared The content loss between the image and the fused image, the second content loss is the content loss between the visible light image and the fused image;

差异值计算子模块，用于对所述融合图像的结构损失值和内容损失值进行加权计算，得到所述融合图像与所述图像对之间的差异值。The difference value calculation sub-module is configured to perform weighted calculation on the structure loss value and the content loss value of the fused image to obtain the difference value between the fused image and the image pair.

可选地，所述结构损失值计算子模块，具体用于通过所述结构损失函数对像素级的第一结构损失和第二结构损失进行加权平均计算，得到像素级的结构损失值；Optionally, the structure loss value calculation sub-module is specifically configured to perform a weighted average calculation on the pixel-level first structure loss and the second structure loss through the structure loss function to obtain a pixel-level structure loss value;

所述内容损失值计算子模块，具体用于通过所述内容损失函数对像素级的第一内容损失和第二内容损失进行加权平均计算，得到像素级的内容损失值；The content loss value calculation submodule is specifically configured to perform a weighted average calculation on the pixel-level first content loss and the second content loss through the content loss function to obtain a pixel-level content loss value;

所述差异值计算子模块，具体用于对所述像素级的结构损失值和所述像素级的内容损失值进行加权计算，得到所述融合图像与所述图像对之间像素级的差异值。The difference value calculation sub-module is specifically configured to perform weighted calculation on the pixel-level structure loss value and the pixel-level content loss value to obtain a pixel-level difference value between the fused image and the image pair .

可选地，所述装置还包括：Optionally, the device further includes:

特征提取模块，用于利用预设神经网络分别对所述红外图像、可见光图像、以及融合图像提取深度特征图谱；a feature extraction module for extracting depth feature maps from the infrared image, the visible light image, and the fused image respectively by using a preset neural network;

第一计算模块，用于基于所述红外图像的深度特征图谱和所述融合图像的深度特征图谱，计算特征级的第一结构损失，以及基于所述可见光图像的深度特征图谱和所述融合图像的深度特征图谱，计算特征级的第二结构损失；a first calculation module, configured to calculate the first structural loss at the feature level based on the depth feature map of the infrared image and the depth feature map of the fusion image, and the depth feature map based on the visible light image and the fusion image The depth feature map of , calculates the feature-level second structure loss;

第二计算模块，用于基于所述红外图像的深度特征图谱和所述融合图像的深度特征图谱，计算特征级的第一内容损失，以及基于所述可见光图像的深度特征图谱和所述融合图像的深度特征图谱，计算特征级的第二内容损失；The second calculation module is configured to calculate the first content loss at feature level based on the depth feature map of the infrared image and the depth feature map of the fused image, and based on the depth feature map of the visible light image and the fused image The depth feature map of , calculates the feature-level second content loss;

所述结构损失值计算子模块，具体用于通过所述结构损失函数对所述特征级的第一结构损失和所述特征级的第二结构损失进行加权平均计算，得到特征级的结构损失值；The structure loss value calculation sub-module is specifically configured to perform a weighted average calculation on the first structure loss of the feature level and the second structure loss of the feature level through the structure loss function, so as to obtain the structure loss value of the feature level ;

所述内容损失值计算子模块，具体用于通过所述内容损失函数对所述特征级的第一内容损失和所述特征级的第二内容损失进行加权平均计算，得到特征级的内容损失值；The content loss value calculation submodule is specifically configured to perform a weighted average calculation on the first content loss at the feature level and the second content loss at the feature level through the content loss function to obtain a content loss value at the feature level ;

所述差异值计算子模块，具体用于对所述特征级的结构损失值和所述特征级的内容损失值进行加权计算，得到所述融合图像与所述图像对之间特征级的差异值。The difference value calculation submodule is specifically configured to perform weighted calculation on the feature-level structure loss value and the feature-level content loss value to obtain the feature-level difference value between the fused image and the image pair .

可选地，所述深层特征提取网络包括至少两层用于提取深度特征信息的聚集残余致密块，所述特征融合子模块，包括：Optionally, the deep feature extraction network includes at least two layers of aggregated residual dense blocks for extracting deep feature information, and the feature fusion sub-module includes:

第一融合单元，用于对每一层聚集残余致密块提取的深度特征信息进行特征融合，得到全局特征；The first fusion unit is used to perform feature fusion on the depth feature information extracted from each layer of aggregated residual dense blocks to obtain global features;

第二融合单元，用于对所述全局特征以及所述浅层特征提取网络提取的浅层特征图谱进行特征融合，得到整合后的图像。The second fusion unit is configured to perform feature fusion on the global feature and the shallow feature map extracted by the shallow feature extraction network to obtain an integrated image.

可选地，其特征在于，所述装置还包括：Optionally, the device further includes:

图像输入模块，用于将待融合的图像对输入所述训练完成的图像融合模型，所述待融合的图像对包含同一场景下的红外图像和可见光图像；an image input module, configured to input the image pair to be fused into the image fusion model that has been trained, and the image pair to be fused includes an infrared image and a visible light image in the same scene;

图像融合模块，用于通过所述训练完成的图像融合模型输出融合图像。The image fusion module is used for outputting the fusion image through the image fusion model completed by the training.

本申请提供了一种训练图像融合模型的装置，通过该装置训练完成的图像融合模型能够有效实现对红外图像与可见光图像的融合，在不引入人为融合规则的前提下，有效提升图像特征分析和提取效果，进一步提升融合特征的丰富性，使得融合图像具有更佳的成像性能，在保留丰富纹理细节的背景信息的同时，平滑地引入红外图像中的场景高亮目标，生成平滑自然的融合图像。同时可以避免引入人为视觉伪影，提供融合图像的客观真实性，提升融合图像的效果。The present application provides a device for training an image fusion model. The image fusion model trained by the device can effectively realize the fusion of infrared images and visible light images, and effectively improve image feature analysis and performance without introducing artificial fusion rules. The extraction effect further enhances the richness of fusion features, making the fusion image have better imaging performance. While retaining the background information of rich texture details, it smoothly introduces the scene highlight target in the infrared image to generate a smooth and natural fusion image. . At the same time, the introduction of artificial visual artifacts can be avoided, the objective authenticity of the fused image can be provided, and the effect of the fused image can be improved.

本申请实施例中的训练图像融合模型的装置可以是装置，也可以是终端中的部件、集成电路、或芯片。该装置可以是移动电子设备，也可以为非移动电子设备。示例性的，移动电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer，UMPC)、上网本或者个人数字助理(personal digital assistant，PDA)等，非移动电子设备可以为服务器、网络附属存储器(Network Attached Storage，NAS)、个人计算机(personal computer，PC)、电视机(television，TV)、柜员机或者自助机等，本申请实施例不作具体限定。The apparatus for training an image fusion model in this embodiment of the present application may be an apparatus, or may be a component, an integrated circuit, or a chip in a terminal. The apparatus may be a mobile electronic device or a non-mobile electronic device. Exemplarily, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, an in-vehicle electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a personal digital assistant (personal digital assistant). assistant, PDA), etc., the non-mobile electronic device can be a server, a network attached storage (NAS), a personal computer (personal computer, PC), a television (television, TV), a teller machine or a self-service machine, etc. This application Examples are not specifically limited.

本申请实施例中的训练图像融合模型的装置可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统，可以为ios操作系统，还可以为其他可能的操作系统，本申请实施例不作具体限定。The apparatus for training an image fusion model in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.

本申请实施例提供的训练图像融合模型的装置能够实现图1的方法实施例实现的各个过程，为避免重复，这里不再赘述。The apparatus for training an image fusion model provided in this embodiment of the present application can implement each process implemented by the method embodiment in FIG. 1 , and to avoid repetition, details are not repeated here.

可选的，如图9所示，本申请实施例还提供一种电子设备900，包括处理器901，存储器902，存储在存储器902上并可在所述处理器901上运行的程序或指令，该程序或指令被处理器901执行时实现上述训练图像融合模型的方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。Optionally, as shown in FIG. 9 , an embodiment of the present application further provides an electronic device 900, including a processor 901, a memory 902, a program or instruction stored in the memory 902 and executable on the processor 901, When the program or instruction is executed by the processor 901, each process of the above-mentioned method embodiment for training an image fusion model can be achieved, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.

需要说明的是，本申请实施例中的电子设备包括上述所述的移动电子设备和非移动电子设备。It should be noted that the electronic devices in the embodiments of the present application include the aforementioned mobile electronic devices and non-mobile electronic devices.

图10为实现本申请实施例的一种电子设备的硬件结构示意图。FIG. 10 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

该电子设备1000包括但不限于：射频单元1001、网络模块1002、音频输出单元1003、输入单元1004、传感器1005、显示单元1006、用户输入单元1007、接口单元1008、存储器1009、以及处理器1010等部件。The electronic device 1000 includes but is not limited to: a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, and a processor 1010, etc. part.

本领域技术人员可以理解，电子设备1000还可以包括给各个部件供电的电源(比如电池)，电源可以通过电源管理系统与处理器1010逻辑相连，从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图10中示出的电子设备结构并不构成对电子设备的限定，电子设备可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置，在此不再赘述。Those skilled in the art can understand that the electronic device 1000 may also include a power source (such as a battery) for supplying power to various components, and the power source may be logically connected to the processor 1010 through a power management system, so that the power management system can manage charging, discharging, and power functions. consumption management and other functions. The structure of the electronic device shown in FIG. 10 does not constitute a limitation on the electronic device, and the electronic device may include more or less components than the one shown, or combine some components, or arrange different components, which will not be repeated here. .

其中，处理器1010，用于获取训练数据集，所述训练数据集中包含图像对，所述图像对包含同一场景下的红外图像和可见光图像；将所述图像对输入初始的图像融合模型，所述图像融合模型为包含浅层特征提取网络、深层特征提取网络、全局特征融合网络、特征重构网络的深度神经网络模型；经过所述浅层特征提取网络、深层特征提取网络、全局特征融合网络、特征重构网络的依次处理，得到所述图像对的融合图像；根据预设损失函数计算所述融合图像与所述图像对之间的差异值，并根据计算的差异值更新所述图像融合模型的网络参数，直到计算的差异值小于预设阈值，得到训练完成的图像融合模型，所述预设损失函数用于计算结构损失值和内容损失值。The processor 1010 is configured to acquire a training data set, the training data set includes image pairs, and the image pairs include infrared images and visible light images in the same scene; the image pairs are input into the initial image fusion model, and the The image fusion model is a deep neural network model including a shallow feature extraction network, a deep feature extraction network, a global feature fusion network, and a feature reconstruction network; after the shallow feature extraction network, the deep feature extraction network, and the global feature fusion network , the sequential processing of the feature reconstruction network, the fusion image of the image pair is obtained; the difference value between the fusion image and the image pair is calculated according to the preset loss function, and the image fusion value is updated according to the calculated difference value. The network parameters of the model are calculated until the calculated difference value is less than the preset threshold, and the image fusion model after training is obtained, and the preset loss function is used to calculate the structure loss value and the content loss value.

本申请基于神经网络模型实现红外图像与可见光图像的融合，神经网络模型可以通过模拟人眼神经元结构，分解并提取更丰富且类型适宜的图像特征，可以提高特征提取的准确性。The present application realizes the fusion of infrared images and visible light images based on a neural network model. The neural network model can decompose and extract more abundant and suitable image features by simulating the neuron structure of the human eye, which can improve the accuracy of feature extraction.

可选的，处理器110，还用于对所述图像对进行通道级联处理，得到级联图像；将所述级联图像输入所述浅层特征提取网络以提取浅层特征图谱；将所述浅层特征图谱输入所述深层特征提取网络以提取深度特征信息；将所述浅层特征图谱和所述深度特征信息输入所述全局特征融合网络以对所述浅层特征图谱和所述深度特征信息进行整合，得到整合后的图像；将所述整合后的图像输入所述特征重构网络以对所述整合后的图像进行特征重构，得到融合图像。Optionally, the processor 110 is further configured to perform channel concatenation processing on the image pair to obtain a concatenated image; input the concatenated image into the shallow feature extraction network to extract a shallow feature map; The shallow feature map is input into the deep feature extraction network to extract deep feature information; the shallow feature map and the deep feature information are input into the global feature fusion network to extract the shallow feature map and the depth feature information. The feature information is integrated to obtain an integrated image; the integrated image is input into the feature reconstruction network to perform feature reconstruction on the integrated image to obtain a fusion image.

可选地，所述预设损失函数包含结构损失函数和内容损失函数，所述根据预设损失函数计算所述融合图像与所述图像对之间的差异值，处理器110，还用于通过所述结构损失函数对第一结构损失和第二结构损失进行加权平均计算，得到所述融合图像的结构损失值，所述第一结构损失为所述红外图像与所述融合图像之间的结构损失，所述第二结构损失为所述可见光图像与所述融合图像之间的结构损失；通过所述内容损失函数对第一内容损失和第二内容损失进行加权平均计算，得到所述融合图像的内容损失值，所述第一内容损失为所述红外图像与所述融合图像之间的内容损失，所述第二内容损失为所述可见光图像与所述融合图像之间的内容损失；对所述融合图像的结构损失值和内容损失值进行加权计算，得到所述融合图像与所述图像对之间的差异值。Optionally, the preset loss function includes a structure loss function and a content loss function, and the difference value between the fused image and the image pair is calculated according to the preset loss function, and the processor 110 is further configured to pass The structure loss function performs a weighted average calculation on the first structure loss and the second structure loss to obtain the structure loss value of the fusion image, and the first structure loss is the structure between the infrared image and the fusion image loss, the second structural loss is the structural loss between the visible light image and the fused image; the first content loss and the second content loss are weighted and averaged through the content loss function to obtain the fused image , the first content loss is the content loss between the infrared image and the fused image, and the second content loss is the content loss between the visible light image and the fused image; The structure loss value and the content loss value of the fused image are weighted and calculated to obtain the difference value between the fused image and the image pair.

可选地，处理器110，还用于通过所述结构损失函数对像素级的第一结构损失和第二结构损失进行加权平均计算，得到像素级的结构损失值；通过所述内容损失函数对像素级的第一内容损失和第二内容损失进行加权平均计算，得到像素级的内容损失值；对所述像素级的结构损失值和所述像素级的内容损失值进行加权计算，得到所述融合图像与所述图像对之间像素级的差异值。Optionally, the processor 110 is further configured to perform a weighted average calculation on the pixel-level first structure loss and the second structure loss by using the structure loss function to obtain a pixel-level structure loss value; Perform a weighted average calculation on the pixel-level first content loss and the second content loss to obtain a pixel-level content loss value; perform weighted calculation on the pixel-level structure loss value and the pixel-level content loss value to obtain the pixel-level structure loss value and the pixel-level content loss value. Pixel-level difference values between the fused image and the pair of images.

本申请实施例针对预设损失函数设计两种不同的计算策略，分别为像素级损失函数和特征级损失函数，适用于不同需求。The embodiment of the present application designs two different calculation strategies for the preset loss function, namely, a pixel-level loss function and a feature-level loss function, which are suitable for different requirements.

可选地，处理器110，还用于利用预设神经网络分别对所述红外图像、可见光图像、以及融合图像提取深度特征图谱；基于所述红外图像的深度特征图谱和所述融合图像的深度特征图谱，计算特征级的第一结构损失，以及基于所述可见光图像的深度特征图谱和所述融合图像的深度特征图谱，计算特征级的第二结构损失；基于所述红外图像的深度特征图谱和所述融合图像的深度特征图谱，计算特征级的第一内容损失，以及基于所述可见光图像的深度特征图谱和所述融合图像的深度特征图谱，计算特征级的第二内容损失；通过所述结构损失函数对所述特征级的第一结构损失和所述特征级的第二结构损失进行加权平均计算，得到特征级的结构损失值；通过所述内容损失函数对所述特征级的第一内容损失和所述特征级的第二内容损失进行加权平均计算，得到特征级的内容损失值；对所述特征级的结构损失值和所述特征级的内容损失值进行加权计算，得到所述融合图像与所述图像对之间特征级的差异值。Optionally, the processor 110 is further configured to use a preset neural network to extract a depth feature map from the infrared image, the visible light image, and the fused image, respectively; based on the depth feature map of the infrared image and the depth of the fused image. feature map, calculate the first structure loss at feature level, and calculate the second structure loss at feature level based on the depth feature map of the visible light image and the depth feature map of the fusion image; based on the depth feature map of the infrared image and the depth feature map of the fused image, calculate the first content loss at the feature level, and calculate the second content loss at the feature level based on the depth feature map of the visible light image and the depth feature map of the fused image; The structure loss function performs a weighted average calculation on the first structure loss of the feature level and the second structure loss of the feature level to obtain the structure loss value of the feature level; A weighted average calculation is performed on the first content loss and the second content loss at the feature level to obtain the content loss value at the feature level; the weighted calculation is performed on the structure loss value at the feature level and the content loss value at the feature level to obtain the content loss value at the feature level. feature-level difference value between the fused image and the image pair.

像素级计算策略提供了网络损失函数的较为粗略的估计，为获得更为详尽的损失信息，本申请在网络训练中优选采用特征级损失函数计算策略，以提高融合图像的精度。The pixel-level calculation strategy provides a relatively rough estimation of the network loss function. In order to obtain more detailed loss information, the present application preferably adopts the feature-level loss function calculation strategy in network training to improve the accuracy of the fusion image.

可选地，所述深层特征提取网络包括至少两层用于提取深度特征信息的聚集残余致密块，处理器110，还用于对每一层聚集残余致密块提取的深度特征信息进行特征融合，得到全局特征；对所述全局特征以及所述浅层特征提取网络提取的浅层特征图谱进行特征融合，得到整合后的图像。Optionally, the deep feature extraction network includes at least two layers of aggregated residual dense blocks for extracting depth feature information, and the processor 110 is further configured to perform feature fusion on the depth feature information extracted from each layer of aggregated residual dense blocks, Obtaining a global feature; performing feature fusion on the global feature and the shallow feature map extracted by the shallow feature extraction network to obtain an integrated image.

可选地，处理器110，还用于将待融合的图像对输入所述训练完成的图像融合模型，所述待融合的图像对包含同一场景下的红外图像和可见光图像；通过所述训练完成的图像融合模型输出融合图像。Optionally, the processor 110 is further configured to input the image pair to be fused into the image fusion model completed by the training, and the image pair to be fused includes an infrared image and a visible light image in the same scene; The image fusion model of the output fusion image.

训练完成的图像融合模型为端对端模型，在得到训练完成的图像融合模型之后，将待融合的图像对输入所述训练完成的图像融合模型，即可通过所述训练完成的图像融合模型输出融合图像，可以提高图像融合操作的便捷性。The trained image fusion model is an end-to-end model. After the trained image fusion model is obtained, the image pair to be fused is input into the trained image fusion model, and the trained image fusion model can be output. Fusion of images can improve the convenience of image fusion operations.

应理解的是，本申请实施例中，输入单元1004可以包括图形处理器(GraphicsProcessing Unit，GPU)1041和麦克风1042，图形处理器1041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元1006可包括显示面板1061，可以采用液晶显示器、有机发光二极管等形式来配置显示面板1061。用户输入单元1007包括触控面板1071以及其他输入设备1072。触控面板1071，也称为触摸屏。触控面板1071可包括触摸检测装置和触摸控制器两个部分。其他输入设备1072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆，在此不再赘述。存储器1009可用于存储软件程序以及各种数据，包括但不限于应用程序和操作系统。处理器1010可集成应用处理器和调制解调处理器，其中，应用处理器主要处理操作系统、用户界面和应用程序等，调制解调处理器主要处理无线通信。可以理解的是，上述调制解调处理器也可以不集成到处理器1010中。It should be understood that, in this embodiment of the present application, the input unit 1004 may include a graphics processor (Graphics Processing Unit, GPU) 1041 and a microphone 1042. camera) to process the image data of still pictures or videos. The display unit 1006 may include a display panel 1061, which may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes a touch panel 1071 and other input devices 1072 . The touch panel 1071 is also called a touch screen. The touch panel 1071 may include two parts, a touch detection device and a touch controller. Other input devices 1072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which are not described herein again. Memory 1009 may be used to store software programs as well as various data, including but not limited to application programs and operating systems. The processor 1010 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface, and application programs, and the like, and the modem processor mainly processes wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 1010.

本申请实施例还提供一种可读存储介质，所述可读存储介质上存储有程序或指令，该程序或指令被处理器执行时实现上述训练图像融合模型的方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。Embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, each process of the above-mentioned method embodiment for training an image fusion model is implemented, and The same technical effect can be achieved, and in order to avoid repetition, details are not repeated here.

其中，所述处理器为上述实施例中所述的电子设备中的处理器。所述可读存储介质，包括计算机可读存储介质，如计算机只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等。Wherein, the processor is the processor in the electronic device described in the foregoing embodiments. The readable storage medium includes a computer-readable storage medium, such as a computer read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.

本申请实施例另提供了一种芯片，所述芯片包括处理器和通信接口，所述通信接口和所述处理器耦合，所述处理器用于运行程序或指令，实现上述训练图像融合模型的方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。An embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run a program or an instruction to implement the above method for training an image fusion model Each process of the embodiment can achieve the same technical effect, and to avoid repetition, it will not be repeated here.

应理解，本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。It should be understood that the chip mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip, a system-on-a-chip, or a system-on-a-chip, or the like.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外，需要指出的是，本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能，还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能，例如，可以按不同于所描述的次序来执行所描述的方法，并且还可以添加、省去、或组合各种步骤。另外，参照某些示例所描述的特征可在其他示例中被组合。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in the reverse order depending on the functions involved. To perform functions, for example, the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to some examples may be combined in other examples.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.

上面结合附图对本申请的实施例进行了描述，但是本申请并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本申请的启示下，在不脱离本申请宗旨和权利要求所保护的范围情况下，还可做出很多形式，均属于本申请的保护之内。The embodiments of the present application have been described above in conjunction with the accompanying drawings, but the present application is not limited to the above-mentioned specific embodiments, which are merely illustrative rather than restrictive. Under the inspiration of this application, without departing from the scope of protection of the purpose of this application and the claims, many forms can be made, which all fall within the protection of this application.

Claims

1. a method for training an image fusion model, wherein the method comprises:

Obtaining a training data set, the training data set includes an image pair, and the image pair includes an infrared image and a visible light image in the same scene;

Inputting the image pair into an initial image fusion model, where the image fusion model is a deep neural network model comprising a shallow feature extraction network, a deep feature extraction network, a global feature fusion network, and a feature reconstruction network;

Through the sequential processing of the shallow feature extraction network, the deep feature extraction network, the global feature fusion network, and the feature reconstruction network, the fusion image of the image pair is obtained;

Calculate the difference value between the fused image and the image pair according to the preset loss function, and update the network parameters of the image fusion model according to the calculated difference value, until the calculated difference value is less than the preset threshold, and the training is completed The image fusion model of , the preset loss function is used to calculate the structure loss value and the content loss value.

2 . The method according to claim 1 , characterized in that, through the sequential processing of the shallow feature extraction network, the deep feature extraction network, the global feature fusion network, and the feature reconstruction network, the image pair is obtained. 3 . Fused images, including:

Perform channel concatenation processing on the image pair to obtain concatenated images;

inputting the concatenated images into the shallow feature extraction network to extract a shallow feature map;

Inputting the shallow feature map into the deep feature extraction network to extract deep feature information;

Inputting the shallow feature map and the deep feature information into the global feature fusion network to integrate the shallow feature map and the deep feature information to obtain an integrated image;

The integrated image is input into the feature reconstruction network to perform feature reconstruction on the integrated image to obtain a fusion image.

3. The method according to claim 1, wherein the preset loss function comprises a structure loss function and a content loss function, and the calculation of the difference between the fused image and the image pair according to the preset loss function. Difference values, including:

The first structure loss and the second structure loss are weighted and averaged by the structure loss function to obtain the structure loss value of the fusion image, and the first structure loss is the difference between the infrared image and the fusion image. structural loss, the second structural loss is the structural loss between the visible light image and the fusion image;

A weighted average calculation is performed on the first content loss and the second content loss through the content loss function to obtain the content loss value of the fused image, where the first content loss is the difference between the infrared image and the fused image. content loss, the second content loss is the content loss between the visible light image and the fusion image;

A weighted calculation is performed on the structure loss value and the content loss value of the fused image to obtain a difference value between the fused image and the image pair.

4. The method according to claim 3, wherein the weighted average calculation is performed on the first structure loss and the second structure loss by the structure loss function to obtain the structure loss value of the fused image, comprising:

Perform a weighted average calculation on the pixel-level first structure loss and the second structure loss through the structure loss function to obtain a pixel-level structure loss value;

The weighted average calculation is performed on the first content loss and the second content loss through the content loss function to obtain the content loss value of the fused image, including:

Perform a weighted average calculation on the pixel-level first content loss and the second content loss through the content loss function to obtain a pixel-level content loss value;

The weighted calculation is performed on the structure loss value and the content loss value of the fused image to obtain the difference value between the fused image and the image pair, including:

A weighted calculation is performed on the pixel-level structure loss value and the pixel-level content loss value to obtain a pixel-level difference value between the fused image and the image pair.

5. The method according to claim 3, wherein before calculating the difference value between the fused image and the image pair according to a preset loss function, the method further comprises:

Using a preset neural network to extract a depth feature map from the infrared image, the visible light image, and the fusion image, respectively;

A first structural loss at feature level is calculated based on the depth feature map of the infrared image and the depth feature map of the fused image, and features are calculated based on the depth feature map of the visible light image and the depth feature map of the fused image The second structural loss of the stage;

A feature-level first content loss is calculated based on the depth feature map of the infrared image and the depth feature map of the fused image, and features are calculated based on the depth feature map of the visible light image and the depth feature map of the fused image Level 2 loss of content;

The weighted average calculation of the first structure loss and the second structure loss through the structure loss function to obtain the structure loss value of the fused image, including:

A weighted average calculation is performed on the first structure loss of the feature level and the second structure loss of the feature level through the structure loss function to obtain a structure loss value of the feature level;

Perform a weighted average calculation on the feature-level first content loss and the feature-level second content loss by using the content loss function to obtain a feature-level content loss value;

A weighted calculation is performed on the feature-level structure loss value and the feature-level content loss value to obtain a feature-level difference value between the fused image and the image pair.

6. The method according to claim 2, wherein the deep feature extraction network comprises at least two layers of aggregated residual dense blocks for extracting deep feature information, the shallow feature map and the depth The feature information is input into the global feature fusion network to integrate the shallow feature map and the deep feature information to obtain an integrated image, including:

Perform feature fusion on the depth feature information extracted from each layer of aggregated residual dense blocks to obtain global features;

Feature fusion is performed on the global feature and the shallow feature map extracted by the shallow feature extraction network to obtain an integrated image.

7. method according to claim 1, is characterized in that, after described obtaining the image fusion model that the training completes, also comprises:

Inputting the image pair to be fused into the trained image fusion model, and the image pair to be fused includes an infrared image and a visible light image in the same scene;

The image fusion model completed by the training outputs a fusion image.

8. A device for training an image fusion model, wherein the device comprises:

a data acquisition module, used for acquiring a training data set, the training data set includes image pairs, and the image pairs include infrared images and visible light images in the same scene;

A data input module for inputting the image pair into an initial image fusion model, where the image fusion model is a deep neural network model including a shallow feature extraction network, a deep feature extraction network, a global feature fusion network, and a feature reconstruction network ;

a data processing module, configured to obtain a fusion image of the image pair through the sequential processing of the shallow feature extraction network, the deep feature extraction network, the global feature fusion network, and the feature reconstruction network;

A parameter adjustment module, configured to calculate the difference value between the fusion image and the image pair according to a preset loss function, and update the network parameters of the image fusion model according to the calculated difference value, until the calculated difference value is less than the predetermined value. A threshold is set to obtain a trained image fusion model, and the preset loss function is used to calculate the structure loss value and the content loss value.

9. The device according to claim 8, wherein the data processing module comprises:

a cascade processing sub-module for performing channel cascade processing on the image pair to obtain cascaded images;

a shallow layer extraction sub-module for inputting the concatenated images into the shallow layer feature extraction network to extract a shallow layer feature map;

a deep extraction sub-module for inputting the shallow feature map into the deep feature extraction network to extract deep feature information;

A feature fusion submodule, configured to input the shallow feature map and the deep feature information into the global feature fusion network to integrate the shallow feature map and the deep feature information to obtain an integrated image;

The feature reconstruction sub-module is used for inputting the integrated image into the feature reconstruction network to perform feature reconstruction on the integrated image to obtain a fusion image.

10. The apparatus according to claim 8, wherein the preset loss function comprises a structure loss function and a content loss function, and the parameter adjustment module comprises:

The structure loss value calculation sub-module is used to perform a weighted average calculation on the first structure loss and the second structure loss through the structure loss function to obtain the structure loss value of the fusion image, and the first structure loss is the infrared The structure loss between the image and the fused image, and the second structure loss is the structure loss between the visible light image and the fused image;

A content loss value calculation submodule, configured to perform a weighted average calculation on the first content loss and the second content loss through the content loss function to obtain a content loss value of the fused image, where the first content loss is the infrared The content loss between the image and the fused image, the second content loss is the content loss between the visible light image and the fused image;

The difference value calculation sub-module is configured to perform weighted calculation on the structure loss value and the content loss value of the fused image to obtain the difference value between the fused image and the image pair.

11. The apparatus according to claim 10, wherein the structure loss value calculation sub-module is specifically configured to perform a weighted average calculation on the pixel-level first structure loss and the second structure loss through the structure loss function , get the pixel-level structure loss value;

The content loss value calculation submodule is specifically configured to perform a weighted average calculation on the pixel-level first content loss and the second content loss through the content loss function to obtain a pixel-level content loss value;

The difference value calculation sub-module is specifically configured to perform weighted calculation on the pixel-level structure loss value and the pixel-level content loss value to obtain a pixel-level difference value between the fused image and the image pair .

12. The apparatus of claim 11, wherein the apparatus further comprises:

a feature extraction module for extracting depth feature maps from the infrared image, the visible light image, and the fused image respectively by using a preset neural network;

a first computing module, configured to calculate the first structural loss at feature level based on the depth feature map of the infrared image and the depth feature map of the fusion image, and the depth feature map based on the visible light image and the fusion image The depth feature map of , calculates the feature-level second structure loss;

The second calculation module is configured to calculate the first content loss at feature level based on the depth feature map of the infrared image and the depth feature map of the fused image, and based on the depth feature map of the visible light image and the fused image The depth feature map of , calculates the feature-level second content loss;

The structure loss value calculation sub-module is specifically configured to perform a weighted average calculation on the first structure loss of the feature level and the second structure loss of the feature level through the structure loss function, so as to obtain the structure loss value of the feature level ;

The content loss value calculation submodule is specifically configured to perform a weighted average calculation on the feature-level first content loss and the feature-level second content loss through the content loss function to obtain a feature-level content loss value ;

The difference value calculation submodule is specifically configured to perform weighted calculation on the feature-level structure loss value and the feature-level content loss value to obtain the feature-level difference value between the fused image and the image pair .

13. The apparatus according to claim 9, wherein the deep feature extraction network comprises at least two layers of aggregated residual dense blocks for extracting deep feature information, and the feature fusion sub-module comprises:

The first fusion unit is used to perform feature fusion on the depth feature information extracted from each layer of aggregated residual dense blocks to obtain global features;

The second fusion unit is configured to perform feature fusion on the global feature and the shallow feature map extracted by the shallow feature extraction network to obtain an integrated image.

14. The apparatus of claim 8, wherein the apparatus further comprises:

an image input module, configured to input the image pair to be fused into the image fusion model that has been trained, and the image pair to be fused includes an infrared image and a visible light image in the same scene;

The image fusion module is used for outputting the fusion image through the image fusion model completed by the training.

15. An electronic device, characterized in that it comprises a processor, a memory, and a program or instruction stored on the memory and executable on the processor, and the program or instruction is implemented when executed by the processor The steps of training an image fusion model method according to any one of claims 1 to 7.