CN116452480A

CN116452480A - Method for fusing infrared and visible light images

Info

Publication number: CN116452480A
Application number: CN202310397002.8A
Authority: CN
Inventors: 朱华毅; 潘细朋; 刘振丙
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2023-04-14
Filing date: 2023-04-14
Publication date: 2023-07-18

Abstract

The invention belongs to the technical field of image fusion, and particularly relates to an infrared and visible light image fusion method based on combination of convolution and a self-attention mechanism; the method comprises three stages of an encoder, a fusion strategy and a decoder: in the encoder stage, the optical image and the infrared image can be respectively input into a module based on combination of a convolution and a self-attention mechanism to obtain image characteristics; in the stage of fusion strategy, the obtained features are fused on a Y channel to obtain a fused image; and finally reconstructing the fusion image through a cascade decoder to obtain a final infrared and visible light fusion image. According to the invention, an image fusion model is established to obtain an infrared and visible light fusion image, and the image not only contains obvious targets and rich texture information, but also is beneficial to completion of advanced visual tasks.

Description

Method for fusing infrared and visible light images

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a method for fusing infrared and visible light images.

Background

With the rapid development of sensing hardware, multimodal imaging has attracted considerable attention in a wide range of applications, such as night monitoring and autopilot. In particular, the combination of infrared and visible light sensors has significant advantages for subsequent intelligent processing. Visible light imaging provides rich detail with high spatial resolution under well-defined illumination conditions, while infrared sensors capture ambient temperature changes emitted by objects, highlighting thermal target structures that are insensitive to illumination changes. However, visible light is difficult to capture objects in dark conditions, while infrared images are often accompanied by blurred details with lower spatial resolution. Fusing visually attractive images or supporting higher-level visual tasks (e.g., segmentation, tracking, and detection) is challenging due to their apparent differences in appearance. Therefore, it is important to design an infrared and visible image fusion method of an meridian-level.

Disclosure of Invention

The invention provides a method for fusing infrared and visible light images, which aims at the problem that an image shot by a single-mode sensor cannot effectively and comprehensively describe an imaging scene.

The technical scheme adopted by the invention is as follows:

a method of infrared and visible light image fusion, comprising the steps of:

s1: carrying out data preprocessing on an infrared and visible light image data set, firstly pairing an infrared image and a visible light image, then carrying out scale transformation on the infrared image and the visible light image, and finally carrying out separation on color channels on the images;

s2: constructing an encoder to realize the extraction of infrared image features and visible light image features;

s3: fusing the features obtained in the step S2 to obtain a fused image;

s4: constructing a decoder, and reconstructing the fusion image obtained in the step S3 to finally obtain a fusion image recovered from the fusion characteristics;

s5: and (3) judging the quality of the fusion image through the designed loss function, and calculating the loss of the fusion image obtained in the step S4, wherein the encoder and the decoder are continuously trained to obtain model parameters which minimize the loss function.

Further, the step S2 includes:

s21: the input infrared image I _ir And visible light image I _vi Performing feature projection by using three 1x1 convolutions respectively to obtain 3 groups of rich intermediate features, wherein the three groups of intermediate features are fully connected and then used as input of a convolution module and also used as input of a self-attention mechanism module consisting of query, key and value respectively;

s22: a convolution and self-attention mechanism combining module (ACmod) designed to extract local and global features, whereinAnd->The outputs of the convolution module and the self-attention mechanism module are represented respectively, the final output results thereof +.>Can be expressed as: f (f) _out ＝αf _a +βf _c ；

S23: inputting the output result into a depth feature extraction module F composed of two convolution layers ² _conv The characteristics of its outputCan be expressed as: f (F) _out ＝F ² _conv (f _out )；

S24: respectively infrared images to be paired and canThe visible light passes through the decoder module to finally obtain the output characteristics of the infrared imageAnd output characteristics of visible light image +.>

Wherein H represents the height of the image, W represents the width of the image, and C _in Representing the channel of the input image, C _out The channels representing the output image, alpha and beta represent weight factors that are learnable for balancing the self-attention and convolved output.

Further, the step S3 includes:

s31: fusing the output features of the step S2, wherein the process can be expressed as: i _fuse ＝C _fuse (f _ir ,f _vi )；

Therein, whereinRepresenting the fused image, C _fuse () Representing a fusion strategy, i.e. concatenation in the channel dimension.

Further, the step S4 includes:

s41: a decoder structure for image reconstruction is designed, which consists of 4 convolutions in series, denoted as F ⁴ _r ()；

S42: and (3) fusing the image obtained in the step (S3)As input to the decoder, the final output result F is obtained _fuse ∈R ^H×W×3 I.e. fusing images, the process can be expressed as: f (F) _fuse ＝F ⁴ _r (I _fuse )。

Further, the step S5 includes:

s51: the penalty for designing a fused image includes texture penalty L _tex And loss of strength L _int Fused image thereofOverall loss L of (2) _a Can be expressed as: l (L) _a ＝L _int +γL _tex ；

Where γ is a weight factor, the final goal is to obtain model parameters of the decoder and encoder such that the overall loss L _a Minimum;

the texture penalty can be expressed as:

the loss of intensity can be expressed as:

represents gradient operations, represent L ₁ A norm;

s52: the final network model parameters are obtained by training with the infrared and visible image pairs of the training set so that the total loss is minimized.

The invention has the following beneficial effects:

(1) The invention provides a double-branch feature extraction network based on combination of convolution and self-attention mechanisms, and the deep learning model utilizes combination of convolution and self-attention mechanisms in a feature extraction module of infrared and visible light images, so that the features of a source image can be better extracted, and information loss of fusion images on global features and local gradient features can be prevented.

(2) The method has excellent running speed and can be easily deployed as a real-time preprocessing module of an advanced visual task. According to the method, the infrared and visible light images are fused to obtain the fused image combining the two modal information, so that the method can be beneficial to improving the performance of advanced visual tasks (such as target tracking, target detection and semantic segmentation) and obtaining a better effect.

(3) The invention provides a new work flow for fusing an infrared image and a visible light image, namely, firstly, projecting and upscaling (more characteristic information of a source image is acquired) the input infrared image and the visible light image, then, sequentially taking the input infrared image and the visible light image as the input of convolution and self-attention to perform characteristic extraction, and finally, obtaining a final fused image through convolution. The flow fully combines the two modules with good effects in the deep learning model, namely the rolling and the self-attention mechanism, and fully obtains the advantages of the rolling and the self-attention mechanism to fuse to obtain a better fused image.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of an implementation of the present invention;

FIG. 3 is a schematic diagram of a fusion result in a daytime scene environment obtained by the processing of the present invention;

fig. 4 is a schematic diagram of a fusion result obtained by the processing of the present invention in a night scene environment.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Embodiment 1:

a method of infrared and visible light image fusion as shown in fig. 1 and 2, comprising the steps of:

s3: fusing the features obtained in the step S2 to obtain a fused image;

Wherein, the step S2 includes:

s21: the input infrared image I _ir And visible light image I _vi And respectively carrying out feature projection by using 3 convolutions of 1x1 to obtain 3 groups of rich intermediate features. On one hand, the three groups of features are input into a convolution module after full connection, and on the other hand, the three groups of features are input into a self-attention mechanism module consisting of a query, a key and a value respectively.

S22: inputting the intermediate features of step S21 into a convolution and self-attention mechanism combining module (ACmod) for extracting local features and global features, whereinAnd->The outputs of the convolution module and the self-attention mechanism module are represented respectively, the final output results thereof +.>Can be expressed as:

f _out ＝αf _a +βf _c

s23: inputting the output result into a depth feature extraction module F composed of two convolution layers ² _conv The characteristics of its outputCan be expressed as:

F _out ＝F ² _conv (f _out )

s24: respectively passing the paired infrared image and visible light through the decoder module to obtainOutput features of an incoming infrared imageAnd output characteristics of visible light image +.>

Wherein, the step S3 includes:

s31: fusing the output features of the step S2, wherein the process can be expressed as:

I _fuse ＝C _fuse (I _ir ,F _vi )

therein, whereinRepresenting the fused image, C _fuse (. Cndot.) represents a fusion strategy, i.e., concatenation in the channel dimension.

Wherein the step S4 includes:

s41: a decoder structure for image reconstruction is designed, which consists of 4 convolutions in series, denoted as F ⁴ _r (·)。

S42: and (3) fusing the image obtained in the step (S3)As input to the decoder, the final output result F is obtained _fuse ∈R ^H×W×3 I.e. fusing images, the process can be expressed as:

wherein, the step S5 includes:

s51: designing a fused imageThe penalty of (1) includes texture penalty L _tex And loss of strength L _int Overall loss L of its fused image _a Can be expressed as:

L _a ＝L _int +γL _tex

where γ is a weight factor, the final goal is to obtain model parameters of the decoder and encoder such that the overall loss L _a Minimum.

The texture penalty can be expressed as:

the loss of intensity can be expressed as:

representing gradient operations, |·| represents L ₁ Norm, |·| represents absolute value arithmetic operations.

To verify the feasibility and effectiveness of the method of the invention, experiments were performed.

Fig. 3 shows a schematic diagram of a fusion result in a daytime scene environment obtained by the processing of the method of the invention, and fig. 4 shows a schematic diagram of a fusion result in a night scene environment obtained by the processing of the method of the invention.

The results of fig. 3 and fig. 4 show that the invention can well preserve the recognition target of the infrared image, fuses the characters in the image, has high contrast and prominent outline, and is beneficial to visual observation, and secondly, the fusion result of the invention can preserve abundant texture details from the visible light image, which can better accord with the visual system of human beings.

Overall, the fusion results obtained by the present invention have prominent character targets, clearer edge contours, and preserve rich texture details.

While the invention has been described in terms of preferred embodiments, it is not intended to be limited thereto, but rather to enable any person skilled in the art to make various changes and modifications without departing from the spirit and scope of the present invention, which is therefore to be limited only by the appended claims.

Claims

1. A method for fusing infrared and visible light images, comprising the steps of:

s3: fusing the features obtained in the step S2 to obtain a fused image;

2. The method of fusing an infrared and visible light image of claim 1, wherein said step S2 comprises:

s21: the input infrared image I _ir And visible light image I _vi The feature projection is carried out by using three 1x1 convolutions respectively to obtain 3 groups of rich intermediate features, wherein the three groups of intermediate features are fully connected and then used as input of a convolution module and are also used as query, key and value respectivelyAn input of a constituent self-attention mechanism module;

S24: respectively passing the paired infrared image and visible light through the decoder module to obtain the output characteristics of the infrared imageAnd output characteristics of visible light image +.>

3. The method of fusing infrared and visible light images as set forth in claim 1, wherein the step S3 includes:

s31: fusing the output features of the step S2, wherein the process can be expressed as: i _fuse ＝C _fuse (f _ir ，f _vi )；

4. The method of fusing an infrared and visible light image of claim 1, wherein said step S4 comprises:

5. The method of fusing an infrared and visible light image of claim 1, wherein said step S5 comprises:

s51: the penalty for designing a fused image includes texture penalty L _tex And loss of strength L _int Overall loss L of its fused image _a Can be expressed as: l (L) _a ＝L _int +γL _tex ；

the texture penalty can be expressed as:

the loss of intensity can be expressed as:

representing gradient operations, |·| represents L ₁ A norm;