CN114782298A - Infrared and visible light image fusion method with regional attention - Google Patents

Infrared and visible light image fusion method with regional attention Download PDF

Info

Publication number
CN114782298A
CN114782298A CN202210434625.3A CN202210434625A CN114782298A CN 114782298 A CN114782298 A CN 114782298A CN 202210434625 A CN202210434625 A CN 202210434625A CN 114782298 A CN114782298 A CN 114782298A
Authority
CN
China
Prior art keywords
image
fusion
infrared
encoder
visible light
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210434625.3A
Other languages
Chinese (zh)
Other versions
CN114782298B (en
Inventor
杜友田
蓝宇
王航
王雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202210434625.3A priority Critical patent/CN114782298B/en
Publication of CN114782298A publication Critical patent/CN114782298A/en
Application granted granted Critical
Publication of CN114782298B publication Critical patent/CN114782298B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The infrared and visible light image fusion aims to utilize information complementarity and fuse information such as heat radiation, texture details and the like in the same scene, so that the content of the fused image is more comprehensive and clear, and the fusion method is beneficial to human eye observation, subsequent tasks and the like. The image fusion steps are typically feature extraction, feature fusion and image reconstruction. The invention provides a fusion method with regional attention. Firstly, extracting high-dimensional features by using an encoder, then designing fusion strategy fusion features with attention of a salient region, and finally reconstructing an image by using a decoder. The invention aims to solve the problem of image fusion in a scene with insufficient illumination. The result shows that the method can fully reserve the good texture details of the visible light image and supplement the content of the underexposed area by utilizing the infrared image. In addition, due to the fact that the salient regions are concerned, the highlighted regions in the source images still keep highlighted in the fused images, and the good effect of complementary advantages of the infrared images and the visible light images is achieved.

Description

Infrared and visible light image fusion method with regional attention
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to an infrared and visible light image fusion method with regional attention.
Background
With the steady development of hardware and software industries, the capabilities of collecting information by using sensors and transmitting and processing information are gradually enhanced. In this context, vision-based sensors are widely used because they can provide rich environmental information. A single type of sensor has only information characteristics characterizing a certain aspect, and cannot meet the requirement of comprehensive description of a monitored environment, so that a multi-sensor system starts to get more and more attention and applications. The multi-source sensor imaging system completely fills the gap of insufficient image expression capability of a single sensor. At present, the image fusion technology has great application value in the fields of remote sensing detection, safety navigation, medical image analysis, anti-terrorism inspection, environment protection, traffic monitoring, clear image reconstruction, disaster detection and forecast, and particularly in the fields of computer vision and the like.
For visual multisource sensor systems, infrared and visible light images can be acquired by relatively simple equipment, most typically by fusion of infrared and visible light images. Due to different imaging mechanisms of the two images, the visible image generally has higher spatial resolution and image contrast, is suitable for human visual perception, but is extremely easily influenced by severe conditions, such as special climates of insufficient brightness, heavy rain, haze and the like. However, the infrared image has better scene anti-jamming capability, and can be more remarkably displayed for objects with higher temperature than the environment, such as pedestrians and the like. But generally the infrared image resolution is low and the image detail is poor. The two images are fused, so that various information can be displayed on one image, the target is highlighted, and the image has richer details and capability of resisting a severe environment compared with a single image. Therefore, the infrared and visible light image fusion aims to perform detailed fusion on the infrared and visible light images in the same scene, and simultaneously reserve the highlight target with thermal radiation information in the infrared image and the background texture detail information with high resolution in the visible light image, so that the finally fused image has more information richness, and the method is more favorable for human eye identification and automatic machine detection, human observation aesthetics and subsequent image processing of a computer.
The prior art and the defects thereof.
The general steps of image fusion are feature extraction, feature fusion and feature reconstruction, wherein the feature reconstruction is the reverse process of the feature extraction, and the feature extraction and fusion are the two most key elements in the image fusion. Among the conventional methods, multi-scale transform (MST) is the most common image fusion method, and has the main characteristics of accurately representing the spatial structure of an image and having consistency of space and frequency spectrum. And a great many multi-scale transforms have been proposed, such as pyramid transform, wavelet transform, contour transform and related variants. In addition, Sparse Representation (SR) based fusion algorithms, and subspace-based methods such as principal component analysis and independent component analysis, etc., have also been proposed.
In recent years, deep learning has demonstrated the most advanced performance in various fields, and has also been successfully applied to image fusion. These algorithms can be roughly classified into three types, Auto Encoder (AE) based methods, CNN based methods, GAN based methods. Li et al propose a simple self-encoder (AE) fusion architecture that includes an encoder, a fusion layer, and a decoder. Later they also increased the complexity of the encoder, and proposed a nested fusion method based on an auto-encoder to obtain a more comprehensive feature fusion. The disadvantage of the above method is that the fusion performance is limited by manually designing the fusion strategy. Zhang et al developed a general image fusion framework through a general network structure, i.e., a feature extraction layer, a fusion layer, and an image reconstruction layer, and learned feature extraction, feature fusion, and image reconstruction under the guidance of a class of complex loss functions. Such methods only focus on fusion at the global level and do not highlight the target region of interest. Ma et al creatively introduce GAN into the image fusion community, which utilizes a discriminator-forced generator to synthesize a fused image with rich texture. They also introduce loss of detail and loss of edge enhancement in order to improve the quality of detail information and sharpen the edges of hot objects. Due to the difficulty in GAN training, this approach fails to achieve good fusion quality and also fails to highlight significant information.
Disclosure of Invention
In order to overcome the above drawbacks of the prior art, an object of the present invention is to provide an infrared and visible image fusion method with regional attention, which is used to solve the image fusion problem of infrared and visible images in a scene with insufficient illumination. The method provided by the invention can fully exert the advantages of the infrared and visible light images in the aspect of scene representation. By extracting and fusing the high-dimensional characteristics of the images, the infrared thermal radiation information and the texture information of the visible light images can be fully fused under the scene of insufficient illumination. Moreover, the regional attention module in the fusion network can focus on a significant region in the high-dimensional features, such as a highlight target of the infrared image and a region with sufficient exposure of the visible light image, and increase the pixel intensity of the part in the fusion to realize the image fusion with regional attention, thereby realizing the complementary advantages of the infrared image and the visible light image.
In order to achieve the purpose, the invention adopts the technical scheme that:
an infrared and visible image fusion method with regional attention, comprising:
step 1, training a self-Encoder (Auto Encoder), wherein the self-Encoder comprises an Encoder and a decoder;
step 1.1: reading an image I in the training set in an RGB format, adjusting the size of the image, and converting the image I into a YCbCr color space;
step 1.2: luminance channel I of imageYInputting the data into an encoder to obtain a high-dimensional characteristic diagram F;
step 1.3: inputting the high-dimensional feature map F into a decoder, and outputting a luminance channel map OY
Step 1.4: calculating I from the loss functionYAnd OYThen optimizing gradient and back propagation, and updating model parameters of a self-encoder;
step 1.5: repeating the steps 1.1 to 1.4 until the iteration times on the whole training set reach a set threshold value, and obtaining a trained self-encoder;
and 2, step: making a fused image training set
Acquiring an infrared and visible light image pair for training, performing sub-image cutting to expand a data set, wherein the cutting size is consistent with the image size adjusted in the step 1, and obtaining a fused image training set;
and 3, step 3: training converged networks
Step 3.1: image pair (I) of infrared and visible light in a training set of fused imagesR,IV) Respectively converting into YCbCr color space, respectively extracting respective luminance channel maps to obtain (I)RY,IVY);
Step 3.2: respectively combining (I)RY,IVY) Inputting the coder trained in the step 1, and calculating to obtain a feature map (F)R,FV);
Step 3.3: will (F)R,FV) Connecting in characteristic dimensions, inputting into a fusion network, and calculating to obtain a fusion characteristic diagram FF
Step 3.4: f is to beFDecoding by an input decoder to obtain a fused image O of a brightness channelFY
Step 3.5: calculating a loss value according to a loss function, then optimizing a gradient and performing back propagation, and updating model parameters of the fusion network;
step 3.6: repeating the steps 3.1 to 3.5 until the number of times of calculation on the whole fusion image training set reaches a set value, and obtaining a trained fusion network;
step 4, acquiring a fusion image
Step 4.1: obtaining a fusion image O of a brightness channel by the infrared and visible light image pair to be fused according to the method of the steps 3.1 to 3.4FY
Step 4.2: mixing O withFYAnd the CbCr channel of the visible image is connected in characteristic dimension to obtain an image in YCbCr format, and then the image is converted into RGB format to obtain a fusion image.
In one embodiment, the encoder has four convolutional layers, with dense connections; the decoder is directly connected by adopting four convolutional layers.
In one embodiment, an encoder and a decoderThe convolution kernel size of the device is 3 multiplied by 3, step is 1, padding is 1, and ReLu activation function is adopted. In step 1.2, the input size is 256 × 256 × 1, the size of the obtained high-dimensional feature map F is 256 × 256 × 128, and in step 1.3, the luminance channel map OYThe size is 256 × 256 × 1.
In one embodiment, after step 1.5, the training data is changed to test data, and steps 1.1 to 1.3 are performed to obtain OYThen adding OYConnecting with the CbCr channel in the step 1.1 in characteristic dimension to obtain an image in YCbCr format, and converting the image into RGB format to obtain an output image O; and subjectively verifying whether the O is consistent with the I.
In one embodiment, the calculation steps of step 3.3 are as follows:
(1) will (F)R,FV) The global information fusion feature map F is obtained by calculation of the convolution layers Conv _1, Conv _2 and Conv _3 through feature dimension connectionF_0
(2) Respectively combining (F)R,FV) Inputting the attention module RAB of the same region, and calculating to obtain an attention feature map (M)R,MV) (ii) a Will (M)R,MV) Connected in characteristic dimension, and input into convolutional layer Conv _ Att to obtain fusion attention characteristic diagram MRV
(3) Computing a fused feature map FF=FF_0+MRVI.e. the corresponding position pixel values are added.
In one embodiment, step 1.4 and step 3.5 both adopt Adam optimizer to optimize the gradient, wherein step 3.5, the model parameters of the self-encoder are fixed, and only the model parameters of the fusion network are updated.
In one embodiment, in step 2, an image with a scene with insufficient illumination and a significant target is selected from the public data set TNO to form a training set and a test set, and the training set is expanded offline in a manner of performing sub-image cropping on the original infrared and visible light images, wherein the sub-image cropping size is 256 × 256, and the cropping step size is 16.
Compared with the prior art, the invention has the beneficial effects that:
firstly: under the scene of insufficient illumination, the texture information of the visible light image and the heat radiation information of the infrared image can be fully fused. The encoder can fully extract the high-dimensional features of the image after training, and the deep fusion of the features of all dimensions in the fusion process is guaranteed due to the calculation loss of the high-dimensional features.
Secondly, on the basis of fusing global contents, attention can be paid to a region with prominent highlight in the source image, and the region with prominent highlight in the fused image can still be kept highlighted. The fusion network comprises two fusion paths, namely global fusion and salient region fusion. The region attention module can extract a salient region of the image from multiple scales, and the two fusion path results are added, so that the salient region has a higher-intensity brightness value, and the effect of highlighting is achieved.
Thirdly, the fused image has good contrast and definition. In training, the structure loss is measured from three aspects of gray scale, contrast and structure similarity. Gradient loss can enable the fused image to have good image texture details, and definition is increased. In addition, only the strategy of fusing the image brightness channel enables the invention to process both gray level images and color images. As the CbCr channel of the visible light image does not participate in calculation, the color of the visible light image can be well restored by the fusion result.
Drawings
Figure 1 gives an overall block diagram of the scheme. The input is the infrared and visible images to be fused and the output is the fused image. The network structure consists of an Encoder Encoder, an Attention fusion network Attention fusion Net and a Decoder Decoder. In the dashed box it is noted that the penalty function consists of three parts, feature penalty loss, structure similarity loss, and gradient penalty loss.
Fig. 2 gives the structure of the self-encoder and the composition of the loss function required for training.
Fig. 3 gives the structure of the attention fusion network. Input as a feature map (F)R,FV) The output is a fusion feature map FF
Fig. 4 gives the network structure of the regional attention module. A feature diagram F is input, and an attention diagram M is output.
Fig. 5 gives three sets of fused image cases. The box is labeled the fusion effect of the salient object.
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the drawings and examples.
The visible light sensor can capture an image which is clear enough and conforms to the observation habit of human eyes under the condition of sufficient light. The field that best highlights the infrared and visible image fusion advantages is often scenes with insufficient lighting. How to make the fusion result compensate the disadvantage of underexposure and highlight the interested target, so as to be more beneficial to human eye observation and subsequent advanced tasks, is a current problem.
Most of the previous fusion methods design a fusion strategy from a global perspective, and focus on fusion of contents such as image texture details, and for a significant target originally in an infrared image, such as a person, a car and the like, brightness is reduced due to introduction of components of a visible light image in the fused image. Some methods, although introducing attention to salient objects, require additional algorithms to obtain in advance a binary map of the object segmentation. On the other hand, the existing method is insufficient for researching nighttime scenes with the widest infrared imaging application.
Based on this, the invention provides an infrared and visible light image fusion method with regional attention, the overall structure is as shown in fig. 1, and the steps are as follows:
step 1, training an Auto Encoder (Auto Encoder). The self-Encoder has a structure as shown in fig. 2, and includes an Encoder and a Decoder. Each rectangle in the figure represents a layer, and both the Encoder and the Decoder are composed of a convolutional layer and an active layer. The loss includes a structural loss of ssim loss and a content loss of pixel loss. In this embodiment, the Encoder has four convolution layers, and adopts dense connection; the Decoder is directly connected by four convolutional layers, the convolutional kernel size of the convolutional layer is 3 multiplied by 3, step is 1, and padding is 1. The activation layer uses the ReLu activation function. The parameters of each layer of the Encoder and the Decoder are specifically set as follows:
Layer Encoder Decoder
L1 Conv(I1,O32,K3,S1,P1),ReLu Conv(I128,O64,K3,S1,P1),ReLu
L2 Conv(I32,O32,K3,S1,P1),ReLu Conv(I64,O32,K3,S1,P1),ReLu
L3 Conv(I64,O32,K3,S1,P1),ReLu Conv(I32,O16,K3,S1,P1),ReLu
L4 Conv(I96,O32,K3,S1,P1),ReLu Conv(I16,O1,K3,S1,P1),ReLu
step 1.1, reading an image I in the training set by using an imread function of OpenCV, wherein the read image I is in an RGB format, and the size of the read image I is adjusted to 256 × 256 × 3. And then from RGB to YCbCr color space, the conversion method can utilize the library function cvtColor of OpenCV. Finally, each pixel of the image is divided by 255, and the pixel value is normalized to [0,1], so that the input image is obtained.
Step 1.2: luminance channel I of imageYThe input Encoder encorder has an input size of 256 × 256 × 1, resulting in a high-dimensional feature map F with a size of 256 × 256 × 128.
Step 1.3: inputting the high-dimensional feature map F into a Decoder to obtain an output brightness channel map OYThe size is 256 × 256 × 1.
Step 1.4: calculating I from the loss functionYAnd OYThe loss function is defined as:
Figure BDA0003612506110000071
wherein, mu (1-SSIM (O)Y,IY) Is the structural loss, and SSIM (. cndot.) is the structural similarity function.
Figure BDA0003612506110000072
For content loss, i.e. calculating IYAnd OYThe euclidean distance of (c). Mu is a hyperparameter used to balance the two losses. H and W are the height and width of the image, respectively.
Step 1.5: and optimizing the gradient in a mode of an Adam optimizer and the like, reversely propagating, and updating the model parameters of the self-encoder.
Step 1.6: steps 1.1 to 1.5 are repeated. And obtaining the trained self-encoder until the iteration times on the whole training set reach a set threshold value.
In the present embodiment, an open source color image data set MS-COCO is used, and 80000 images are included in total. The algorithm is implemented with python and pytorch, based on GPU training of a block NVIDIA TITAN V, epoch set to 2, batch size set to 16, and hyperparameter μ set to 1.
Step 1.7: to verify the training, the training data can be changed into test data, and steps 1.1 to 1.3 are performed to obtain OY. Then adding OYConnecting with the CbCr channel image in the step 1.1 in characteristic dimension to obtain an image in YCbCr format, and then converting into RGB format to obtain an output image O; subjectively verifying whether the output image O coincides with the input image I.
And 2, making a fusion image training set and a test set.
From the disclosed infrared and visible image fusion data set TNO, images containing low-light scenes and having significant targets are selected to constitute a training set and a test set. This example picks 41 pairs of darker bright images as the training set and 25 pairs as the test set. Then, the training set is expanded in an off-line mode, wherein the expansion mode is as follows: and (3) performing sub-image cropping on the original infrared and visible light images, wherein the size of each sub-image is consistent with the size of the image adjusted in the step (1), namely 256 multiplied by 256, and the cropping moving step size is 16, so that 13940 pairs of infrared and visible light images are obtained finally.
And 3, step 3: and training the fusion network.
Step 3.1: reading pairs of infrared and visible light images in a fused image training set (I)R,IV) Then, the same operations as in step 1.1 are performed, i.e. conversion to YCbCr color space and respective luminance channel maps are extracted to obtain (I)RY,IVY)。
Step 3.2: respectively will (I)RY,IVY) Inputting the Encoder Encoder trained in the step 1, and calculating to obtain a characteristic diagram (F)R,FV);
Step 3.3: will (F)R,FV) Connecting in characteristic dimensions, inputting into a fusion network, and calculating to obtain a fusion characteristic diagram FF. The structure of the fused layer is shown in FIG. 3. The fusion process includes two paths, namely global information fusion and attention feature map fusion, namely a global information fusion network and an attention feature map fusion network. The former contains three convolutional layers, Conv _1, Conv _2 and Conv _3, and the latter contains a regional attention module RAB and a convolutional layer Conv _ Att, in this embodiment, the network layer parameters may be set as:
Figure BDA0003612506110000081
Figure BDA0003612506110000091
the calculation steps in the converged network are as follows:
(1) will (F)R,FV) Connecting in feature dimensions, and then calculating by Conv _1, Conv _2 and Conv _3 to obtain a global information fusion feature map FF_0
(2) An attention feature map is calculated.
Respectively combining (F)R,FV) Input into a regional attention module RAB to obtain an attention profile (M)R,MV) The size is 256 × 256 × 128. Note that the same RFB module is used for separate calculations here. The structure of the regional attention module RAB is shown in fig. 4. The method comprises maximum pooling, global average pooling, a full connection layer, an activation layer, an upsampling operation and a standardization operation.
Figure BDA0003612506110000092
Representing the multiplication of the weights and the feature map.
Figure BDA0003612506110000093
Representing a feature map addition. To extract the feature map weights from multiple scales, the module uses three kinds of max-pooling kernels, respectively.
The specific calculation steps are as follows: inputting a characteristic diagram F, and performing maximum pooling to obtain the characteristic diagram FsOf a size of
Figure BDA0003612506110000094
Where H and W represent image sizes, in this example, both H and W are 256, s is 1, 2, 4, respectively, and the pooling kernel sizes are 1 × 1, 2 × 2, 4 × 4, respectively. Then, for fsAnd performing global average pooling operation to obtain vectors with dimensions of 1 × 1 × 128. Then, a full connection layer and an activation layer are connected to finally obtain a weight vector omegasThe dimension is 1 × 1 × 128. Weight value of kth dimension feature
Figure BDA0003612506110000095
Representation for measuring the k characteristic layers
Figure BDA0003612506110000096
Of the cell. On the other hand, for F, to obtain a feature map of the same size as FsPerform an upsampling operation and then convert ω tosMultiplying the feature map after up-sampling by the corresponding dimension to obtain a weighted feature map
Figure BDA0003612506110000097
Wherein k represents a characteristic diagram of the k-th dimension, Hup(. cndot.) represents an upsampling function. Finally, adding the feature maps of the three scales, and then normalizing to obtain an attention feature map with dimensions H multiplied by W multiplied by 128:
Figure BDA0003612506110000098
where σ (·) denotes the normalization operation.
(3) Will (M)R,MV) Connected in characteristic dimension, and input into convolutional layer Conv _ Att to obtain fusion attention characteristic diagram MRVThe size is H × W × 128.
(4) Computing a final fused feature map FF=FF_0+MRVI.e. the corresponding position pixel values are added.
Step 3.4: f is to beFDecoding by an input Decoder to obtain a fused image O of a brightness channelFY
Step 3.5: and calculating a loss value according to the loss function L, optimizing the loss gradient by using an Adam optimizer and the like, reversely propagating, and updating the model parameters of the fusion network.
The loss function L comprises three parts, the structure loss LssimCharacteristic loss LpixelAnd gradient loss LgradientThe calculation formula is as follows:
L=ωLssim+λLpixel+Lgradient
wherein, omega and lambda are hyper-parameters used for balancing various losses.
Structural loss LssimThe calculation formula is as follows:
Lssim=δ(1-SSIM(IRY,OY))+(1-δ)(1-SSIM(IVY,OY))
where δ is a hyperparameter used to balance two loss values.
The characteristic loss calculation formula is as follows:
Figure BDA0003612506110000101
wherein eta is a hyper-parameter, and the size of the characteristic diagram is H multiplied by W multiplied by C. I | · | purple wind2Representing the euclidean distance of the feature map.
Gradient loss LgradientThe calculation formula is as follows:
Figure BDA0003612506110000102
wherein,
Figure BDA0003612506110000103
and representing Sobel gradient calculation operation used for measuring fine grain texture information of the image.
Step 3.6: and (5) repeating the steps 3.1 to 3.5 until the iteration times reach a set threshold value on the whole fusion image training set, thereby obtaining a trained fusion network. In this example, a GPU based on a block of NVIDIA TITAN V was trained, using an Adam optimizer, with a batch size and epoch of 4 and 2, respectively. The initial learning rate is set to 1 × 10-4The hyperparameters ω, λ, δ, η of the loss function are set to 1, 2.7, 0.5, respectively.
And 4, step 4: and inputting test data to obtain a fusion image.
Step 4.1: obtaining a fused image O of a brightness channel by the method of the steps 3.1 to 3.4 according to the test data or the infrared and visible light image pair to be fusedFY
And 4.2: is prepared from OFYAnd connecting with a CbCr channel of the visible image in a characteristic dimension to obtain an image in a YCbCr format, and converting the image into an RGB format to obtain a fused image.
Three sets of fused images were selected from the test, as shown in fig. 5. As can be seen from the figure, the fused image fuses the texture details of the visible light image, as shown by the dashed line square in the figure, and the overall brightness of the image is improved to a certain extent. Meanwhile, only the salient region in the infrared image is well embodied in the fused image, as shown by the solid line box in the figure.

Claims (7)

1. A method for fusing infrared and visible images with regional attention, comprising:
step 1, training a self-encoder, wherein the self-encoder comprises an encoder and a decoder;
step 1.1: reading an image I in the training set in an RGB format, adjusting the size of the image, and converting the image I into a YCbCr color space;
step 1.2: luminance channel I of imageYInputting the data into an encoder to obtain a high-dimensional characteristic diagram F;
step 1.3: inputting the high-dimensional feature map F into a decoder, and outputting a luminance channel map OY
Step 1.4: calculation of I from the loss functionYAnd OYThen optimizing gradient and back propagation, and updating model parameters of a self-encoder;
step 1.5: repeating the step 1.1 to the step 1.4 until the iteration times on the whole training set reach a set threshold value, and obtaining a trained self-encoder;
step 2: making a fused image training set
Acquiring an infrared and visible light image pair for training, performing sub-image cutting to expand a data set, wherein the cutting size is consistent with the image size adjusted in the step 1, and obtaining a fused image training set;
and 3, step 3: training converged networks
Step 3.1: imaging the infrared and visible light images in the fused image training set (I)R,IV) Respectively converting into YCbCr color space, respectively extracting respective luminance channel maps to obtain (I)RY,IVY);
Step 3.2: respectively combining (I)RY,IVY) Inputting the coder trained in the step 1, and calculating to obtain a feature map (F)R,FV);
Step 3.3: will (F)R,FV) Connecting in characteristic dimensions, inputting into a fusion network, and calculating to obtain a fusion characteristic diagram FF
Step 3.4: f is to beFDecoding by an input decoder to obtain a fused image O of a brightness channelFY
Step 3.5: calculating a loss value according to a loss function, then optimizing a gradient and performing back propagation, and updating model parameters of the fusion network;
step 3.6: repeating the steps 3.1 to 3.5 until the number of times of calculation on the whole fusion image training set reaches a set value, and obtaining a trained fusion network;
step 4, obtaining a fusion image
Step 4.1: obtaining a fusion image O of a brightness channel by the infrared and visible light image to be fused according to the method of the steps 3.1 to 3.4FY
Step 4.2: mixing O withFYAnd the CbCr channel of the visible image is connected in the characteristic dimension to obtain an image in a YCbCr format, and then the image is converted into an RGB format to obtain a fusion image.
2. The method of claim 1, wherein the encoder has four convolutional layers, with dense connections; the decoder is directly connected by adopting four convolutional layers; in the encoder and the decoder, the convolution kernel size is 3 × 3, step is 1, padding is 1, and the ReLu activation function is adopted.
3. The method for fusing the infrared and visible light images with regional attention according to claim 2, wherein in the step 1.2, the input size is 256 x 1, the size of the obtained high-dimensional feature map F is 256 x 128, and in the step 1.3, the luminance channel map OYThe size is 256 × 256 × 1.
4. The method for fusing local attention infrared and visible light images as claimed in claim 1, wherein after step 1.5, the training data is changed into the test data, and steps 1.1 to 1.3 are performed to obtain OYThen adding OYConnecting with the CbCr channel in the step 1.1 in a characteristic dimension to obtain an image in a YCbCr format, and converting the image into an RGB format to obtain an output image O; and subjectively verifying whether the O is consistent with the I.
5. The method for fusing infrared and visible images with regional attention according to claim 1, wherein the calculation step of step 3.3 is as follows:
(1) will (F)R,FV) The global information fusion feature map F is obtained by calculation of the convolution layers Conv _1, Conv _2 and Conv _3 through feature dimension connectionF_0
(2) Respectively mixing (F)R,FV) Inputting the same regional attention module RAB to calculate attention feature map (M)R,MV) (ii) a Will (M)R,MV) Connected in characteristic dimension, and input into convolutional layer Conv _ Att to obtain fusion attention characteristic diagram MRV
(3) Computing a fused feature map FF=FF_0+MRVI.e. the corresponding position pixel values are added.
6. The method of claim 1, wherein step 1.4 and step 3.5 both adopt Adam optimizer to optimize gradients, and wherein step 3.5 fixes model parameters of the self-encoder and updates only model parameters of the fusion network.
7. The method for fusing infrared and visible images with regional attention according to claim 1, wherein in step 2, the images containing the scene with insufficient illumination and having significant targets are selected from the public data set TNO to form a training set and a test set, and the training set is expanded off-line in a way of sub-image clipping on the original infrared and visible images, wherein the sub-image size is 256 × 256 and the clipping step size is 16.
CN202210434625.3A 2022-04-24 2022-04-24 Infrared and visible light image fusion method with regional attention Active CN114782298B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210434625.3A CN114782298B (en) 2022-04-24 2022-04-24 Infrared and visible light image fusion method with regional attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210434625.3A CN114782298B (en) 2022-04-24 2022-04-24 Infrared and visible light image fusion method with regional attention

Publications (2)

Publication Number Publication Date
CN114782298A true CN114782298A (en) 2022-07-22
CN114782298B CN114782298B (en) 2024-03-12

Family

ID=82433252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210434625.3A Active CN114782298B (en) 2022-04-24 2022-04-24 Infrared and visible light image fusion method with regional attention

Country Status (1)

Country Link
CN (1) CN114782298B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311186A (en) * 2022-10-09 2022-11-08 济南和普威视光电技术有限公司 Cross-scale attention confrontation fusion method for infrared and visible light images and terminal
CN115423734A (en) * 2022-11-02 2022-12-02 国网浙江省电力有限公司金华供电公司 Infrared and visible light image fusion method based on multi-scale attention mechanism
CN116363036A (en) * 2023-05-12 2023-06-30 齐鲁工业大学(山东省科学院) Infrared and visible light image fusion method based on visual enhancement

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161201A (en) * 2019-12-06 2020-05-15 北京理工大学 Infrared and visible light image fusion method based on detail enhancement channel attention
CN111709902A (en) * 2020-05-21 2020-09-25 江南大学 Infrared and visible light image fusion method based on self-attention mechanism
CN111797779A (en) * 2020-07-08 2020-10-20 兰州交通大学 Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion
US20220044374A1 (en) * 2019-12-17 2022-02-10 Dalian University Of Technology Infrared and visible light fusion method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161201A (en) * 2019-12-06 2020-05-15 北京理工大学 Infrared and visible light image fusion method based on detail enhancement channel attention
US20220044374A1 (en) * 2019-12-17 2022-02-10 Dalian University Of Technology Infrared and visible light fusion method
CN111709902A (en) * 2020-05-21 2020-09-25 江南大学 Infrared and visible light image fusion method based on self-attention mechanism
CN111797779A (en) * 2020-07-08 2020-10-20 兰州交通大学 Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
何勇;: "基于注意力机制的眼周性别属性识别", 企业科技与发展, no. 06, 10 June 2020 (2020-06-10) *
陈潮起;孟祥超;邵枫;符冉迪;: "一种基于多尺度低秩分解的红外与可见光图像融合方法", 光学学报, no. 11, 10 June 2020 (2020-06-10) *
陈艳菲;桑农;王洪伟;但志平;: "基于视觉注意的可见光与红外图像融合算法", 华中科技大学学报(自然科学版), no. 1, 10 January 2014 (2014-01-10) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311186A (en) * 2022-10-09 2022-11-08 济南和普威视光电技术有限公司 Cross-scale attention confrontation fusion method for infrared and visible light images and terminal
CN115423734A (en) * 2022-11-02 2022-12-02 国网浙江省电力有限公司金华供电公司 Infrared and visible light image fusion method based on multi-scale attention mechanism
CN116363036A (en) * 2023-05-12 2023-06-30 齐鲁工业大学(山东省科学院) Infrared and visible light image fusion method based on visual enhancement
CN116363036B (en) * 2023-05-12 2023-10-10 齐鲁工业大学(山东省科学院) Infrared and visible light image fusion method based on visual enhancement

Also Published As

Publication number Publication date
CN114782298B (en) 2024-03-12

Similar Documents

Publication Publication Date Title
CN112949565B (en) Single-sample partially-shielded face recognition method and system based on attention mechanism
CN110909690B (en) Method for detecting occluded face image based on region generation
CN113052210B (en) Rapid low-light target detection method based on convolutional neural network
CN112507997B (en) Face super-resolution system based on multi-scale convolution and receptive field feature fusion
CN113673590B (en) Rain removing method, system and medium based on multi-scale hourglass dense connection network
CN114782298A (en) Infrared and visible light image fusion method with regional attention
CN110263705A (en) Towards two phase of remote sensing technology field high-resolution remote sensing image change detecting method
CN116071243B (en) Infrared image super-resolution reconstruction method based on edge enhancement
CN111222396A (en) All-weather multispectral pedestrian detection method
CN109034184B (en) Grading ring detection and identification method based on deep learning
CN114066831B (en) Remote sensing image mosaic quality non-reference evaluation method based on two-stage training
CN112686207A (en) Urban street scene target detection method based on regional information enhancement
CN111931857B (en) MSCFF-based low-illumination target detection method
CN113610905B (en) Deep learning remote sensing image registration method based on sub-image matching and application
CN114140672A (en) Target detection network system and method applied to multi-sensor data fusion in rainy and snowy weather scene
CN115841438A (en) Infrared image and visible light image fusion method based on improved GAN network
CN116486074A (en) Medical image segmentation method based on local and global context information coding
CN113095358A (en) Image fusion method and system
CN116645569A (en) Infrared image colorization method and system based on generation countermeasure network
CN114913337A (en) Camouflage target frame detection method based on ternary cascade perception
CN117576402B (en) Deep learning-based multi-scale aggregation transducer remote sensing image semantic segmentation method
CN114155165A (en) Image defogging method based on semi-supervision
CN116977747B (en) Small sample hyperspectral classification method based on multipath multi-scale feature twin network
CN114331931A (en) High dynamic range multi-exposure image fusion model and method based on attention mechanism
CN111832508B (en) DIE _ GA-based low-illumination target detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant