CN116452480A - Method for fusing infrared and visible light images - Google Patents

Method for fusing infrared and visible light images Download PDF

Info

Publication number
CN116452480A
CN116452480A CN202310397002.8A CN202310397002A CN116452480A CN 116452480 A CN116452480 A CN 116452480A CN 202310397002 A CN202310397002 A CN 202310397002A CN 116452480 A CN116452480 A CN 116452480A
Authority
CN
China
Prior art keywords
image
infrared
visible light
fusion
fusing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310397002.8A
Other languages
Chinese (zh)
Inventor
朱华毅
潘细朋
刘振丙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202310397002.8A priority Critical patent/CN116452480A/en
Publication of CN116452480A publication Critical patent/CN116452480A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Image Processing (AREA)

Abstract

The invention belongs to the technical field of image fusion, and particularly relates to an infrared and visible light image fusion method based on combination of convolution and a self-attention mechanism; the method comprises three stages of an encoder, a fusion strategy and a decoder: in the encoder stage, the optical image and the infrared image can be respectively input into a module based on combination of a convolution and a self-attention mechanism to obtain image characteristics; in the stage of fusion strategy, the obtained features are fused on a Y channel to obtain a fused image; and finally reconstructing the fusion image through a cascade decoder to obtain a final infrared and visible light fusion image. According to the invention, an image fusion model is established to obtain an infrared and visible light fusion image, and the image not only contains obvious targets and rich texture information, but also is beneficial to completion of advanced visual tasks.

Description

Method for fusing infrared and visible light images
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a method for fusing infrared and visible light images.
Background
With the rapid development of sensing hardware, multimodal imaging has attracted considerable attention in a wide range of applications, such as night monitoring and autopilot. In particular, the combination of infrared and visible light sensors has significant advantages for subsequent intelligent processing. Visible light imaging provides rich detail with high spatial resolution under well-defined illumination conditions, while infrared sensors capture ambient temperature changes emitted by objects, highlighting thermal target structures that are insensitive to illumination changes. However, visible light is difficult to capture objects in dark conditions, while infrared images are often accompanied by blurred details with lower spatial resolution. Fusing visually attractive images or supporting higher-level visual tasks (e.g., segmentation, tracking, and detection) is challenging due to their apparent differences in appearance. Therefore, it is important to design an infrared and visible image fusion method of an meridian-level.
Disclosure of Invention
The invention provides a method for fusing infrared and visible light images, which aims at the problem that an image shot by a single-mode sensor cannot effectively and comprehensively describe an imaging scene.
The technical scheme adopted by the invention is as follows:
a method of infrared and visible light image fusion, comprising the steps of:
s1: carrying out data preprocessing on an infrared and visible light image data set, firstly pairing an infrared image and a visible light image, then carrying out scale transformation on the infrared image and the visible light image, and finally carrying out separation on color channels on the images;
s2: constructing an encoder to realize the extraction of infrared image features and visible light image features;
s3: fusing the features obtained in the step S2 to obtain a fused image;
s4: constructing a decoder, and reconstructing the fusion image obtained in the step S3 to finally obtain a fusion image recovered from the fusion characteristics;
s5: and (3) judging the quality of the fusion image through the designed loss function, and calculating the loss of the fusion image obtained in the step S4, wherein the encoder and the decoder are continuously trained to obtain model parameters which minimize the loss function.
Further, the step S2 includes:
s21: the input infrared image I ir And visible light image I vi Performing feature projection by using three 1x1 convolutions respectively to obtain 3 groups of rich intermediate features, wherein the three groups of intermediate features are fully connected and then used as input of a convolution module and also used as input of a self-attention mechanism module consisting of query, key and value respectively;
s22: a convolution and self-attention mechanism combining module (ACmod) designed to extract local and global features, whereinAnd->The outputs of the convolution module and the self-attention mechanism module are represented respectively, the final output results thereof +.>Can be expressed as: f (f) out =αf a +βf c
S23: inputting the output result into a depth feature extraction module F composed of two convolution layers 2 conv The characteristics of its outputCan be expressed as: f (F) out =F 2 conv (f out );
S24: respectively infrared images to be paired and canThe visible light passes through the decoder module to finally obtain the output characteristics of the infrared imageAnd output characteristics of visible light image +.>
Wherein H represents the height of the image, W represents the width of the image, and C in Representing the channel of the input image, C out The channels representing the output image, alpha and beta represent weight factors that are learnable for balancing the self-attention and convolved output.
Further, the step S3 includes:
s31: fusing the output features of the step S2, wherein the process can be expressed as: i fuse =C fuse (f ir ,f vi );
Therein, whereinRepresenting the fused image, C fuse () Representing a fusion strategy, i.e. concatenation in the channel dimension.
Further, the step S4 includes:
s41: a decoder structure for image reconstruction is designed, which consists of 4 convolutions in series, denoted as F 4 r ();
S42: and (3) fusing the image obtained in the step (S3)As input to the decoder, the final output result F is obtained fuse ∈R H×W×3 I.e. fusing images, the process can be expressed as: f (F) fuse =F 4 r (I fuse )。
Further, the step S5 includes:
s51: the penalty for designing a fused image includes texture penalty L tex And loss of strength L int Fused image thereofOverall loss L of (2) a Can be expressed as: l (L) a =L int +γL tex
Where γ is a weight factor, the final goal is to obtain model parameters of the decoder and encoder such that the overall loss L a Minimum;
the texture penalty can be expressed as:
the loss of intensity can be expressed as:
represents gradient operations, represent L 1 A norm;
s52: the final network model parameters are obtained by training with the infrared and visible image pairs of the training set so that the total loss is minimized.
The invention has the following beneficial effects:
(1) The invention provides a double-branch feature extraction network based on combination of convolution and self-attention mechanisms, and the deep learning model utilizes combination of convolution and self-attention mechanisms in a feature extraction module of infrared and visible light images, so that the features of a source image can be better extracted, and information loss of fusion images on global features and local gradient features can be prevented.
(2) The method has excellent running speed and can be easily deployed as a real-time preprocessing module of an advanced visual task. According to the method, the infrared and visible light images are fused to obtain the fused image combining the two modal information, so that the method can be beneficial to improving the performance of advanced visual tasks (such as target tracking, target detection and semantic segmentation) and obtaining a better effect.
(3) The invention provides a new work flow for fusing an infrared image and a visible light image, namely, firstly, projecting and upscaling (more characteristic information of a source image is acquired) the input infrared image and the visible light image, then, sequentially taking the input infrared image and the visible light image as the input of convolution and self-attention to perform characteristic extraction, and finally, obtaining a final fused image through convolution. The flow fully combines the two modules with good effects in the deep learning model, namely the rolling and the self-attention mechanism, and fully obtains the advantages of the rolling and the self-attention mechanism to fuse to obtain a better fused image.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of an implementation of the present invention;
FIG. 3 is a schematic diagram of a fusion result in a daytime scene environment obtained by the processing of the present invention;
fig. 4 is a schematic diagram of a fusion result obtained by the processing of the present invention in a night scene environment.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Embodiment 1:
a method of infrared and visible light image fusion as shown in fig. 1 and 2, comprising the steps of:
s1: carrying out data preprocessing on an infrared and visible light image data set, firstly pairing an infrared image and a visible light image, then carrying out scale transformation on the infrared image and the visible light image, and finally carrying out separation on color channels on the images;
s2: constructing an encoder to realize the extraction of infrared image features and visible light image features;
s3: fusing the features obtained in the step S2 to obtain a fused image;
s4: constructing a decoder, and reconstructing the fusion image obtained in the step S3 to finally obtain a fusion image recovered from the fusion characteristics;
s5: and (3) judging the quality of the fusion image through the designed loss function, and calculating the loss of the fusion image obtained in the step S4, wherein the encoder and the decoder are continuously trained to obtain model parameters which minimize the loss function.
Wherein, the step S2 includes:
s21: the input infrared image I ir And visible light image I vi And respectively carrying out feature projection by using 3 convolutions of 1x1 to obtain 3 groups of rich intermediate features. On one hand, the three groups of features are input into a convolution module after full connection, and on the other hand, the three groups of features are input into a self-attention mechanism module consisting of a query, a key and a value respectively.
S22: inputting the intermediate features of step S21 into a convolution and self-attention mechanism combining module (ACmod) for extracting local features and global features, whereinAnd->The outputs of the convolution module and the self-attention mechanism module are represented respectively, the final output results thereof +.>Can be expressed as:
f out =αf a +βf c
s23: inputting the output result into a depth feature extraction module F composed of two convolution layers 2 conv The characteristics of its outputCan be expressed as:
F out =F 2 conv (f out )
s24: respectively passing the paired infrared image and visible light through the decoder module to obtainOutput features of an incoming infrared imageAnd output characteristics of visible light image +.>
Wherein H represents the height of the image, W represents the width of the image, and C in Representing the channel of the input image, C out The channels representing the output image, alpha and beta represent weight factors that are learnable for balancing the self-attention and convolved output.
Wherein, the step S3 includes:
s31: fusing the output features of the step S2, wherein the process can be expressed as:
I fuse =C fuse (I ir ,F vi )
therein, whereinRepresenting the fused image, C fuse (. Cndot.) represents a fusion strategy, i.e., concatenation in the channel dimension.
Wherein the step S4 includes:
s41: a decoder structure for image reconstruction is designed, which consists of 4 convolutions in series, denoted as F 4 r (·)。
S42: and (3) fusing the image obtained in the step (S3)As input to the decoder, the final output result F is obtained fuse ∈R H×W×3 I.e. fusing images, the process can be expressed as:
wherein, the step S5 includes:
s51: designing a fused imageThe penalty of (1) includes texture penalty L tex And loss of strength L int Overall loss L of its fused image a Can be expressed as:
L a =L int +γL tex
where γ is a weight factor, the final goal is to obtain model parameters of the decoder and encoder such that the overall loss L a Minimum.
The texture penalty can be expressed as:
the loss of intensity can be expressed as:
representing gradient operations, |·| represents L 1 Norm, |·| represents absolute value arithmetic operations.
S52: the final network model parameters are obtained by training with the infrared and visible image pairs of the training set so that the total loss is minimized.
To verify the feasibility and effectiveness of the method of the invention, experiments were performed.
Fig. 3 shows a schematic diagram of a fusion result in a daytime scene environment obtained by the processing of the method of the invention, and fig. 4 shows a schematic diagram of a fusion result in a night scene environment obtained by the processing of the method of the invention.
The results of fig. 3 and fig. 4 show that the invention can well preserve the recognition target of the infrared image, fuses the characters in the image, has high contrast and prominent outline, and is beneficial to visual observation, and secondly, the fusion result of the invention can preserve abundant texture details from the visible light image, which can better accord with the visual system of human beings.
Overall, the fusion results obtained by the present invention have prominent character targets, clearer edge contours, and preserve rich texture details.
While the invention has been described in terms of preferred embodiments, it is not intended to be limited thereto, but rather to enable any person skilled in the art to make various changes and modifications without departing from the spirit and scope of the present invention, which is therefore to be limited only by the appended claims.

Claims (5)

1. A method for fusing infrared and visible light images, comprising the steps of:
s1: carrying out data preprocessing on an infrared and visible light image data set, firstly pairing an infrared image and a visible light image, then carrying out scale transformation on the infrared image and the visible light image, and finally carrying out separation on color channels on the images;
s2: constructing an encoder to realize the extraction of infrared image features and visible light image features;
s3: fusing the features obtained in the step S2 to obtain a fused image;
s4: constructing a decoder, and reconstructing the fusion image obtained in the step S3 to finally obtain a fusion image recovered from the fusion characteristics;
s5: and (3) judging the quality of the fusion image through the designed loss function, and calculating the loss of the fusion image obtained in the step S4, wherein the encoder and the decoder are continuously trained to obtain model parameters which minimize the loss function.
2. The method of fusing an infrared and visible light image of claim 1, wherein said step S2 comprises:
s21: the input infrared image I ir And visible light image I vi The feature projection is carried out by using three 1x1 convolutions respectively to obtain 3 groups of rich intermediate features, wherein the three groups of intermediate features are fully connected and then used as input of a convolution module and are also used as query, key and value respectivelyAn input of a constituent self-attention mechanism module;
s22: a convolution and self-attention mechanism combining module (ACmod) designed to extract local and global features, whereinAnd->The outputs of the convolution module and the self-attention mechanism module are represented respectively, the final output results thereof +.>Can be expressed as: f (f) out =αf a +βf c
S23: inputting the output result into a depth feature extraction module F composed of two convolution layers 2 conv The characteristics of its outputCan be expressed as: f (F) out =F 2 conv (f out );
S24: respectively passing the paired infrared image and visible light through the decoder module to obtain the output characteristics of the infrared imageAnd output characteristics of visible light image +.>
Wherein H represents the height of the image, W represents the width of the image, and C in Representing the channel of the input image, C out The channels representing the output image, alpha and beta represent weight factors that are learnable for balancing the self-attention and convolved output.
3. The method of fusing infrared and visible light images as set forth in claim 1, wherein the step S3 includes:
s31: fusing the output features of the step S2, wherein the process can be expressed as: i fuse =C fuse (f ir ,f vi );
Therein, whereinRepresenting the fused image, C fuse () Representing a fusion strategy, i.e. concatenation in the channel dimension.
4. The method of fusing an infrared and visible light image of claim 1, wherein said step S4 comprises:
s41: a decoder structure for image reconstruction is designed, which consists of 4 convolutions in series, denoted as F 4 r ();
S42: and (3) fusing the image obtained in the step (S3)As input to the decoder, the final output result F is obtained fuse ∈R H×W×3 I.e. fusing images, the process can be expressed as: f (F) fuse =F 4 r (I fuse )。
5. The method of fusing an infrared and visible light image of claim 1, wherein said step S5 comprises:
s51: the penalty for designing a fused image includes texture penalty L tex And loss of strength L int Overall loss L of its fused image a Can be expressed as: l (L) a =L int +γL tex
Where γ is a weight factor, the final goal is to obtain model parameters of the decoder and encoder such that the overall loss L a Minimum;
the texture penalty can be expressed as:
the loss of intensity can be expressed as:
representing gradient operations, |·| represents L 1 A norm;
s52: the final network model parameters are obtained by training with the infrared and visible image pairs of the training set so that the total loss is minimized.
CN202310397002.8A 2023-04-14 2023-04-14 Method for fusing infrared and visible light images Pending CN116452480A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310397002.8A CN116452480A (en) 2023-04-14 2023-04-14 Method for fusing infrared and visible light images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310397002.8A CN116452480A (en) 2023-04-14 2023-04-14 Method for fusing infrared and visible light images

Publications (1)

Publication Number Publication Date
CN116452480A true CN116452480A (en) 2023-07-18

Family

ID=87126794

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310397002.8A Pending CN116452480A (en) 2023-04-14 2023-04-14 Method for fusing infrared and visible light images

Country Status (1)

Country Link
CN (1) CN116452480A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117391983A (en) * 2023-10-26 2024-01-12 安徽大学 Infrared image and visible light image fusion method
CN117783780A (en) * 2023-12-26 2024-03-29 阳谷质上特种电缆有限公司 Cable fault detection method based on imaging technology

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117391983A (en) * 2023-10-26 2024-01-12 安徽大学 Infrared image and visible light image fusion method
CN117783780A (en) * 2023-12-26 2024-03-29 阳谷质上特种电缆有限公司 Cable fault detection method based on imaging technology

Similar Documents

Publication Publication Date Title
CN111209810B (en) Boundary frame segmentation supervision deep neural network architecture for accurately detecting pedestrians in real time through visible light and infrared images
CN109299274B (en) Natural scene text detection method based on full convolution neural network
CN111145131B (en) Infrared and visible light image fusion method based on multiscale generation type countermeasure network
CN116452480A (en) Method for fusing infrared and visible light images
Zhang et al. Deep hierarchical guidance and regularization learning for end-to-end depth estimation
CN112419212B (en) Infrared and visible light image fusion method based on side window guide filtering
CN111274921A (en) Method for recognizing human body behaviors by utilizing attitude mask
CN114972748A (en) Infrared semantic segmentation method capable of explaining edge attention and gray level quantization network
CN116740419A (en) Target detection method based on graph regulation network
Deng et al. DRD-Net: Detail-recovery image deraining via context aggregation networks
CN110751271B (en) Image traceability feature characterization method based on deep neural network
Chen et al. Colorization of infrared images based on feature fusion and contrastive learning
Xie et al. YOLO-MS: Multispectral object detection via feature interaction and self-attention guided fusion
Liu et al. Multi-exposure image fusion via multi-scale and context-aware feature learning
CN117690161B (en) Pedestrian detection method, device and medium based on image fusion
Bajpai et al. Underwater U-Net: Deep learning with U-Net for visual underwater moving object detection
CN116137023B (en) Low-illumination image enhancement method based on background modeling and detail enhancement
Yue et al. Low-illumination traffic object detection using the saliency region of infrared image masking on infrared-visible fusion image
Patel et al. ThermISRnet: an efficient thermal image super-resolution network
Goncalves et al. Guidednet: Single image dehazing using an end-to-end convolutional neural network
CN113449552A (en) Pedestrian re-identification method based on blocking indirect coupling GAN network
CN113869151B (en) Cross-view gait recognition method and system based on feature fusion
CN115661451A (en) Deep learning single-frame infrared small target high-resolution segmentation method
CN114549958A (en) Night and disguised target detection method based on context information perception mechanism
Yuyao et al. The infrared-visible complementary recognition network based on context information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination