CN116757988B - Infrared and visible light image fusion method based on semantic enrichment and segmentation tasks - Google Patents

Infrared and visible light image fusion method based on semantic enrichment and segmentation tasks Download PDF

Info

Publication number
CN116757988B
CN116757988B CN202311033639.5A CN202311033639A CN116757988B CN 116757988 B CN116757988 B CN 116757988B CN 202311033639 A CN202311033639 A CN 202311033639A CN 116757988 B CN116757988 B CN 116757988B
Authority
CN
China
Prior art keywords
image
layer
convolution
fusion
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311033639.5A
Other languages
Chinese (zh)
Other versions
CN116757988A (en
Inventor
吕国华
宋文廓
高翔
王西艳
司马超群
张曾彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN202311033639.5A priority Critical patent/CN116757988B/en
Publication of CN116757988A publication Critical patent/CN116757988A/en
Application granted granted Critical
Publication of CN116757988B publication Critical patent/CN116757988B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/54Extraction of image or video features relating to texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an infrared and visible light image fusion method based on semantic enrichment and segmentation tasks, and relates to the technical field of new generation information. The invention comprises the following steps: s1: acquiring a training set and a testing set of a fusion network and a testing set of a generalization experiment; s2: constructing an image fusion network; s3: and training the constructed image fusion network by utilizing the training set of the fusion network to obtain an image fusion network model. When the fusion image obtained by the image fusion method is subjected to a subsequent panoramic segmentation task, a result which is more similar to the original image can be segmented.

Description

Infrared and visible light image fusion method based on semantic enrichment and segmentation tasks
Technical Field
The invention belongs to the technical field of new generation information, and particularly relates to an infrared and visible light image fusion method based on semantic enrichment and segmentation tasks.
Background
Image fusion is a common information fusion method, and according to the difference of imaging equipment, the image fusion can be divided into three types, namely multimode image fusion, digital photographic image fusion and remote sensing image fusion. At present, typical multimode image fusion mainly comprises medical image fusion and infrared image and visible light image fusion, typical digital photographic image fusion mainly comprises multi-exposure image fusion and multi-focus image fusion, and remote sensing image fusion mainly comprises multi-spectrum image and full-color image fusion. At present, image fusion of an infrared image and a visible light image is a focus of scientific researchers, and mainly because the fused image obtained after the fusion of the infrared image and the visible light image generally has obvious targets and rich textures, has high image quality, and has very good application prospects in the aspects of target detection and military monitoring.
At present, image fusion of an infrared image and a visible light image by using a deep learning method is an important direction of attention of scientific researchers. However, most of the current methods for image fusion of infrared images and visible light images by using a deep learning method only pay attention to fusion indexes and subjective visual effects, and rarely pay attention to advanced tasks such as panoramic segmentation after image fusion. Although some fusion methods of infrared images and visible light images are proposed based on subsequent panoramic segmentation tasks in the prior art, for example, the SeAFusion fusion method disclosed by Information Fusion combines semantic loss with a network fusion frame, the fusion image target obtained by fusion in the image fusion method under a low-light scene cannot be clearly displayed, and the texture detail is not fine enough, so that the difference between the fusion image segmented in the subsequent panoramic segmentation task and the image segmented in the source image is larger, and even the target cannot be segmented; therefore, the application provides an infrared and visible light image fusion method based on a panoramic segmentation task.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides an infrared and visible light image fusion method based on semantic enrichment and segmentation tasks.
The technical scheme of the invention is as follows:
an infrared and visible light image fusion method based on semantic enrichment and segmentation tasks comprises the following steps:
step S1: acquiring a training set and a testing set of a fusion network and a testing set of a generalization experiment;
step S2: constructing an image fusion network;
step S3: training the constructed image fusion network based on semantic segmentation by utilizing a training set of the fusion network to obtain an image fusion network model; the method specifically comprises the following steps:
step S3-1: converting the visible light image in the training set of the fusion network into a YCbCr image, and then respectively separating a Y channel image of the visible light image, a Cb channel image of the visible light image and a Cr channel image of the visible light image;
s3-2: inputting the infrared image in the training set of the fusion network and the Y-channel image obtained in the step S3-1 into the image fusion network to obtain a single-channel fusion image; then, fusing the single-channel fused image, the Cb channel image and the Cr channel image together to obtain a YCbCr color space, and converting the YCbCr color space into a color RGB fused image; and performing semantic segmentation on the color RGB fusion image by using a semantic segmentation network BiSeNet, calculating the total loss of the image fusion network and the semantic segmentation network BiSeNet, and repeatedly iterating for 4 times by using the parameters of the total loss iteration updating network to obtain an image fusion network model.
Preferably, the specific manner of step S1 is: selecting a training set in the MSRS data set as a training set of the fusion network; randomly selecting 20 pairs of registered infrared images and visible light images under a normal light scene and 20 pairs of registered infrared images and visible light images under a low light scene from a test set in an MSRS data set to form a test set of the fusion network; and randomly selecting 20 pairs of registered infrared images and visible light images from the TNO data set to form a test set of generalization experiments.
Preferably, in step S2, the image fusion network includes five convolution blocks i, two CISM modules, four fine-grained feature extraction modules, four Concat layers i, a channel splicing layer i, and a convolution block ii; the first and second convolution blocks I and I are respectively connected with the first Concat layer I and the second Concat layer I, the first and second convolution blocks I and I are also connected with the first CISM module, the first CISM module is also respectively connected with the first Concat layer I and the second Concat layer I, the first Concat layer I and the second Concat layer I are respectively connected with the first and second fine grain feature extraction modules, the first fine grain feature extraction module and the second fine grain feature extraction module are respectively connected with the third and fourth Concat layer I and the fourth Concat layer I, the first fine grain feature extraction module and the second fine grain feature extraction module are also connected with the second CISM module, the second CISM module is respectively connected with the third and fourth Concat layer I and the fourth Concat layer I, the third and fourth Concat layer I are respectively connected with the third and fourth fine grain feature extraction module, the fourth convolution block I and the fourth convolution block I are respectively connected with the third and fourth convolution block I and the fourth convolution block I in turn, the fourth convolution block I is connected with the third and fourth convolution block I; the first convolution block I is used for extracting features of an infrared image input to the image fusion network to obtain a feature image A with the channel number of 16; the second convolution block I is used for extracting features of the Y-channel images input to the image fusion network to obtain a feature map B with the channel number of 16; the first CISM module is used for carrying out information sharing and feature fusion on the feature images A and B to obtain a feature image C and a feature image D with the channel number of 16; the first Concat layer I is used for carrying out element addition on the feature map A and the feature map C to obtain a feature map E with the channel number of 16; the second Concat layer I is used for carrying out element addition on the feature map B and the feature map D to obtain a feature map F with the channel number of 16; the first fine-granularity feature extraction module is used for carrying out fine-granularity feature extraction on the feature map E to obtain a feature map G; the second fine-granularity feature extraction module is used for carrying out fine-granularity feature extraction on the feature map F to obtain a feature map H; the second CISM module is used for carrying out information sharing and feature fusion on the feature images G and H to obtain a feature image J and a feature image K with the channel number of 32; the third Concat layer I is used for carrying out element addition on the feature map G and the feature map J to obtain a feature map L with the channel number of 32; the fourth Concat layer I is used for carrying out element addition on the feature images H and K to obtain a feature image M with the channel number of 32; the third fine-granularity feature extraction module is used for carrying out fine-granularity feature extraction on the feature map L to obtain a feature map N, wherein the number of channels is 48; the fourth fine-granularity feature extraction module is used for carrying out fine-granularity feature extraction on the feature map M to obtain a feature map P, and the number of channels is 48; the channel splicing layer I is used for splicing the feature images N and P in the channel dimension to obtain a feature image Q with the channel number of 96; the third convolution block I is used for changing the channel from 96 to 48 for the characteristic diagram Q to obtain a characteristic diagram R; the fourth convolution block I is used for changing the channel from 48 to 32 for the characteristic diagram R to obtain a characteristic diagram S; the fifth convolution block I is used for changing the channel from 32 to 16 for the feature map S to obtain a feature map T; the convolution block II is used for changing the channel from 16 to 1 for the feature map T, and simultaneously mapping the pixel value to a specific range by utilizing the Tanh activation layer of the convolution block II to obtain a fusion image of the infrared image and the visible light Y single-channel image, namely the single-channel fusion image.
Preferably, in step S2, the convolution block i includes a convolution layer having a convolution kernel of 3×3 and an lrerlu activation layer connected to the convolution layer having the convolution kernel of 3×3.
Preferably, in step S2, the convolution block ii includes a convolution layer with a convolution kernel of 1×1 and a Tanh active layer connected to the convolution layer with a convolution kernel of 1×1;
preferably, in step S2, the CISM module includes a feature layer, a maximum pooling layer, an average pooling layer, a channel splicing layer ii, a convolution module, and a Sigmoid layer; the first convolution block I and the second convolution block I are connected with a characteristic layer, the characteristic layer is respectively connected with a maximum pooling layer and an average pooling layer, the maximum pooling layer and the average pooling layer are both connected with a channel splicing layer II, and the channel splicing layer II is sequentially connected with a convolution module and a Sigmoid layer; the Sigmoid layer of the first CISM module is respectively connected with the first Concat layer I and the second Concat layer I; the Sigmoid layer of the second CISM module is connected with the third Concat layer i and the fourth Concat layer i respectively.
Preferably, in step S2, the convolution module has a convolution layer with a convolution kernel of 3×3 and an lrerlu activation layer connected to the convolution layer with a convolution kernel of 3×3.
Preferably, in step S2, the fine-grained feature extraction module includes a backbone network, a residual network and a fifth Concat layer ii, where the backbone network includes a first convolution unit ii, a first Concat layer ii, a first convolution unit i, a second Concat layer ii, a second convolution unit i, a third Concat layer ii, a third convolution unit i, a fourth Concat layer ii, and a second convolution unit ii that are sequentially and densely connected; the residual error network comprises a third convolution unit II, a Sobel edge detection module and a fourth convolution unit II which are sequentially arranged, and the second convolution unit II of the main network and the fourth convolution unit II of the residual error network are connected with a fifth Concat layer II.
Preferably, in step S2, the convolution unit i includes a convolution layer having a convolution kernel of 3×3 and an lrerlu activation layer connected to the convolution layer having the convolution kernel of 3×3.
Preferably, in step S2, the convolution unit ii includes a convolution layer having a convolution kernel of 1×1 and an lrerlu activation layer connected to the convolution layer having the convolution kernel of 1×1.
Preferably, in step S3, the overall loss in the present applicationIncluding content loss->And semantic loss- >The method comprises the steps of carrying out a first treatment on the surface of the And content loss->And also includes strength loss->And texture loss->The method comprises the steps of carrying out a first treatment on the surface of the Semantic loss->Including main semantic loss->And auxiliary semantic loss->
Compared with the prior art, the invention has the following beneficial effects:
the invention provides an infrared and visible light image fusion method based on rich semantics and segmentation tasks, aiming at overcoming the disadvantage that the image fusion method in the prior art only pays attention to fusion effect and ignores the requirement of a high-level task of subsequent panoramic segmentation; according to the method, semantic requirements of panoramic segmentation tasks are fully considered in the image fusion process, a CISM module is designed in the image fusion process so that images of two modes of an infrared image and a Y-channel image of a visible light image can exchange more information from the other side, a color RGB fusion image comprises more source image information, a visible light image in a low-illumination scene can have better illumination intensity after the images are fused, so that targets in the fusion image can be clearer and more prominent, a fine-granularity feature extraction module is designed for extracting features with more fine granularity, and the extraction of the features with more granularity is used for obtaining more semantic information, so that the semantic requirements of subsequent panoramic segmentation are met; then, the fusion image generated by the image fusion network is segmented by utilizing the semantic segmentation network BiSeNet, then, the overall loss is utilized to calculate the overall loss of the image fusion network and the semantic segmentation network BiSeNet, then, the calculated overall loss is utilized to lose iteration to update the parameters of the image fusion network and the semantic segmentation network BiSeNet, and the iteration is repeated for 4 times to obtain an image fusion network model.
Drawings
FIG. 1 is a general flow chart of the present invention;
FIG. 2 is a schematic diagram of an image fusion network training process of the present invention;
FIG. 3 is a schematic diagram of a network architecture of an image fusion network according to the present invention; in the view of figure 3 of the drawings,representing Concat layer I;
fig. 4 is a schematic diagram of a network structure of the fine-grained feature extraction module according to the invention, and in fig. 4,representing Concat layer II;
FIG. 5 is a schematic diagram of a network structure of a CISM module according to the present invention;
FIG. 6 is a fusion image obtained by the image fusion method of the present application based on a test set of a fusion network; in fig. 6, (a) Infrared represents an Infrared image for testing visual effects, (b) Visible represents a Visible image for testing visual effects, and (c) to (j) in fig. 6 represent Fusion gan image Fusion method, denseFuse image Fusion method, MDLatLRR image Fusion method, nestFuse image Fusion method, piause image Fusion method, seafilon image Fusion method, U2Fusion image Fusion method, and Fusion images obtained by the image Fusion method described herein based on a test set of Fusion networks, respectively;
fig. 7 is a segmentation result Image obtained by performing a panoramic segmentation task on a Fusion Image obtained by a test set of a Fusion network according to the Image Fusion method of the application and seven existing Image Fusion methods, in fig. 7, (a) an Infrared Image for testing a panoramic segmentation effect, (b) a Visible Image for testing a panoramic segmentation effect, (c) an Infrared Image for performing a panoramic segmentation task on an Infrared Image for testing a panoramic segmentation effect, (d) a segmentation result Image obtained by performing a panoramic segmentation task on a Visible Image for testing a panoramic segmentation effect, and (e) to (l) in fig. 7 respectively represent a Fusion gan Image Fusion method, a desuse Image Fusion method, an MDLatLRR Image Fusion method, a nestfose Image Fusion method, a piause Image Fusion method, a Seafusion Image Fusion method, a U2Fusion method, and a panoramic segmentation task obtained by performing a Fusion Image Fusion method according to the test set of the Fusion network;
FIG. 8 shows a fusion image obtained by the image fusion method and a test set of seven existing image fusion methods based on generalization experiments, in FIG. 8, (a) Infrared represents an Infrared image for testing generalization of the method of the application, and (b) Visible represents a Visible image for testing generalization of the method of the application; fig. 8 (c) to (j) respectively show Fusion gan image Fusion method, denseFuse image Fusion method, MDLatLRR image Fusion method, nestFuse image Fusion method, piafilion image Fusion method, seafilion image Fusion method, U2Fusion image Fusion method, and Fusion images obtained by the image Fusion method based on a test set of generalization experiments.
Detailed Description
In order to enable those skilled in the art to better understand the technical solution of the present invention, the technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
An infrared and visible light image fusion method based on semantic enrichment and segmentation tasks, the general flow chart of which is shown in figure 1, specifically comprises the following steps:
s1, acquiring a training set and a testing set of a fusion network and a testing set of a generalization experiment:
the training set in the MSRS data set is selected as the training set of the fusion network; randomly selecting 20 pairs of registered infrared images and visible light images under a normal light scene and 20 pairs of registered infrared images and visible light images under a low light scene from a test set in an MSRS data set to form a test set of the fusion network;
the MSRS data set selected in the application is an existing data set, the existing MSRS data set is divided into a test set and a training set and is provided with corresponding labels, and the acquired website of the MSRS data set is as follows: github-Linfeng-Tang/MSRS: multi-Spectral Road Scenarios for Practical Infrared and VisibleImageFusion; the training set of the fusion network comprises 1083 pairs of registered infrared images and visible light images, the test set of the fusion network comprises 361 pairs of registered infrared images and visible light images, and the training set of the fusion network and the visible light images in the test set of the fusion network both comprise visible light images in a normal light scene and visible light images in a low light scene;
Randomly selecting 20 pairs of registered infrared images and visible light images from the TNO data set to form a test set of generalization experiments; TNO dataset is the existing dataset in this application, obtains the website and is: https:// figshare. Com/fingers/Dataset/tno_image_fusion_dataset/1008029 file = 1475454.
S2: and constructing an image fusion network. The method comprises the following specific steps:
the image fusion network, as shown in fig. 3, comprises five convolution blocks i, two CISM modules, four fine-granularity feature extraction modules, four Concat layers i, a channel splicing layer i and a convolution block ii; the first convolution block I and the second convolution block I are respectively connected with a first Concat layer I and a second Concat layer I, the first convolution block I and the second convolution block I are also connected with a first CISM module, the first CISM module is also respectively connected with the first Concat layer I and the second Concat layer I, the first Concat layer I and the second Concat layer I are respectively connected with a first fine grain feature extraction module and a second fine grain feature extraction module, the first fine grain feature extraction module and the second fine grain feature extraction module are respectively connected with a third Concat layer I and a fourth Concat layer I, the first fine grain feature extraction module and the second fine grain feature extraction module are also connected with a second CISM module, the second CISM module is respectively connected with a third fine grain Concat layer I and a fourth Concat layer I, the third Concat layer I and the fourth Concat layer I are respectively connected with a third fine grain feature extraction module, a fourth fine grain feature extraction module and a fourth convolution module are respectively connected with a third fine grain feature extraction module, a fourth convolution module and a fourth convolution module, and a fourth convolution module are connected with a third convolution module I and a fourth convolution module, and a fourth convolution module are connected with the third convolution module I and the second convolution module; the convolution block I comprises a convolution layer with a convolution kernel of 3 multiplied by 3 and an LReLu activation layer connected with the convolution layer with the convolution kernel of 3 multiplied by 3; the convolution block II comprises a convolution layer with a convolution kernel of 1 multiplied by 1 and a Tanh activation layer connected with the convolution layer with the convolution kernel of 1 multiplied by 1;
The structure of the CISM module in the application, as shown in fig. 5, comprises a feature layer, a maximum pooling layer, an average pooling layer, a channel splicing layer II, a convolution module and a Sigmoid layer; the first convolution block I and the second convolution block I are connected with a characteristic layer, the characteristic layer is respectively connected with a maximum pooling layer and an average pooling layer, the maximum pooling layer and the average pooling layer are both connected with a channel splicing layer II, and the channel splicing layer II is sequentially connected with a convolution module and a Sigmoid layer; the convolution module comprises a convolution layer with a convolution kernel of 3×3 and an LReLu activation layer connected with the convolution layer with the convolution kernel of 3×3;
the Sigmoid layer of the first CISM module is respectively connected with the first Concat layer I and the second Concat layer I; the Sigmoid layer of the second CISM module is connected with the third Concat layer i and the fourth Concat layer i respectively.
The structure of the fine-grained feature extraction module in the application is shown in fig. 4, and the fine-grained feature extraction module comprises a main network, a residual network and a fifth Concat layer II which are arranged in parallel, wherein the main network comprises a first convolution unit II, a first Concat layer II, a first convolution unit I, a second Concat layer II, a second convolution unit I, a third Concat layer II, a third convolution unit I, a fourth Concat layer II and a second convolution unit II which are sequentially arranged and densely connected; the residual error network comprises a third convolution unit II, a Sobel edge detection module and a fourth convolution unit II which are sequentially arranged; the second convolution unit II of the backbone network and the fourth convolution unit II of the residual network are connected with the fifth Concat layer II; in the present application, the convolution unit i includes a convolution layer with a convolution kernel of 3×3 and an lrehu activation layer connected to the convolution layer with a convolution kernel of 3×3; the convolution unit II comprises a convolution layer with a convolution kernel of 1 multiplied by 1 and an LReLu activation layer connected with the convolution layer with the convolution kernel of 1 multiplied by 1;
The first convolution block I is used for extracting characteristics of an infrared image input to the image fusion network to obtain a characteristic diagram A with the channel number of 16; the second convolution block I is used for extracting features of a Y-channel image of the visible light image input to the image fusion network to obtain a feature map B with the channel number of 16; the first CISM module is used for carrying out information sharing and feature fusion on the feature images A and B to obtain a feature image C and a feature image D with the channel number of 16;
the first Concat layer I is used for carrying out element addition on the feature map A and the feature map C to obtain a feature map E with the channel number of 16; the second Concat layer I is used for carrying out element addition on the feature map B and the feature map D to obtain a feature map F with the channel number of 16; the first fine-granularity feature extraction module is used for carrying out fine-granularity feature extraction on the feature map E to obtain a feature map G; the second fine-granularity feature extraction module is used for carrying out fine-granularity feature extraction on the feature map F to obtain a feature map H; the second CISM module is used for carrying out information sharing and feature fusion on the feature images G and H to obtain a feature image J and a feature image K with the channel number of 32; the third Concat layer I is used for carrying out element addition on the feature map G and the feature map J to obtain a feature map L with the channel number of 32; the fourth Concat layer I is used for carrying out element addition on the feature images H and K to obtain a feature image M with the channel number of 32; the third fine-granularity feature extraction module is used for carrying out fine-granularity feature extraction on the feature map L to obtain a feature map N, wherein the number of channels is 48; the fourth fine-granularity feature extraction module is used for carrying out fine-granularity feature extraction on the feature map M to obtain a feature map P, and the number of channels is 48; the channel splicing layer I is used for splicing the feature images N and P in the channel dimension to obtain a feature image Q with the channel number of 96; the third convolution block I is used for changing the channel from 96 to 48 for the characteristic diagram Q to obtain a characteristic diagram R; the fourth convolution block I is used for changing the channel from 48 to 32 for the characteristic diagram R to obtain a characteristic diagram S; the fifth convolution block I is used for changing the channel from 32 to 16 for the feature map S to obtain a feature map T; the convolution block II is used for changing the channel from 16 to 1 for the feature map T, and simultaneously mapping the pixel value to a specific range by utilizing the Tanh activation layer of the convolution block II to obtain a fusion image of the infrared image and the visible light Y single-channel image, namely the single-channel fusion image.
Wherein: the first CISM module is used for carrying out information sharing and feature fusion on the feature images A and B to obtain a feature image C and a feature image D with the channel number of 16; the method specifically comprises the following steps: feature layerFor element-by-element subtraction of the characteristic diagram A output by the first convolution block I and the characteristic diagram B output by the second convolution block I, so that the pixels at the corresponding positions of the characteristic diagram A and the characteristic diagram B are subtracted and the absolute value is taken to obtain a characteristic diagram AB with 16 channels 1 The method comprises the steps of carrying out a first treatment on the surface of the The maximum pooling layer and the average pooling layer are respectively used for mapping the feature map AB 1 Pooling operation is carried out to respectively obtain two characteristic graphs AB with the number of channels being 1 2 And AB 3 The method comprises the steps of carrying out a first treatment on the surface of the Channel splice layer II pair AB 2 And AB 3 Performing channel splicing to obtain a feature map AB with the number of channels being 2 4 The method comprises the steps of carrying out a first treatment on the surface of the Convolving module pair feature map AB 4 Feature extraction and dimension reduction are carried out to obtain a feature map AB with the channel number of 1 5 The method comprises the steps of carrying out a first treatment on the surface of the The Sigmoid layer is used for mapping the feature map AB 5 Normalizing to obtain a position [0,1 ]]The factor is multiplied with each pixel of the feature map A and each pixel of the feature map B to obtain a feature map C and a feature map D; and carrying out element addition on the feature map A and the feature map C in a first Concat layer I to obtain a feature map E, and carrying out element addition on the feature map B and the feature map D in a second Concat layer I to obtain a feature map F.
In the method, the network structures of a first CISM module and a second CISM module are the same, the second CISM module performs information sharing and feature fusion on a feature diagram F and a feature diagram G, and the working principle of the feature diagram G and the feature diagram H with the channel number of 32 is the same as that of the first CISM module;
the first fine-granularity feature extraction module is used for carrying out fine-granularity feature extraction on the feature map E to obtain a feature map G; specifically: the first convolution unit II of the backbone network is used for fusing the characteristics of different channels in the input characteristic diagram E to obtain the characteristic diagram E with the channel number of 16 1 The first Concat layer II is used for combining the feature map E and the feature map E 1 Adding to obtain a characteristic diagram E 2 The first convolution unit I is used for comparing the characteristic diagram E 2 Roughly extracting the features of the target contour and the target texture to obtain a feature map E 3 The second Concat layer II is used for mapping the feature map E 3 Feature map E 1 Adding the feature map E to obtain a feature map E 4 A second convolution unit IFor the characteristic diagram E 4 The detailed feature extraction is carried out on the features of the target outline and the target distribution in the model to obtain a feature map E 5 The third Concat layer II is used for combining the feature images E and E 1 Feature map E 3 Feature map E 5 Adding to obtain a feature map E 6 A third convolution unit I for applying to the feature map E 6 Extracting semantic information contained in the target outline and target distribution to obtain a feature map E 7 The fourth Concat layer II is used for comparing the feature images E and E 1 Feature map E 3 Feature map E 5 Feature map E 7 Adding to obtain a feature map E 8 A second convolution unit II for applying to the feature map E 8 Performing dimension lifting operation to obtain a feature map E with the channel number of 32 9
And a third convolution unit II of the residual error network can fuse the characteristics of different channels of the input characteristic diagram E to obtain the characteristic diagram E 10 The Sobel edge detection module is used for detecting the characteristic diagram E 10 Extracting edge features in the image to obtain a feature map E 11 A fourth convolution unit II for applying to the feature map E 11 Performing dimension increasing operation to obtain a feature map E with the channel number of 32 12 The fifth Concat layer II is used for mapping the feature map E 12 And feature map E 9 The feature map G with the number of channels of 32 is obtained by element-by-element addition.
In the application, a first fine-grained feature extraction module, a second fine-grained feature extraction module, a third fine-grained feature extraction module and a fourth fine-grained feature extraction module have the same network structure, and the second fine-grained feature extraction module is used for carrying out fine-grained feature extraction on a feature map F to obtain a feature map H; the second fine-grained feature extraction module operates on the same principle as the first fine-grained feature extraction module.
The third fine-granularity feature extraction module is used for carrying out fine-granularity feature extraction on the feature map L to obtain a feature map N; the third fine-grained feature extraction module operates on the same principle as the first fine-grained feature extraction module.
The fourth fine-granularity feature extraction module is used for carrying out fine-granularity feature extraction on the feature map M to obtain a feature map P; the working principle of the fourth fine-grained feature extraction module is the same as that of the first fine-grained feature extraction module.
S3, training the constructed image fusion network by utilizing a training set of the fusion network and based on semantic segmentation to obtain an image fusion network model, wherein the training process is shown in FIG. 2; the method comprises the following specific steps:
s3-1: converting a visible light image in a training set of a fusion network into a YCbCr image by utilizing an image space conversion module, and then respectively separating a Y-channel image of the visible light image, a Cb-channel image of the visible light image and a Cr-channel image of the visible light image to obtain a Y-channel image of the visible light image, a Cb-channel image of the visible light image and a Cr-channel image of the visible light image;
s3-2: inputting the infrared image in the training set of the fusion network and the Y-channel image obtained in the step S3-1 into the image fusion network to obtain a single-channel fusion image; then, fusing the single-channel fused image, the Cb channel image and the Cr channel image together by using a format conversion module to obtain a YCbCr color space, and transmitting the YCbCr color space into an RGB space for format conversion to obtain a color RGB fused image; the principle of fusing a single-channel fusion image output by an image fusion network with a Cb channel image of a visible light image and a Cr channel image of the visible light image by using a format conversion module is the same as the principle disclosed by the following website: https:// zhuanlan. Zhihu. Com/p/88933905; and then, segmenting the color RGB fusion image by using a semantic segmentation network BiSeNet, calculating the total loss of the image fusion network and the semantic segmentation network BiSeNet, and repeatedly iterating for 4 times by using the calculated total loss to update the parameters of the image fusion network and the semantic segmentation network BiSeNet, thereby obtaining an image fusion network model. In the process of iteratively training the image fusion network, an Adam optimizer is used for optimizing the image fusion network, the learning rate is set to be 0.001, and the batch size is set to be 4; meanwhile, a small batch of gradient random descent is used for optimizing the segmentation network, the batch size is 8, and the initial learning rate is 0.01. The image fusion network described herein is run on a PyTorch platform.
The overall loss is utilized to characterize the overall loss of an image fusion network and a semantic segmentation network BiSeNet, and the overall loss is used to constrain the fusion network and the semantic segmentation network BiSeNet to obtain a fusion image with smaller loss and higher qualityIncluding content loss->And semantic loss->Total loss->The calculation formula of (2) is shown as a formula (1),
(1)
in the formula (1), the components are as follows,as a balancing factor for balancing content loss and semantic loss, the method is used for balancing content loss and semantic loss in four iterative processes from the first iteration to the fourth iteration in the process of training the image fusion network>Are respectively set as->=0,1,2,3。
Content lossAnd also includes strength loss->And texture loss->Content loss->And loss of strengthAnd texture loss->As shown in the formula (2),
(2)
in the formula (2), the amino acid sequence of the compound,as a balancing factor for balancing the intensity loss and the texture loss, in the present application +.>The value is 10.
In the formula (2), the strength is lostThe calculation formula of (2) is shown as formula (3):
(3)
in the formula (3), H and W represent the height and width of the color RGB fusion image,representing a color RGB fusion image, ">An infrared image representing an input image fusion network, < > >Representing inputY-channel image of visible light image of converged network, max () represents maximum element selection, +.>Represents->A norm; in the image fusion process, the sizes of the images are not processed, so that the sizes of the Y-channel images of the infrared images and the visible light images input into the image fusion network are the same as the sizes of the color RGB fusion images output by the image fusion network; the color RGB fusion image output by the image fusion network in the application refers to the fusion image obtained by repeatedly iterating for 4 times by using the calculated total loss to lose iteration to update parameters of the image fusion network and the semantic segmentation network BiSeNet;
texture loss in formula (2)As shown in the formula (4):
(4)
in the formula (4), the amino acid sequence of the compound,representing a Sobel gradient operator for acquiring gradient values of the image, < >>Representing absolute value, max () represents maximum element selection, +.>Represents->Norms, H and W represent the height and width of the final fused image generated by the image fusion network, +.>Representing a color RGB fusion image->Gradient value of color RGB fusion image obtained by gradient operation, < >>Representing an infrared image gradient value obtained by performing gradient operation on an infrared image of an input image fusion network,/- >And the gradient value of the Y-channel image obtained by carrying out gradient operation on the Y-channel image of the visible light image of the input image fusion network is represented.
In the application, the semantic segmentation network BiSeNet segments the fusion image to generate two output results, namely a segmentation resultAnd auxiliary segmentation result->. Therefore, the semantic loss introduced in this application +.>Including main semantic loss->And auxiliary semantic loss->Semantic loss->Loss of sense->And auxiliary semantic loss->As shown in the formula (6):
(6)
in the formula (6), the amino acid sequence of the compound,representing as balancing factor for balancing the main semantic loss and the auxiliary semantic loss +.>The value is 0.1.
Loss of sense in formula (6)The calculation formula of (2) is shown as a formula (7); auxiliary semantic loss->The calculation formula of (2) is shown as a formula (8); loss of sense->And auxiliary semantic loss->Semantic information contained in the fused image is reflected from different angles;
(7)
(8)
in the formula (7), H and W represent the height and width of the color RGB fusion image, C represents the number of channels of the color RGB fusion image,representative is represented by the label->A transformed one-hot vector, (-)>Represents one-hot vector with height H, width W and channel C,/-, and >Representing the segmentation result of the color RGB fusion image with the height of H, the width of W and the channel of C;
in the formula (8), H and W represent the height and width of the color RGB fusion image, C represents the number of channels of the color RGB fusion image,representative is represented by the label->A transformed one-hot vector, (-)>Auxiliary segmentation result representing segmentation of fusion image obtained by the image fusion network by the semantic segmentation network BiSeNet,/I>Represents one-hot vector with height H, width W and channel C,/-, and>representing the result of the auxiliary segmentation of the color RGB fusion image with height H, width W and channel C.
And (3) testing:
inputting an infrared Image and a visible light Image in a test set of a Fusion network into the Image Fusion network to be transmitted forward once to obtain a color RGB Fusion Image, and then testing the color RGB Fusion Image by using an existing Matlab test module (the acquired website is https:// gilub.com/linear-Tang/Evaluation-for-Image-Fusion), so as to obtain test results shown in table 1 and figure 6; in addition, in order to demonstrate the superiority of the present application in terms of performance with other existing image Fusion methods, the present application also specifically tested seven existing image Fusion methods of the NestFuse image Fusion method (from IEEE TRANSACTIONS ON INSTRUMENTATION AND EASUREMENT), the MDLatLRR image Fusion method (from IEEE Transactions on Image Processing), the Fusion gan image Fusion method (from Information Fusion), the DenseFuse image Fusion method (from IEEE Transactions onImage Processing), the piause image Fusion method (from information Fusion), the SeAFusion image Fusion method (from Information Fusion), the U2Fusion image Fusion method (from IEEETransactions on Pattern Analysis and Machine Intelligence) using the same test strategies, and the test results are shown in table 1 and fig. 6:
In table 1, EN is the amount of information included in the fused image calculated based on the information theory, and a higher value means that the information included in the fused image is more abundant; q (Q) abf For measuring edge information transferred from the source image to the fusion image, the higher the value of which represents the more edge information transferred from the source image to the fusion image; the VIF quantifies the amount of information shared between the fused image F and the source image X based on natural scene statistics and the human visual system (HSV), with higher VIF meaning that the fusion result is more consistent with human visual perception; SD can reflect the contrast and distribution of the fused image, and the human visual system is often more attracted by areas with high contrast, so fusion results with higher SD have better contrast; SF reveals the details and texture information of the fused image by measuring gradient distribution of the fused image, and higher SF means richer edge and texture details; AG measures gradient information of the fusion image and characterizes texture details of the fusion image by the gradient information, and the fusion image has higher AG, which means that the fusion image comprises more rich gradient information, and in addition, the image fusion method is expressed by Ours in a table 1;
As can be seen from fig. 6, the method of the present application can retain more detailed information, and the brightness is brighter than other methods, so as to better conform to subjective vision of a person.
TABLE 1
NestFuse MDLatLRR FusionGAN DenseFuse PIAFusion SeAFusion U2Fusion Ours
EN 5.0401 6.0138 5.2417 5.6527 6.0381 6.2061 4.0655 6.2239
VIF 0.7456 0.7751 0.7732 0.7746 1.0211 1.0457 0.4291 1.0831
AG 2.0501 2.3241 1.3108 1.8081 2.8461 2.8204 1.4139 2.8239
SD 31.1481 29.2763 17.4636 24.9197 38.6515 38.3165 16.5399 38.6664
Qabf 0.4775 0.5064 0.4976 0.4388 0.6555 0.6537 0.2541 0.6817
SF 7.5791 7.4311 4.1852 5.8038 9.7444 9.4031 5.4621 9.6592
As can be seen from table 1:
1) The image fusion method can obtain a higher EN value, and the EN value obtained by the image fusion method is improved by 0.29% compared with that obtained by the SeAFusion image fusion method in the prior art (the EN value obtained by the method in the prior art is the highest), which means that the fused image obtained by the image fusion method in the image fusion process of infrared images and visible light images in low light field scenes can comprise more information;
2) The image fusion method can obtain a higher VIF value, and the VIF value obtained by the image fusion method is improved by 3.82% compared with the SeAFusion image fusion method in the prior art (the VIF value obtained by the method in the existing image fusion method is highest); this shows that in the image fusion method, in the process of carrying out image fusion on the infrared image and the visible light image, the fused image obtained after fusion can effectively improve the overall brightness, and the visual effect is effectively enhanced;
3) The image fusion method can obtain higher Q abf Value, Q obtained by the image fusion method abf The value is compared with the prior art PIAFusion image fusion method (the method obtains Q in the prior art image fusion method abf The highest value) is increased by 3.996%, which indicates that in the image fusion process of the infrared image and the visible light image, the fused image obtained after fusion can obtain richer edge information;
4) The image fusion method can obtain a higher SD value, and the SD value obtained by the image fusion method is improved by 0.038% compared with that obtained by the prior art PIAFusion image fusion method (the SD value obtained by the prior art image fusion method is highest), so that the fused image obtained by the method has better contrast in the process of carrying out image fusion on infrared images and visible light images;
in summary, the fusion image obtained by the image fusion method described in the application includes more source image information, which is very helpful to the subsequent panoramic segmentation task, and meanwhile, the fusion image obtained by the image fusion method described in the application also has higher contrast and richer edge and texture information, and is more in line with the visual perception of human.
In order to verify that the image fusion method described in the present application is better in performance on the subsequent panoramic segmentation advanced task than the other existing image fusion methods described above, the present application specifically performs panoramic segmentation on the fusion image obtained by the image fusion method described in the present application and the other existing image fusion methods described above in the test process, and the result is shown in fig. 7. As can be seen from fig. 7, compared with the fused image obtained by the above-mentioned other existing image fusion methods, the segmentation performance of the fused image obtained by the fused image method of the present application can segment the result that is closer to the original image, especially the two targets of sky (the upper frame in (a) to (i) in fig. 7) and topography (the lower frame in (a) to (i) in fig. 7), which also proves that the fused image obtained by the present application can store richer semantic information, so that the image fusion method proposed by the present application is more beneficial to the implementation of panoramic segmentation task.
In addition, in order to verify the generalization of the image Fusion method, the present application further uses a test set of generalization experiments to perform qualitative and quantitative experiments of generalization experiments on the image Fusion method, the NestFuse image Fusion method, the MDLatLRR image Fusion method, the Fusion gan image Fusion method, the DenseFuse image Fusion method, the piause image Fusion method, the seause image Fusion method, and the U2Fusion image Fusion method, and experimental results are shown in table 2 and fig. 8.
TABLE 2
NestFuse MDLatLRR FusionGAN DenseFuse PIAFusion SeAFusion U2Fusion Ours
EN 7.0466 6.0731 6.5581 6.8192 6.8887 7.1335 6.9366 7.1166
VIF 0.7818 0.6263 0.4105 0.6583 0.7428 0.7042 0.6061 0.7843
AG 3.8485 2.6918 2.421 3.5611 4.4005 4.9802 4.8069 4.9273
SD 41.8759 27.8343 30.6636 34.8251 34.825 44.2436 36.4151 43.781
Qabf 0.5270 0.4389 0.2327 0.4663 0.4463 0.4879 0.4267 0.5712
SF 10.0471 7.9978 6.2753 8.9853 8.8953 12.2525 11.6061 12.3076
The image fusion method described in the present application and the seven existing image fusion methods described above were subjected to generalization experiments using a test set of generalization experiments, and the following results can be seen from table 2:
1) The image fusion method can obtain a higher VIF value, the VIF value obtained by the image fusion method is improved by 5.44% compared with that obtained by the PIAFusion image fusion method in the prior art, which means that the fused image obtained by the image fusion method in the image fusion process of the infrared image and the visible light image can effectively improve the overall brightness and effectively enhance the visual effect;
2) The image fusion method can obtain higher Q on the TNO data set abf Value, Q obtained by the image fusion method abf The value is compared with the value obtained by the NestFuse image fusion method in the prior art (Q obtained by the method in the prior art abf The highest value) is improved by 8.38%, which indicates that in the image fusion process of the infrared image and the visible light image, the fused image obtained after fusion can obtain richer edge information;
3) The image fusion method can obtain a higher SF value on a TNO data set, and the SF value obtained by the image fusion method is improved by 0.45% compared with that obtained by a SeAFusion image fusion method in the prior art (the SF value obtained by the method in the prior image fusion method is highest), which means that the fused image obtained by the image fusion method in the image fusion process of infrared images and visible light images can obtain richer edge and texture details.
The image fusion method disclosed by the application has good generalization performance on a test set of a generalization experiment, can better highlight a target, and has a better visual effect; in addition, the image fusion method can retain richer edge and texture details, and further retain more edge information transferred from the source image to the fusion image.

Claims (7)

1. An infrared and visible light image fusion method based on semantic enrichment and segmentation tasks is characterized in that: the method comprises the following steps:
step S1: acquiring a training set and a testing set of a fusion network and a testing set of a generalization experiment;
Step S2: constructing an image fusion network;
the image fusion network comprises five convolution blocks I, two CISM modules, four fine granularity feature extraction modules, four Concat layers I, a channel splicing layer I and a convolution block II; the first and second convolution blocks I and I are respectively connected with the first Concat layer I and the second Concat layer I, the first and second convolution blocks I and I are also connected with the first CISM module, the first CISM module is also respectively connected with the first Concat layer I and the second Concat layer I, the first Concat layer I and the second Concat layer I are respectively connected with the first and second fine grain feature extraction modules, the first fine grain feature extraction module and the second fine grain feature extraction module are respectively connected with the third and fourth Concat layer I and the fourth Concat layer I, the first fine grain feature extraction module and the second fine grain feature extraction module are also connected with the second CISM module, the second CISM module is respectively connected with the third and fourth Concat layer I and the fourth Concat layer I, the third and fourth Concat layer I are respectively connected with the third and fourth fine grain feature extraction module, the fourth convolution block I and the fourth convolution block I are respectively connected with the third and fourth convolution block I and the fourth convolution block I in turn, the fourth convolution block I is connected with the third and fourth convolution block I;
The CISM module comprises a feature layer, a maximum pooling layer, an average pooling layer, a channel splicing layer II, a convolution module and a Sigmoid layer; the first convolution block I and the second convolution block I are connected with a characteristic layer, the characteristic layer is respectively connected with a maximum pooling layer and an average pooling layer, the maximum pooling layer and the average pooling layer are both connected with a channel splicing layer II, and the channel splicing layer II is sequentially connected with a convolution module and a Sigmoid layer; the Sigmoid layer of the first CISM module is respectively connected with the first Concat layer I and the second Concat layer I; the Sigmoid layer of the second CISM module is respectively connected with the third Concat layer I and the fourth Concat layer I;
the fine-grained feature extraction module comprises a main network, a residual network and a fifth Concat layer II which are arranged in parallel, wherein the main network comprises a first convolution unit II, a first Concat layer II, a first convolution unit I, a second Concat layer II, a second convolution unit I, a third Concat layer II, a third convolution unit I, a fourth Concat layer II and a second convolution unit II which are arranged in sequence and are in dense connection; the residual error network comprises a third convolution unit II, a Sobel edge detection module and a fourth convolution unit II which are sequentially arranged, wherein the second convolution unit II of the main network and the fourth convolution unit II of the residual error network are connected with a fifth Concat layer II;
Step S3: training the constructed image fusion network based on semantic segmentation by utilizing a training set of the fusion network to obtain an image fusion network model; the method specifically comprises the following steps:
step S3-1: converting the visible light image in the training set of the fusion network into a YCbCr image, and then respectively separating a Y channel image of the visible light image, a Cb channel image of the visible light image and a Cr channel image of the visible light image;
s3-2: inputting the infrared image in the training set of the fusion network and the Y-channel image obtained in the step S3-1 into the image fusion network to obtain a single-channel fusion image; then, fusing the single-channel fused image, the Cb channel image and the Cr channel image together to obtain a YCbCr color space, and converting the YCbCr color space into a color RGB fused image; then, carrying out semantic segmentation on the color RGB fusion image by using a semantic segmentation network BiSeNet, calculating the total loss of the image fusion network and the semantic segmentation network BiSeNet, and iteratively updating the parameters of the network by using the total loss, and repeatedly iterating for 4 times to obtain an image fusion network model;
and then, inputting the infrared image and the visible light image in the test set of the fusion network into the image fusion network to obtain a fusion image.
2. The semantic enrichment and segmentation task-based infrared and visible light image fusion method according to claim 1, wherein the method is characterized by comprising the following steps of: the specific mode of the step S1 is as follows: selecting a training set in the MSRS data set as a training set of the fusion network; randomly selecting 20 pairs of registered infrared images and visible light images under a normal light scene and 20 pairs of registered infrared images and visible light images under a low light scene from a test set in an MSRS data set to form a test set of a fusion network; and randomly selecting 20 pairs of registered infrared images and visible light images from the TNO data set to form a test set of generalization experiments.
3. The semantic enrichment and segmentation task-based infrared and visible light image fusion method according to claim 1, wherein the method is characterized by comprising the following steps of: in step S2, the convolution block i includes a convolution layer with a convolution kernel of 3×3 and an lrerlu activation layer.
4. The semantic enrichment and segmentation task-based infrared and visible light image fusion method according to claim 1, wherein the method is characterized by comprising the following steps of: in step S2, the convolution block ii includes a convolution layer with a convolution kernel of 1×1 and a Tanh activation layer.
5. The semantic enrichment and segmentation task-based infrared and visible light image fusion method according to claim 1, wherein the method is characterized by comprising the following steps of: in step S2, the convolution module includes a convolution layer with a convolution kernel of 3×3 and an lrerlu activation layer.
6. The semantic enrichment and segmentation task-based infrared and visible light image fusion method according to claim 1, wherein the method is characterized by comprising the following steps of: in step S2, the convolution unit i includes a convolution layer with a convolution kernel of 3×3 and an lrerlu activation layer.
7. The semantic enrichment and segmentation task-based infrared and visible light image fusion method according to claim 1, wherein the method is characterized by comprising the following steps of: in step S3, total lossIncluding content loss->And semantic loss->The method comprises the steps of carrying out a first treatment on the surface of the And content loss->Including intensity loss->And texture loss->The method comprises the steps of carrying out a first treatment on the surface of the Semantic loss->Including main semantic loss->And auxiliary semantic loss->
CN202311033639.5A 2023-08-17 2023-08-17 Infrared and visible light image fusion method based on semantic enrichment and segmentation tasks Active CN116757988B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311033639.5A CN116757988B (en) 2023-08-17 2023-08-17 Infrared and visible light image fusion method based on semantic enrichment and segmentation tasks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311033639.5A CN116757988B (en) 2023-08-17 2023-08-17 Infrared and visible light image fusion method based on semantic enrichment and segmentation tasks

Publications (2)

Publication Number Publication Date
CN116757988A CN116757988A (en) 2023-09-15
CN116757988B true CN116757988B (en) 2023-12-22

Family

ID=87951801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311033639.5A Active CN116757988B (en) 2023-08-17 2023-08-17 Infrared and visible light image fusion method based on semantic enrichment and segmentation tasks

Country Status (1)

Country Link
CN (1) CN116757988B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117558414B (en) * 2023-11-23 2024-05-24 之江实验室 System, electronic device and medium for predicting early recurrence of multi-tasking hepatocellular carcinoma
CN117876836B (en) * 2024-03-11 2024-05-24 齐鲁工业大学(山东省科学院) Image fusion method based on multi-scale feature extraction and target reconstruction

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934793A (en) * 2019-01-30 2019-06-25 绵阳慧视光电技术有限责任公司 A kind of Real-time image fusion method based on Integer DCT Transform
CN110097528A (en) * 2019-04-11 2019-08-06 江南大学 A kind of image interfusion method based on joint convolution autoencoder network
CN112598675A (en) * 2020-12-25 2021-04-02 浙江科技学院 Indoor scene semantic segmentation method based on improved full convolution neural network
CN112884688A (en) * 2021-02-03 2021-06-01 浙江大华技术股份有限公司 Image fusion method, device, equipment and medium
CN113298192A (en) * 2021-07-07 2021-08-24 思特威(上海)电子科技股份有限公司 Fusion method and device of infrared light image and visible light image and storage medium
CN113592018A (en) * 2021-08-10 2021-11-02 大连大学 Infrared light and visible light image fusion method based on residual dense network and gradient loss
CN113705453A (en) * 2021-08-30 2021-11-26 北京理工大学 Driving scene segmentation method based on thermal infrared attention mechanism neural network
CN113781377A (en) * 2021-11-03 2021-12-10 南京理工大学 Infrared and visible light image fusion method based on antagonism semantic guidance and perception
KR20220016614A (en) * 2020-08-03 2022-02-10 주식회사 인포웍스 OBJECT RECOGNITION SYSTEM FOR COMBINING EO/IR RADAR LiDAR SENSOR BASED ON DEEP NEURAL NETWORK ALGORITHM
WO2022042049A1 (en) * 2020-08-31 2022-03-03 华为技术有限公司 Image fusion method, and training method and apparatus for image fusion model
CN114283104A (en) * 2021-12-29 2022-04-05 中国科学院西安光学精密机械研究所 Multi-spectral-segment image fusion method based on Y-shaped pyramid network
CN115063329A (en) * 2022-06-10 2022-09-16 中国人民解放军国防科技大学 Visible light and infrared image fusion enhancement method and system under low-illumination environment
CN115205651A (en) * 2022-09-16 2022-10-18 南京工业大学 Low visibility road target detection method based on bimodal fusion
US11521377B1 (en) * 2021-10-26 2022-12-06 Nanjing University Of Information Sci. & Tech. Landslide recognition method based on laplacian pyramid remote sensing image fusion
CN115471723A (en) * 2022-09-23 2022-12-13 安徽优航遥感信息技术有限公司 Substation unmanned aerial vehicle inspection method based on infrared and visible light image fusion
CN115620010A (en) * 2022-09-20 2023-01-17 长春理工大学 Semantic segmentation method for RGB-T bimodal feature fusion
CN115713481A (en) * 2022-09-20 2023-02-24 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Complex scene target detection method based on multi-mode fusion and storage medium
CN116091372A (en) * 2023-01-03 2023-05-09 江南大学 Infrared and visible light image fusion method based on layer separation and heavy parameters
CN116363036A (en) * 2023-05-12 2023-06-30 齐鲁工业大学(山东省科学院) Infrared and visible light image fusion method based on visual enhancement
CN116596822A (en) * 2023-05-25 2023-08-15 西安交通大学 Pixel-level real-time multispectral image fusion method based on self-adaptive weight and target perception

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934793A (en) * 2019-01-30 2019-06-25 绵阳慧视光电技术有限责任公司 A kind of Real-time image fusion method based on Integer DCT Transform
CN110097528A (en) * 2019-04-11 2019-08-06 江南大学 A kind of image interfusion method based on joint convolution autoencoder network
KR20220016614A (en) * 2020-08-03 2022-02-10 주식회사 인포웍스 OBJECT RECOGNITION SYSTEM FOR COMBINING EO/IR RADAR LiDAR SENSOR BASED ON DEEP NEURAL NETWORK ALGORITHM
WO2022042049A1 (en) * 2020-08-31 2022-03-03 华为技术有限公司 Image fusion method, and training method and apparatus for image fusion model
CN112598675A (en) * 2020-12-25 2021-04-02 浙江科技学院 Indoor scene semantic segmentation method based on improved full convolution neural network
CN112884688A (en) * 2021-02-03 2021-06-01 浙江大华技术股份有限公司 Image fusion method, device, equipment and medium
CN113298192A (en) * 2021-07-07 2021-08-24 思特威(上海)电子科技股份有限公司 Fusion method and device of infrared light image and visible light image and storage medium
CN113592018A (en) * 2021-08-10 2021-11-02 大连大学 Infrared light and visible light image fusion method based on residual dense network and gradient loss
CN113705453A (en) * 2021-08-30 2021-11-26 北京理工大学 Driving scene segmentation method based on thermal infrared attention mechanism neural network
US11521377B1 (en) * 2021-10-26 2022-12-06 Nanjing University Of Information Sci. & Tech. Landslide recognition method based on laplacian pyramid remote sensing image fusion
CN113781377A (en) * 2021-11-03 2021-12-10 南京理工大学 Infrared and visible light image fusion method based on antagonism semantic guidance and perception
CN114283104A (en) * 2021-12-29 2022-04-05 中国科学院西安光学精密机械研究所 Multi-spectral-segment image fusion method based on Y-shaped pyramid network
CN115063329A (en) * 2022-06-10 2022-09-16 中国人民解放军国防科技大学 Visible light and infrared image fusion enhancement method and system under low-illumination environment
CN115205651A (en) * 2022-09-16 2022-10-18 南京工业大学 Low visibility road target detection method based on bimodal fusion
CN115620010A (en) * 2022-09-20 2023-01-17 长春理工大学 Semantic segmentation method for RGB-T bimodal feature fusion
CN115713481A (en) * 2022-09-20 2023-02-24 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Complex scene target detection method based on multi-mode fusion and storage medium
CN115471723A (en) * 2022-09-23 2022-12-13 安徽优航遥感信息技术有限公司 Substation unmanned aerial vehicle inspection method based on infrared and visible light image fusion
CN116091372A (en) * 2023-01-03 2023-05-09 江南大学 Infrared and visible light image fusion method based on layer separation and heavy parameters
CN116363036A (en) * 2023-05-12 2023-06-30 齐鲁工业大学(山东省科学院) Infrared and visible light image fusion method based on visual enhancement
CN116596822A (en) * 2023-05-25 2023-08-15 西安交通大学 Pixel-level real-time multispectral image fusion method based on self-adaptive weight and target perception

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity;Linfeng Tang等;《Information Fusion》;1-16 *
基于语义分割的红外和可见光图像融合;周华兵等;《计算机研究与发展》;第58卷(第2期);436-443 *
引入独立融合分支的双模态语义分割网络;田乐等;《计算机工程》;第48卷(第8期);240-248, 257 *

Also Published As

Publication number Publication date
CN116757988A (en) 2023-09-15

Similar Documents

Publication Publication Date Title
CN116757988B (en) Infrared and visible light image fusion method based on semantic enrichment and segmentation tasks
WO2021164234A1 (en) Image processing method and image processing device
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
CN112733950A (en) Power equipment fault diagnosis method based on combination of image fusion and target detection
CN109034184B (en) Grading ring detection and identification method based on deep learning
WO2021238420A1 (en) Image defogging method, terminal, and computer storage medium
CN112465727A (en) Low-illumination image enhancement method without normal illumination reference based on HSV color space and Retinex theory
CN114782298B (en) Infrared and visible light image fusion method with regional attention
CN116681636B (en) Light infrared and visible light image fusion method based on convolutional neural network
CN116363036B (en) Infrared and visible light image fusion method based on visual enhancement
CN117392496A (en) Target detection method and system based on infrared and visible light image fusion
CN117391981A (en) Infrared and visible light image fusion method based on low-light illumination and self-adaptive constraint
CN113284061A (en) Underwater image enhancement method based on gradient network
CN115019340A (en) Night pedestrian detection algorithm based on deep learning
CN113628143A (en) Weighted fusion image defogging method and device based on multi-scale convolution
CN112712481A (en) Structure-texture sensing method aiming at low-light image enhancement
CN110415816B (en) Skin disease clinical image multi-classification method based on transfer learning
CN117173595A (en) Unmanned aerial vehicle aerial image target detection method based on improved YOLOv7
Moghimi et al. A joint adaptive evolutionary model towards optical image contrast enhancement and geometrical reconstruction approach in underwater remote sensing
CN116468625A (en) Single image defogging method and system based on pyramid efficient channel attention mechanism
CN113506230B (en) Photovoltaic power station aerial image dodging processing method based on machine vision
Chen et al. GADO-Net: an improved AOD-Net single image dehazing algorithm
CN111932469A (en) Significance weight quick exposure image fusion method, device, equipment and medium
CN110796716A (en) Image coloring method based on multiple residual error networks and regularized transfer learning
CN111832433B (en) Device for extracting object characteristics from image and working method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant