CN115331112A - Infrared and visible light image fusion method and system based on multi-granularity word elements - Google Patents

Infrared and visible light image fusion method and system based on multi-granularity word elements Download PDF

Info

Publication number
CN115331112A
CN115331112A CN202211054722.6A CN202211054722A CN115331112A CN 115331112 A CN115331112 A CN 115331112A CN 202211054722 A CN202211054722 A CN 202211054722A CN 115331112 A CN115331112 A CN 115331112A
Authority
CN
China
Prior art keywords
image
visible light
infrared
fusion
granularity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211054722.6A
Other languages
Chinese (zh)
Inventor
窦浩
伍政华
代泽洋
倪向东
李静
王潇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 38 Research Institute
Original Assignee
CETC 38 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 38 Research Institute filed Critical CETC 38 Research Institute
Priority to CN202211054722.6A priority Critical patent/CN115331112A/en
Publication of CN115331112A publication Critical patent/CN115331112A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Radiation Pyrometers (AREA)
  • Photometry And Measurement Of Optical Pulse Characteristics (AREA)
  • Image Processing (AREA)

Abstract

The invention provides an infrared and visible light image fusion method and system based on multi-granularity lemmas, which comprises the following steps: s1, acquiring an infrared image and a visible light image, and decomposing the infrared image and the visible light image on not less than 2 different scales respectively; extracting multi-granularity word global features: respectively calculating the long-range dependence relationship of the infrared image and the visible light image through not less than 2 independent Transformer branches; utilizing a preset logic design loss function to supervise and train a preset multi-granularity word element fusion model; and fusing the infrared image and the visible light image through a multi-granularity word element fusion module to obtain a multi-granularity word element fusion output image. The invention solves the technical problems of poor performance, high complexity and limited feature extraction and representation of the fusion model.

Description

Infrared and visible light image fusion method and system based on multi-granularity word elements
Technical Field
The invention relates to the field of image fusion, in particular to the technical field of multi-type image fusion.
Background
To comprehensively describe real-world scenes, the combination of multi-source images acquired by different sensors is a key to the application. Therefore, infrared and visible image fusion has been widely used for information acquisition or analysis, such as military affairs, public security, smart city, etc. The infrared sensor is intended to capture thermal radiation emitted by the heat source, enhancing the thermal infrared target. However, the infrared sensor cannot detect the presence of detail or texture information in the background because the objects in the background have almost the same thermal information. And the visible light sensor captures reflected light to generate an image, so that abundant background texture or detail information can be saved. Therefore, image fusion can simultaneously synthesize a single information image using the advantages of multi-source images, and many fusion methods including a conventional method and a deep learning-based method have been proposed. The traditional method mainly extracts features through mathematical transformation and then combines the features through designing a fusion strategy. Traditional methods include multi-scale transformation methods, sparse representation-based methods, saliency-based methods, subspace-based methods, and other hybrid methods. Conventional fusion methods aim to obtain a satisfactory fused image and to some extent satisfy certain applications. However, the traditional fusion method still has a bottleneck.
For example, the prior invention patent application document CN110332934A, the patent application document "a system and a method for tracking a track of a robot under hybrid network fusion" includes a visual image sensor, a wireless transmitter, a wireless receiver, a visible light source, an optical communication device, an inertial navigation device, and a hybrid data fusion unit. A visual image sensor, a wireless transmitter and a visible light source are installed in a moving area of the robot, a wireless receiver, an optical communication device and an inertial navigation device are installed on a robot body, and a hybrid data processing unit in a control center is used for resolving and evaluating track tracking data. This multisource miscellaneous data that prior art fuses multisensor and obtains can learn from the concrete implementation mode of this prior art scheme, and this prior art's visible light image is only used for thick unit, later corrects many times with the data that other multisource sensors gathered, and the difference of multisource image has been neglected in single expression, has restricted the performance of fusing the model, and this prior art's mode of fusion mainly is the combination of various kinds of data processing process simultaneously, has improved the algorithm complexity of this prior art. Furthermore, conventional fusion methods also require complex designed fusion strategies. Therefore, deep Learning (DL) is introduced to solve these tasks.
The method based on deep learning has the nonlinear fitting capability and can model the complex correlation of the source images. The DL-based fusion method may be classified into a CNN-based fusion method and a generation countermeasure network (GAN) -based fusion method according to a difference of a fusion framework. The CNN-based method utilizes parallel convolution kernels to extract multiple features, and uses a fine loss function to reconstruct a fusion result. In addition, the GAN framework can also perform an image fusion task, and the framework designs a antagonistic game to simulate the distribution of the source images. Typically, ma et al first fuse the infrared image and the visible image using GAN, and then propose various GAN-based methods such as a dual discriminator GAN method and a multi-class constraint GAN method.
For example, in the prior invention patent application publication No. CN114240736A, "method for simultaneously generating and editing arbitrary face attributes based on VAE and cGAN", we focus on the encoder-decoder architecture based on a Variational Automatic Encoder (VAE) and a conditional antagonistic neural network (cGAN), and develop a bidirectional feedback generation network for simultaneously generating a new face and performing attribute editing. And using attribute classification constraints on the generated image to ensure the correct change of the specified attributes, and generating a human face image with a plurality of attributes by sampling attribute codes from a hidden space. The method includes modeling attribute strengths to support attribute interpolation and flexible handling of multiple face attributes. The CNN or GAN-based methods in the prior art use convolution operation to extract image features in a small-range perceptual domain, and the unified convolution operation also limits the extraction and representation of features. In addition, the CNN or GAN based method aims at extracting local features through convolution kernel, but this kind of existing scheme also has a defect that long-range dependency information cannot be extracted. Therefore, transformers can be introduced to solve these problems. For example, li et al and Vibashan et al combine Transformer and CNN to extract local features and long-range dependency information of images.
However, existing Transformer-based methods ignore the difference in attention weights of infrared and visible lemmas at the same location, which affects fusion performance because the importance of infrared and visible lemmas at the same location is different.
In summary, the prior art has the technical problems of poor performance of a fusion model, high complexity and limited feature extraction and representation.
Disclosure of Invention
The invention aims to solve the technical problems of poor performance, high complexity and limited feature extraction and representation of a fusion model in the prior art.
The invention adopts the following technical scheme to solve the technical problems: the infrared and visible light image fusion method based on the multi-granularity lemmas comprises the following steps:
s1, acquiring an infrared image and a visible light image, and decomposing the infrared image and the visible light image on not less than 2 different scales respectively;
s2, multi-granularity word element global feature extraction: respectively calculating the long-range dependency relationship of the infrared image and the visible light image through not less than 2 independent Transformer branches, wherein not less than 2 Transformer models are designed in each independent Transformer branch to extract the comprehensive multi-scale long-range dependency relationship, and the step S2 comprises the following steps:
s21, dividing the infrared image and the visible light image into multi-scale subarea batch;
s22, converting the infrared image and the visible light image into an infrared sequence
Figure BDA0003825080530000021
And visible light sequence
Figure BDA0003825080530000022
S23, embedding and processing infrared sequence by utilizing preset linear projection E
Figure BDA0003825080530000031
And visible light sequence
Figure BDA0003825080530000032
And adding coded position information into each sequence to obtain coded infrared sequence
Figure BDA0003825080530000033
And encoding the visible light sequence
Figure BDA0003825080530000034
S24, utilizing the full connection layer to preset the embedded logic pair to code the infrared sequence
Figure BDA0003825080530000035
And encoding the visible light sequence
Figure BDA0003825080530000036
Performing embedding operation to obtain a relation extraction parameter;
s25, extracting parameters by using a multi-head self-attention mechanism MSA (multiple advanced self-attention mechanism) according to a preset logic processing relation, extracting a long-range dependency relationship from the infrared image and the visible light image, and acquiring multi-head self-attention fusion parameters according to the long-range dependency relationship, wherein the multi-head self-attention fusion parameters comprise: infrared morpheme
Figure BDA0003825080530000037
And visible light lemma
Figure BDA0003825080530000038
S3, designing a loss function by using preset logic, and supervising and training a preset multi-granularity word element fusion model according to the loss function;
s4, fusing the infrared image and the visible light image through the multi-granularity lemma fusion module to obtain a multi-granularity lemma fusion output image, wherein the step S4 comprises the following steps:
s41, defining logic with preset weight to obtain learnable attention weight, and capturing infrared morpheme by using preset relation to capture logic
Figure BDA00038250805300000312
And the visual light word element
Figure BDA00038250805300000313
Multi-granularity lemma relevance of (2);
and S42, processing the multi-granularity lemma correlation and the difference scale characteristics by using preset reconstruction logic to obtain a multi-granularity lemma fusion output image.
According to the method, the relevance of the corresponding lemmas is captured by introducing the learnable attention weight, so that the infrared image and the visible light image are fused under the multi-granularity lemma dimension, and the method can sense the difference of multi-modal information of the infrared and visible light lemmas at the same position. The invention extracts the long-range dependency relationship of each image on multiple scales, embeds multi-granularity lemmas by locally dividing sub-images on multiple scales, reconstructs multi-scale characteristics by utilizing multi-granularity fusion, and calculates the fusion image by dimension reduction mapping operation. The method can realize the fusion of the infrared image and the visible light image under the multi-scale lemma dimension based on the pure Transformer, and has better fusion performance compared with other schemes.
In a more specific embodiment, step S21 includes: the infrared image is divided into N P's per scale using the following logic s 2 Size multiscale partition region patch:
Figure BDA0003825080530000039
where s is a scale applied to the divided image, and s is defined as 1, 2, and 3, respectively.
In a more specific embodiment, in step S23, the infrared sequence is embedded and processed by using the preset linear projection E with the following logic
Figure BDA00038250805300000310
And visible light sequence
Figure BDA00038250805300000311
And adding coded position information to each sequence:
Figure BDA0003825080530000041
in the formula (I), the compound is shown in the specification,
Figure BDA0003825080530000042
and
Figure BDA0003825080530000043
representing the coded sequences of the two original images at different scales of s.
In a more specific embodiment, in step S24, the full connectivity layer is used to encode the ir sequence with the following logic pairs
Figure BDA0003825080530000044
And encoding the visible light sequence
Figure BDA0003825080530000045
Carrying out embedding operation:
Figure BDA0003825080530000046
in the formula (I), the compound is shown in the specification,
Figure BDA0003825080530000047
and
Figure BDA0003825080530000048
query, key and value representing infrared and visible image sequences, LN representing fully connected layers.
In a more specific technical solution, in step S25, the relationship extraction parameters are processed with the following logic to extract the long-range dependency relationship from the infrared image and the visible light image:
Figure BDA0003825080530000049
in the formula (I), the compound is shown in the specification,
Figure BDA00038250805300000410
and
Figure BDA00038250805300000411
is the output of the MSA.
The method expands the fusion to the fusion based on multi-granularity lemmas so as to extract the multi-scale long-range dependency relationship of each source image and capture the attention relevance of the corresponding lemmas under different scales. In addition, local region features can be extracted based on the fusion of the lemmas, and the lemmas embedded in the local segmentation subimages contain the local region features, so that the model fusion performance is optimized, and the representation degree of the fusion image features is improved.
In a more specific technical solution, step S3 includes:
s31, calculating the loss of the infrared image and the fused image in an intensity domain by utilizing the following logics:
Figure BDA00038250805300000412
s32, obtaining the loss L by the following logic processing 1 And total loss, thereby preserving the details and brightness information of the visible light image:
Figure BDA00038250805300000413
Figure BDA00038250805300000414
in the formula, L represents the total loss value, M represents the total number of pixels, f, I and V respectively represent the fusion result, the infrared image and the visible light image, | | · | calcualting F Representing the matrix Forbenius norm, with λ being the equilibrium parameter.
The invention extracts the long-range dependence relationship between the infrared image and the visible light image through two independent Transformer branches, and the intensity loss of the infrared image and the visible light image L are passed through the fusion module 1 And the optimal fusion result is searched for by loss, more details and brightness information of the visible light image are retained, the fusion of the infrared image and the visible light image is finally realized, and the image fusion precision and the fusion effect are optimized.
In a more specific technical solution, step S41 includes:
s411, utilizing the following logic to capture infrared word elements by the learnable attention weight
Figure BDA0003825080530000051
And the visual light word element
Figure BDA0003825080530000052
The multi-granularity lemma relevance of (2):
Figure BDA0003825080530000053
in the formula, f represents the characteristics calculated by fusing the lemma under the s scale, R represents the remodeling operation,
Figure BDA0003825080530000054
and
Figure BDA0003825080530000055
representing infrared lemma
Figure BDA0003825080530000056
And visible light word element
Figure BDA0003825080530000057
Learnable attention weight of (a);
s412, balancing the infrared lemmas
Figure BDA0003825080530000058
And visible light word element
Figure BDA0003825080530000059
Importance at the same location.
In a more specific embodiment, in step S412, the infrared lemma is balanced by using the following logic
Figure BDA00038250805300000510
And visible light word element
Figure BDA00038250805300000511
Importance at the same location:
Figure BDA00038250805300000512
in a more specific technical solution, in step S42, the following logic is used to process the correlation between multiple-granularity lemmas and the difference scale features to obtain a multi-granularity lemma fusion output image:
f=g(f 1 ,…f s ),
where f denotes the fused image, g denotes the dimensionality reduction mapping operation of the convolutional layer, and the kernel is 1 × 1.
In a more specific technical scheme, the infrared and visible light image fusion system based on multi-granularity lemmas comprises:
the difference scale decomposition module is used for acquiring the infrared image and the visible light image and decomposing the infrared image and the visible light image on not less than 2 difference scales respectively;
the multi-granularity morphological global feature extraction module is used for respectively calculating the long-range dependency relationship of the infrared image and the visible light image through not less than 2 independent transform branches, wherein not less than 2 transform models are designed in each independent transform branch to extract the comprehensive multi-scale long-range dependency relationship, the multi-granularity morphological global feature extraction module is connected with the difference scale decomposition module, and the multi-granularity morphological global feature extraction module comprises:
the image segmentation module is used for segmenting the infrared image and the visible light image into multi-scale subareas patch;
an image conversion module for converting the infrared image and the visible light image into an infrared sequence
Figure BDA0003825080530000061
And visible light sequence
Figure BDA0003825080530000062
The image conversion module is connected with the image segmentation module;
linear projection embedding module for embedding processing infrared sequence by using preset linear projection E
Figure BDA0003825080530000063
And visible light sequence
Figure BDA0003825080530000064
And adding coded position information into each sequence to obtain coded infrared sequence
Figure BDA0003825080530000065
And encoding the visible light sequence
Figure BDA0003825080530000066
The linear projection embedding module is connected with the image conversion module;
a relation extraction module for encoding the infrared sequence with preset embedded logic pairs by using the full connection layer
Figure BDA0003825080530000067
And weavingCode visible light sequence
Figure BDA0003825080530000068
Embedding operation is carried out to obtain a relation extraction parameter, and the relation extraction module is connected with the linear projection embedding module;
the multi-head self-attention fusion module is used for extracting parameters by utilizing a multi-head self-attention mechanism MSA (Multi-head self-attention mechanism) and using a preset logic processing relation to extract a long-range dependency relation from an infrared image and a visible light image so as to obtain multi-head self-attention fusion parameters, wherein the multi-head self-attention fusion parameters comprise: infrared word element
Figure BDA0003825080530000069
And visible light word element
Figure BDA00038250805300000610
The multi-head self-attention fusion module is connected with the relation extraction module;
the training module of the word element fusion model is used for monitoring and training a preset multi-granularity word element fusion model by utilizing a preset logic design loss function, and the training module of the word element fusion model is connected with the multi-head self-attention fusion module;
the multi-granularity lemma fusion output module is used for fusing the infrared image and the visible light image through the multi-granularity lemma fusion module so as to obtain a multi-granularity lemma fusion output image, the multi-granularity lemma fusion output module is connected with the lemma fusion model training module, and the multi-granularity lemma fusion output module comprises:
a word element correlation module for obtaining learnable attention weight value by preset weight definition logic and capturing infrared word element by using preset relation capture logic
Figure BDA00038250805300000611
And the visual light word element
Figure BDA00038250805300000612
The multi-granularity lemma relevance of (2);
and the image reconstruction module is used for processing the multi-granularity word element correlation and the difference scale characteristics by utilizing preset reconstruction logic so as to obtain a multi-granularity word element fusion output image, and is connected with the word element correlation module.
Compared with the prior art, the invention has the following advantages: according to the method, the relevance of the corresponding lemmas is captured by introducing the learnable attention weight, so that the infrared image and the visible light image are fused under the multi-granularity lemma dimension, and the method can sense the difference of multi-modal information of the infrared and visible light lemmas at the same position. The invention extracts the long-range dependency relationship of each image on multiple scales, embeds multi-granularity lemmas by locally dividing sub-images on multiple scales, reconstructs multi-scale characteristics by utilizing multi-granularity fusion, and calculates the fusion image by dimension reduction mapping operation. The method can realize the fusion of the infrared image and the visible light image under the multi-scale lemma dimension based on the pure Transformer, and has better fusion performance compared with other schemes.
The method expands the fusion to the fusion based on multi-granularity lemmas so as to extract the multi-scale long-range dependency relationship of each source image and capture the attention relevance of the corresponding lemmas under different scales. In addition, local region features can be extracted based on the fusion of the word elements, and the word elements embedded in the local segmentation subimages contain the local region features, so that the model fusion performance is optimized, and the characterization degree of the fusion image features is improved.
The invention extracts the long-range dependence relationship between the infrared image and the visible light image through two independent Transformer branches, and the intensity loss of the infrared image and the visible light image L are passed through the fusion module 1 And searching for the optimal fusion result of the infrared image and the visible image in a loss mode, and simultaneously keeping more details and brightness information of the visible image to finally realize the fusion of the infrared image and the visible image. Meanwhile, the image fusion precision and the fusion effect are optimized. The method solves the technical problems of poor performance, high complexity and limited feature extraction and representation of the fusion model in the prior art.
Drawings
Fig. 1 is a schematic diagram of a transform-based infrared and visible light image fusion framework in a multi-granularity lemma-based infrared and visible light image fusion method according to embodiment 1 of the present invention;
FIG. 2 is a schematic diagram of basic steps of a multi-granularity lemma-based infrared and visible light image fusion method according to embodiment 1 of the present invention;
fig. 3 is a schematic diagram of multi-granularity lemma global feature extraction in embodiment 1 of the present invention;
FIG. 4 is a diagram illustrating a specific step of multi-granularity lemma fusion in embodiment 1 of the present invention;
FIG. 5 is a comparison of the results of the ablation experiment with the learnable attention module of example 2 of the present invention;
FIG. 6 is a comparison chart of the results of the ablation experiments of the multi-granularity lemma module in example 2 of the present invention;
FIG. 7 is a schematic diagram showing the comparison of the processing effects of the methods on the TNO data set of example 2 of the present invention;
fig. 8a is a schematic diagram of a first index analysis of a fused image obtained by each method on the TNO data set according to embodiment 2 of the present invention;
FIG. 8b is a schematic diagram of a second index analysis of the fused image of each method on the TNO data set according to embodiment 2 of the present invention;
fig. 8c is a schematic diagram of a third index analysis of the fused image obtained by the methods in the TNO data set according to embodiment 2 of the present invention;
fig. 8d is a diagram illustrating a fourth index analysis of an image fused by each method on the TNO data set according to embodiment 2 of the present invention;
fig. 8e is a schematic diagram of a fifth index analysis of the fused image obtained by the methods in the TNO data set according to embodiment 2 of the present invention;
fig. 8f is a schematic diagram of a sixth index analysis of fused images of each method on the TNO data set according to embodiment 2 of the present invention;
fig. 9 is a diagram of an image fusion effect of each method of a night scene on a Roadscene data set in embodiment 2 of the present invention;
fig. 10 is a comparison graph of the image fusion effect of the methods of nighttime scenes on the LLVIP data set in embodiment 2 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
Example 1
As shown in fig. 1, the infrared and visible light image fusion framework based on Transformer related to the infrared and visible light image fusion method based on multi-granularity lemmas provided by the present invention. Extracting the long-range dependence relationship between the infrared image and the visible light image through two independent Transformer branches, and obtaining the visible light image L and the intensity loss of the infrared image in the fusion module 1 And finding the optimal fusion result by loss, and finally realizing the fusion of the infrared image and the visible light image.
In this embodiment, the method for fusing infrared and visible light images based on multi-granularity lemmas includes the following steps:
s1, image multi-scale decomposition: the infrared image and the visible light image are decomposed on three different scales respectively to obtain three different scale branches.
S2, extracting global features of multiple granularity word elements;
as shown in fig. 3, in this embodiment, the step S2 further includes the following specific steps:
s21, dividing the infrared image and the visible light image into multi-scale subarea batch;
s22, converting the infrared image and the visible light image into an infrared sequence
Figure BDA0003825080530000081
And visible light sequence
Figure BDA0003825080530000082
S23, embedding and processing infrared sequence by utilizing preset linear projection E
Figure BDA0003825080530000083
And visible light sequence
Figure BDA0003825080530000084
And adding coded position information into each sequence to obtain coded infrared sequence
Figure BDA0003825080530000085
And encoding the visible light sequence
Figure BDA0003825080530000086
S24, utilizing the full connection layer to code the infrared sequence by preset embedded logic pairs
Figure BDA0003825080530000087
And encoding the visible light sequence
Figure BDA0003825080530000088
Performing embedding operation to obtain a relation extraction parameter;
s25, extracting parameters by utilizing multi-head self-attention mechanism MSA processing to preset a logic processing relation, extracting a long-range dependency relation from the infrared image and the visible light image, and obtaining multi-head self-attention fusion parameters according to the long-range dependency relation, wherein the multi-head self-attention fusion parameters comprise: infrared morpheme
Figure BDA0003825080530000089
And visible light word element
Figure BDA00038250805300000810
In this embodiment, the long-range dependence of the infrared and visible images is calculated by two independent transform branches, respectively. In order to comprehensively extract the multi-scale long-range dependency relationship, 3 Transformer models are designed in each branch. Given an infrared image (I) and a visible light image (V), wherein I, V E R H×W×C H, W and C respectively represent the height, width and channel size of the source image, the invention firstly divides the two source images into multiple scalesDegree patch, dividing the infrared image into N P pieces on each scale s 2 Size of
Figure BDA0003825080530000091
Where s is the scale applied to the segmented image and s is defined as 1, 2, and 3, respectively. In addition, in the operation of the present invention, P S Also satisfies P 1 =P 2 =P 3 . Based on this, the infrared and visible light images are converted into a sequence
Figure BDA0003825080530000092
And
Figure BDA0003825080530000093
then embedded by linear projection E
Figure BDA0003825080530000094
And
Figure BDA0003825080530000095
and incorporating coded position information in each sequence
Figure BDA0003825080530000096
And
Figure BDA0003825080530000097
the expression is shown in formula (1):
Figure BDA0003825080530000098
in the formula (I), the compound is shown in the specification,
Figure BDA0003825080530000099
and
Figure BDA00038250805300000910
representing the coded sequence of the two original images at different scales of s.
In addition, the invention utilizes the full connecting layer
Figure BDA00038250805300000911
And
Figure BDA00038250805300000912
embedding into query, key and value, as follows:
Figure BDA00038250805300000913
wherein
Figure BDA00038250805300000914
And
Figure BDA00038250805300000915
query, key and value, LN, representing infrared and visible image sequences, represent fully connected layers. In addition, the invention utilizes a multi-head self attention Mechanism (MSA) to extract the long-range dependence relationship between the infrared image and the visible light image, and the expression thereof is shown as a formula (3):
Figure BDA00038250805300000916
wherein
Figure BDA00038250805300000917
And
Figure BDA00038250805300000918
is the output of the MSA, applied to the next fusion stage.
S3, setting a loss function: a loss function is designed to supervise the training of the proposed method to simulate the data distribution of the original image. Since the infrared image is obtained by capturing thermal radiation, the content present in the infrared image is characterized by pixel intensity. Thus calculating the loss of the infrared image and the fused image in the intensity domain, i.e.
Figure BDA00038250805300000919
At the same time, the visible light is transmittedThe sensor describes the scene by capturing the reflected light, using L in order to retain more detail and brightness information of the visible light image 1 The loss is used to constrain the fused image to have a data distribution similar to that of the visible light image, defined as
Figure BDA00038250805300000920
The total loss function is shown in equation (4):
Figure BDA0003825080530000101
wherein L represents the total loss value, M represents the total number of pixels, f, I and V respectively represent the fusion result, the infrared image and the visible light image, | | · | | F The matrix Forbenius norm is expressed and lambda is designed to balance the two terms.
As shown in fig. 4, step S4, multi-granularity lemma fusion: and fusing the infrared image and the visible light image through a multi-granularity word element fusion module. In this embodiment, step S4 further includes:
in this embodiment, a learnable attention weight is introduced to capture
Figure BDA0003825080530000102
And
Figure BDA0003825080530000103
their definition is shown in equation (5):
Figure BDA0003825080530000104
wherein f represents the characteristics calculated by fusing the lemma under the s scale, R represents the reshaping operation,
Figure BDA0003825080530000105
and
Figure BDA0003825080530000106
representing infrared lemma
Figure BDA0003825080530000107
And visible light word element
Figure BDA0003825080530000108
The learnable attention weight of (1), the present invention defines
Figure BDA0003825080530000109
To balance the importance of infrared and visible lemmas being co-located.
The invention reconstructs the fused image by using the characteristics of different scales, and the definition is shown as a formula (6):
f=g(f 1 ,…f s ),(6)
where f represents the fused image, g represents the dimensionality reduction mapping operation of the convolutional layer, and the kernel is 1 × 1.
Example 2
The effectiveness of the fusion strategy of the invention is proved by ablation experiments, and qualitative and quantitative comparison and generalization experiments are carried out on three public data sets.
A. Ablation experiment
1) Ablation analysis of the learnable attention weight: the invention introduces the learnable attention to estimate the importance of the corresponding word element in the fusion process based on the word element. Therefore, models were trained by discarding learnable attention weights (no weights), and compared to prove their validity.
As shown in fig. 5, it can be seen that the learnable attention weight plays a key role in the lemma-based fusion, for example, in the red square of fig. 5, the fusion model added with the learnable attention weight contains more detail and edge information, which proves the effectiveness of the learnable attention weight.
2) Multi-granularity word element fusion ablation analysis: in the subject, the invention fuses infrared images and visible light images, extracts multi-scale long-range dependency, and captures attention relevance of corresponding word elements under different sizes. Thus, fusion model comparisons can be trained by removing multi-granular operations (no multi-granular) to prove their effectiveness.
As shown in fig. 6, in the present embodiment, the thermal infrared details are richer and have better visual effect after the multi-granularity module is introduced, such as the area existing in the red square in fig. 6. In addition, compared with the method that more background texture information is captured when no multi-granularity lemmas exist, the method shows the rationality and the necessity of multi-granularity lemma fusion.
B. Comparative experiment
In the experiments of the present invention, the present invention evaluated the method of the present invention in a qualitative and quantitative manner. In qualitative analysis, the fusion result is evaluated by using a human visual system, and the analysis is mainly performed from the aspects of brightness, sharpness, contrast and the like. In quantitative analysis, these methods were evaluated using 6 evaluation indexes. Metrics include Mutual Information (MI), standard Deviation (SD), average Gradient (AG), spatial Frequency (SF), edge strength (EI), and peak signal-to-noise ratio (PSNR). The purpose of the fused image is to measure the information retained by the fused image from the source image. Wherein SD can reflect the contrast of the fused image, and AG mainly measures the texture information of the fused image. The SF can measure the gray change rate of the fused image and reflect the definition of the image. EI may be used to evaluate edge information of the fused image. For the above indices, the larger the value, the better the performance of the fusion method.
1) And (3) qualitative analysis: in comparative experiments, the present invention compared the method of the present invention with 8 fusion methods on 35 TNO image pairs.
As shown in fig. 7, the conventional method, the CNN-based method, and the GAN-based method cannot maintain clear background texture information, while the Transformer-based method performs better in maintaining background details, and the method of the present invention maintains more details and texture information than the conventional method (CGTF) that combines CNN and Transformer. For example, the area of the invention enlarged with red squares may reflect the most clear edges and details of the results of the invention. In addition, compared with other methods, the method can also save more obvious thermal infrared information. The present invention selects a region from each image and scales them in green squares, which illustrates that the processing results of the present invention are brighter in the face region than other methods.
2) Quantitative analysis: in a quantitative experiment, 35 TNO image pairs are selected and compared, and six indexes are selected for objectively evaluating the method. As can be seen from Table 1, the method of the present invention performed well on all 6 criteria.
As shown in fig. 8, the figure shows more detail for quantitative analysis of 35 image pairs. The maximum MI value represents that the method retains rich original image information, and the maximum SD value represents that the method has higher contrast ratio than other methods. In addition, the values of the model of the present invention on AG, SF, EI and PSNR are also the largest, which indicates that the method of the present invention retains more texture and detail information with less noise.
Table 1: indicators of different scenarios on the TNO dataset
Figure BDA0003825080530000121
C. Generalization experiment
In order to verify the generalization ability of the proposed model, in addition to the TNO data set, the invention selects 100 infrared and visible light images of the Roadscene data set and LLVIP data set respectively for qualitative and quantitative experiments.
As shown in fig. 9, on the Roadscene data set, through qualitative experiments, it can be seen that the method of the present invention not only retains more background textures on a night scene, but also has more obvious thermal infrared characteristics and details; more thermal infrared characteristics and details can be retained on daytime scenes, with night time scene comparison results as shown in fig. 9. Also by quantitative experiments, it can be seen from table 2 that the results of the present invention are the largest in all of MI, SD, AG, EI and SF and still acceptable in PSNR.
Table 2: indicators of different schemes on the Roadscene dataset
Figure BDA0003825080530000122
Figure BDA0003825080530000131
As shown in fig. 10, on LLVIP data set, through qualitative experiments, it can be seen that the method of the present invention contains more details and higher contrast on nighttime scenes, which can also provide clearer outlines and edges than other methods; a certain amount of texture and prominent infrared targets can be saved simultaneously on a day scene, wherein a night scene comparison result is shown in fig. 8. Meanwhile, through quantitative experiments, it can be seen from table 3 that the results of the present invention are the greatest in values of MI, SD, AG, EI, SF and PSNR, which can prove the superiority of the method of the present invention.
Table 3: indicators of different scenarios on LLVIP dataset
Figure BDA0003825080530000132
D. Efficiency comparison
In this work, the present invention also makes efficiency comparisons by providing an average run time of each method over three data sets. The traditional method is realized by a CPU, and other methods are realized by a GPU. As can be seen from table 4, the conventional MSVD and wavelet based methods are less time consuming than most DL based methods, and the Transformer based methods are more time consuming than CNN and GAN based methods.
Table 4: average run time of different methods on three datasets
Figure BDA0003825080530000141
By combining the above experiments, it can be seen through the ablation experiments that it is reasonable and necessary to introduce the learnable attention weight and the multi-granularity lemma fusion module. As can be seen from comparison experiments and generalization experiments, the method provided by the invention has certain advantages in both quantitative and qualitative aspects. The method has certain advantages in computational efficiency compared with the same CGTF based on a Transformer framework, but is far lower than methods based on CNN and GAN, and due to the excellent processing effect, the method has a great application scene.
According to the method, the relevance of the corresponding lemmas is captured by introducing the learnable attention weight, so that the infrared image and the visible light image are fused under the dimension of the multi-granularity lemmas, and the method can sense the difference of multi-modal information of the infrared and visible light lemmas at the same position. The invention extracts the long-range dependency relationship of each image on multiple scales, embeds multi-granularity lemmas by locally dividing sub-images on multiple scales, reconstructs multi-scale characteristics by utilizing multi-granularity fusion, and calculates the fusion image by dimension reduction mapping operation. The method can realize the fusion of the infrared image and the visible light image under the multi-scale lemma dimensionality based on the pure Transformer, and has better fusion performance compared with other schemes.
The method expands the fusion to the fusion based on multi-granularity lemmas so as to extract the multi-scale long-range dependency relationship of each source image and capture the attention relevance of the corresponding lemmas under different scales. In addition, local region features can be extracted based on the fusion of the word elements, and the word elements embedded in the local segmentation subimages contain the local region features, so that the model fusion performance is optimized, and the characterization degree of the fusion image features is improved.
The invention extracts the long-range dependence relationship between the infrared image and the visible light image through two independent Transformer branches, and the intensity loss of the infrared image and the visible light image L are obtained in a fusion module 1 And finding the optimal fusion result of the infrared image and the visible image through loss, and simultaneously keeping more details and brightness information of the visible image to finally realize the fusion of the infrared image and the visible image.
Meanwhile, the image fusion precision and the fusion effect are optimized. The method solves the technical problems of poor performance, high complexity and limited feature extraction and representation of the fusion model in the prior art.
The above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. The infrared and visible light image fusion method based on the multi-granularity lemmas is characterized by comprising the following steps:
s1, acquiring an infrared image and a visible light image, and decomposing the infrared image and the visible light image on not less than 2 different scales respectively;
s2, extracting multi-granularity word global features: calculating the long-range dependency relationship of the infrared image and the visible light image respectively through not less than 2 independent transform branches, wherein not less than 2 transform models are designed in each independent transform branch to extract a comprehensive multi-scale long-range dependency relationship, and the step S2 comprises:
s21, dividing the infrared image and the visible light image into multi-scale subareas patch;
s22, converting the infrared image and the visible light image into an infrared sequence
Figure FDA0003825080520000011
And visible light sequence
Figure FDA0003825080520000012
S23, embedding and processing the infrared sequence by utilizing preset linear projection E
Figure FDA0003825080520000013
And the visible light sequence
Figure FDA0003825080520000014
And adding coded position information into each sequence to obtain coded infrared sequence
Figure FDA0003825080520000015
And encoding the visible light sequence
Figure FDA0003825080520000016
S24, utilizing the full connection layer to preset the embedded logic to the coded infrared sequence
Figure FDA0003825080520000017
And the coded visible light sequence
Figure FDA0003825080520000018
Performing embedding operation to obtain a relation extraction parameter;
s25, processing the relationship extraction parameters by preset logic by using a multi-head self-attention mechanism MSA to extract the long-range dependency relationship from the infrared image and the visible light image so as to obtain multi-head self-attention fusion parameters, wherein the multi-head self-attention fusion parameters comprise: infrared word element
Figure FDA0003825080520000019
And visible light token TokenV s j
S3, designing a loss function by using preset logic, and supervising and training a preset multi-granularity word element fusion model according to the loss function;
s4, fusing the infrared image and the visible light image through a multi-granularity lemma fusion module to obtain a multi-granularity lemma fusion output image, wherein the step S4 comprises the following steps of:
s41, defining logic with preset weight to obtain learnable attention weight, and capturing the infrared word element by using preset relation capture logic
Figure FDA00038250805200000110
And the visible light lemma TokenV s j The multi-granularity lemma relevance of (2);
and S42, processing the correlation and the difference scale characteristics of the multi-granularity lemmas by using preset reconstruction logic to obtain the multi-granularity lemma fusion output image.
2. The infrared and visible light image fusion method based on multi-granularity lemmas according to claim 1, wherein the step S21 comprises: dividing the infrared image into N P's per scale using the following logic s 2 Size the multiscale partition region patch:
Figure FDA0003825080520000021
where s is a scale applied to the divided image, and s is defined as 1, 2, and 3, respectively.
3. The method of claim 1, wherein in step S23, the preset linear projection E is used to embed the infrared sequence
Figure FDA0003825080520000022
And the visible light sequence
Figure FDA0003825080520000023
And adding coded position information to each sequence:
Figure FDA0003825080520000024
in the formula (I), the compound is shown in the specification,
Figure FDA0003825080520000025
and
Figure FDA0003825080520000026
representing the coded sequences of the two original images at different scales of s.
4. The method as claimed in claim 1, wherein in step S24, the encoded ir sequence is processed by the following logic using the full-concatenation layer
Figure FDA0003825080520000027
And the coded visible light sequence
Figure FDA0003825080520000028
Carrying out embedding operation:
Figure FDA0003825080520000029
in the formula (I), the compound is shown in the specification,
Figure FDA00038250805200000210
and
Figure FDA00038250805200000211
query, key and value representing infrared and visible image sequences, LN representing fully connected layers.
5. The method as claimed in claim 1, wherein in step S25, the relationship extraction parameters are processed with the following logic to extract the long-range dependency relationship from the infrared image and the visible light image:
Figure FDA00038250805200000212
in the formula (I), the compound is shown in the specification,
Figure FDA00038250805200000213
and TokenV s j Is the output of the MSA.
6. The infrared and visible light image fusion method based on multi-granularity lemmas according to claim 1, wherein the step S3 comprises:
s31, calculating the loss of the infrared image and the fused image in an intensity domain by using the following logics:
Figure FDA00038250805200000214
s32, obtaining the loss L by the following logic processing 1 And total loss, so as to retain the detail and brightness information of the visible light image:
Figure FDA0003825080520000031
Figure FDA0003825080520000032
in the formula, L represents the total loss value, M represents the total number of pixels, f, I and V respectively represent the fusion result, the infrared image and the visible light image, | | F Representing the matrix Forbenius norm, with λ being the balance parameter.
7. The method for fusing infrared and visible light images based on multi-granularity lemmas according to claim 1, wherein the step S41 comprises:
s411, capturing the infrared word element by the learnable attention weight value by utilizing the following logic
Figure FDA0003825080520000033
And the visible light lemma TokenV s j The multi-granularity lemma relevance of (a):
Figure FDA0003825080520000034
wherein f represents the calculated characteristics of the fusion word element under the s scale, R represents the remodeling operation,
Figure FDA0003825080520000035
and
Figure FDA0003825080520000036
representing said infrared lemma
Figure FDA0003825080520000037
And the visible light lemma TokenV s j Learnable attention weights of (a);
s412, balancing the infrared lemmas
Figure FDA0003825080520000038
And said visible light token TokenV s j Importance at the same location.
8. The method for fusing infrared and visible images based on multi-granularity lemmas according to claim 1, wherein the step S412 is performed by balancing the infrared lemmas using the following logic
Figure FDA0003825080520000039
And said visible light token TokenV s j Importance at the same location:
Figure FDA00038250805200000310
9. the method as claimed in claim 1, wherein in step S42, the correlation and difference scale features of the multi-granular lemmas are processed by the following logic to obtain the multi-granular lemma fusion output image:
f=g(f 1 ,…f s ),
where f denotes the fused image, g denotes the dimensionality reduction mapping operation of the convolutional layer, and the kernel is 1 × 1.
10. An infrared and visible light image fusion system based on multi-granularity lemmas is characterized by comprising:
the difference scale decomposition module is used for acquiring an infrared image and a visible light image, and decomposing the infrared image and the visible light image on not less than 2 difference scales respectively;
the multi-granularity morphological global feature extraction module is used for calculating the long-range dependency relationship of the infrared image and the visible light image respectively through not less than 2 independent transform branches, wherein not less than 2 transform models are designed in each independent transform branch to extract the comprehensive multi-scale long-range dependency relationship, the multi-granularity morphological global feature extraction module is connected with the difference scale decomposition module, and the multi-granularity morphological global feature extraction module comprises:
the image segmentation module is used for segmenting the infrared image and the visible light image into a multi-scale subarea patch;
an image conversion module for converting the infrared image and the visible light image into an infrared sequence
Figure FDA0003825080520000041
And visible light sequence
Figure FDA0003825080520000042
The image conversion module is connected with the image segmentation module;
linear projection embedded moduleFor processing the infrared sequence by means of preset linear projection E embedding
Figure FDA0003825080520000043
And the visible light sequence
Figure FDA0003825080520000044
And adding coded position information into each sequence to obtain coded infrared sequence
Figure FDA0003825080520000045
And encoding the visible light sequence
Figure FDA0003825080520000046
The linear projection embedding module is connected with the image conversion module;
a relation extraction module for using the full connection layer to preset the embedded logic to the coded infrared sequence
Figure FDA0003825080520000047
And the coded visible light sequence
Figure FDA0003825080520000048
Embedding operation is carried out to obtain a relation extraction parameter, and the relation extraction module is connected with the linear projection embedding module;
a multi-head self-attention fusion module, configured to utilize a multi-head self-attention mechanism MSA to process the relationship extraction parameters with preset logic, so as to extract the long-range dependency relationship from the infrared image and the visible light image, so as to obtain multi-head self-attention fusion parameters, where the multi-head self-attention fusion parameters include: infrared morpheme
Figure FDA0003825080520000049
And visible light token TokenV s j The multi-head self-attention fusion module is connected with the relation extraction module;
the training module of the word element fusion model is used for monitoring and training a preset multi-granularity word element fusion model by utilizing a preset logic design loss function, and the training module of the word element fusion model is connected with the multi-head self-attention fusion module;
a multi-granularity lemma fusion output module, configured to fuse the infrared image and the visible light image through a multi-granularity lemma fusion module, so as to obtain a multi-granularity lemma fusion output image, where the multi-granularity lemma fusion output module is connected to the lemma fusion model training module, and the multi-granularity lemma fusion output module includes:
a word element correlation module for obtaining learnable attention weight value by preset weight definition logic and capturing the infrared word element by using preset relation capture logic
Figure FDA00038250805200000410
And the visible light lemma TokenV s j Multi-granularity lemma relevance of (2);
and the image reconstruction module is used for processing the multi-granularity word element correlation and the difference scale characteristics by utilizing preset reconstruction logic so as to obtain a multi-granularity word element fusion output image, and is connected with the word element correlation module.
CN202211054722.6A 2022-08-30 2022-08-30 Infrared and visible light image fusion method and system based on multi-granularity word elements Pending CN115331112A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211054722.6A CN115331112A (en) 2022-08-30 2022-08-30 Infrared and visible light image fusion method and system based on multi-granularity word elements

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211054722.6A CN115331112A (en) 2022-08-30 2022-08-30 Infrared and visible light image fusion method and system based on multi-granularity word elements

Publications (1)

Publication Number Publication Date
CN115331112A true CN115331112A (en) 2022-11-11

Family

ID=83928840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211054722.6A Pending CN115331112A (en) 2022-08-30 2022-08-30 Infrared and visible light image fusion method and system based on multi-granularity word elements

Country Status (1)

Country Link
CN (1) CN115331112A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523969A (en) * 2023-06-28 2023-08-01 云南联合视觉科技有限公司 MSCFM and MGFE-based infrared-visible light cross-mode pedestrian re-identification method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523969A (en) * 2023-06-28 2023-08-01 云南联合视觉科技有限公司 MSCFM and MGFE-based infrared-visible light cross-mode pedestrian re-identification method
CN116523969B (en) * 2023-06-28 2023-10-03 云南联合视觉科技有限公司 MSCFM and MGFE-based infrared-visible light cross-mode pedestrian re-identification method

Similar Documents

Publication Publication Date Title
Tang et al. YDTR: Infrared and visible image fusion via Y-shape dynamic transformer
CN110097528B (en) Image fusion method based on joint convolution self-coding network
CN112347859A (en) Optical remote sensing image saliency target detection method
CN111126202A (en) Optical remote sensing image target detection method based on void feature pyramid network
Komorowski et al. Minkloc++: lidar and monocular image fusion for place recognition
CN115713679A (en) Target detection method based on multi-source information fusion, thermal infrared and three-dimensional depth map
CN113762201A (en) Mask detection method based on yolov4
CN113870160B (en) Point cloud data processing method based on transformer neural network
CN113792641A (en) High-resolution lightweight human body posture estimation method combined with multispectral attention mechanism
CN115188066A (en) Moving target detection system and method based on cooperative attention and multi-scale fusion
CN117830788B (en) Image target detection method for multi-source information fusion
CN115331112A (en) Infrared and visible light image fusion method and system based on multi-granularity word elements
CN117238034A (en) Human body posture estimation method based on space-time transducer
CN115393404A (en) Double-light image registration method, device and equipment and storage medium
Yuan et al. STransUNet: A siamese TransUNet-based remote sensing image change detection network
Wang et al. PACCDU: Pyramid attention cross-convolutional dual UNet for infrared and visible image fusion
Baoyuan et al. Research on object detection method based on FF-YOLO for complex scenes
CN114639002A (en) Infrared and visible light image fusion method based on multi-mode characteristics
Xu et al. JCa2Co: A joint cascade convolution coding network based on fuzzy regional characteristics for infrared and visible image fusion
Li et al. TFIV: Multi-grained Token Fusion for Infrared and Visible Image via Transformer
CN116778346A (en) Pipeline identification method and system based on improved self-attention mechanism
CN115984714A (en) Cloud detection method based on double-branch network model
CN115393735A (en) Remote sensing image building extraction method based on improved U-Net
Zhu et al. PD-SegNet: Semantic Segmentation of Small Agricultural Targets in Complex Environments
CN113449611B (en) Helmet recognition intelligent monitoring system based on YOLO network compression algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination