CN116757986A - Infrared and visible light image fusion method and device - Google Patents

Infrared and visible light image fusion method and device Download PDF

Info

Publication number
CN116757986A
CN116757986A CN202310817924.XA CN202310817924A CN116757986A CN 116757986 A CN116757986 A CN 116757986A CN 202310817924 A CN202310817924 A CN 202310817924A CN 116757986 A CN116757986 A CN 116757986A
Authority
CN
China
Prior art keywords
infrared
visible light
image
feature
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310817924.XA
Other languages
Chinese (zh)
Inventor
陆成
刘雪明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202310817924.XA priority Critical patent/CN116757986A/en
Publication of CN116757986A publication Critical patent/CN116757986A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an infrared and visible light image fusion method and device, comprising the following steps: acquiring an infrared image and a visible light image of a target object; carrying out gray-scale and data-enhanced preprocessing operation on the infrared image and the visible light image to obtain a preprocessed infrared image and a preprocessed visible light image; inputting the preprocessed infrared image and the preprocessed visible light image into a pre-trained infrared and visible light image fusion model based on feature interaction and a self-encoder to obtain a fusion image; the infrared and visible light image fusion model encoder based on the characteristic interaction and self-encoder, the step fusion layer and the one-path cascade decoder. The advantages are that: the method can keep remarkable heat radiation information, abundant texture detail information and background characteristics in the source image, has high fusion image contrast, and is more in line with the visual effect of human eyes.

Description

Infrared and visible light image fusion method and device
Technical Field
The invention relates to an infrared and visible light image fusion method and device, and belongs to the technical field of image processing.
Background
Because of the imaging characteristics of the infrared sensor, the infrared image has strong anti-interference capability, but the structure texture is unclear, the spatial resolution is lower, and the visual perception of human eyes is not met; in contrast, the vision sensor forms a visible light image with rich detailed textures and higher resolution by capturing the light reflection from the object, but is greatly affected by the illumination condition, and particularly in a low-illumination environment, important targets cannot be accurately identified. Therefore, the visible light image and the infrared image are fused, so that the image with rich texture details and obvious contrast can be displayed, and the limitation of single sensor imaging is further solved. The method has wide application prospect in the fields of target identification, video monitoring, military application, medical diagnosis and the like.
The existing infrared and visible light image fusion technology is mainly divided into a traditional fusion method and a deep learning fusion method. Traditional fusion methods are mainly divided into multi-scale transformation, sparse representation, subspace, saliency-based, hybrid models and other methods. The method is only suitable for fusion among specific modes, has limited capability of mining complex features of the images, generally needs to manually design a fusion strategy, has low calculation efficiency and has poor visual effect of the fused images.
In recent years, deep learning is rapidly developed in the field of computer vision, and a method based on the deep learning achieves a good effect in the field of image fusion. However, the conventional deep learning method generally utilizes convolution to extract the individual characteristics of the image, and the complementary characteristics and redundancy of information among different modes are not fully considered, so that the problems of image background detail information loss, unobvious heat radiation targets and the like are caused. In addition, the fusion strategy of the methods generally adopts methods such as feature map addition, mean value obtaining and splicing, so that semantic features are difficult to adaptively fuse, and meanwhile, the extraction of fine-grained information of images is limited, so that unclear edges and loss of local detail information are caused, and the fusion performance is affected.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an infrared and visible light image fusion method and device.
In order to solve the technical problems, the invention provides an infrared and visible light image fusion method, which comprises the following steps:
acquiring an infrared image and a visible light image of a target object;
carrying out gray-scale and data-enhanced preprocessing operation on the infrared image and the visible light image to obtain a preprocessed infrared image and a preprocessed visible light image;
inputting the preprocessed infrared image and the preprocessed visible light image into a pre-trained infrared and visible light image fusion model based on feature interaction and a self-encoder to obtain a fusion image;
the infrared and visible light image fusion model encoder based on the characteristic interaction and self-encoder, the step fusion layer and the one-path cascade decoder;
the encoder is used for extracting the infrared characteristics, the visible light characteristics and the complementary characteristics of the preprocessed infrared images and the preprocessed visible light images;
the step-type fusion layer is used for obtaining information quantity weight graphs of all the extracted infrared features, visible light features and complementary features through a sigmoid activation function respectively, and obtaining fusion image features by adaptively fusing the infrared features, the visible light features and the complementary features through corresponding weight coefficients.
And the one-path cascade decoder is used for obtaining the fusion image through four residual blocks on the fusion image characteristics.
Further, the encoder includes an infrared path, a visible path, and an interactive compensation path;
the infrared path and the visible light path comprise a first convolution group, a second convolution group, a first gradient residual block, a second gradient residual block, a third gradient residual block and a fourth gradient residual block;
the first convolution group comprises four convolution blocks which are connected in sequence, conv1, conv2, conv3 and Conv4 are respectively arranged, and the input of Conv1 is visible light image I v Conv4 outputs shallow visible light image characteristics, and the shallow visible light image characteristics sequentially pass through a first gradient residual block and a second gradient residual block and then output visible light image characteristics;
the second convolution group comprises four convolution blocks which are connected in sequence, namely Conv2 and Conv4 respectivelyConv6, conv8, conv5 is input as an infrared image I i Conv8 outputs shallow infrared image characteristics, and the shallow infrared image characteristics sequentially pass through a third gradient residual block and a fourth gradient residual block and are output to obtain infrared image characteristics;
the interaction compensation path comprises four interaction characteristic perfecting modules and three cross-level characteristic aggregation modules, which are respectively expressed as: the system comprises a first interactive feature perfecting module, a second interactive feature perfecting module, a third interactive feature perfecting module, a fourth interactive feature perfecting module, a first cross-level feature aggregation module, a second cross-level feature aggregation module and a third cross-level feature aggregation module;
the first interactive feature perfecting module, the second interactive feature perfecting module, the first cross-level feature converging module, the third interactive feature perfecting module, the second cross-level feature converging module, the fourth interactive feature perfecting module and the third cross-level feature converging module are connected in sequence; the output of the first cross-level feature aggregation module is also connected with the first cross-level feature aggregation module, the output of the first cross-level feature aggregation module is also connected with the second cross-level feature aggregation module, and the output of the second cross-level feature aggregation module is also connected with the third cross-level feature aggregation module;
the input of the first interactive feature perfecting module is respectively connected with the output of Conv2 and Conv6, the input of the second interactive feature perfecting module is respectively connected with the output of Conv4 and Conv8, the input of the third interactive feature perfecting module is respectively connected with the first gradient residual block and the third gradient residual block, and the input of the fourth interactive feature perfecting module is respectively connected with the second gradient residual block and the fourth gradient residual block.
Further, in the first convolution group, the input channel numbers of Conv1, conv3, conv5 and Conv7 are 1,16,1,16, the output channel numbers are 1,16,1,16, each convolution block includes convolution with a convolution kernel of 1×1, BN and a LeakyReLU activation function, and the input features are output after the operations of 1×1 convolution, BN and LeakyReLU in sequence;
the input channel numbers of Conv2, conv4, conv6 and Conv8 in the second convolution group are 1,16,1,16 and the output channel numbers are 16,16,16,16 respectively, each convolution block comprises convolution with a convolution kernel of 3×3, BN and LeakyReLU activation functions, and the input features are output after the operation of 3×3 convolution, BN and LeakyReLU;
the first gradient residual block, the third gradient residual block, the second gradient residual block and the third gradient residual block all comprise a main path and a residual path, wherein the number of input and output channels of the first gradient residual block and the third gradient residual block is 16 and 32 respectively; the input and output channels of the second gradient residual block and the third gradient residual block are respectively 32 and 64; the main path comprises convolution with a convolution kernel of 1 multiplied by 1, BN, a LeakyReLU activation function and convolution with a convolution kernel of 3 multiplied by 3, the residual path comprises a Scharr gradient operator and DSConv, and firstly input features sequentially pass through the convolution with the convolution kernel of 1 multiplied by 1, BN, the LeakyReLU activation function, the convolution kernel of 3 multiplied by 3 and the BN operation to obtain learnable convolution features; and secondly, the input features sequentially pass through a Scharr gradient operator and DSConv operation to obtain gradient amplitude information features, and the two features are added and then pass through a LeakyReLU to obtain output features.
Further, the interactive feature perfecting module comprises two stages of channel attention feature correction and spatial attention feature aggregation, wherein the channel attention feature correction comprises element-by-element multiplication operation, splicing operation, MLP, average pooling operation, maximum pooling operation, sigmoid operation and element-by-element addition operation, and the spatial attention feature aggregation comprises element-by-element multiplication operation, splicing operation, average pooling operation, maximum pooling operation and element-by-element addition operation;
the channel attention characteristic correction stage adopts element-by-element multiplication to input infrared characteristic F ir And visible light feature F vis Interact with the infrared feature F ir And visible light feature F vis Splicing to obtain feature F c The method comprises the steps of carrying out a first treatment on the surface of the Feature F c Obtaining output characteristics after global maximization pooling, global tie pooling and MLP operation in channel dimension, adding the output characteristics element by element, obtaining weighting coefficients for channel calibration after sigmoid activation function operation, and finally adding the weighting coefficients with the spliced characteristics F c Multiplying to obtain channel correction attention mapExpressed as:
wherein concat (-) represents a splicing operation, sigmoid (-) represents an activation function, avg (-) represents global maximum average pooling, and max (-) represents global average pooling;
wherein ,representing visible light characteristics after aggregation by spatial attention,/->Representing the infrared features after aggregation by spatial attention;
the two spatial attention maps, the channel attention map and the initial input feature are added, obtaining cross-modal feature perfection attention force diagram F if Expressed as:
and capturing complementary information of different modes through channel-by-channel characteristic correction and space-by-space characteristic aggregation to obtain a cross-mode characteristic perfect attention diagram.
Further, the cross-level feature aggregation module comprises two convolutions with convolution kernels of 1×1, a splicing operation, an element-by-element multiplication operation, an element-by-element addition operation, and two non-convolutionsThe features with the same scale are convolved by 1 multiplied by 1 to obtain a correlation feature map, the correlation feature map is multiplied by the input features and then added by corresponding elements, and finally the complementary feature F is obtained 3 One convolution is used to adjust the number of input channels and the other convolution is used to obtain a correlation profile.
Further, the step fusion layer comprises a sigmoid operation, an element multiplication operation and an element addition operation;
the input of the step-type fusion layer is an infrared path characteristic, a visible light path characteristic and a complementary characteristic, the infrared path characteristic and the visible light path characteristic are firstly respectively operated by sigmoid to obtain a weight graph, and then the weight coefficients of the infrared path and the visible light path are obtained according to the weight graphs of the infrared path characteristic and the visible light path characteristic and />Expressed as:
lambda represents a super-parameter for preventing denominator 0,F 1 Indicating infrared characteristics, F 2 Representing visible light characteristics;
multiplying the weight coefficient with the corresponding path characteristic to obtain a pre-fusion characteristic F' out Expressed as:
wherein ,represented as element-wise multiplication;
complementary features F to be output across hierarchical feature aggregation modules 3 And a pre-fusion feature F' out And fusing to obtain the fused image characteristics.
Further, the one-path cascade decoder comprises four layers of residual modules with the same structure, wherein the four layers of residual modules are sequentially connected, the number of input channels 2 of the four residual modules is 128,64,32,16 in sequence, and the number of output channels is 64,32,16,1 in sequence.
The residual module comprises a main path and a residual path, wherein the main path comprises convolution with a convolution kernel of 1 multiplied by 1, BN, leakyReLU activation function and convolution with a convolution kernel of 3 multiplied by 3, and the residual path comprises DSConv;
firstly, the input features sequentially pass through convolution of 1×1, BN, a convolution of LeakyReLU, 3×3 and BN operation to obtain learnable convolution features; secondly, obtaining gradient amplitude information features after the input features are subjected to DSConv operation, and obtaining output features after adding the learnable convolution features and the gradient amplitude information features and performing LeakyReLU operation;
and the input channel of the residual error module of the first layer inputs the fusion image characteristics of the output channel, and the residual error module of the fourth layer outputs the fusion image.
Further, the activation function of the residual error module of the fourth layer is a hyperbolic tangent function.
Further, the training of the infrared and visible light image fusion model based on the characteristic interaction and the self-encoder comprises the following steps:
selecting 32 pairs of images from the TNO data set as the data set, converting the gray values of the images into [ -1,1], cutting the images by using a window of 128 multiplied by 128, setting the step length to be 32, and finally obtaining 6184 pairs of image blocks as a training set;
setting a loss function L total Expressed as:
L total =λ 1 L SSIM2 L patchNCE3 L detail
wherein ,λ1 、λ 2 and λ3 Are all super parameters, L SSIM Is of similar structureLoss of sex, L patchNCE For comparison of losses, L detail Loss of texture detail;
according to training set and loss function L total And training an initial infrared and visible light image fusion model based on the feature interaction and the self-encoder, wherein an Adam optimizer used in the training process updates network model parameters until training is completed, and a trained infrared and visible light image fusion model based on the feature interaction and the self-encoder is obtained.
An infrared and visible light image fusion apparatus comprising:
the acquisition module is used for acquiring an infrared image and a visible light image of the target object;
the preprocessing module is used for carrying out grayscale and data enhancement preprocessing operation on the infrared image and the visible light image to obtain a preprocessed infrared image and a preprocessed visible light image;
the fusion module is used for inputting the preprocessed infrared image and the preprocessed visible light image into a pre-trained infrared and visible light image fusion model based on the feature interaction and the self-encoder to obtain a fusion image;
the infrared and visible light image fusion model encoder based on the characteristic interaction and self-encoder, the step fusion layer and the one-path cascade decoder;
the encoder is used for extracting the infrared characteristics, the visible light characteristics and the complementary characteristics of the preprocessed infrared images and the preprocessed visible light images;
the step-type fusion layer is used for obtaining information quantity weight graphs of all the extracted infrared features, visible light features and complementary features through a sigmoid activation function respectively, and obtaining fusion image features by adaptively fusing the infrared features, the visible light features and the complementary features through corresponding weight coefficients.
And the one-path cascade decoder is used for obtaining the fusion image through four residual blocks on the fusion image characteristics.
The invention has the beneficial effects that:
1. according to the invention, a gradient residual error module is introduced into an infrared path and a visible light path of the encoder, the learnable convolution characteristic and gradient amplitude information are integrated, the space fine granularity detail information is extracted, and the representation capability of texture detail information is further improved.
2. The invention develops the interactive feature perfecting module, corrects the information of different modes from the channel dimension, aggregates the interactive information from the space dimension, and utilizes the relationship between the channel and the space to better pay attention to the complementary information of the other party, thereby inhibiting the influence of noise generated by introducing multiple modes and better realizing the cross-mode feature interaction.
3. The invention develops a cross-level feature aggregation module, gradually extracts the frequency correlation between the image features of adjacent network layers, enhances the image features by utilizing a correlation feature map, and aggregates the complementary information on the interaction compensation path.
4. According to the invention, a stepwise fusion module is developed, and pixel intensity information of an infrared image and texture detail information of visible light are adaptively fused through weight coefficients on different paths, so that the fusion performance of a model is improved.
5. The method has the advantages of remarkably improved fusion effect and better model generalization capability, can be applied to multi-mode image and medical image fusion, and has high application value in the field of image fusion.
Drawings
FIG. 1 is a schematic diagram of a converged network of the method of the present invention;
FIG. 2 is a schematic diagram of a residual module in a decoder;
FIG. 3 is a schematic diagram of an interactive feature refinement module in an encoder;
FIG. 4 is a schematic diagram of a cross-level feature aggregation module in an encoder;
FIG. 5 is a schematic diagram of a ladder fusion module according to the present invention;
FIG. 6 is a comparison of fusion results under a first set of normal lighting conditions;
FIG. 7 is a graph comparing fusion results under a second set of low light conditions.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
An infrared and visible light image fusion method comprises the following steps:
s1, as shown in FIG. 1, constructing an infrared and visible light image fusion model frame based on feature interaction and a self-encoder; the model framework mainly comprises an encoder, a step-type fusion layer and a path of cascade decoder; the encoder comprises three different branches, namely an infrared path, a visible path and an interaction compensation path, and infrared features, visible light features and complementary features required by the fused image are extracted through the three different branches of the encoder, so that the fused image comprises richer source image information.
The encoder includes an infrared path, a visible path, and an interactive compensation path;
the infrared path and the visible light path comprise a first convolution group, a second convolution group, a first gradient residual block, a second gradient residual block, a third gradient residual block and a fourth gradient residual block;
the first convolution group comprises four convolution blocks which are connected in sequence, conv1, conv2, conv3 and Conv4 are respectively arranged, and the input of Conv1 is visible light image I v Conv4 outputs shallow visible light image characteristics, and the shallow visible light image characteristics sequentially pass through a first gradient residual block and a second gradient residual block and then output visible light image characteristics;
the second convolution group comprises four convolution blocks which are connected in sequence, wherein the inputs of the four convolution blocks are Conv2, conv4, conv6, conv8 and Conv5 are infrared images I i Conv8 outputs shallow infrared image characteristics, and the shallow infrared image characteristics sequentially pass through a third gradient residual block and a fourth gradient residual block and are output to obtain infrared image characteristics;
the interaction compensation path comprises four interaction characteristic perfecting modules and three cross-level characteristic aggregation modules, which are respectively expressed as: the system comprises a first interactive feature perfecting module, a second interactive feature perfecting module, a third interactive feature perfecting module, a fourth interactive feature perfecting module, a first cross-level feature aggregation module, a second cross-level feature aggregation module and a third cross-level feature aggregation module;
the first interactive feature perfecting module, the second interactive feature perfecting module, the first cross-level feature converging module, the third interactive feature perfecting module, the second cross-level feature converging module, the fourth interactive feature perfecting module and the third cross-level feature converging module are connected in sequence; the output of the first cross-level feature aggregation module is also connected with the first cross-level feature aggregation module, the output of the first cross-level feature aggregation module is also connected with the second cross-level feature aggregation module, and the output of the second cross-level feature aggregation module is also connected with the third cross-level feature aggregation module;
the input of the first interactive feature perfecting module is respectively connected with the output of Conv2 and Conv6, the input of the second interactive feature perfecting module is respectively connected with the output of Conv4 and Conv8, the input of the third interactive feature perfecting module is respectively connected with the first gradient residual block and the third gradient residual block, and the input of the fourth interactive feature perfecting module is respectively connected with the second gradient residual block and the fourth gradient residual block.
The input channel numbers of Conv1, conv3, conv5 and Conv7 in the first convolution group are 1,16,1,16, the output channel numbers are 1,16,1,16, each convolution block comprises convolution with a convolution kernel of 1×1, BN and a LeakyReLU activation function, and the input features sequentially pass through the 1×1 convolution, BN (Batch Normalization ) and the LeakyReLU operation and then output features;
the input channel numbers of Conv2, conv4, conv6 and Conv8 in the second convolution group are 1,16,1,16 and the output channel numbers are 16,16,16,16 respectively, each convolution block comprises convolution with a convolution kernel of 3×3, BN and LeakyReLU activation functions, and the input features are output after the operation of 3×3 convolution, BN and LeakyReLU;
the first gradient residual block, the third gradient residual block, the second gradient residual block and the third gradient residual block all comprise a main path and a residual path, wherein the number of input and output channels of the first gradient residual block and the third gradient residual block is 16 and 32 respectively; the input and output channels of the second gradient residual block and the third gradient residual block are respectively 32 and 64; the main path comprises convolution with a convolution kernel of 1×1, BN, a inakenyllu activation function, and convolution with a convolution kernel of 3×3, the residual path comprises a Scharr gradient operator and a DSConv (Depthwise Separable Convolution, depth separable convolution), firstly, input features sequentially pass through the convolution with the convolution kernel of 1×1, BN, the inakenyllu activation function, the convolution kernel of 3×3, and BN operations to obtain learnable convolution features; and secondly, the input features sequentially pass through a Scharr gradient operator and DSConv operation to obtain gradient amplitude information features, and the two features are added and then pass through a LeakyReLU to obtain output features. The main branch of the gradient residual block consists of 1×1 convolution and 3×3 convolution, the residual branch consists of Scharr gradient operator and depth separable convolution, the learnable convolution characteristics and gradient amplitude information are integrated, and the space fine granularity information characterization capability is improved.
S2, as shown in FIG. 3, an interactive feature perfecting module and a cross-level feature aggregation module on the interactive compensation path are designed. The interactive compensation path is composed of 4 interactive feature perfecting modules and 3 cross-level feature aggregation modules. The feature interaction perfecting module comprises two stages of channel attention feature correction and space attention feature aggregation, and further obtains complementary information of two-mode images. For low-level to high-level characteristics of infrared and visible light images, obtaining channel correction attention force diagrams of two mode images after channel correction attention; and obtaining a space aggregation attention map through space aggregation attention, fusing the channel and the space attention features to obtain complementary features, and finally obtaining interaction compensation features after adjacent interaction features pass through a cross-level feature aggregation module.
S21, correcting the attention characteristic of the channel. Input of infrared features F i And visible light feature F v The two characteristics are multiplied element by element and then are spliced with input, then the spliced characteristics are subjected to maximum pooling, average pooling and multi-layer perceptron in the channel dimension, the two output characteristics are added element by element and input into an activation function sigmoid to obtain a weighting coefficient of channel correction, and finally the weighting coefficient is multiplied with the spliced characteristics to obtain a channel correction attention diagramRepresented asWhere avg (·) is denoted global maximum average pooling and max (·) is denoted global average pooling.
S22, spatial attention feature aggregation. Will F i Performing splicing operation after the operations of maximum pooling and average pooling in the space dimension respectively, F v Similarly, the features of the two modes are spliced, and the spliced features are matched with F i 、F v Respectively multiplying to obtain corresponding space aggregation attention force diagram F i sRespectively expressed as->And
s23, fusing the channel and the spatial attention characteristics. The two spatial attention maps and the channel attention map are added to the initial input features, obtaining cross-modal feature perfection attention force diagram F f Expressed asAnd by means of channel-by-channel correction and space-by-space aggregation, complementary information of different modes is captured, and single-mode individual characteristic characterization capability is enhanced.
S24, as shown in FIG. 4, designing a cross-level feature aggregation module. The cross-level feature aggregation module comprises two common convolutions, the correlation feature graphs are obtained by the convolution operation of 1 multiplied by 1 and then are added with corresponding elements after being multiplied by the input features, and finally the aggregation features are obtained. One convolution is used to adjust the number of input channels and the other convolution is used to obtain a correlation profile. By referring to the cross-level feature aggregation module and the interactive feature perfecting module, all middle-level features are used for feature interaction and perfecting, and the capability of complementary information and redundant information suppression is improved.
S3, as shown in FIG. 5, designing a stepwise fusion module. Fusion layer passes through three different branches F of input n And (n=1, 2 and 3) carrying out self-adaptive fusion on semantic information to obtain fusion characteristics.
S31, step fusion branches. First, features of infrared branches and visible branches are fused, wherein weight coefficients of the infrared branches and the visible branches and />Can be expressed as +.> Wherein λ is set to 1e-8.
Then multiplying the weight coefficient with the corresponding branch feature to obtain a pre-fusion branch feature F' out Can be expressed as wherein ,/>Expressed as element-wise multiplication, interactive compensation branch F 3 And the same as the operation of the fusion branch, and finally obtaining a fusion characteristic diagram.
S4, as shown in fig. 1 and 2, designing a cascade decoder. The decoder comprises 4 residual modules, the number of input channels of the 4 residual blocks is 128,64,32,16 in sequence, and the number of output channels is 64,32,16,1. Wherein the main branches of the residual block consist of a 1×1 convolution and a 3×3 convolution, the residual branches consist of a depth separable convolution, and the two branches are finally output by element-by-element addition and activation functions LeakyReLU. The fusion characteristic is input into a decoder, and the reconstruction of the fusion image is completed through 4 residual error modules in sequence. Here, the activation function of the last layer residual block is a hyperbolic tangent function (Tanh).
S5, training a network model. And taking the TNO image as a training data set, carrying out graying and data enhancement operation on the source image input with two modes, and training the network model by adopting structural similarity, fine granularity detail loss and contrast loss to obtain parameters of the network model.
S51, preprocessing a data set. Selecting 32 pairs of images from the TNO data set as the data set, converting the gray value of the images into [ -1,1], cutting the images by using a window of 128 multiplied by 128, setting the step length to be 32, and finally obtaining 6184 pairs of image blocks as the training set.
S52, setting a loss function. The loss function formula can be expressed as L total =λ 1 L SSIM2 L patchNCE3 L detail, wherein ,λ1 、λ 2 and λ3 Are super parameters.
The above structural similarity loss L SSIM The quality of the fused image can be evaluated in terms of brightness, contrast, structure, and structural similarity expressed asWherein μ represents the pixel mean value, σ xy Representing covariance, σ represents variance. C (C) 1 、C 2 Is to avoid->The minimum value set near zero is usually 0.01 2 、0.03 2 。L SSIM Expressed as:
wherein I is x 、I y and If Respectively representing a visible light image, an infrared image and a fusion image, W represents a sliding window, the stride is 1, P i Representing the value of pixel i, m, n representing the sliding window size, the present invention sets the window to 16 x 16.
The above comparative loss L patchNCE Represented as Where k represents the encoded feature samples, k + Representing positive samples similar to k, k-representing negative samples dissimilar to k, τ representing the temperature coefficient, is typically taken as τ=0.07. S represents the number of locations sampled in the image feature layer, where S e {1, 2..once., S }, D s Representing any sequence of features in space of the channel. The most similar parts of the fusion image and the source image are effectively reserved by calculating the similarity between a certain region and positive and negative samples in space.
The texture detail penalty L described above detail The fused image retains more fine-grained detail information, expressed as wherein />Representing the Scharr gradient operator, H and W represent the image height and width, I.I 1 Representing the L1 norm.
The Adam optimizer used in the training process updates the parameters of the network model, the learning rate is set to 0.01, the epoch is set to 20, and the batch size is set to 4.
Furthermore, in order to verify the image fusion effect of the self-encoder obtained through training by the method, the embodiment of the invention also verifies the trained self-encoder.
Specifically, in the test phase, 20 sets of images were selected from the M3FD dataset for test verification, and 7 typical methods were selected for comparison, including DenseFUSE, U2Fusion, RFN-Nest, SEDRFuse, IFCNN, GANMcC, CSF. In addition, the quantitative evaluation index adopts 6 indexes such as information Entropy (EN), average Gradient (AG), spatial Frequency (SF), mutual Information (MI), standard Deviation (SD), visual fidelity (VIF) and the like, and the verification result comprises two aspects of qualitative evaluation and quantitative evaluation.
(1) And (5) qualitative evaluation. Fig. 6 and 7 show qualitative comparison results of two sets of images (a morals scene graph under normal and low light conditions). By comparison, the fusion method of the present invention can be found to have three advantages. Firstly, the fusion effect can furthest reserve the brightness information of the heat radiation target in the infrared image. For a typical infrared target, such as the person in fig. 6 and 7, the fusion results of the present invention are more bright, more sharp edge contoured target features than other methods. Secondly, the fusion result can keep texture details and background information in the visible light image. Such as the license plate of fig. 6 and the trash can of fig. 7, the fusion method of the present invention can retain clearer detailed information and more obvious background information than other methods. Finally, the fusion result has better contrast and better visual effect. Compared with the source image and other fusion results, the method can better keep the prominent target features and rich scene detail information, has high image contrast and better accords with the visual effect of human eyes.
(2) And (5) quantitatively evaluating. Table 1 gives the objective comparison results of the 20 images of the M3FD dataset. The optimal average and suboptimal average are marked with bold and underline, respectively. The invention can be seen that the optimal average value of EN, MI, SD, VIF is obtained, the index SF is a suboptimal value, and objective experiments show that the method has better fusion performance than other methods. The maximum values EN and MI show that the fusion image of the invention effectively acquires a large amount of infrared heat radiation information and texture detail characteristics from the source image, and the complementary information can be acquired from different modes by constructing an interactive characteristic perfecting module by the fusion method of the invention. The maximum SD indicates a stepwise fusion strategy constructed by the fusion method of the invention, and the characteristics of different modes can be adaptively fused. The maximum value VIF shows that the fusion image contains rich fine granularity information and better contrast and visual effect, because the fusion method constructs a gradient residual block, establishes long-distance dependency of local features and acquires fine granularity detail information from the image.
Table 1 quantitative comparison of 20 images of M3FD dataset
The invention also provides an infrared and visible light image fusion device, which comprises:
the acquisition module is used for acquiring an infrared image and a visible light image of the target object;
the preprocessing module is used for carrying out grayscale and data enhancement preprocessing operation on the infrared image and the visible light image to obtain a preprocessed infrared image and a preprocessed visible light image;
the fusion module is used for inputting the preprocessed infrared image and the preprocessed visible light image into a pre-trained infrared and visible light image fusion model based on the feature interaction and the self-encoder to obtain a fusion image;
the infrared and visible light image fusion model encoder based on the characteristic interaction and self-encoder, the step fusion layer and the one-path cascade decoder;
the encoder is used for extracting the infrared characteristics, the visible light characteristics and the complementary characteristics of the preprocessed infrared images and the preprocessed visible light images;
the step-type fusion layer is used for obtaining information quantity weight graphs of all the extracted infrared features, visible light features and complementary features through a sigmoid activation function respectively, and obtaining fusion image features by adaptively fusing the infrared features, the visible light features and the complementary features through corresponding weight coefficients.
And the one-path cascade decoder is used for obtaining the fusion image through four residual blocks on the fusion image characteristics.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims (10)

1. An infrared and visible light image fusion method, comprising:
acquiring an infrared image and a visible light image of a target object;
carrying out gray-scale and data-enhanced preprocessing operation on the infrared image and the visible light image to obtain a preprocessed infrared image and a preprocessed visible light image;
inputting the preprocessed infrared image and the preprocessed visible light image into a pre-trained infrared and visible light image fusion model based on feature interaction and a self-encoder to obtain a fusion image;
the infrared and visible light image fusion model encoder based on the characteristic interaction and self-encoder, the step fusion layer and the one-path cascade decoder;
the encoder is used for extracting the infrared characteristics, the visible light characteristics and the complementary characteristics of the preprocessed infrared images and the preprocessed visible light images;
the step-type fusion layer is used for obtaining information quantity weight graphs of all the extracted infrared features, visible light features and complementary features through a sigmoid activation function respectively, and obtaining fusion image features by adaptively fusing the infrared features, the visible light features and the complementary features through corresponding weight coefficients.
And the one-path cascade decoder is used for obtaining the fusion image through four residual blocks on the fusion image characteristics.
2. The infrared and visible light image fusion method of claim 1, wherein the encoder comprises an infrared path, a visible path, and an interactive compensation path;
the infrared path and the visible light path comprise a first convolution group, a second convolution group, a first gradient residual block, a second gradient residual block, a third gradient residual block and a fourth gradient residual block;
the first convolution group comprises four convolution blocks which are connected in sequence, conv1, conv2, conv3 and Conv4 are respectively arranged, and the input of Conv1 is visible light image I v Conv4 outputs shallow visible light image characteristics, and the shallow visible light image characteristics sequentially pass through a first gradient residual block and a second gradient residual block and then output visible light image characteristics;
the second convolution group comprises four convolution blocks which are connected in sequence, wherein the inputs of the four convolution blocks are Conv2, conv4, conv6, conv8 and Conv5 are infrared images I i Conv8 outputs shallow infrared image characteristics, and the shallow infrared image characteristics sequentially pass through a third gradient residual block and a fourth gradient residual block and are output to obtain infrared image characteristics;
the interaction compensation path comprises four interaction characteristic perfecting modules and three cross-level characteristic aggregation modules, which are respectively expressed as: the system comprises a first interactive feature perfecting module, a second interactive feature perfecting module, a third interactive feature perfecting module, a fourth interactive feature perfecting module, a first cross-level feature aggregation module, a second cross-level feature aggregation module and a third cross-level feature aggregation module;
the first interactive feature perfecting module, the second interactive feature perfecting module, the first cross-level feature converging module, the third interactive feature perfecting module, the second cross-level feature converging module, the fourth interactive feature perfecting module and the third cross-level feature converging module are connected in sequence; the output of the first cross-level feature aggregation module is also connected with the first cross-level feature aggregation module, the output of the first cross-level feature aggregation module is also connected with the second cross-level feature aggregation module, and the output of the second cross-level feature aggregation module is also connected with the third cross-level feature aggregation module;
the input of the first interactive feature perfecting module is respectively connected with the output of Conv2 and Conv6, the input of the second interactive feature perfecting module is respectively connected with the output of Conv4 and Conv8, the input of the third interactive feature perfecting module is respectively connected with the first gradient residual block and the third gradient residual block, and the input of the fourth interactive feature perfecting module is respectively connected with the second gradient residual block and the fourth gradient residual block.
3. The method for fusing infrared and visible light images according to claim 2, wherein the numbers of input channels of Conv1, conv3, conv5 and Conv7 in the first convolution group are 1,16,1,16 and the numbers of output channels are 1,16,1,16 respectively, each convolution block comprises convolution with a convolution kernel of 1×1, BN and a inaryrelu activation function, and the input features are output after the operations of 1×1 convolution, BN and inaryrelu in sequence;
the input channel numbers of Conv2, conv4, conv6 and Conv8 in the second convolution group are 1,16,1,16 and the output channel numbers are 16,16,16,16 respectively, each convolution block comprises convolution with a convolution kernel of 3×3, BN and LeakyReLU activation functions, and the input features are output after the operation of 3×3 convolution, BN and LeakyReLU;
the first gradient residual block, the third gradient residual block, the second gradient residual block and the third gradient residual block all comprise a main path and a residual path, wherein the number of input and output channels of the first gradient residual block and the third gradient residual block is 16 and 32 respectively; the input and output channels of the second gradient residual block and the third gradient residual block are respectively 32 and 64; the main path comprises convolution with a convolution kernel of 1 multiplied by 1, BN, a LeakyReLU activation function and convolution with a convolution kernel of 3 multiplied by 3, the residual path comprises a Scharr gradient operator and DSConv, and firstly input features sequentially pass through the convolution with the convolution kernel of 1 multiplied by 1, BN, the LeakyReLU activation function, the convolution kernel of 3 multiplied by 3 and the BN operation to obtain learnable convolution features; and secondly, the input features sequentially pass through a Scharr gradient operator and DSConv operation to obtain gradient amplitude information features, and the two features are added and then pass through a LeakyReLU to obtain output features.
4. The infrared and visible light image fusion method of claim 2, wherein the interactive feature refinement module comprises two stages of channel attention feature correction and spatial attention feature aggregation, the channel attention feature correction comprises element-by-element multiplication operation, stitching operation, MLP, average pooling operation, maximum pooling operation, sigmoid operation, element-by-element addition operation, and the spatial attention feature aggregation comprises element-by-element multiplication operation, stitching operation, average pooling operation, maximum pooling operation, element-by-element addition operation;
the channel attention characteristic correction stage adopts element-by-element multiplication to input infrared characteristic F ir And visible light feature F vis Interact with the infrared feature F ir And visible light feature F vis Splicing to obtain feature F c The method comprises the steps of carrying out a first treatment on the surface of the Feature F c Obtaining output characteristics after global maximization pooling, global tie pooling and MLP operation in channel dimension, adding the output characteristics element by element, obtaining weighting coefficients for channel calibration after sigmoid activation function operation, and finally adding the weighting coefficients with the spliced characteristics F c Multiplying to obtain channel correction attention mapExpressed as:
wherein concat (-) represents a splicing operation, sigmoid (-) represents an activation function, avg (-) represents global maximum average pooling, and max (-) represents global average pooling;
wherein ,representing visible light characteristics after aggregation by spatial attention,/->Representing the infrared features after aggregation by spatial attention;
the two spatial attention maps, the channel attention map and the initial input feature are added, obtaining cross-modal feature perfection attention force diagram F if Expressed as:
and capturing complementary information of different modes through channel-by-channel characteristic correction and space-by-space characteristic aggregation to obtain a cross-mode characteristic perfect attention diagram.
5. The method for fusing infrared and visible light images according to claim 4, wherein the cross-level feature aggregation module comprises a convolution with two convolution kernels of 1×1, a stitching operation, an element-by-element multiplication operation, and an element-by-element addition operation, features of two different scales are obtained through the convolution operation of 1×1 to obtain a correlation feature map, and the correlation feature map is multiplied by an input feature and then added by a corresponding element to finally obtain a complementary feature F 3 One convolution is used to adjust the number of input channels and the other convolution is used to obtain a correlation profile.
6. The infrared and visible light image fusion method according to claim 5, wherein the stepwise fusion layer includes a sigmoid operation, an element multiplication operation, and an element addition operation;
the input of the step-type fusion layer is an infrared path characteristic, a visible light path characteristic and a complementary characteristic, the infrared path characteristic and the visible light path characteristic are firstly respectively operated by sigmoid to obtain a weight graph, and then the weight coefficients of the infrared path and the visible light path are obtained according to the weight graphs of the infrared path characteristic and the visible light path characteristic and />Expressed as:
lambda represents a super-parameter for preventing denominator 0,F 1 Indicating infrared characteristics, F 2 Representing visible light characteristics;
multiplying the weight coefficient with the corresponding path characteristic to obtain a pre-fusion characteristic F' out Expressed as:
wherein ,represented as element-wise multiplication;
complementary features F to be output across hierarchical feature aggregation modules 3 And a pre-fusion feature F' out And fusing to obtain the fused image characteristics.
7. The method for fusing infrared and visible light images according to claim 6, wherein the one-path cascade decoder comprises four layers of residual modules with the same structure, wherein the input channel number 2 of the four residual modules is 128,64,32,16 in sequence, and the output channel number is 64,32,16,1 in sequence.
The residual module comprises a main path and a residual path, wherein the main path comprises convolution with a convolution kernel of 1 multiplied by 1, BN, leakyReLU activation function and convolution with a convolution kernel of 3 multiplied by 3, and the residual path comprises DSConv;
firstly, the input features sequentially pass through convolution of 1×1, BN, a convolution of LeakyReLU, 3×3 and BN operation to obtain learnable convolution features; secondly, obtaining gradient amplitude information features after the input features are subjected to DSConv operation, and obtaining output features after adding the learnable convolution features and the gradient amplitude information features and performing LeakyReLU operation;
and the input channel of the residual error module of the first layer inputs the fusion image characteristics of the output channel, and the residual error module of the fourth layer outputs the fusion image.
8. The method of claim 7, wherein the activation function of the residual module of the fourth layer is a hyperbolic tangent function.
9. The method of claim 1, wherein the training of the feature interaction and self-encoder based infrared and visible image fusion model comprises:
selecting 32 pairs of images from the TNO data set as the data set, converting the gray values of the images into [ -1,1], cutting the images by using a window of 128 multiplied by 128, setting the step length to be 32, and finally obtaining 6184 pairs of image blocks as a training set;
setting a loss function L total Expressed as:
L total =λ 1 L SSIM2 L patchNCE3 L detail
wherein ,λ1 、λ 2 and λ3 Are all super parameters, L SSIM For structural similarity loss, L patchNCE For comparison of losses, L detail Loss of texture detail;
according to training set and loss function L total And training an initial infrared and visible light image fusion model based on the feature interaction and the self-encoder, wherein an Adam optimizer used in the training process updates network model parameters until training is completed, and a trained infrared and visible light image fusion model based on the feature interaction and the self-encoder is obtained.
10. An infrared and visible light image fusion apparatus, comprising:
the acquisition module is used for acquiring an infrared image and a visible light image of the target object;
the preprocessing module is used for carrying out grayscale and data enhancement preprocessing operation on the infrared image and the visible light image to obtain a preprocessed infrared image and a preprocessed visible light image;
the fusion module is used for inputting the preprocessed infrared image and the preprocessed visible light image into a pre-trained infrared and visible light image fusion model based on the feature interaction and the self-encoder to obtain a fusion image;
the infrared and visible light image fusion model encoder based on the characteristic interaction and self-encoder, the step fusion layer and the one-path cascade decoder;
the encoder is used for extracting the infrared characteristics, the visible light characteristics and the complementary characteristics of the preprocessed infrared images and the preprocessed visible light images;
the step-type fusion layer is used for obtaining information quantity weight graphs of all the extracted infrared features, visible light features and complementary features through a sigmoid activation function respectively, and obtaining fusion image features by adaptively fusing the infrared features, the visible light features and the complementary features through corresponding weight coefficients.
And the one-path cascade decoder is used for obtaining the fusion image through four residual blocks on the fusion image characteristics.
CN202310817924.XA 2023-07-05 2023-07-05 Infrared and visible light image fusion method and device Pending CN116757986A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310817924.XA CN116757986A (en) 2023-07-05 2023-07-05 Infrared and visible light image fusion method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310817924.XA CN116757986A (en) 2023-07-05 2023-07-05 Infrared and visible light image fusion method and device

Publications (1)

Publication Number Publication Date
CN116757986A true CN116757986A (en) 2023-09-15

Family

ID=87947801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310817924.XA Pending CN116757986A (en) 2023-07-05 2023-07-05 Infrared and visible light image fusion method and device

Country Status (1)

Country Link
CN (1) CN116757986A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994295A (en) * 2023-09-27 2023-11-03 华侨大学 Wild animal category identification method based on gray sample self-adaptive selection gate
CN117036893A (en) * 2023-10-08 2023-11-10 南京航空航天大学 Image fusion method based on local cross-stage and rapid downsampling

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994295A (en) * 2023-09-27 2023-11-03 华侨大学 Wild animal category identification method based on gray sample self-adaptive selection gate
CN116994295B (en) * 2023-09-27 2024-02-02 华侨大学 Wild animal category identification method based on gray sample self-adaptive selection gate
CN117036893A (en) * 2023-10-08 2023-11-10 南京航空航天大学 Image fusion method based on local cross-stage and rapid downsampling
CN117036893B (en) * 2023-10-08 2023-12-15 南京航空航天大学 Image fusion method based on local cross-stage and rapid downsampling

Similar Documents

Publication Publication Date Title
CN113065558B (en) Lightweight small target detection method combined with attention mechanism
WO2018000752A1 (en) Monocular image depth estimation method based on multi-scale cnn and continuous crf
CN116757986A (en) Infrared and visible light image fusion method and device
CN112819910A (en) Hyperspectral image reconstruction method based on double-ghost attention machine mechanism network
CN115311186B (en) Cross-scale attention confrontation fusion method and terminal for infrared and visible light images
CN114782298B (en) Infrared and visible light image fusion method with regional attention
CN114219719A (en) CNN medical CT image denoising method based on dual attention and multi-scale features
CN114708615B (en) Human body detection method based on image enhancement in low-illumination environment, electronic equipment and storage medium
CN116012255A (en) Low-light image enhancement method for generating countermeasure network based on cyclic consistency
CN115546171A (en) Shadow detection method and device based on attention shadow boundary and feature correction
CN115731597A (en) Automatic segmentation and restoration management platform and method for mask image of face mask
CN116205962A (en) Monocular depth estimation method and system based on complete context information
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
Zhang et al. Deep joint neural model for single image haze removal and color correction
CN113810683A (en) No-reference evaluation method for objectively evaluating underwater video quality
CN116596792B (en) Inland river foggy scene recovery method, system and equipment for intelligent ship
CN117392496A (en) Target detection method and system based on infrared and visible light image fusion
CN117391981A (en) Infrared and visible light image fusion method based on low-light illumination and self-adaptive constraint
CN116883303A (en) Infrared and visible light image fusion method based on characteristic difference compensation and fusion
Kumar et al. Underwater Image Enhancement using deep learning
CN116309170A (en) Defogging method and device for inspection images of power transmission line
CN116129417A (en) Digital instrument reading detection method based on low-quality image
CN116310394A (en) Saliency target detection method and device
CN115578262A (en) Polarization image super-resolution reconstruction method based on AFAN model
CN115346091A (en) Method and device for generating Mura defect image data set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination