CN116309215A - Image fusion method based on double decoders - Google Patents

Image fusion method based on double decoders Download PDF

Info

Publication number
CN116309215A
CN116309215A CN202310165488.2A CN202310165488A CN116309215A CN 116309215 A CN116309215 A CN 116309215A CN 202310165488 A CN202310165488 A CN 202310165488A CN 116309215 A CN116309215 A CN 116309215A
Authority
CN
China
Prior art keywords
image
fusion
decoding
fusion method
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310165488.2A
Other languages
Chinese (zh)
Inventor
邱怀彬
刘晓宋
邸江磊
秦玉文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202310165488.2A priority Critical patent/CN116309215A/en
Publication of CN116309215A publication Critical patent/CN116309215A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

The invention belongs to the field of image fusion, and discloses an image fusion method based on double decoders, which is used for solving the problem of image fusion based on deep learningThe method is used for solving the problem that the characteristic extraction capability and fusion effect of complex multi-mode image processing shot by cameras with different imaging modes are poor, and comprises the following steps: will multimodal image A 1 、A 2 The method comprises the steps of extracting features through a large receptive field feature extraction module, then respectively passing through two interactive decoding modules, splicing and interactively fusing decoding information between the two decoding modules with different modes on a channel in the decoding process, reconstructing a fused image C, and calculating the fused image C and a multi-mode image A 1 、A 2 Updating network model parameters. The invention can effectively realize the fusion of complex multi-mode images and has the characteristics of better feature information extraction, less parameter quantity, high reconstruction precision, stronger robustness and the like.

Description

Image fusion method based on double decoders
Technical field:
the invention relates to an image fusion method, in particular to an image fusion method based on double decoders.
The background technology is as follows:
with the progress of the age, the information provided by a single source image cannot meet the requirement of human vision or the requirement of target identification and detection, so that cameras with different imaging modes are required to shoot multi-mode images, and fusion images with richer detail information are acquired through an image fusion means.
The image fusion technology integrates all information of two or more images of the same scene with different sensors or different positions, time, brightness and the like into a single fusion image by overlapping and complementing, so as to comprehensively characterize an imaging scene and promote subsequent visual tasks. Compared with a single source image, the fusion image can obtain scene information of a target more clearly, and the quality and definition of the image are obviously improved.
The traditional image fusion method is relatively mature, requires complex fusion rules to be designed manually, and has high labor cost and calculation cost of image fusion. For complex multi-modal images, it is very difficult to design a general feature extraction method for the complex multi-modal images, which is highly dependent on manually designed features. With the rise of deep learning in recent years, an image fusion method based on the deep learning is also emerging, and a new idea is provided for image fusion. However, the image fusion method based on deep learning at the present stage has high network complexity and large calculation amount, and can also have the problems of inaccurate feature extraction, poor image fusion effect and the like for complex multi-mode images.
The invention comprises the following steps:
the invention aims to overcome the defects of the prior art and provides an image fusion method based on the non-activated feature extraction of a simple gate unit, which can realize the fusion of complex multi-mode images and has the characteristics of good feature information extraction, less parameter quantity, high reconstruction precision and stronger robustness.
The technical scheme for solving the technical problems is as follows:
an image fusion method based on double decoders, comprising the following steps:
(S1) shooting a multi-mode image by using cameras with different imaging modes, and recording the multi-mode image as an image A 1 、A 2
(S2) Multi-modality image A 1 、A 2 As the input of the network, the multi-mode feature map is obtained through a convolution layer and then through N large receptive field feature extraction modules;
(S3) respectively passing the two multi-mode feature maps through two interactive decoding modules, splicing decoding information between the two decoding modules with different modes on a channel for interactive fusion in the decoding process, repeating the steps for N times, gradually fusing, and then obtaining a fused image C through a convolution layer;
and (S4) constructing the neural network through the process, calculating a loss function value between a fusion image output by the neural network and an input image, and reversely transmitting a gradient of the loss function value to update parameters of the network model until the loss function value is converged, and stopping updating the parameters of the network model to obtain the trained neural network.
Preferably, in step (S1), the multi-modal image includes, but is not limited to, a visible light image, a short wave infrared image, a medium wave infrared image, a long wave infrared image, a polarized image.
Preferably, in step (S1), the multi-modal image a 1 For visible light image, A 2 Is one of short wave, medium wave, long wave infrared image or polarized image.
Preferably, in the step (S2), the number of times N of the module repetition is preferably within a range of 4.ltoreq.N.ltoreq.6.
Preferably, in step (S2), the large receptive field feature extraction module adopts residual connection, including a convolution layer with a convolution kernel size of 1*1, a gaussian error nonlinear activation function, a depth convolution layer with a convolution kernel size of 5*5, a depth convolution layer with a convolution kernel size of 5*5 and an expansion value of 3, and pixel normalization.
Preferably, in step (S3), the interactive decoding module acquires each level of feature information and the fusion decoding information of the previous level to perform pixel superposition and interactive decoding.
Preferably, in step (S3), the interactive decoding module includes a convolution layer with a convolution kernel size of 3*3, channel attention, and interpolation upsampling.
Preferably, in step (S4), the neural network' S Loss function uses a Loss function for comparing the similarity between the fusion result image and the pre-fusion image, where Loss function Loss is a combination of SSIM Loss, background content Loss and saliency target Loss, and the expression of the Loss function is as follows:
L SSIM =1-kSSIM(A 1 ,C)-(1-k)SSIM(A 2 ,C) (1)
Figure SMS_1
Figure SMS_2
Loss=δ 1 L SSIM2 L back3 L salient (4)
in the above
Figure SMS_3
For a gradient operator, h and w are respectively the height and width of an image, k can take different values according to different input mode images, k takes the value range of 0 < k < 1, and δ1+δ2+δ3 should be equal to 1.
Compared with the prior art, the invention has the following beneficial effects:
1. the image fusion method based on the double decoders adopts a large receptive field feature extraction module, uses large kernel convolution with separable depth to reduce the size of a model, and simultaneously increases receptive field. By using the large convolution kernel, information can be collected from a large area, the semantic information extraction capacity is higher, the large calculation burden caused by the large convolution kernel is reduced by using the designed large convolution kernel with separable depth, and parameters are reduced, so that better feature extraction performance is realized.
2. The image fusion method based on the double decoders uses the double decoders and carries out multi-mode fusion in the decoding stage. In existing methods, multi-modal fusion is typically performed during the encoding phase, but such fusion strategies are more difficult to optimize than methods that perform fusion during the decoding phase. In back propagation, the gradient computation path in the decoder is shorter than the path in the encoder, so the optimization process of the decoder is less affected by the gradient extinction or gradient explosion. Thus, the decoder is more easily optimized than the encoder.
3. The double decoder of the double-decoder-based image fusion method adopts an interactive decoding module, and aims to utilize the complementation potential of different modes and multiple types of information of image contents, including modal fusion information and context information. Instead of using different information alone to improve decoding characteristics in existing work, we use the interactive decoding module in the dual decoder as the basic module of the decoder, combining multiple information. The channel attention is then used to adaptively select useful information, and the pixels add different types of information for feature reconstruction. In addition, the network also has interaction between two decoders, the output information contains fusion information and specific information of two modes, each decoding step is added, namely the information is transmitted to two interaction decoding modules of the next stage, the decoded characteristic information is gradually improved, and further the improvement of the multi-mode image fusion quality is more facilitated.
Description of the drawings:
fig. 1 is a block flow diagram of a dual decoder-based image fusion method of the present invention.
Fig. 2 is a schematic diagram of a large receptive field feature extraction module of the image fusion method based on dual decoders of the invention.
Fig. 3 is a block diagram of an interactive decoding module of the image fusion method based on the dual decoders of the present invention.
The specific embodiment is as follows:
the present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.
Referring to fig. 1 to 3, the dual decoder-based image fusion method of the present invention includes the steps of:
(S1) shooting a multi-mode image by using cameras with different imaging modes, and recording the multi-mode image as an image A 1 、A 2
(S2) Multi-modality image A 1 、A 2 As the input of the network, the multi-mode feature map is obtained through a convolution layer and then through N large receptive field feature extraction modules;
(S3) respectively passing the two multi-mode feature maps through two interactive decoding modules, splicing decoding information between the two decoding modules with different modes on a channel for interactive fusion in the decoding process, repeating the steps for N times, gradually fusing, and then obtaining a fused image C through a convolution layer;
and (S4) constructing the neural network through the process, calculating a loss function value between a fusion image output by the neural network and an input image, and reversely transmitting a gradient of the loss function value to update parameters of the network model until the loss function value is converged, and stopping updating the parameters of the network model to obtain the trained neural network.
Referring to fig. 2, in step (S2), the large receptive field feature extraction module adopts residual connection, including a convolution layer with a convolution kernel size of 1*1, a gaussian error nonlinear activation function, a depth convolution layer with a convolution kernel size of 5*5, a depth convolution layer with a convolution kernel size of 5*5 and an expansion value of 3, and pixel normalization.
Referring to fig. 3, in step (S3), the interactive decoding module acquires the feature information of each layer and the fusion decoding information of the previous stage to perform pixel superposition and interactive decoding.
Referring to fig. 3, in step (S3), the interactive decoding module includes a convolution layer of convolution kernel size 3*3, channel attention, interpolation upsampling.
In addition, the neural network Loss function in this embodiment adopts a Loss function for comparing the similarity degree of the fusion result image and the fusion pre-image, where Loss function Loss is the combination of SSIM Loss, background content Loss and saliency target Loss, and the expression of the Loss function is as follows:
L SSIM =1-kSSIM(A 1 ,C)-(1-k)SSIM(A 2 ,C) (1)
Figure SMS_4
Figure SMS_5
Loss=δ 1 L SSIM2 L back3 L salient (4)
in the above
Figure SMS_6
For a gradient operator, h and w are respectively the height and width of an image, k can take different values according to different input mode images, k takes the value range of 0 < k < 1, and δ1+δ2+δ3 should be equal to 1.
In addition, the multi-modal image described in the present embodiment includes, but is not limited to, a visible light image, a short wave infrared image, a medium wave infrared image, a long wave infrared image, a polarized image.
In addition, the multi-modal image A described in this embodiment 1 For visible light image, A 2 For mid-wave or long-wave infrared images, the image resolution is 640 x 512.
In addition, the number N of repetitions of the module described in the present embodiment may be 4.
The foregoing is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the foregoing examples, but all technical solutions falling under the concept of the present invention fall within the scope of the present invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

Claims (8)

1. A dual decoder-based image fusion method, comprising the steps of:
(S1) shooting a multi-mode image by using cameras with different imaging modes, and recording the multi-mode image as an image A 1 、A 2
(S2) Multi-modality image A 1 、A 2 As the input of the network, the multi-mode feature map is obtained through a convolution layer and then through N large receptive field feature extraction modules;
(S3) respectively passing the two multi-mode feature maps through two interactive decoding modules, splicing decoding information between the two decoding modules with different modes on a channel for interactive fusion in the decoding process, repeating the steps for N times, gradually fusing, and then obtaining a fused image C through a convolution layer;
and (S4) constructing the neural network through the process, calculating a loss function value between a fusion image output by the neural network and an input image, and reversely transmitting a gradient of the loss function value to update parameters of the network model until the loss function value is converged, and stopping updating the parameters of the network model to obtain the trained neural network.
2. The dual decoder-based image fusion method of claim 1, wherein in step (S1), the multi-modal image includes, but is not limited to, a visible light image, a short wave infrared image, a medium wave infrared image, a long wave infrared image, a polarized image.
3. The dual decoder-based image fusion method according to claim 1, wherein in step (S1), the multi-mode image a 1 For visible light image, A 2 Is one of short wave, medium wave, long wave infrared image or polarized image.
4. The dual decoder-based image fusion method of claim 1, wherein in step (S2), the range of the number of module repetitions N is preferably 4.ltoreq.n.ltoreq.6.
5. The dual decoder-based image fusion method of claim 1, wherein in step (S2), the large receptive field feature extraction module uses residual connection, including convolution layer with convolution kernel size 1*1, gaussian error nonlinear activation function, depth convolution layer with convolution kernel size 5*5, depth convolution layer with convolution kernel size 5*5 and dilation value 3, and pixel normalization.
6. The dual decoder-based image fusion method according to claim 1, wherein in the step (S3), the interactive decoding module acquires each level of feature information and the fusion decoding information of the previous level to perform pixel superposition and interactive decoding.
7. The dual decoder-based image fusion method of claim 1, wherein in step (S3), the interactive decoding module includes a convolution layer of convolution kernel size 3*3, channel attention, interpolation upsampling.
8. The dual decoder-based image fusion method according to claim 1, wherein in the step (S4), a Loss function of the neural network is used to compare the similarity between the fusion result image and the pre-fusion image, the Loss function Loss is a combination of SSIM Loss, background content Loss and saliency target Loss, and the expression of the Loss function is as follows:
L SSIM =1-kSSIM(A 1 ,C)-(1-k)SSIM(A 2 ,C) (1)
Figure FSA0000296651100000011
Figure FSA0000296651100000012
Loss=δ 1 L SSIM2 L back3 L salient (4)
in the above
Figure FSA0000296651100000013
For a gradient operator, h and w are respectively the height and width of an image, k can take different values according to different input mode images, k takes the value range of 0 < k < 1, and δ1+δ2+δ3 should be equal to 1.
CN202310165488.2A 2023-02-24 2023-02-24 Image fusion method based on double decoders Pending CN116309215A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310165488.2A CN116309215A (en) 2023-02-24 2023-02-24 Image fusion method based on double decoders

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310165488.2A CN116309215A (en) 2023-02-24 2023-02-24 Image fusion method based on double decoders

Publications (1)

Publication Number Publication Date
CN116309215A true CN116309215A (en) 2023-06-23

Family

ID=86817870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310165488.2A Pending CN116309215A (en) 2023-02-24 2023-02-24 Image fusion method based on double decoders

Country Status (1)

Country Link
CN (1) CN116309215A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116721112A (en) * 2023-08-10 2023-09-08 南开大学 Underwater camouflage object image segmentation method based on double-branch decoder network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116721112A (en) * 2023-08-10 2023-09-08 南开大学 Underwater camouflage object image segmentation method based on double-branch decoder network
CN116721112B (en) * 2023-08-10 2023-10-24 南开大学 Underwater camouflage object image segmentation method based on double-branch decoder network

Similar Documents

Publication Publication Date Title
Guo et al. Dense scene information estimation network for dehazing
CN111915619A (en) Full convolution network semantic segmentation method for dual-feature extraction and fusion
CN110717868B (en) Video high dynamic range inverse tone mapping model construction and mapping method and device
CN110675328A (en) Low-illumination image enhancement method and device based on condition generation countermeasure network
CN102982520B (en) Robustness face super-resolution processing method based on contour inspection
CN111709900A (en) High dynamic range image reconstruction method based on global feature guidance
CN111008608B (en) Night vehicle detection method based on deep learning
CN110189268A (en) Underwater picture color correcting method based on GAN network
CN116309215A (en) Image fusion method based on double decoders
WO2024017093A1 (en) Image generation method, model training method, related apparatus, and electronic device
CN115330620A (en) Image defogging method based on cyclic generation countermeasure network
CN116757986A (en) Infrared and visible light image fusion method and device
KS et al. Deep multi-stage learning for hdr with large object motions
CN115731597A (en) Automatic segmentation and restoration management platform and method for mask image of face mask
Song et al. Real-scene reflection removal with raw-rgb image pairs
US11783454B2 (en) Saliency map generation method and image processing system using the same
CN112712481B (en) Structure-texture sensing method aiming at low-light image enhancement
CN116109538A (en) Image fusion method based on simple gate unit feature extraction
KR20230036343A (en) Apparatus for Image Fusion High Quality
Zhang et al. Single image dehazing via reinforcement learning
CN116071281A (en) Multi-mode image fusion method based on characteristic information interaction
CN115829868A (en) Underwater dim light image enhancement method based on illumination and noise residual error image
CN111476721B (en) Wasserstein distance-based image rapid enhancement method
CN110189370B (en) Monocular image depth estimation method based on full convolution dense connection neural network
CN114463192A (en) Infrared video distortion correction method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination