CN116309215A - Image fusion method based on double decoders - Google Patents
Image fusion method based on double decoders Download PDFInfo
- Publication number
- CN116309215A CN116309215A CN202310165488.2A CN202310165488A CN116309215A CN 116309215 A CN116309215 A CN 116309215A CN 202310165488 A CN202310165488 A CN 202310165488A CN 116309215 A CN116309215 A CN 116309215A
- Authority
- CN
- China
- Prior art keywords
- image
- fusion
- decoding
- fusion method
- loss
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007500 overflow downdraw method Methods 0.000 title claims abstract description 25
- 230000004927 fusion Effects 0.000 claims abstract description 36
- 230000002452 interceptive effect Effects 0.000 claims abstract description 19
- 238000000605 extraction Methods 0.000 claims abstract description 17
- 238000000034 method Methods 0.000 claims abstract description 12
- 230000008569 process Effects 0.000 claims abstract description 8
- 238000003384 imaging method Methods 0.000 claims abstract description 6
- 230000009977 dual effect Effects 0.000 claims description 13
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000010339 dilation Effects 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 16
- 238000004364 calculation method Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008033 biological extinction Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Processing (AREA)
Abstract
The invention belongs to the field of image fusion, and discloses an image fusion method based on double decoders, which is used for solving the problem of image fusion based on deep learningThe method is used for solving the problem that the characteristic extraction capability and fusion effect of complex multi-mode image processing shot by cameras with different imaging modes are poor, and comprises the following steps: will multimodal image A 1 、A 2 The method comprises the steps of extracting features through a large receptive field feature extraction module, then respectively passing through two interactive decoding modules, splicing and interactively fusing decoding information between the two decoding modules with different modes on a channel in the decoding process, reconstructing a fused image C, and calculating the fused image C and a multi-mode image A 1 、A 2 Updating network model parameters. The invention can effectively realize the fusion of complex multi-mode images and has the characteristics of better feature information extraction, less parameter quantity, high reconstruction precision, stronger robustness and the like.
Description
Technical field:
the invention relates to an image fusion method, in particular to an image fusion method based on double decoders.
The background technology is as follows:
with the progress of the age, the information provided by a single source image cannot meet the requirement of human vision or the requirement of target identification and detection, so that cameras with different imaging modes are required to shoot multi-mode images, and fusion images with richer detail information are acquired through an image fusion means.
The image fusion technology integrates all information of two or more images of the same scene with different sensors or different positions, time, brightness and the like into a single fusion image by overlapping and complementing, so as to comprehensively characterize an imaging scene and promote subsequent visual tasks. Compared with a single source image, the fusion image can obtain scene information of a target more clearly, and the quality and definition of the image are obviously improved.
The traditional image fusion method is relatively mature, requires complex fusion rules to be designed manually, and has high labor cost and calculation cost of image fusion. For complex multi-modal images, it is very difficult to design a general feature extraction method for the complex multi-modal images, which is highly dependent on manually designed features. With the rise of deep learning in recent years, an image fusion method based on the deep learning is also emerging, and a new idea is provided for image fusion. However, the image fusion method based on deep learning at the present stage has high network complexity and large calculation amount, and can also have the problems of inaccurate feature extraction, poor image fusion effect and the like for complex multi-mode images.
The invention comprises the following steps:
the invention aims to overcome the defects of the prior art and provides an image fusion method based on the non-activated feature extraction of a simple gate unit, which can realize the fusion of complex multi-mode images and has the characteristics of good feature information extraction, less parameter quantity, high reconstruction precision and stronger robustness.
The technical scheme for solving the technical problems is as follows:
an image fusion method based on double decoders, comprising the following steps:
(S1) shooting a multi-mode image by using cameras with different imaging modes, and recording the multi-mode image as an image A 1 、A 2 ;
(S2) Multi-modality image A 1 、A 2 As the input of the network, the multi-mode feature map is obtained through a convolution layer and then through N large receptive field feature extraction modules;
(S3) respectively passing the two multi-mode feature maps through two interactive decoding modules, splicing decoding information between the two decoding modules with different modes on a channel for interactive fusion in the decoding process, repeating the steps for N times, gradually fusing, and then obtaining a fused image C through a convolution layer;
and (S4) constructing the neural network through the process, calculating a loss function value between a fusion image output by the neural network and an input image, and reversely transmitting a gradient of the loss function value to update parameters of the network model until the loss function value is converged, and stopping updating the parameters of the network model to obtain the trained neural network.
Preferably, in step (S1), the multi-modal image includes, but is not limited to, a visible light image, a short wave infrared image, a medium wave infrared image, a long wave infrared image, a polarized image.
Preferably, in step (S1), the multi-modal image a 1 For visible light image, A 2 Is one of short wave, medium wave, long wave infrared image or polarized image.
Preferably, in the step (S2), the number of times N of the module repetition is preferably within a range of 4.ltoreq.N.ltoreq.6.
Preferably, in step (S2), the large receptive field feature extraction module adopts residual connection, including a convolution layer with a convolution kernel size of 1*1, a gaussian error nonlinear activation function, a depth convolution layer with a convolution kernel size of 5*5, a depth convolution layer with a convolution kernel size of 5*5 and an expansion value of 3, and pixel normalization.
Preferably, in step (S3), the interactive decoding module acquires each level of feature information and the fusion decoding information of the previous level to perform pixel superposition and interactive decoding.
Preferably, in step (S3), the interactive decoding module includes a convolution layer with a convolution kernel size of 3*3, channel attention, and interpolation upsampling.
Preferably, in step (S4), the neural network' S Loss function uses a Loss function for comparing the similarity between the fusion result image and the pre-fusion image, where Loss function Loss is a combination of SSIM Loss, background content Loss and saliency target Loss, and the expression of the Loss function is as follows:
L SSIM =1-kSSIM(A 1 ,C)-(1-k)SSIM(A 2 ,C) (1)
Loss=δ 1 L SSIM +δ 2 L back +δ 3 L salient (4)
in the aboveFor a gradient operator, h and w are respectively the height and width of an image, k can take different values according to different input mode images, k takes the value range of 0 < k < 1, and δ1+δ2+δ3 should be equal to 1.
Compared with the prior art, the invention has the following beneficial effects:
1. the image fusion method based on the double decoders adopts a large receptive field feature extraction module, uses large kernel convolution with separable depth to reduce the size of a model, and simultaneously increases receptive field. By using the large convolution kernel, information can be collected from a large area, the semantic information extraction capacity is higher, the large calculation burden caused by the large convolution kernel is reduced by using the designed large convolution kernel with separable depth, and parameters are reduced, so that better feature extraction performance is realized.
2. The image fusion method based on the double decoders uses the double decoders and carries out multi-mode fusion in the decoding stage. In existing methods, multi-modal fusion is typically performed during the encoding phase, but such fusion strategies are more difficult to optimize than methods that perform fusion during the decoding phase. In back propagation, the gradient computation path in the decoder is shorter than the path in the encoder, so the optimization process of the decoder is less affected by the gradient extinction or gradient explosion. Thus, the decoder is more easily optimized than the encoder.
3. The double decoder of the double-decoder-based image fusion method adopts an interactive decoding module, and aims to utilize the complementation potential of different modes and multiple types of information of image contents, including modal fusion information and context information. Instead of using different information alone to improve decoding characteristics in existing work, we use the interactive decoding module in the dual decoder as the basic module of the decoder, combining multiple information. The channel attention is then used to adaptively select useful information, and the pixels add different types of information for feature reconstruction. In addition, the network also has interaction between two decoders, the output information contains fusion information and specific information of two modes, each decoding step is added, namely the information is transmitted to two interaction decoding modules of the next stage, the decoded characteristic information is gradually improved, and further the improvement of the multi-mode image fusion quality is more facilitated.
Description of the drawings:
fig. 1 is a block flow diagram of a dual decoder-based image fusion method of the present invention.
Fig. 2 is a schematic diagram of a large receptive field feature extraction module of the image fusion method based on dual decoders of the invention.
Fig. 3 is a block diagram of an interactive decoding module of the image fusion method based on the dual decoders of the present invention.
The specific embodiment is as follows:
the present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.
Referring to fig. 1 to 3, the dual decoder-based image fusion method of the present invention includes the steps of:
(S1) shooting a multi-mode image by using cameras with different imaging modes, and recording the multi-mode image as an image A 1 、A 2 ;
(S2) Multi-modality image A 1 、A 2 As the input of the network, the multi-mode feature map is obtained through a convolution layer and then through N large receptive field feature extraction modules;
(S3) respectively passing the two multi-mode feature maps through two interactive decoding modules, splicing decoding information between the two decoding modules with different modes on a channel for interactive fusion in the decoding process, repeating the steps for N times, gradually fusing, and then obtaining a fused image C through a convolution layer;
and (S4) constructing the neural network through the process, calculating a loss function value between a fusion image output by the neural network and an input image, and reversely transmitting a gradient of the loss function value to update parameters of the network model until the loss function value is converged, and stopping updating the parameters of the network model to obtain the trained neural network.
Referring to fig. 2, in step (S2), the large receptive field feature extraction module adopts residual connection, including a convolution layer with a convolution kernel size of 1*1, a gaussian error nonlinear activation function, a depth convolution layer with a convolution kernel size of 5*5, a depth convolution layer with a convolution kernel size of 5*5 and an expansion value of 3, and pixel normalization.
Referring to fig. 3, in step (S3), the interactive decoding module acquires the feature information of each layer and the fusion decoding information of the previous stage to perform pixel superposition and interactive decoding.
Referring to fig. 3, in step (S3), the interactive decoding module includes a convolution layer of convolution kernel size 3*3, channel attention, interpolation upsampling.
In addition, the neural network Loss function in this embodiment adopts a Loss function for comparing the similarity degree of the fusion result image and the fusion pre-image, where Loss function Loss is the combination of SSIM Loss, background content Loss and saliency target Loss, and the expression of the Loss function is as follows:
L SSIM =1-kSSIM(A 1 ,C)-(1-k)SSIM(A 2 ,C) (1)
Loss=δ 1 L SSIM +δ 2 L back +δ 3 L salient (4)
in the aboveFor a gradient operator, h and w are respectively the height and width of an image, k can take different values according to different input mode images, k takes the value range of 0 < k < 1, and δ1+δ2+δ3 should be equal to 1.
In addition, the multi-modal image described in the present embodiment includes, but is not limited to, a visible light image, a short wave infrared image, a medium wave infrared image, a long wave infrared image, a polarized image.
In addition, the multi-modal image A described in this embodiment 1 For visible light image, A 2 For mid-wave or long-wave infrared images, the image resolution is 640 x 512.
In addition, the number N of repetitions of the module described in the present embodiment may be 4.
The foregoing is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the foregoing examples, but all technical solutions falling under the concept of the present invention fall within the scope of the present invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.
Claims (8)
1. A dual decoder-based image fusion method, comprising the steps of:
(S1) shooting a multi-mode image by using cameras with different imaging modes, and recording the multi-mode image as an image A 1 、A 2 ;
(S2) Multi-modality image A 1 、A 2 As the input of the network, the multi-mode feature map is obtained through a convolution layer and then through N large receptive field feature extraction modules;
(S3) respectively passing the two multi-mode feature maps through two interactive decoding modules, splicing decoding information between the two decoding modules with different modes on a channel for interactive fusion in the decoding process, repeating the steps for N times, gradually fusing, and then obtaining a fused image C through a convolution layer;
and (S4) constructing the neural network through the process, calculating a loss function value between a fusion image output by the neural network and an input image, and reversely transmitting a gradient of the loss function value to update parameters of the network model until the loss function value is converged, and stopping updating the parameters of the network model to obtain the trained neural network.
2. The dual decoder-based image fusion method of claim 1, wherein in step (S1), the multi-modal image includes, but is not limited to, a visible light image, a short wave infrared image, a medium wave infrared image, a long wave infrared image, a polarized image.
3. The dual decoder-based image fusion method according to claim 1, wherein in step (S1), the multi-mode image a 1 For visible light image, A 2 Is one of short wave, medium wave, long wave infrared image or polarized image.
4. The dual decoder-based image fusion method of claim 1, wherein in step (S2), the range of the number of module repetitions N is preferably 4.ltoreq.n.ltoreq.6.
5. The dual decoder-based image fusion method of claim 1, wherein in step (S2), the large receptive field feature extraction module uses residual connection, including convolution layer with convolution kernel size 1*1, gaussian error nonlinear activation function, depth convolution layer with convolution kernel size 5*5, depth convolution layer with convolution kernel size 5*5 and dilation value 3, and pixel normalization.
6. The dual decoder-based image fusion method according to claim 1, wherein in the step (S3), the interactive decoding module acquires each level of feature information and the fusion decoding information of the previous level to perform pixel superposition and interactive decoding.
7. The dual decoder-based image fusion method of claim 1, wherein in step (S3), the interactive decoding module includes a convolution layer of convolution kernel size 3*3, channel attention, interpolation upsampling.
8. The dual decoder-based image fusion method according to claim 1, wherein in the step (S4), a Loss function of the neural network is used to compare the similarity between the fusion result image and the pre-fusion image, the Loss function Loss is a combination of SSIM Loss, background content Loss and saliency target Loss, and the expression of the Loss function is as follows:
L SSIM =1-kSSIM(A 1 ,C)-(1-k)SSIM(A 2 ,C) (1)
Loss=δ 1 L SSIM +δ 2 L back +δ 3 L salient (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310165488.2A CN116309215A (en) | 2023-02-24 | 2023-02-24 | Image fusion method based on double decoders |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310165488.2A CN116309215A (en) | 2023-02-24 | 2023-02-24 | Image fusion method based on double decoders |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116309215A true CN116309215A (en) | 2023-06-23 |
Family
ID=86817870
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310165488.2A Pending CN116309215A (en) | 2023-02-24 | 2023-02-24 | Image fusion method based on double decoders |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116309215A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116721112A (en) * | 2023-08-10 | 2023-09-08 | 南开大学 | Underwater camouflage object image segmentation method based on double-branch decoder network |
-
2023
- 2023-02-24 CN CN202310165488.2A patent/CN116309215A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116721112A (en) * | 2023-08-10 | 2023-09-08 | 南开大学 | Underwater camouflage object image segmentation method based on double-branch decoder network |
CN116721112B (en) * | 2023-08-10 | 2023-10-24 | 南开大学 | Underwater camouflage object image segmentation method based on double-branch decoder network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Guo et al. | Dense scene information estimation network for dehazing | |
CN111915619A (en) | Full convolution network semantic segmentation method for dual-feature extraction and fusion | |
CN110717868B (en) | Video high dynamic range inverse tone mapping model construction and mapping method and device | |
CN110675328A (en) | Low-illumination image enhancement method and device based on condition generation countermeasure network | |
CN102982520B (en) | Robustness face super-resolution processing method based on contour inspection | |
CN111709900A (en) | High dynamic range image reconstruction method based on global feature guidance | |
CN111008608B (en) | Night vehicle detection method based on deep learning | |
CN110189268A (en) | Underwater picture color correcting method based on GAN network | |
CN116309215A (en) | Image fusion method based on double decoders | |
WO2024017093A1 (en) | Image generation method, model training method, related apparatus, and electronic device | |
CN115330620A (en) | Image defogging method based on cyclic generation countermeasure network | |
CN116757986A (en) | Infrared and visible light image fusion method and device | |
KS et al. | Deep multi-stage learning for hdr with large object motions | |
CN115731597A (en) | Automatic segmentation and restoration management platform and method for mask image of face mask | |
Song et al. | Real-scene reflection removal with raw-rgb image pairs | |
US11783454B2 (en) | Saliency map generation method and image processing system using the same | |
CN112712481B (en) | Structure-texture sensing method aiming at low-light image enhancement | |
CN116109538A (en) | Image fusion method based on simple gate unit feature extraction | |
KR20230036343A (en) | Apparatus for Image Fusion High Quality | |
Zhang et al. | Single image dehazing via reinforcement learning | |
CN116071281A (en) | Multi-mode image fusion method based on characteristic information interaction | |
CN115829868A (en) | Underwater dim light image enhancement method based on illumination and noise residual error image | |
CN111476721B (en) | Wasserstein distance-based image rapid enhancement method | |
CN110189370B (en) | Monocular image depth estimation method based on full convolution dense connection neural network | |
CN114463192A (en) | Infrared video distortion correction method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |