CN116152128A - High dynamic range multi-exposure image fusion model and method based on attention mechanism - Google Patents
High dynamic range multi-exposure image fusion model and method based on attention mechanism Download PDFInfo
- Publication number
- CN116152128A CN116152128A CN202211489946.XA CN202211489946A CN116152128A CN 116152128 A CN116152128 A CN 116152128A CN 202211489946 A CN202211489946 A CN 202211489946A CN 116152128 A CN116152128 A CN 116152128A
- Authority
- CN
- China
- Prior art keywords
- image
- dynamic range
- attention mechanism
- feature
- high dynamic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007246 mechanism Effects 0.000 title claims abstract description 47
- 230000004927 fusion Effects 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000000605 extraction Methods 0.000 claims abstract description 28
- 238000007500 overflow downdraw method Methods 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims description 19
- 239000010410 layer Substances 0.000 claims description 15
- 238000010586 diagram Methods 0.000 claims description 13
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 239000002356 single layer Substances 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 230000001537 neural effect Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 claims description 2
- 230000002349 favourable effect Effects 0.000 abstract description 7
- 230000006870 function Effects 0.000 description 16
- 238000013135 deep learning Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000010339 dilation Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000011514 reflex Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
- 230000004580 weight loss Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20172—Image enhancement details
- G06T2207/20208—High dynamic range [HDR] image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a high dynamic range multi-exposure image fusion method based on an attention mechanism, and belongs to the technical field of image processing. Firstly, two different exposure images of a target scene are respectively input into two groups of feature extraction modules with the same structure, and two groups of high-dimensional feature images corresponding to the two different exposure images of the target scene are obtained. And then respectively sending the two groups of high-dimensional feature images into corresponding attention mechanism modules as input to highlight and fuse favorable image features, and inhibit the features of low-quality areas such as undersaturation, supersaturation and the like to obtain pure high-dimensional features required by reconstructing a fused image. The feature reconstruction module performs fusion reconstruction on the high-dimensional features of the two groups of different exposure images output by the attention mechanism module to obtain a high dynamic range image. The method improves the quality and the robustness of the fusion of the high dynamic range multi-exposure images.
Description
Technical Field
The invention relates to a multi-exposure image fusion model and a method for high dynamic range imaging, and belongs to the technical field of image processing.
Background
Natural scenes have a very wide dynamic range, e.g. a weak starlight brightness of about 10 -4 cd/m 2 The bright light brightness range of sidereal is 10 5 ~10 9 cd/m 2 When a digital single-lens reflex is used for shooting and recording, the dynamic range of the digital camera is limited, so that the shot photos are over-exposed and under-exposed. The high dynamic range multi-exposure image fusion technology aims at expanding the dynamic range of an image and solving the problem that the dynamic range of a digital camera is limited and the high dynamic range image cannot be captured. In recent years, as the level of computing power increases, high dynamic range multi-exposure image fusion methods are increasingly moving from traditional transform-based methods to deep learning-based methods. The traditional transformation-based method generally utilizes certain image transformation (Laplacian pyramid, wavelet change, sparse representation and the like) to convert an input image into a feature map, and performs feature fusion according to a manually defined fusion strategy to obtain a high dynamic range image containing rich information. The deep learning-based method overcomes the defect that the traditional high dynamic range multi-exposure image fusion method cannot adaptively learn image characteristics, and generates a high dynamic range image with richer details than the traditional method. However, the multiple exposure images have the characteristics of information complementation, brightness, chromaticity and complex corresponding relation of structures due to different exposure time of the multiple exposure images and objects in different exposure images of the same scene. Thus, existing deep learning-based high-mobilityThe state range multi-exposure image fusion method still has the problems of image distortion, detail loss, incapability of highlighting and fusing favorable image characteristics and the like.
Disclosure of Invention
Technical problem to be solved
Aiming at the problems that the existing high dynamic range multi-exposure image fusion method has image distortion, detail loss, insufficient utilization of complementary information of a source image sequence and the like, the invention provides a high dynamic range multi-exposure image fusion model and a method based on an attention mechanism, and the method further improves the quality and the robustness of high dynamic range multi-exposure image fusion.
Technical proposal
The high dynamic range multi-exposure image fusion model based on the attention mechanism is characterized by comprising a feature extraction module, an attention mechanism module and a feature reconstruction module, wherein two different exposure images of a target scene are respectively input into two groups of feature extraction modules with the same structure, and two groups of high-dimensional feature images corresponding to the two different exposure images of the target scene are obtained; then, the two groups of high-dimensional feature images are used as input and respectively sent to corresponding attention mechanism modules, so that pure high-dimensional features required by reconstructing the fusion image are obtained; the feature reconstruction module performs fusion reconstruction on the high-dimensional features of the two groups of different exposure images output by the attention mechanism module to obtain a high dynamic range image.
A high dynamic range multi-exposure image fusion method based on an attention mechanism is characterized by comprising the following steps:
step 1: reading a training underexposure image U and an overexposure image O; cutting the read U and O into a plurality of sub-images M, wherein the M is w multiplied by h multiplied by c, w and h represent the width and height of M, c represents the channel number of M, and then carrying out data enhancement on the cut sub-images M;
step 2: the method comprises the steps of taking a sub-image M as an input source, constructing a single-layer convolutional neural network layer through a feature extraction module, wherein the convolutional kernel size is W multiplied by H, and converting U and O into 64-dimensional features f through the convolutional neural layer respectively 1 ,f 1 The size of (2) is w×h×64, and the calculation method is as follows:
f 1 =C 1 (M)(1)
wherein, C1 represents a corresponding convolution operation;
step 3: at f 1 For input source, inputting to the Unet network for multi-scale feature extraction of image features to obtain high-dimensional multi-scale features f containing 64 channels 2 ,f 2 The size is w multiplied by h multiplied by 64, and the calculation method is as follows:
f 2 =U(f 1 )(2)
wherein U represents convolution operation of the Unet network;
step 4: constructing two attention mechanism modules A with the same structure, and using the attention mechanism modules A to respectively output characteristic graphs f of different exposure images output by a Unet network 2 And performing Squeeze operation, and coding the whole spatial feature on the channel into a global feature by adopting a global average pooling mode, wherein the calculation formula is as follows:
wherein F is sq (. Cndot.) represents a Squeeze operation, i, j represents a pixel, R C Represents the C dimension, f c Is the result of the Squeeze operation; then adopting an accounting operation on global features, adopting two full-connection operations for reducing model complexity and improving generalization capability, adopting a ReLU activation function to perform nonlinear processing between full-connection, and finally outputting weight vectors through a normalization function Sigmoid, wherein the accounting operation enables a network to learn the relation among all channels and also obtains the weights f of different channels 3 The calculation method is as follows:
f 3 =F ex (f c ,W)=σ(g(f c ,W))=σ(W 2 ReLU(W 1 f c )) (4)
in the method, in the process of the invention,represents W 1 Dimension is-> Represents W 2 Dimension is->r is a scaling factor;
step 5: image feature f output by utilizing multiplication operation 2 Weight f of each channel learned by attention mechanism 3 Multiplying to obtain final image feature f 4 The calculation method is as follows:
f 4 =F scale (f 2 ,f 3 )=f 2 ·f 3 (5)
where, represents a matrix multiplication operation;
step 6: high-dimensional image features f of underexposed and overexposed images by stitching operation u,4 ,f o,4 The feature map F0 was obtained, and the size of F0 was w×h×128, calculated as follows:
F 0 =concat(f u,4 +f o,4 ) (6)
wherein f u,4 And f o,4 Respectively representing the image characteristics obtained after the underexposure image and the overexposure image pass through an attention mechanism, wherein concat represents characteristic splicing operation;
step 7: by F 0 The method comprises the steps that as an input source, a high dynamic range image is obtained through a characteristic reconstruction module, and the characteristic reconstruction module firstly utilizes a single-layer convolutional neural network layer to splice a characteristic image F 0 Map F converted to 64 channels 1 ,F 1 The size is w multiplied by h multiplied by 64, and then the characteristic diagram F is obtained 1 Output feature map F provided to DRDB unit 2 Wherein the DRDB unit is obtained by improving residual error dense unit based on expansion convolution, and finally, the characteristic diagram F is sequentially convolved by using 2 convolution layers 2 Obtaining a characteristic diagram F 3 Finally, a high dynamic range image is obtained, wherein F 3 The size of (2) is w×h×16, and the calculation method is as follows:
F 1 =C 1 (F 0 ) (7)
F 2 =DRDB(F 1 ) (8)
F 3 =C 2 (F 2 ) (9)
HDR=C 3 (F 3 ) (10)
wherein DRDB represents a dilated residual dense unit convolution operation, C 1 ,C 2 ,C 3 Representing a single convolution layer, HDR representing a high dynamic range image;
step 8: designing a loss function, iterating, and updating a model, wherein the loss function is as follows:
Loss=λL SSIM +L content (16)
L SSIM =α o SSIM O,F +α u SSIM U,F (12)
L content =β O L O,F +β U L U,F (15)
wherein SSIM O,F ,SSIM U,F Respectively representing the structural similarity of the overexposed image O and the underexposed image U with the fusion image F, lambda represents the super parameter and alpha o And alpha u The weight coefficients, beta, of the overexposed image O and the underexposed image U respectively O And beta U The weight coefficients of the overexposed image O and the underexposed image U are respectively L O,F 、L U,F The content similarity of the overexposed image O and the underexposed image U with the fusion image F is respectively represented;
step 9: and reading the underexposure U and the overexposed image O which need to be processed, and obtaining the high dynamic range image HDR through a training model with complete parameters.
Preferentially: w=256, h=256, c=3 as described in step 1.
Preferentially: in the step 1, data enhancement is realized by rotating, horizontally overturning and vertically overturning.
Preferentially: w=3 and h=3 as described in step 2.
Preferentially: and 8, realizing model updating by adopting an Adam optimizer.
Advantageous effects
The invention provides a novel high dynamic range multi-exposure image fusion model and method based on an attention mechanism from end to end, and fusion quality and robustness are improved.
1. The method comprises the steps of extracting features of a target scene in underexposure and overexposure images by using a weight separation double-channel feature extraction module, and obtaining a feature map with stronger texture information characterization capability;
2. introducing an attention mechanism into a multi-exposure image task, focusing local details and global features of underexposure and overexposure images from local to global, and highlighting image features favorable for fusion;
3. in order to reconstruct the fusion image more accurately, the L2 norm and the structural similarity SSIM are used as constraint criteria of the neural network to design a loss function, smaller similarity difference between the source image sequence and the fusion image is obtained, and more accurate convergence of the neural network model is achieved.
The operation enables the network to capture more detail information and generate a high-dynamic-range multi-exposure fusion image with better quality.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.
Fig. 1: high dynamic range multi-exposure image fusion method flow chart based on attention mechanism;
fig. 2: extracting a network structure diagram of the characteristics;
fig. 3: attention mechanism network structure diagram;
fig. 4: reconstructing a network structure diagram by the characteristics;
fig. 5: the flow chart of the invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The invention designs a high dynamic range multi-exposure image fusion algorithm network frame based on an attention mechanism, which consists of three core modules, namely a feature extraction module, an attention mechanism module and a feature reconstruction module. Firstly, two different exposure images of a target scene are respectively input into two groups of feature extraction modules with the same structure, and two groups of high-dimensional feature images corresponding to the two different exposure images of the target scene are obtained. And then respectively sending the two groups of high-dimensional feature images into corresponding attention mechanism modules as input to highlight and fuse favorable image features, and inhibit the features of low-quality areas such as undersaturation, supersaturation and the like to obtain pure high-dimensional features required by reconstructing a fused image. The feature reconstruction module performs fusion reconstruction on the high-dimensional features of the two groups of different exposure images output by the attention mechanism module to obtain a high dynamic range image.
A high dynamic range multi-exposure image fusion method based on an attention mechanism comprises the following 5 aspects:
(1) The exposure time of the multi-exposure image is different, so that the information such as brightness, contrast, texture, contour and the like of different exposure images of the same scene is different. If the multi-exposure images are directly fused, the characteristics are extracted through the same network, and the generated sharing weight can destroy the inherent characteristics of the target scene under different exposure. The feature extraction module adopts a double-channel architecture, two different exposure images in the multi-exposure image sequence are selected as input and sent to the feature extraction module, and the two different exposure images respectively pass through the feature extraction network with the same structure but without sharing any learning parameters to simultaneously perform feature extraction.
(2) And extracting an infrastructure network structure by using the Unet network as a characteristic. The feature extraction module consists of 1 independent convolution layer and 1 Unet network, wherein the Unet network comprises convolution, downsampling, pooling, upsampling and splicing operations. Firstly, a convolution kernel with the size of 3 multiplied by 3 is used for extracting low-level image features, meanwhile, 256 multiplied by 256 input images are converted into 64-channel high-dimensional features, multi-scale feature extraction of the image features is realized by utilizing a Unet network, shallow-level image features and deep semantic features are stacked in a feature stitching mode, and an effective solution is provided for preserving image structures and texture features. After the fine extraction of the features is completed by the Unet network, a 64-channel high-dimensional multi-scale feature map is output, and the feature map is used as an input source of a follow-up attention mechanism module.
(3) The attention mechanism is utilized to keep rich detail information of the multi-exposure image, and image characteristics favorable for fusion are highlighted so as to correct local distortion and information loss of the fusion image. Similar to the feature extraction module, the attention mechanism module adopts a dual channel design with the same structure.
(4) And reconstructing the high dynamic range image by using a characteristic reconstruction module. The expansion residual error dense unit used by the feature reconstruction module is obtained by improving the residual error dense unit based on expansion convolution, the used expansion residual error dense unit fully utilizes image features of different network levels, and simultaneously uses a larger receptive field to infer details of undersaturation and saturation region loss while retaining details of low dynamic range images.
(5) With designed L-based 2 Content loss of norms and multiple loss functions based on structural loss of SSIM. The network model is further constrained, and the generalization capability of the network model is improved.
The characteristic extraction module is specifically as follows:
the multi-exposure image is transmitted into a feature extraction module for feature extraction to obtain a high-dimensional multi-scale feature map, the main basic framework is shown in fig. 2, and the steps are as follows:
step 1: the trained underexposure image U and overexposed image O are read. And cutting the read U and O into a plurality of sub-images M, wherein the size of M is w multiplied by h multiplied by c, w and h represent the width and height of M, c represents the channel number of M, w=256, h=256 and c=3, and then carrying out data enhancement on the cut sub-images M in a rotating, horizontal overturning and vertical overturning mode.
Step 2: the sub-image M is taken as an input source. Then, a single-layer convolutional neural network layer is constructed through a feature extraction module, wherein the convolution kernel size is w×h, w=3, and h=3. By convolving the nerve layersU and O are converted into 64-dimensional features f 1 ,f 1 The size of (2) is w×h×64, and the calculation method is as follows:
f 1 =C 1 (M)(1)
wherein C is 1 Representing the corresponding convolution operation.
Step 3: at f 1 As an input source, multi-scale feature extraction of image features is realized through a Unet network, the network structure is shown in fig. 2, shallow image features and deep semantic features are stacked in a feature stitching mode, and the Unet network comprises upsampling, pooling, convolution and activation function operation. After the fine extraction of the features is completed by the Unet network, a high-dimensional multi-scale feature f containing 64 channels is determined 2 ,f 2 The size is w×h×64. Similar to building image feature f 1 Can calculate the high-dimensional multi-scale characteristic f 2 :
f 2 =U(f 1 )(2)
Where U represents the convolution operation of the Unet network. Image features f of underexposure image U and overexposed image O 1 The training is carried out simultaneously through feature extraction networks which are identical in structure and do not share any learning parameters.
The attention mechanism module is specifically as follows:
the high-dimensional image features output by the feature extraction module are transmitted into the attention mechanism module, the favorable interest channel features are highlighted and fused, the non-interest channel features are restrained, the network structure is shown in figure 3, and the steps are as follows:
step 1: two attention mechanism modules A with the same structure are constructed, the network structure is shown in figure 3, and the attention mechanism modules A are used for retaining rich detail information of the multi-exposure image and highlighting the image characteristics favorable for fusion so as to correct the local distortion and information loss of the fusion image.
Step 2: the attention mechanism module A outputs characteristic graphs f of different exposure images to the Unet network respectively 2 And performing Squeeze operation, and coding the whole spatial feature on the channel into a global feature by adopting a global average pooling mode, wherein the calculation formula is as follows:
f in the formula sq (. Cndot.) represents a Squeeze operation, i, j represents a pixel, R C Represents the C dimension, f c Is the result of the Squeeze operation. Then adopting an accounting operation on global features, adopting two full-connection operations for reducing model complexity and improving generalization capability, adopting a ReLU activation function to perform nonlinear processing between full-connection, and finally outputting weight vectors through a normalization function Sigmoid, wherein the accounting operation enables a network to learn the relation among all channels and also obtains the weights f of different channels 3 The calculation method is as follows:
f 3 =F ex (f c ,W)=σ(g(f c ,W))=σ(W 2 ReLU(W 1 f c )) (4)
wherein the method comprises the steps ofRepresents W 1 Dimension is-> Represents W 2 Dimension is->r is the scaling factor.
Step 3: image feature f output by utilizing multiplication operation 2 Weight f of each channel learned by attention mechanism 3 Multiplying to obtain final image feature f 4 The calculation method is as follows:
f 4 =F scale (f 2 ,f 3 )=f 2 ·f 3 (5)
in the formula, the expression matrix multiplication operation. The whole operation can be seen as learning the weight coefficient of each channel, so that the model has more distinguishing ability on the characteristics of each channel.
The characteristic reconstruction module is specifically as follows:
the feature reconstruction module reconstructs the high-dimensional pure features of the different exposure images obtained in the previous step to generate a high dynamic range image, and the network structure is shown in fig. 4, and the steps are as follows:
step 1: high-dimensional image features f of underexposed and overexposed images by stitching operation u,4 ,f o,4 Obtaining a characteristic diagram F 0 ,F 0 The size of (2) is w×h×128, and the calculation method is as follows:
F 0 =concat(f u,4 +f o,4 ) (6)
f in u,4 And f o,4 Respectively representing the image characteristics obtained after the underexposure image and the overexposure image pass through the attention mechanism, and concat represents the characteristic splicing operation.
Step 2: by F 0 As an input source, a high dynamic range image is obtained through a characteristic reconstruction module, the network structure is shown as figure 4, and the characteristic reconstruction module firstly uses a single-layer convolutional neural network layer to splice a characteristic diagram F 0 Map F converted to 64 channels 1 ,F 1 The size is w multiplied by h multiplied by 64, and then the characteristic diagram F is obtained 1 Output feature map F provided to DRDB unit 2 Wherein the DRDB unit is obtained by improving residual dense units (Residual Dense Block, RDB) based on dilation convolution, and finally sequentially convolving the feature map F with 2 convolution layers 2 Obtaining a characteristic diagram F 3 And a high dynamic range image, wherein F 3 The size of (2) is w×h×16, and the calculation method is as follows:
F 1 =C 1 (F 0 )(7)
F 2 =DRDB(F 1 )(8)
F 3 =C 2 (F 2 )(9)
HDR=C 3 (F 3 )(10)
wherein DRDB represents a dilated residual dense unit convolution operation, C 1 ,C 2 ,C 3 Representing a single convolution layer, HDR represents a high dynamic range image.
The loss function is specifically as follows:
the loss function determines the type of image features extracted and the proportional relationship between the different types of image features. In order to meet the requirements that the fusion image not only contains detailed information of a bright part area of the underexposure image and a dark part area of the overexposure image, but also contains brightness information of different exposure images, and simultaneously meets the requirements of visual perception characteristics of human eyes. The invention designs a method based on L 2 A multi-loss function of the content loss of norms and the SSIM-based structural loss, comprising the steps of:
step 1: the structural similarity metric SSIM models the loss and distortion of similarity of the source image sequence and the fused image based on the similarity of the luminance features, contrast, and structural information. Let x be the input image and y be the output image, the mathematical expression is:
wherein μ and σ represent the mean and standard deviation, respectively, σ xy Representing the covariance of x, y, C 1 ,C 2 And C 3 Is a constant coefficient. The distortion of the source image sequence and the fusion image in three aspects of brightness, contrast and structure is fully considered, and the structural loss L is designed for the multi-exposure image fusion task SSIM . F represents the fused image, L SSIM The mathematical expression of (2) is:
L SSIM =α o SSIM O,F +α u SSIM U,F (12)
wherein SSIM O,F ,SSIM U,F Respectively representing the structural similarity of the overexposed image O and the underexposed image U with the fusion image F, alpha o And alpha u The weight coefficients of the overexposed image O and the underexposed image U are respectively, and in the multi-exposure image fusion task, the overexposed image and the underexposed image have the same texture details, but the brightness intensity is too high or too low. So for the weight coefficient alpha o And alpha u The same weight is set to balance to obtain the brightness intensity and texture detail with proper size, which can be expressedThe method comprises the following steps:
α o =α u (13)
content loss L content The texture detail information distortion of the multi-exposure image sequence and the fusion image is guaranteed to be minimum, meanwhile, the interference of noise is avoided, and the content loss is calculated as follows:
L x,y =||x-y|| 2 (14)
calculating Euclidean distance between pixel points of an input image x and an output image y in the formula, wherein I I.I.I.I 2 Is L 2 Norms. Content loss may be defined as:
L content =β O L O,F +β U L U,F (15)
wherein beta is O And beta U The weighting coefficients of the overexposed image O and the underexposed image U, respectively, are similar to the structural loss, beta O And beta U With the same weight coefficients. In order to realize weight balance between the structure loss function and the content loss function, the super parameter lambda is used for endowing corresponding weight to the structure loss so as to improve the learning capacity of the model. To sum up, the AMEFNet overall loss function can be expressed as:
Loss=λL SSIM +L content (16)
step 2: constraint by loss function and selecting Adam optimizer to obtain parameter beta 1 =0.9,β 2 =0.999, initial learning rate of 10 -4 The learning rate is attenuated by 0.5 times for every 50 iterations, so that the Loss weight Loss is reduced, and model updating is realized.
Step 3: judging whether all the image pairs in the training set are processed or not, and finishing the set iteration times epoch, wherein the epoch is set to be 1000. And (3) ending the algorithm if the processing is finished, and obtaining an attention-based high dynamic range multi-exposure image fusion model AMEFNet, otherwise, executing the step (2).
High dynamic range multi-exposure image generation
Step 1: the underexposure U and the overexposed image O which need to be processed are read, and the high dynamic range image HDR is obtained through a training model AMEFNet with complete parameters, and the calculation method is as follows:
HDR = AMEFNet (U,O) (17)
while the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made without departing from the spirit and scope of the invention.
Claims (6)
1. The high dynamic range multi-exposure image fusion model based on the attention mechanism is characterized by comprising a feature extraction module, an attention mechanism module and a feature reconstruction module, wherein two different exposure images of a target scene are respectively input into two groups of feature extraction modules with the same structure, and two groups of high-dimensional feature images corresponding to the two different exposure images of the target scene are obtained; then, the two groups of high-dimensional feature images are used as input and respectively sent to corresponding attention mechanism modules, so that pure high-dimensional features required by reconstructing the fusion image are obtained; the feature reconstruction module performs fusion reconstruction on the high-dimensional features of the two groups of different exposure images output by the attention mechanism module to obtain a high dynamic range image.
2. A high dynamic range multi-exposure image fusion method based on an attention mechanism realized by the model of claim 1, which is characterized by comprising the following steps:
step 1: reading a training underexposure image U and an overexposure image O; cutting the read U and O into a plurality of sub-images M, wherein the M is w multiplied by h multiplied by c, w and h represent the width and height of M, c represents the channel number of M, and then carrying out data enhancement on the cut sub-images M;
step 2: the method comprises the steps of taking a sub-image M as an input source, constructing a single-layer convolutional neural network layer through a feature extraction module, wherein the convolutional kernel size is W multiplied by H, and converting U and O into 64-dimensional features f through the convolutional neural layer respectively 1 ,f 1 The size of (2) is w×h×64, and the calculation method is as follows:
f 1 =C 1 (M) (1) wherein C1 represents a corresponding convolution operation;
step 3: at f 1 For input source, inputting to the Unet network for multi-scale feature extraction of image features to obtain high-dimensional multi-scale features f containing 64 channels 2 ,f 2 The size is w multiplied by h multiplied by 64, and the calculation method is as follows:
f 2 =U(f 1 ) In the formula (2), U represents a convolution operation of the Unet network;
step 4: constructing two attention mechanism modules A with the same structure, and using the attention mechanism modules A to respectively output characteristic graphs f of different exposure images output by a Unet network 2 And performing Squeeze operation, and coding the whole spatial feature on the channel into a global feature by adopting a global average pooling mode, wherein the calculation formula is as follows:
wherein F is sq (. Cndot.) represents a Squeeze operation, i, j represents a pixel, R C Represents the C dimension, f c Is the result of the Squeeze operation; then adopting an accounting operation on global features, adopting two full-connection operations for reducing model complexity and improving generalization capability, adopting a ReLU activation function to perform nonlinear processing between full-connection, and finally outputting weight vectors through a normalization function Sigmoid, wherein the accounting operation enables a network to learn the relation among all channels and also obtains the weights f of different channels 3 The calculation method is as follows:
f 3 =F ex (f c ,W)=σ(g(f c ,W))=σ(W 2 ReLU(W 1 f c )) (4)
in the method, in the process of the invention,represents W 1 Dimension is-> Represents W 2 Dimension is->r is a scaling factor;
step 5: image feature f output by utilizing multiplication operation 2 Weight f of each channel learned by attention mechanism 3 Multiplying to obtain final image feature f 4 The calculation method is as follows:
f 4 =F scale (f 2 ,f 3 )=f 2 ·f 3 (5)
where, represents a matrix multiplication operation;
step 6: high-dimensional image features f of underexposed and overexposed images by stitching operation u,4 ,f o,4 The feature map F0 was obtained, and the size of F0 was w×h×128, calculated as follows:
F 0 =concat(f u,4 +f o,4 ) (6)
wherein f u,4 And f o,4 Respectively representing the image characteristics obtained after the underexposure image and the overexposure image pass through an attention mechanism, wherein concat represents characteristic splicing operation;
step 7: by F 0 The method comprises the steps that as an input source, a high dynamic range image is obtained through a characteristic reconstruction module, and the characteristic reconstruction module firstly utilizes a single-layer convolutional neural network layer to splice a characteristic image F 0 Map F converted to 64 channels 1 ,F 1 The size is w multiplied by h multiplied by 64, and then the characteristic diagram F is obtained 1 Output feature map F provided to DRDB unit 2 Wherein the DRDB unit is obtained by improving residual error dense unit based on expansion convolution, and finally, the characteristic diagram F is sequentially convolved by using 2 convolution layers 2 Obtaining a characteristic diagram F 3 Finally, a high dynamic range image is obtained, wherein F 3 The size of (2) is w×h×16, and the calculation method is as follows:
F 1 =C 1 (F 0 ) (7)
F 2 =DRDB(F 1 ) (8)
F 3 =C 2 (F 2 ) (9)
HDR=C 3 (F 3 ) (10)
wherein DRDB represents a dilated residual dense unit convolution operation, C 1 ,C 2 ,C 3 Representing a single convolution layer, HDR representing a high dynamic range image;
step 8: designing a loss function, iterating, and updating a model, wherein the loss function is as follows:
Loss=λL SSIM +L content (16)
L SSIM =α o SSIM O,F +α u SSIM U,F (12)
L content =β O L O,F +β U L U,F (15)
wherein SSIM O,F ,SSIM U,F Respectively representing the structural similarity of the overexposed image O and the underexposed image U with the fusion image F, lambda represents the super parameter and alpha o And alpha u The weight coefficients, beta, of the overexposed image O and the underexposed image U respectively O And beta U The weight coefficients of the overexposed image O and the underexposed image U are respectively L O,F 、L U,F The content similarity of the overexposed image O and the underexposed image U with the fusion image F is respectively represented;
step 9: and reading the underexposure U and the overexposed image O which need to be processed, and obtaining the high dynamic range image HDR through a training model with complete parameters.
3. The high dynamic range multi-exposure image fusion method based on the attention mechanism of claim 2, wherein: w=256, h=256, c=3 as described in step 1.
4. The high dynamic range multi-exposure image fusion method based on the attention mechanism of claim 2, wherein: in the step 1, data enhancement is realized by rotating, horizontally overturning and vertically overturning.
5. The high dynamic range multi-exposure image fusion method based on the attention mechanism of claim 2, wherein: w=3 and h=3 as described in step 2.
6. The high dynamic range multi-exposure image fusion method based on the attention mechanism of claim 2, wherein: and 8, realizing model updating by adopting an Adam optimizer.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111428200.3A CN114331931A (en) | 2021-11-26 | 2021-11-26 | High dynamic range multi-exposure image fusion model and method based on attention mechanism |
CN2021114282003 | 2021-11-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116152128A true CN116152128A (en) | 2023-05-23 |
Family
ID=81047436
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111428200.3A Withdrawn CN114331931A (en) | 2021-11-26 | 2021-11-26 | High dynamic range multi-exposure image fusion model and method based on attention mechanism |
CN202211489946.XA Pending CN116152128A (en) | 2021-11-26 | 2022-11-25 | High dynamic range multi-exposure image fusion model and method based on attention mechanism |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111428200.3A Withdrawn CN114331931A (en) | 2021-11-26 | 2021-11-26 | High dynamic range multi-exposure image fusion model and method based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN114331931A (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116342455B (en) * | 2023-05-29 | 2023-08-08 | 湖南大学 | Efficient multi-source image fusion method, system and medium |
-
2021
- 2021-11-26 CN CN202111428200.3A patent/CN114331931A/en not_active Withdrawn
-
2022
- 2022-11-25 CN CN202211489946.XA patent/CN116152128A/en active Pending
Also Published As
Publication number | Publication date |
---|---|
CN114331931A (en) | 2022-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lv et al. | Attention guided low-light image enhancement with a large scale low-light simulation dataset | |
CN111311629B (en) | Image processing method, image processing device and equipment | |
CN112233038B (en) | True image denoising method based on multi-scale fusion and edge enhancement | |
Yue et al. | Supervised raw video denoising with a benchmark dataset on dynamic scenes | |
CN112183637B (en) | Single-light-source scene illumination re-rendering method and system based on neural network | |
CN110570377A (en) | group normalization-based rapid image style migration method | |
CN112541877B (en) | Defuzzification method, system, equipment and medium for generating countermeasure network based on condition | |
CN112837245B (en) | Dynamic scene deblurring method based on multi-mode fusion | |
Wang et al. | Joint iterative color correction and dehazing for underwater image enhancement | |
CN111986084A (en) | Multi-camera low-illumination image quality enhancement method based on multi-task fusion | |
CN110363068B (en) | High-resolution pedestrian image generation method based on multiscale circulation generation type countermeasure network | |
CN110225260B (en) | Three-dimensional high dynamic range imaging method based on generation countermeasure network | |
Ma et al. | Meta PID attention network for flexible and efficient real-world noisy image denoising | |
Prabhakar et al. | Labeled from unlabeled: Exploiting unlabeled data for few-shot deep hdr deghosting | |
CN113344773B (en) | Single picture reconstruction HDR method based on multi-level dual feedback | |
CN114862698B (en) | Channel-guided real overexposure image correction method and device | |
WO2022100490A1 (en) | Methods and systems for deblurring blurry images | |
CN112419191A (en) | Image motion blur removing method based on convolution neural network | |
CN116152128A (en) | High dynamic range multi-exposure image fusion model and method based on attention mechanism | |
Zhang et al. | MFFE: multi-scale feature fusion enhanced net for image dehazing | |
CN113810597B (en) | Rapid image and scene rendering method based on semi-predictive filtering | |
CN109002802A (en) | Video foreground separation method and system based on adaptive robust principal component analysis | |
CN115311149A (en) | Image denoising method, model, computer-readable storage medium and terminal device | |
Yang et al. | An end‐to‐end perceptual enhancement method for UHD portrait images | |
Peng et al. | RAUNE-Net: A Residual and Attention-Driven Underwater Image Enhancement Method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |