CN116152128A - High dynamic range multi-exposure image fusion model and method based on attention mechanism - Google Patents

High dynamic range multi-exposure image fusion model and method based on attention mechanism Download PDF

Info

Publication number
CN116152128A
CN116152128A CN202211489946.XA CN202211489946A CN116152128A CN 116152128 A CN116152128 A CN 116152128A CN 202211489946 A CN202211489946 A CN 202211489946A CN 116152128 A CN116152128 A CN 116152128A
Authority
CN
China
Prior art keywords
image
dynamic range
attention mechanism
feature
high dynamic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211489946.XA
Other languages
Chinese (zh)
Inventor
白本督
李俊鹏
孙爱晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Posts and Telecommunications
Original Assignee
Xian University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Posts and Telecommunications filed Critical Xian University of Posts and Telecommunications
Publication of CN116152128A publication Critical patent/CN116152128A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20208High dynamic range [HDR] image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a high dynamic range multi-exposure image fusion method based on an attention mechanism, and belongs to the technical field of image processing. Firstly, two different exposure images of a target scene are respectively input into two groups of feature extraction modules with the same structure, and two groups of high-dimensional feature images corresponding to the two different exposure images of the target scene are obtained. And then respectively sending the two groups of high-dimensional feature images into corresponding attention mechanism modules as input to highlight and fuse favorable image features, and inhibit the features of low-quality areas such as undersaturation, supersaturation and the like to obtain pure high-dimensional features required by reconstructing a fused image. The feature reconstruction module performs fusion reconstruction on the high-dimensional features of the two groups of different exposure images output by the attention mechanism module to obtain a high dynamic range image. The method improves the quality and the robustness of the fusion of the high dynamic range multi-exposure images.

Description

High dynamic range multi-exposure image fusion model and method based on attention mechanism
Technical Field
The invention relates to a multi-exposure image fusion model and a method for high dynamic range imaging, and belongs to the technical field of image processing.
Background
Natural scenes have a very wide dynamic range, e.g. a weak starlight brightness of about 10 -4 cd/m 2 The bright light brightness range of sidereal is 10 5 ~10 9 cd/m 2 When a digital single-lens reflex is used for shooting and recording, the dynamic range of the digital camera is limited, so that the shot photos are over-exposed and under-exposed. The high dynamic range multi-exposure image fusion technology aims at expanding the dynamic range of an image and solving the problem that the dynamic range of a digital camera is limited and the high dynamic range image cannot be captured. In recent years, as the level of computing power increases, high dynamic range multi-exposure image fusion methods are increasingly moving from traditional transform-based methods to deep learning-based methods. The traditional transformation-based method generally utilizes certain image transformation (Laplacian pyramid, wavelet change, sparse representation and the like) to convert an input image into a feature map, and performs feature fusion according to a manually defined fusion strategy to obtain a high dynamic range image containing rich information. The deep learning-based method overcomes the defect that the traditional high dynamic range multi-exposure image fusion method cannot adaptively learn image characteristics, and generates a high dynamic range image with richer details than the traditional method. However, the multiple exposure images have the characteristics of information complementation, brightness, chromaticity and complex corresponding relation of structures due to different exposure time of the multiple exposure images and objects in different exposure images of the same scene. Thus, existing deep learning-based high-mobilityThe state range multi-exposure image fusion method still has the problems of image distortion, detail loss, incapability of highlighting and fusing favorable image characteristics and the like.
Disclosure of Invention
Technical problem to be solved
Aiming at the problems that the existing high dynamic range multi-exposure image fusion method has image distortion, detail loss, insufficient utilization of complementary information of a source image sequence and the like, the invention provides a high dynamic range multi-exposure image fusion model and a method based on an attention mechanism, and the method further improves the quality and the robustness of high dynamic range multi-exposure image fusion.
Technical proposal
The high dynamic range multi-exposure image fusion model based on the attention mechanism is characterized by comprising a feature extraction module, an attention mechanism module and a feature reconstruction module, wherein two different exposure images of a target scene are respectively input into two groups of feature extraction modules with the same structure, and two groups of high-dimensional feature images corresponding to the two different exposure images of the target scene are obtained; then, the two groups of high-dimensional feature images are used as input and respectively sent to corresponding attention mechanism modules, so that pure high-dimensional features required by reconstructing the fusion image are obtained; the feature reconstruction module performs fusion reconstruction on the high-dimensional features of the two groups of different exposure images output by the attention mechanism module to obtain a high dynamic range image.
A high dynamic range multi-exposure image fusion method based on an attention mechanism is characterized by comprising the following steps:
step 1: reading a training underexposure image U and an overexposure image O; cutting the read U and O into a plurality of sub-images M, wherein the M is w multiplied by h multiplied by c, w and h represent the width and height of M, c represents the channel number of M, and then carrying out data enhancement on the cut sub-images M;
step 2: the method comprises the steps of taking a sub-image M as an input source, constructing a single-layer convolutional neural network layer through a feature extraction module, wherein the convolutional kernel size is W multiplied by H, and converting U and O into 64-dimensional features f through the convolutional neural layer respectively 1 ,f 1 The size of (2) is w×h×64, and the calculation method is as follows:
f 1 =C 1 (M)(1)
wherein, C1 represents a corresponding convolution operation;
step 3: at f 1 For input source, inputting to the Unet network for multi-scale feature extraction of image features to obtain high-dimensional multi-scale features f containing 64 channels 2 ,f 2 The size is w multiplied by h multiplied by 64, and the calculation method is as follows:
f 2 =U(f 1 )(2)
wherein U represents convolution operation of the Unet network;
step 4: constructing two attention mechanism modules A with the same structure, and using the attention mechanism modules A to respectively output characteristic graphs f of different exposure images output by a Unet network 2 And performing Squeeze operation, and coding the whole spatial feature on the channel into a global feature by adopting a global average pooling mode, wherein the calculation formula is as follows:
Figure BDA0003963017580000031
wherein F is sq (. Cndot.) represents a Squeeze operation, i, j represents a pixel, R C Represents the C dimension, f c Is the result of the Squeeze operation; then adopting an accounting operation on global features, adopting two full-connection operations for reducing model complexity and improving generalization capability, adopting a ReLU activation function to perform nonlinear processing between full-connection, and finally outputting weight vectors through a normalization function Sigmoid, wherein the accounting operation enables a network to learn the relation among all channels and also obtains the weights f of different channels 3 The calculation method is as follows:
f 3 =F ex (f c ,W)=σ(g(f c ,W))=σ(W 2 ReLU(W 1 f c )) (4)
in the method, in the process of the invention,
Figure BDA0003963017580000032
represents W 1 Dimension is->
Figure BDA0003963017580000033
Figure BDA0003963017580000034
Represents W 2 Dimension is->
Figure BDA0003963017580000035
r is a scaling factor;
step 5: image feature f output by utilizing multiplication operation 2 Weight f of each channel learned by attention mechanism 3 Multiplying to obtain final image feature f 4 The calculation method is as follows:
f 4 =F scale (f 2 ,f 3 )=f 2 ·f 3 (5)
where, represents a matrix multiplication operation;
step 6: high-dimensional image features f of underexposed and overexposed images by stitching operation u,4 ,f o,4 The feature map F0 was obtained, and the size of F0 was w×h×128, calculated as follows:
F 0 =concat(f u,4 +f o,4 ) (6)
wherein f u,4 And f o,4 Respectively representing the image characteristics obtained after the underexposure image and the overexposure image pass through an attention mechanism, wherein concat represents characteristic splicing operation;
step 7: by F 0 The method comprises the steps that as an input source, a high dynamic range image is obtained through a characteristic reconstruction module, and the characteristic reconstruction module firstly utilizes a single-layer convolutional neural network layer to splice a characteristic image F 0 Map F converted to 64 channels 1 ,F 1 The size is w multiplied by h multiplied by 64, and then the characteristic diagram F is obtained 1 Output feature map F provided to DRDB unit 2 Wherein the DRDB unit is obtained by improving residual error dense unit based on expansion convolution, and finally, the characteristic diagram F is sequentially convolved by using 2 convolution layers 2 Obtaining a characteristic diagram F 3 Finally, a high dynamic range image is obtained, wherein F 3 The size of (2) is w×h×16, and the calculation method is as follows:
F 1 =C 1 (F 0 ) (7)
F 2 =DRDB(F 1 ) (8)
F 3 =C 2 (F 2 ) (9)
HDR=C 3 (F 3 ) (10)
wherein DRDB represents a dilated residual dense unit convolution operation, C 1 ,C 2 ,C 3 Representing a single convolution layer, HDR representing a high dynamic range image;
step 8: designing a loss function, iterating, and updating a model, wherein the loss function is as follows:
Loss=λL SSIM +L content (16)
L SSIM =α o SSIM O,Fu SSIM U,F (12)
L content =β O L O,FU L U,F (15)
wherein SSIM O,F ,SSIM U,F Respectively representing the structural similarity of the overexposed image O and the underexposed image U with the fusion image F, lambda represents the super parameter and alpha o And alpha u The weight coefficients, beta, of the overexposed image O and the underexposed image U respectively O And beta U The weight coefficients of the overexposed image O and the underexposed image U are respectively L O,F 、L U,F The content similarity of the overexposed image O and the underexposed image U with the fusion image F is respectively represented;
step 9: and reading the underexposure U and the overexposed image O which need to be processed, and obtaining the high dynamic range image HDR through a training model with complete parameters.
Preferentially: w=256, h=256, c=3 as described in step 1.
Preferentially: in the step 1, data enhancement is realized by rotating, horizontally overturning and vertically overturning.
Preferentially: w=3 and h=3 as described in step 2.
Preferentially: and 8, realizing model updating by adopting an Adam optimizer.
Advantageous effects
The invention provides a novel high dynamic range multi-exposure image fusion model and method based on an attention mechanism from end to end, and fusion quality and robustness are improved.
1. The method comprises the steps of extracting features of a target scene in underexposure and overexposure images by using a weight separation double-channel feature extraction module, and obtaining a feature map with stronger texture information characterization capability;
2. introducing an attention mechanism into a multi-exposure image task, focusing local details and global features of underexposure and overexposure images from local to global, and highlighting image features favorable for fusion;
3. in order to reconstruct the fusion image more accurately, the L2 norm and the structural similarity SSIM are used as constraint criteria of the neural network to design a loss function, smaller similarity difference between the source image sequence and the fusion image is obtained, and more accurate convergence of the neural network model is achieved.
The operation enables the network to capture more detail information and generate a high-dynamic-range multi-exposure fusion image with better quality.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.
Fig. 1: high dynamic range multi-exposure image fusion method flow chart based on attention mechanism;
fig. 2: extracting a network structure diagram of the characteristics;
fig. 3: attention mechanism network structure diagram;
fig. 4: reconstructing a network structure diagram by the characteristics;
fig. 5: the flow chart of the invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The invention designs a high dynamic range multi-exposure image fusion algorithm network frame based on an attention mechanism, which consists of three core modules, namely a feature extraction module, an attention mechanism module and a feature reconstruction module. Firstly, two different exposure images of a target scene are respectively input into two groups of feature extraction modules with the same structure, and two groups of high-dimensional feature images corresponding to the two different exposure images of the target scene are obtained. And then respectively sending the two groups of high-dimensional feature images into corresponding attention mechanism modules as input to highlight and fuse favorable image features, and inhibit the features of low-quality areas such as undersaturation, supersaturation and the like to obtain pure high-dimensional features required by reconstructing a fused image. The feature reconstruction module performs fusion reconstruction on the high-dimensional features of the two groups of different exposure images output by the attention mechanism module to obtain a high dynamic range image.
A high dynamic range multi-exposure image fusion method based on an attention mechanism comprises the following 5 aspects:
(1) The exposure time of the multi-exposure image is different, so that the information such as brightness, contrast, texture, contour and the like of different exposure images of the same scene is different. If the multi-exposure images are directly fused, the characteristics are extracted through the same network, and the generated sharing weight can destroy the inherent characteristics of the target scene under different exposure. The feature extraction module adopts a double-channel architecture, two different exposure images in the multi-exposure image sequence are selected as input and sent to the feature extraction module, and the two different exposure images respectively pass through the feature extraction network with the same structure but without sharing any learning parameters to simultaneously perform feature extraction.
(2) And extracting an infrastructure network structure by using the Unet network as a characteristic. The feature extraction module consists of 1 independent convolution layer and 1 Unet network, wherein the Unet network comprises convolution, downsampling, pooling, upsampling and splicing operations. Firstly, a convolution kernel with the size of 3 multiplied by 3 is used for extracting low-level image features, meanwhile, 256 multiplied by 256 input images are converted into 64-channel high-dimensional features, multi-scale feature extraction of the image features is realized by utilizing a Unet network, shallow-level image features and deep semantic features are stacked in a feature stitching mode, and an effective solution is provided for preserving image structures and texture features. After the fine extraction of the features is completed by the Unet network, a 64-channel high-dimensional multi-scale feature map is output, and the feature map is used as an input source of a follow-up attention mechanism module.
(3) The attention mechanism is utilized to keep rich detail information of the multi-exposure image, and image characteristics favorable for fusion are highlighted so as to correct local distortion and information loss of the fusion image. Similar to the feature extraction module, the attention mechanism module adopts a dual channel design with the same structure.
(4) And reconstructing the high dynamic range image by using a characteristic reconstruction module. The expansion residual error dense unit used by the feature reconstruction module is obtained by improving the residual error dense unit based on expansion convolution, the used expansion residual error dense unit fully utilizes image features of different network levels, and simultaneously uses a larger receptive field to infer details of undersaturation and saturation region loss while retaining details of low dynamic range images.
(5) With designed L-based 2 Content loss of norms and multiple loss functions based on structural loss of SSIM. The network model is further constrained, and the generalization capability of the network model is improved.
The characteristic extraction module is specifically as follows:
the multi-exposure image is transmitted into a feature extraction module for feature extraction to obtain a high-dimensional multi-scale feature map, the main basic framework is shown in fig. 2, and the steps are as follows:
step 1: the trained underexposure image U and overexposed image O are read. And cutting the read U and O into a plurality of sub-images M, wherein the size of M is w multiplied by h multiplied by c, w and h represent the width and height of M, c represents the channel number of M, w=256, h=256 and c=3, and then carrying out data enhancement on the cut sub-images M in a rotating, horizontal overturning and vertical overturning mode.
Step 2: the sub-image M is taken as an input source. Then, a single-layer convolutional neural network layer is constructed through a feature extraction module, wherein the convolution kernel size is w×h, w=3, and h=3. By convolving the nerve layersU and O are converted into 64-dimensional features f 1 ,f 1 The size of (2) is w×h×64, and the calculation method is as follows:
f 1 =C 1 (M)(1)
wherein C is 1 Representing the corresponding convolution operation.
Step 3: at f 1 As an input source, multi-scale feature extraction of image features is realized through a Unet network, the network structure is shown in fig. 2, shallow image features and deep semantic features are stacked in a feature stitching mode, and the Unet network comprises upsampling, pooling, convolution and activation function operation. After the fine extraction of the features is completed by the Unet network, a high-dimensional multi-scale feature f containing 64 channels is determined 2 ,f 2 The size is w×h×64. Similar to building image feature f 1 Can calculate the high-dimensional multi-scale characteristic f 2
f 2 =U(f 1 )(2)
Where U represents the convolution operation of the Unet network. Image features f of underexposure image U and overexposed image O 1 The training is carried out simultaneously through feature extraction networks which are identical in structure and do not share any learning parameters.
The attention mechanism module is specifically as follows:
the high-dimensional image features output by the feature extraction module are transmitted into the attention mechanism module, the favorable interest channel features are highlighted and fused, the non-interest channel features are restrained, the network structure is shown in figure 3, and the steps are as follows:
step 1: two attention mechanism modules A with the same structure are constructed, the network structure is shown in figure 3, and the attention mechanism modules A are used for retaining rich detail information of the multi-exposure image and highlighting the image characteristics favorable for fusion so as to correct the local distortion and information loss of the fusion image.
Step 2: the attention mechanism module A outputs characteristic graphs f of different exposure images to the Unet network respectively 2 And performing Squeeze operation, and coding the whole spatial feature on the channel into a global feature by adopting a global average pooling mode, wherein the calculation formula is as follows:
Figure BDA0003963017580000081
f in the formula sq (. Cndot.) represents a Squeeze operation, i, j represents a pixel, R C Represents the C dimension, f c Is the result of the Squeeze operation. Then adopting an accounting operation on global features, adopting two full-connection operations for reducing model complexity and improving generalization capability, adopting a ReLU activation function to perform nonlinear processing between full-connection, and finally outputting weight vectors through a normalization function Sigmoid, wherein the accounting operation enables a network to learn the relation among all channels and also obtains the weights f of different channels 3 The calculation method is as follows:
f 3 =F ex (f c ,W)=σ(g(f c ,W))=σ(W 2 ReLU(W 1 f c )) (4)
wherein the method comprises the steps of
Figure BDA0003963017580000091
Represents W 1 Dimension is->
Figure BDA0003963017580000092
Figure BDA0003963017580000093
Represents W 2 Dimension is->
Figure BDA0003963017580000094
r is the scaling factor.
Step 3: image feature f output by utilizing multiplication operation 2 Weight f of each channel learned by attention mechanism 3 Multiplying to obtain final image feature f 4 The calculation method is as follows:
f 4 =F scale (f 2 ,f 3 )=f 2 ·f 3 (5)
in the formula, the expression matrix multiplication operation. The whole operation can be seen as learning the weight coefficient of each channel, so that the model has more distinguishing ability on the characteristics of each channel.
The characteristic reconstruction module is specifically as follows:
the feature reconstruction module reconstructs the high-dimensional pure features of the different exposure images obtained in the previous step to generate a high dynamic range image, and the network structure is shown in fig. 4, and the steps are as follows:
step 1: high-dimensional image features f of underexposed and overexposed images by stitching operation u,4 ,f o,4 Obtaining a characteristic diagram F 0 ,F 0 The size of (2) is w×h×128, and the calculation method is as follows:
F 0 =concat(f u,4 +f o,4 ) (6)
f in u,4 And f o,4 Respectively representing the image characteristics obtained after the underexposure image and the overexposure image pass through the attention mechanism, and concat represents the characteristic splicing operation.
Step 2: by F 0 As an input source, a high dynamic range image is obtained through a characteristic reconstruction module, the network structure is shown as figure 4, and the characteristic reconstruction module firstly uses a single-layer convolutional neural network layer to splice a characteristic diagram F 0 Map F converted to 64 channels 1 ,F 1 The size is w multiplied by h multiplied by 64, and then the characteristic diagram F is obtained 1 Output feature map F provided to DRDB unit 2 Wherein the DRDB unit is obtained by improving residual dense units (Residual Dense Block, RDB) based on dilation convolution, and finally sequentially convolving the feature map F with 2 convolution layers 2 Obtaining a characteristic diagram F 3 And a high dynamic range image, wherein F 3 The size of (2) is w×h×16, and the calculation method is as follows:
F 1 =C 1 (F 0 )(7)
F 2 =DRDB(F 1 )(8)
F 3 =C 2 (F 2 )(9)
HDR=C 3 (F 3 )(10)
wherein DRDB represents a dilated residual dense unit convolution operation, C 1 ,C 2 ,C 3 Representing a single convolution layer, HDR represents a high dynamic range image.
The loss function is specifically as follows:
the loss function determines the type of image features extracted and the proportional relationship between the different types of image features. In order to meet the requirements that the fusion image not only contains detailed information of a bright part area of the underexposure image and a dark part area of the overexposure image, but also contains brightness information of different exposure images, and simultaneously meets the requirements of visual perception characteristics of human eyes. The invention designs a method based on L 2 A multi-loss function of the content loss of norms and the SSIM-based structural loss, comprising the steps of:
step 1: the structural similarity metric SSIM models the loss and distortion of similarity of the source image sequence and the fused image based on the similarity of the luminance features, contrast, and structural information. Let x be the input image and y be the output image, the mathematical expression is:
Figure BDA0003963017580000101
wherein μ and σ represent the mean and standard deviation, respectively, σ xy Representing the covariance of x, y, C 1 ,C 2 And C 3 Is a constant coefficient. The distortion of the source image sequence and the fusion image in three aspects of brightness, contrast and structure is fully considered, and the structural loss L is designed for the multi-exposure image fusion task SSIM . F represents the fused image, L SSIM The mathematical expression of (2) is:
L SSIM =α o SSIM O,Fu SSIM U,F (12)
wherein SSIM O,F ,SSIM U,F Respectively representing the structural similarity of the overexposed image O and the underexposed image U with the fusion image F, alpha o And alpha u The weight coefficients of the overexposed image O and the underexposed image U are respectively, and in the multi-exposure image fusion task, the overexposed image and the underexposed image have the same texture details, but the brightness intensity is too high or too low. So for the weight coefficient alpha o And alpha u The same weight is set to balance to obtain the brightness intensity and texture detail with proper size, which can be expressedThe method comprises the following steps:
α o =α u (13)
content loss L content The texture detail information distortion of the multi-exposure image sequence and the fusion image is guaranteed to be minimum, meanwhile, the interference of noise is avoided, and the content loss is calculated as follows:
L x,y =||x-y|| 2 (14)
calculating Euclidean distance between pixel points of an input image x and an output image y in the formula, wherein I I.I.I.I 2 Is L 2 Norms. Content loss may be defined as:
L content =β O L O,FU L U,F (15)
wherein beta is O And beta U The weighting coefficients of the overexposed image O and the underexposed image U, respectively, are similar to the structural loss, beta O And beta U With the same weight coefficients. In order to realize weight balance between the structure loss function and the content loss function, the super parameter lambda is used for endowing corresponding weight to the structure loss so as to improve the learning capacity of the model. To sum up, the AMEFNet overall loss function can be expressed as:
Loss=λL SSIM +L content (16)
step 2: constraint by loss function and selecting Adam optimizer to obtain parameter beta 1 =0.9,β 2 =0.999, initial learning rate of 10 -4 The learning rate is attenuated by 0.5 times for every 50 iterations, so that the Loss weight Loss is reduced, and model updating is realized.
Step 3: judging whether all the image pairs in the training set are processed or not, and finishing the set iteration times epoch, wherein the epoch is set to be 1000. And (3) ending the algorithm if the processing is finished, and obtaining an attention-based high dynamic range multi-exposure image fusion model AMEFNet, otherwise, executing the step (2).
High dynamic range multi-exposure image generation
Step 1: the underexposure U and the overexposed image O which need to be processed are read, and the high dynamic range image HDR is obtained through a training model AMEFNet with complete parameters, and the calculation method is as follows:
HDR = AMEFNet (U,O) (17)
while the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made without departing from the spirit and scope of the invention.

Claims (6)

1. The high dynamic range multi-exposure image fusion model based on the attention mechanism is characterized by comprising a feature extraction module, an attention mechanism module and a feature reconstruction module, wherein two different exposure images of a target scene are respectively input into two groups of feature extraction modules with the same structure, and two groups of high-dimensional feature images corresponding to the two different exposure images of the target scene are obtained; then, the two groups of high-dimensional feature images are used as input and respectively sent to corresponding attention mechanism modules, so that pure high-dimensional features required by reconstructing the fusion image are obtained; the feature reconstruction module performs fusion reconstruction on the high-dimensional features of the two groups of different exposure images output by the attention mechanism module to obtain a high dynamic range image.
2. A high dynamic range multi-exposure image fusion method based on an attention mechanism realized by the model of claim 1, which is characterized by comprising the following steps:
step 1: reading a training underexposure image U and an overexposure image O; cutting the read U and O into a plurality of sub-images M, wherein the M is w multiplied by h multiplied by c, w and h represent the width and height of M, c represents the channel number of M, and then carrying out data enhancement on the cut sub-images M;
step 2: the method comprises the steps of taking a sub-image M as an input source, constructing a single-layer convolutional neural network layer through a feature extraction module, wherein the convolutional kernel size is W multiplied by H, and converting U and O into 64-dimensional features f through the convolutional neural layer respectively 1 ,f 1 The size of (2) is w×h×64, and the calculation method is as follows:
f 1 =C 1 (M) (1) wherein C1 represents a corresponding convolution operation;
step 3: at f 1 For input source, inputting to the Unet network for multi-scale feature extraction of image features to obtain high-dimensional multi-scale features f containing 64 channels 2 ,f 2 The size is w multiplied by h multiplied by 64, and the calculation method is as follows:
f 2 =U(f 1 ) In the formula (2), U represents a convolution operation of the Unet network;
step 4: constructing two attention mechanism modules A with the same structure, and using the attention mechanism modules A to respectively output characteristic graphs f of different exposure images output by a Unet network 2 And performing Squeeze operation, and coding the whole spatial feature on the channel into a global feature by adopting a global average pooling mode, wherein the calculation formula is as follows:
Figure FDA0003963017570000021
wherein F is sq (. Cndot.) represents a Squeeze operation, i, j represents a pixel, R C Represents the C dimension, f c Is the result of the Squeeze operation; then adopting an accounting operation on global features, adopting two full-connection operations for reducing model complexity and improving generalization capability, adopting a ReLU activation function to perform nonlinear processing between full-connection, and finally outputting weight vectors through a normalization function Sigmoid, wherein the accounting operation enables a network to learn the relation among all channels and also obtains the weights f of different channels 3 The calculation method is as follows:
f 3 =F ex (f c ,W)=σ(g(f c ,W))=σ(W 2 ReLU(W 1 f c )) (4)
in the method, in the process of the invention,
Figure FDA0003963017570000022
represents W 1 Dimension is->
Figure FDA0003963017570000023
Figure FDA0003963017570000024
Represents W 2 Dimension is->
Figure FDA0003963017570000025
r is a scaling factor;
step 5: image feature f output by utilizing multiplication operation 2 Weight f of each channel learned by attention mechanism 3 Multiplying to obtain final image feature f 4 The calculation method is as follows:
f 4 =F scale (f 2 ,f 3 )=f 2 ·f 3 (5)
where, represents a matrix multiplication operation;
step 6: high-dimensional image features f of underexposed and overexposed images by stitching operation u,4 ,f o,4 The feature map F0 was obtained, and the size of F0 was w×h×128, calculated as follows:
F 0 =concat(f u,4 +f o,4 ) (6)
wherein f u,4 And f o,4 Respectively representing the image characteristics obtained after the underexposure image and the overexposure image pass through an attention mechanism, wherein concat represents characteristic splicing operation;
step 7: by F 0 The method comprises the steps that as an input source, a high dynamic range image is obtained through a characteristic reconstruction module, and the characteristic reconstruction module firstly utilizes a single-layer convolutional neural network layer to splice a characteristic image F 0 Map F converted to 64 channels 1 ,F 1 The size is w multiplied by h multiplied by 64, and then the characteristic diagram F is obtained 1 Output feature map F provided to DRDB unit 2 Wherein the DRDB unit is obtained by improving residual error dense unit based on expansion convolution, and finally, the characteristic diagram F is sequentially convolved by using 2 convolution layers 2 Obtaining a characteristic diagram F 3 Finally, a high dynamic range image is obtained, wherein F 3 The size of (2) is w×h×16, and the calculation method is as follows:
F 1 =C 1 (F 0 ) (7)
F 2 =DRDB(F 1 ) (8)
F 3 =C 2 (F 2 ) (9)
HDR=C 3 (F 3 ) (10)
wherein DRDB represents a dilated residual dense unit convolution operation, C 1 ,C 2 ,C 3 Representing a single convolution layer, HDR representing a high dynamic range image;
step 8: designing a loss function, iterating, and updating a model, wherein the loss function is as follows:
Loss=λL SSIM +L content (16)
L SSIM =α o SSIM O,Fu SSIM U,F (12)
L content =β O L O,FU L U,F (15)
wherein SSIM O,F ,SSIM U,F Respectively representing the structural similarity of the overexposed image O and the underexposed image U with the fusion image F, lambda represents the super parameter and alpha o And alpha u The weight coefficients, beta, of the overexposed image O and the underexposed image U respectively O And beta U The weight coefficients of the overexposed image O and the underexposed image U are respectively L O,F 、L U,F The content similarity of the overexposed image O and the underexposed image U with the fusion image F is respectively represented;
step 9: and reading the underexposure U and the overexposed image O which need to be processed, and obtaining the high dynamic range image HDR through a training model with complete parameters.
3. The high dynamic range multi-exposure image fusion method based on the attention mechanism of claim 2, wherein: w=256, h=256, c=3 as described in step 1.
4. The high dynamic range multi-exposure image fusion method based on the attention mechanism of claim 2, wherein: in the step 1, data enhancement is realized by rotating, horizontally overturning and vertically overturning.
5. The high dynamic range multi-exposure image fusion method based on the attention mechanism of claim 2, wherein: w=3 and h=3 as described in step 2.
6. The high dynamic range multi-exposure image fusion method based on the attention mechanism of claim 2, wherein: and 8, realizing model updating by adopting an Adam optimizer.
CN202211489946.XA 2021-11-26 2022-11-25 High dynamic range multi-exposure image fusion model and method based on attention mechanism Pending CN116152128A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111428200.3A CN114331931A (en) 2021-11-26 2021-11-26 High dynamic range multi-exposure image fusion model and method based on attention mechanism
CN2021114282003 2021-11-26

Publications (1)

Publication Number Publication Date
CN116152128A true CN116152128A (en) 2023-05-23

Family

ID=81047436

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202111428200.3A Withdrawn CN114331931A (en) 2021-11-26 2021-11-26 High dynamic range multi-exposure image fusion model and method based on attention mechanism
CN202211489946.XA Pending CN116152128A (en) 2021-11-26 2022-11-25 High dynamic range multi-exposure image fusion model and method based on attention mechanism

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202111428200.3A Withdrawn CN114331931A (en) 2021-11-26 2021-11-26 High dynamic range multi-exposure image fusion model and method based on attention mechanism

Country Status (1)

Country Link
CN (2) CN114331931A (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116342455B (en) * 2023-05-29 2023-08-08 湖南大学 Efficient multi-source image fusion method, system and medium

Also Published As

Publication number Publication date
CN114331931A (en) 2022-04-12

Similar Documents

Publication Publication Date Title
Lv et al. Attention guided low-light image enhancement with a large scale low-light simulation dataset
CN111311629B (en) Image processing method, image processing device and equipment
CN112233038B (en) True image denoising method based on multi-scale fusion and edge enhancement
Yue et al. Supervised raw video denoising with a benchmark dataset on dynamic scenes
CN112183637B (en) Single-light-source scene illumination re-rendering method and system based on neural network
CN110570377A (en) group normalization-based rapid image style migration method
CN112541877B (en) Defuzzification method, system, equipment and medium for generating countermeasure network based on condition
CN112837245B (en) Dynamic scene deblurring method based on multi-mode fusion
Wang et al. Joint iterative color correction and dehazing for underwater image enhancement
CN111986084A (en) Multi-camera low-illumination image quality enhancement method based on multi-task fusion
CN110363068B (en) High-resolution pedestrian image generation method based on multiscale circulation generation type countermeasure network
CN110225260B (en) Three-dimensional high dynamic range imaging method based on generation countermeasure network
Ma et al. Meta PID attention network for flexible and efficient real-world noisy image denoising
Prabhakar et al. Labeled from unlabeled: Exploiting unlabeled data for few-shot deep hdr deghosting
CN113344773B (en) Single picture reconstruction HDR method based on multi-level dual feedback
CN114862698B (en) Channel-guided real overexposure image correction method and device
WO2022100490A1 (en) Methods and systems for deblurring blurry images
CN112419191A (en) Image motion blur removing method based on convolution neural network
CN116152128A (en) High dynamic range multi-exposure image fusion model and method based on attention mechanism
Zhang et al. MFFE: multi-scale feature fusion enhanced net for image dehazing
CN113810597B (en) Rapid image and scene rendering method based on semi-predictive filtering
CN109002802A (en) Video foreground separation method and system based on adaptive robust principal component analysis
CN115311149A (en) Image denoising method, model, computer-readable storage medium and terminal device
Yang et al. An end‐to‐end perceptual enhancement method for UHD portrait images
Peng et al. RAUNE-Net: A Residual and Attention-Driven Underwater Image Enhancement Method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination