CN116152128A

CN116152128A - High dynamic range multi-exposure image fusion model and method based on attention mechanism

Info

Publication number: CN116152128A
Application number: CN202211489946.XA
Authority: CN
Inventors: 白本督; 李俊鹏; 孙爱晶
Original assignee: Xian University of Posts and Telecommunications
Current assignee: Xian University of Posts and Telecommunications
Priority date: 2021-11-26
Filing date: 2022-11-25
Publication date: 2023-05-23
Also published as: CN114331931A

Abstract

The invention relates to a high dynamic range multi-exposure image fusion method based on an attention mechanism, and belongs to the technical field of image processing. Firstly, two different exposure images of a target scene are respectively input into two groups of feature extraction modules with the same structure, and two groups of high-dimensional feature images corresponding to the two different exposure images of the target scene are obtained. And then respectively sending the two groups of high-dimensional feature images into corresponding attention mechanism modules as input to highlight and fuse favorable image features, and inhibit the features of low-quality areas such as undersaturation, supersaturation and the like to obtain pure high-dimensional features required by reconstructing a fused image. The feature reconstruction module performs fusion reconstruction on the high-dimensional features of the two groups of different exposure images output by the attention mechanism module to obtain a high dynamic range image. The method improves the quality and the robustness of the fusion of the high dynamic range multi-exposure images.

Description

High dynamic range multi-exposure image fusion model and method based on attention mechanism

Technical Field

The invention relates to a multi-exposure image fusion model and a method for high dynamic range imaging, and belongs to the technical field of image processing.

Background

Natural scenes have a very wide dynamic range, e.g. a weak starlight brightness of about 10 ^-4 cd/m ² The bright light brightness range of sidereal is 10 ⁵ ～10 ⁹ cd/m ² When a digital single-lens reflex is used for shooting and recording, the dynamic range of the digital camera is limited, so that the shot photos are over-exposed and under-exposed. The high dynamic range multi-exposure image fusion technology aims at expanding the dynamic range of an image and solving the problem that the dynamic range of a digital camera is limited and the high dynamic range image cannot be captured. In recent years, as the level of computing power increases, high dynamic range multi-exposure image fusion methods are increasingly moving from traditional transform-based methods to deep learning-based methods. The traditional transformation-based method generally utilizes certain image transformation (Laplacian pyramid, wavelet change, sparse representation and the like) to convert an input image into a feature map, and performs feature fusion according to a manually defined fusion strategy to obtain a high dynamic range image containing rich information. The deep learning-based method overcomes the defect that the traditional high dynamic range multi-exposure image fusion method cannot adaptively learn image characteristics, and generates a high dynamic range image with richer details than the traditional method. However, the multiple exposure images have the characteristics of information complementation, brightness, chromaticity and complex corresponding relation of structures due to different exposure time of the multiple exposure images and objects in different exposure images of the same scene. Thus, existing deep learning-based high-mobilityThe state range multi-exposure image fusion method still has the problems of image distortion, detail loss, incapability of highlighting and fusing favorable image characteristics and the like.

Disclosure of Invention

Technical problem to be solved

Aiming at the problems that the existing high dynamic range multi-exposure image fusion method has image distortion, detail loss, insufficient utilization of complementary information of a source image sequence and the like, the invention provides a high dynamic range multi-exposure image fusion model and a method based on an attention mechanism, and the method further improves the quality and the robustness of high dynamic range multi-exposure image fusion.

Technical proposal

The high dynamic range multi-exposure image fusion model based on the attention mechanism is characterized by comprising a feature extraction module, an attention mechanism module and a feature reconstruction module, wherein two different exposure images of a target scene are respectively input into two groups of feature extraction modules with the same structure, and two groups of high-dimensional feature images corresponding to the two different exposure images of the target scene are obtained; then, the two groups of high-dimensional feature images are used as input and respectively sent to corresponding attention mechanism modules, so that pure high-dimensional features required by reconstructing the fusion image are obtained; the feature reconstruction module performs fusion reconstruction on the high-dimensional features of the two groups of different exposure images output by the attention mechanism module to obtain a high dynamic range image.

A high dynamic range multi-exposure image fusion method based on an attention mechanism is characterized by comprising the following steps:

step 1: reading a training underexposure image U and an overexposure image O; cutting the read U and O into a plurality of sub-images M, wherein the M is w multiplied by h multiplied by c, w and h represent the width and height of M, c represents the channel number of M, and then carrying out data enhancement on the cut sub-images M;

step 2: the method comprises the steps of taking a sub-image M as an input source, constructing a single-layer convolutional neural network layer through a feature extraction module, wherein the convolutional kernel size is W multiplied by H, and converting U and O into 64-dimensional features f through the convolutional neural layer respectively ₁ ，f ₁ The size of (2) is w×h×64, and the calculation method is as follows:

f ₁ ＝C ₁ (M)(1)

wherein, C1 represents a corresponding convolution operation;

step 3: at f ₁ For input source, inputting to the Unet network for multi-scale feature extraction of image features to obtain high-dimensional multi-scale features f containing 64 channels ₂ ，f ₂ The size is w multiplied by h multiplied by 64, and the calculation method is as follows:

f ₂ ＝U(f ₁ )(2)

wherein U represents convolution operation of the Unet network;

step 4: constructing two attention mechanism modules A with the same structure, and using the attention mechanism modules A to respectively output characteristic graphs f of different exposure images output by a Unet network ₂ And performing Squeeze operation, and coding the whole spatial feature on the channel into a global feature by adopting a global average pooling mode, wherein the calculation formula is as follows:

wherein F is _sq (. Cndot.) represents a Squeeze operation, i, j represents a pixel, R ^C Represents the C dimension, f _c Is the result of the Squeeze operation; then adopting an accounting operation on global features, adopting two full-connection operations for reducing model complexity and improving generalization capability, adopting a ReLU activation function to perform nonlinear processing between full-connection, and finally outputting weight vectors through a normalization function Sigmoid, wherein the accounting operation enables a network to learn the relation among all channels and also obtains the weights f of different channels ₃ The calculation method is as follows:

f ₃ ＝F _ex (f _c ，W)＝σ(g(f _c ，W))＝σ(W ₂ ReLU(W ₁ f _c )) (4)

in the method, in the process of the invention,

represents W ₁ Dimension is->

Represents W ₂ Dimension is->

r is a scaling factor;

step 5: image feature f output by utilizing multiplication operation ₂ Weight f of each channel learned by attention mechanism ₃ Multiplying to obtain final image feature f ₄ The calculation method is as follows:

f ₄ ＝F _scale (f ₂ ,f ₃ )＝f ₂ ·f ₃ (5)

where, represents a matrix multiplication operation;

step 6: high-dimensional image features f of underexposed and overexposed images by stitching operation _u，4 ，f _o，4 The feature map F0 was obtained, and the size of F0 was w×h×128, calculated as follows:

F ₀ ＝concat(f _u，4 +f _o，4 ) (6)

wherein f _u，4 And f _o，4 Respectively representing the image characteristics obtained after the underexposure image and the overexposure image pass through an attention mechanism, wherein concat represents characteristic splicing operation;

step 7: by F ₀ The method comprises the steps that as an input source, a high dynamic range image is obtained through a characteristic reconstruction module, and the characteristic reconstruction module firstly utilizes a single-layer convolutional neural network layer to splice a characteristic image F ₀ Map F converted to 64 channels ₁ ，F ₁ The size is w multiplied by h multiplied by 64, and then the characteristic diagram F is obtained ₁ Output feature map F provided to DRDB unit ₂ Wherein the DRDB unit is obtained by improving residual error dense unit based on expansion convolution, and finally, the characteristic diagram F is sequentially convolved by using 2 convolution layers ₂ Obtaining a characteristic diagram F ₃ Finally, a high dynamic range image is obtained, wherein F ₃ The size of (2) is w×h×16, and the calculation method is as follows:

F ₁ ＝C ₁ (F ₀ ) (7)

F ₂ ＝DRDB(F ₁ ) (8)

F ₃ ＝C ₂ (F ₂ ) (9)

HDR＝C ₃ (F ₃ ) (10)

wherein DRDB represents a dilated residual dense unit convolution operation, C ₁ ，C ₂ ，C ₃ Representing a single convolution layer, HDR representing a high dynamic range image;

step 8: designing a loss function, iterating, and updating a model, wherein the loss function is as follows:

Loss＝λL _SSIM +L _content (16)

L _SSIM ＝α _o SSIM _O，F +α _u SSIM _U,F (12)

L _content ＝β _O L _O,F +β _U L _U，F (15)

wherein SSIM _O,F ，SSIM _U，F Respectively representing the structural similarity of the overexposed image O and the underexposed image U with the fusion image F, lambda represents the super parameter and alpha _o And alpha _u The weight coefficients, beta, of the overexposed image O and the underexposed image U respectively _O And beta _U The weight coefficients of the overexposed image O and the underexposed image U are respectively L _O，F 、L _U，F The content similarity of the overexposed image O and the underexposed image U with the fusion image F is respectively represented;

step 9: and reading the underexposure U and the overexposed image O which need to be processed, and obtaining the high dynamic range image HDR through a training model with complete parameters.

Preferentially: w=256, h=256, c=3 as described in step 1.

Preferentially: in the step 1, data enhancement is realized by rotating, horizontally overturning and vertically overturning.

Preferentially: w=3 and h=3 as described in step 2.

Preferentially: and 8, realizing model updating by adopting an Adam optimizer.

Advantageous effects

The invention provides a novel high dynamic range multi-exposure image fusion model and method based on an attention mechanism from end to end, and fusion quality and robustness are improved.

1. The method comprises the steps of extracting features of a target scene in underexposure and overexposure images by using a weight separation double-channel feature extraction module, and obtaining a feature map with stronger texture information characterization capability;

2. introducing an attention mechanism into a multi-exposure image task, focusing local details and global features of underexposure and overexposure images from local to global, and highlighting image features favorable for fusion;

3. in order to reconstruct the fusion image more accurately, the L2 norm and the structural similarity SSIM are used as constraint criteria of the neural network to design a loss function, smaller similarity difference between the source image sequence and the fusion image is obtained, and more accurate convergence of the neural network model is achieved.

The operation enables the network to capture more detail information and generate a high-dynamic-range multi-exposure fusion image with better quality.

Drawings

The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.

Fig. 1: high dynamic range multi-exposure image fusion method flow chart based on attention mechanism;

fig. 2: extracting a network structure diagram of the characteristics;

fig. 3: attention mechanism network structure diagram;

fig. 4: reconstructing a network structure diagram by the characteristics;

fig. 5: the flow chart of the invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

The invention designs a high dynamic range multi-exposure image fusion algorithm network frame based on an attention mechanism, which consists of three core modules, namely a feature extraction module, an attention mechanism module and a feature reconstruction module. Firstly, two different exposure images of a target scene are respectively input into two groups of feature extraction modules with the same structure, and two groups of high-dimensional feature images corresponding to the two different exposure images of the target scene are obtained. And then respectively sending the two groups of high-dimensional feature images into corresponding attention mechanism modules as input to highlight and fuse favorable image features, and inhibit the features of low-quality areas such as undersaturation, supersaturation and the like to obtain pure high-dimensional features required by reconstructing a fused image. The feature reconstruction module performs fusion reconstruction on the high-dimensional features of the two groups of different exposure images output by the attention mechanism module to obtain a high dynamic range image.

A high dynamic range multi-exposure image fusion method based on an attention mechanism comprises the following 5 aspects:

(1) The exposure time of the multi-exposure image is different, so that the information such as brightness, contrast, texture, contour and the like of different exposure images of the same scene is different. If the multi-exposure images are directly fused, the characteristics are extracted through the same network, and the generated sharing weight can destroy the inherent characteristics of the target scene under different exposure. The feature extraction module adopts a double-channel architecture, two different exposure images in the multi-exposure image sequence are selected as input and sent to the feature extraction module, and the two different exposure images respectively pass through the feature extraction network with the same structure but without sharing any learning parameters to simultaneously perform feature extraction.

(2) And extracting an infrastructure network structure by using the Unet network as a characteristic. The feature extraction module consists of 1 independent convolution layer and 1 Unet network, wherein the Unet network comprises convolution, downsampling, pooling, upsampling and splicing operations. Firstly, a convolution kernel with the size of 3 multiplied by 3 is used for extracting low-level image features, meanwhile, 256 multiplied by 256 input images are converted into 64-channel high-dimensional features, multi-scale feature extraction of the image features is realized by utilizing a Unet network, shallow-level image features and deep semantic features are stacked in a feature stitching mode, and an effective solution is provided for preserving image structures and texture features. After the fine extraction of the features is completed by the Unet network, a 64-channel high-dimensional multi-scale feature map is output, and the feature map is used as an input source of a follow-up attention mechanism module.

(3) The attention mechanism is utilized to keep rich detail information of the multi-exposure image, and image characteristics favorable for fusion are highlighted so as to correct local distortion and information loss of the fusion image. Similar to the feature extraction module, the attention mechanism module adopts a dual channel design with the same structure.

(4) And reconstructing the high dynamic range image by using a characteristic reconstruction module. The expansion residual error dense unit used by the feature reconstruction module is obtained by improving the residual error dense unit based on expansion convolution, the used expansion residual error dense unit fully utilizes image features of different network levels, and simultaneously uses a larger receptive field to infer details of undersaturation and saturation region loss while retaining details of low dynamic range images.

(5) With designed L-based ₂ Content loss of norms and multiple loss functions based on structural loss of SSIM. The network model is further constrained, and the generalization capability of the network model is improved.

The characteristic extraction module is specifically as follows:

the multi-exposure image is transmitted into a feature extraction module for feature extraction to obtain a high-dimensional multi-scale feature map, the main basic framework is shown in fig. 2, and the steps are as follows:

step 1: the trained underexposure image U and overexposed image O are read. And cutting the read U and O into a plurality of sub-images M, wherein the size of M is w multiplied by h multiplied by c, w and h represent the width and height of M, c represents the channel number of M, w=256, h=256 and c=3, and then carrying out data enhancement on the cut sub-images M in a rotating, horizontal overturning and vertical overturning mode.

Step 2: the sub-image M is taken as an input source. Then, a single-layer convolutional neural network layer is constructed through a feature extraction module, wherein the convolution kernel size is w×h, w=3, and h=3. By convolving the nerve layersU and O are converted into 64-dimensional features f ₁ ，f ₁ The size of (2) is w×h×64, and the calculation method is as follows:

f ₁ ＝C ₁ (M)(1)

wherein C is ₁ Representing the corresponding convolution operation.

Step 3: at f ₁ As an input source, multi-scale feature extraction of image features is realized through a Unet network, the network structure is shown in fig. 2, shallow image features and deep semantic features are stacked in a feature stitching mode, and the Unet network comprises upsampling, pooling, convolution and activation function operation. After the fine extraction of the features is completed by the Unet network, a high-dimensional multi-scale feature f containing 64 channels is determined ₂ ，f ₂ The size is w×h×64. Similar to building image feature f ₁ Can calculate the high-dimensional multi-scale characteristic f ₂ ：

f ₂ ＝U(f ₁ )(2)

Where U represents the convolution operation of the Unet network. Image features f of underexposure image U and overexposed image O ₁ The training is carried out simultaneously through feature extraction networks which are identical in structure and do not share any learning parameters.

The attention mechanism module is specifically as follows:

the high-dimensional image features output by the feature extraction module are transmitted into the attention mechanism module, the favorable interest channel features are highlighted and fused, the non-interest channel features are restrained, the network structure is shown in figure 3, and the steps are as follows:

step 1: two attention mechanism modules A with the same structure are constructed, the network structure is shown in figure 3, and the attention mechanism modules A are used for retaining rich detail information of the multi-exposure image and highlighting the image characteristics favorable for fusion so as to correct the local distortion and information loss of the fusion image.

Step 2: the attention mechanism module A outputs characteristic graphs f of different exposure images to the Unet network respectively ₂ And performing Squeeze operation, and coding the whole spatial feature on the channel into a global feature by adopting a global average pooling mode, wherein the calculation formula is as follows:

f in the formula _sq (. Cndot.) represents a Squeeze operation, i, j represents a pixel, R ^C Represents the C dimension, f _c Is the result of the Squeeze operation. Then adopting an accounting operation on global features, adopting two full-connection operations for reducing model complexity and improving generalization capability, adopting a ReLU activation function to perform nonlinear processing between full-connection, and finally outputting weight vectors through a normalization function Sigmoid, wherein the accounting operation enables a network to learn the relation among all channels and also obtains the weights f of different channels ₃ The calculation method is as follows:

f ₃ ＝F _ex (f _c ,W)＝σ(g(f _c ,W))＝σ(W ₂ ReLU(W ₁ f _c )) (4)

wherein the method comprises the steps of

Represents W ₁ Dimension is->

Represents W ₂ Dimension is->

r is the scaling factor.

Step 3: image feature f output by utilizing multiplication operation ₂ Weight f of each channel learned by attention mechanism ₃ Multiplying to obtain final image feature f ₄ The calculation method is as follows:

f ₄ ＝F _scale (f ₂ ,f ₃ )＝f ₂ ·f ₃ (5)

in the formula, the expression matrix multiplication operation. The whole operation can be seen as learning the weight coefficient of each channel, so that the model has more distinguishing ability on the characteristics of each channel.

The characteristic reconstruction module is specifically as follows:

the feature reconstruction module reconstructs the high-dimensional pure features of the different exposure images obtained in the previous step to generate a high dynamic range image, and the network structure is shown in fig. 4, and the steps are as follows:

step 1: high-dimensional image features f of underexposed and overexposed images by stitching operation _u,4 ，f _o,4 Obtaining a characteristic diagram F ₀ ，F ₀ The size of (2) is w×h×128, and the calculation method is as follows:

F ₀ ＝concat(f _u,4 +f _o,4 ) (6)

f in _u,4 And f _o,4 Respectively representing the image characteristics obtained after the underexposure image and the overexposure image pass through the attention mechanism, and concat represents the characteristic splicing operation.

Step 2: by F ₀ As an input source, a high dynamic range image is obtained through a characteristic reconstruction module, the network structure is shown as figure 4, and the characteristic reconstruction module firstly uses a single-layer convolutional neural network layer to splice a characteristic diagram F ₀ Map F converted to 64 channels ₁ ，F ₁ The size is w multiplied by h multiplied by 64, and then the characteristic diagram F is obtained ₁ Output feature map F provided to DRDB unit ₂ Wherein the DRDB unit is obtained by improving residual dense units (Residual Dense Block, RDB) based on dilation convolution, and finally sequentially convolving the feature map F with 2 convolution layers ₂ Obtaining a characteristic diagram F ₃ And a high dynamic range image, wherein F ₃ The size of (2) is w×h×16, and the calculation method is as follows:

F ₁ ＝C ₁ (F ₀ )(7)

F ₂ ＝DRDB(F ₁ )(8)

F ₃ ＝C ₂ (F ₂ )(9)

HDR＝C ₃ (F ₃ )(10)

wherein DRDB represents a dilated residual dense unit convolution operation, C ₁ ，C ₂ ，C ₃ Representing a single convolution layer, HDR represents a high dynamic range image.

The loss function is specifically as follows:

the loss function determines the type of image features extracted and the proportional relationship between the different types of image features. In order to meet the requirements that the fusion image not only contains detailed information of a bright part area of the underexposure image and a dark part area of the overexposure image, but also contains brightness information of different exposure images, and simultaneously meets the requirements of visual perception characteristics of human eyes. The invention designs a method based on L ₂ A multi-loss function of the content loss of norms and the SSIM-based structural loss, comprising the steps of:

step 1: the structural similarity metric SSIM models the loss and distortion of similarity of the source image sequence and the fused image based on the similarity of the luminance features, contrast, and structural information. Let x be the input image and y be the output image, the mathematical expression is:

wherein μ and σ represent the mean and standard deviation, respectively, σ _xy Representing the covariance of x, y, C ₁ ，C ₂ And C ₃ Is a constant coefficient. The distortion of the source image sequence and the fusion image in three aspects of brightness, contrast and structure is fully considered, and the structural loss L is designed for the multi-exposure image fusion task _SSIM . F represents the fused image, L _SSIM The mathematical expression of (2) is:

L _SSIM ＝α _o SSIM _O,F +α _u SSIM _U,F (12)

wherein SSIM _O,F ，SSIM _U,F Respectively representing the structural similarity of the overexposed image O and the underexposed image U with the fusion image F, alpha _o And alpha _u The weight coefficients of the overexposed image O and the underexposed image U are respectively, and in the multi-exposure image fusion task, the overexposed image and the underexposed image have the same texture details, but the brightness intensity is too high or too low. So for the weight coefficient alpha _o And alpha _u The same weight is set to balance to obtain the brightness intensity and texture detail with proper size, which can be expressedThe method comprises the following steps:

α _o ＝α _u (13)

content loss L _content The texture detail information distortion of the multi-exposure image sequence and the fusion image is guaranteed to be minimum, meanwhile, the interference of noise is avoided, and the content loss is calculated as follows:

L _x,y ＝||x-y|| ₂ (14)

calculating Euclidean distance between pixel points of an input image x and an output image y in the formula, wherein I I.I.I.I ₂ Is L ₂ Norms. Content loss may be defined as:

L _content ＝β _O L _O,F +β _U L _U,F (15)

wherein beta is _O And beta _U The weighting coefficients of the overexposed image O and the underexposed image U, respectively, are similar to the structural loss, beta _O And beta _U With the same weight coefficients. In order to realize weight balance between the structure loss function and the content loss function, the super parameter lambda is used for endowing corresponding weight to the structure loss so as to improve the learning capacity of the model. To sum up, the AMEFNet overall loss function can be expressed as:

Loss＝λL _SSIM +L _content (16)

step 2: constraint by loss function and selecting Adam optimizer to obtain parameter beta ₁ ＝0.9，β ₂ =0.999, initial learning rate of 10 ^-4 The learning rate is attenuated by 0.5 times for every 50 iterations, so that the Loss weight Loss is reduced, and model updating is realized.

Step 3: judging whether all the image pairs in the training set are processed or not, and finishing the set iteration times epoch, wherein the epoch is set to be 1000. And (3) ending the algorithm if the processing is finished, and obtaining an attention-based high dynamic range multi-exposure image fusion model AMEFNet, otherwise, executing the step (2).

High dynamic range multi-exposure image generation

Step 1: the underexposure U and the overexposed image O which need to be processed are read, and the high dynamic range image HDR is obtained through a training model AMEFNet with complete parameters, and the calculation method is as follows:

HDR ＝ AMEFNet (U，O) (17)

while the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made without departing from the spirit and scope of the invention.

Claims

1. The high dynamic range multi-exposure image fusion model based on the attention mechanism is characterized by comprising a feature extraction module, an attention mechanism module and a feature reconstruction module, wherein two different exposure images of a target scene are respectively input into two groups of feature extraction modules with the same structure, and two groups of high-dimensional feature images corresponding to the two different exposure images of the target scene are obtained; then, the two groups of high-dimensional feature images are used as input and respectively sent to corresponding attention mechanism modules, so that pure high-dimensional features required by reconstructing the fusion image are obtained; the feature reconstruction module performs fusion reconstruction on the high-dimensional features of the two groups of different exposure images output by the attention mechanism module to obtain a high dynamic range image.

2. A high dynamic range multi-exposure image fusion method based on an attention mechanism realized by the model of claim 1, which is characterized by comprising the following steps:

f ₁ ＝C ₁ (M) (1) wherein C1 represents a corresponding convolution operation;

f ₂ ＝U(f ₁ ) In the formula (2), U represents a convolution operation of the Unet network;

f ₃ ＝F _ex (f _c ,W)＝σ(g(f _c ,W))＝σ(W ₂ ReLU(W ₁ f _c )) (4)

in the method, in the process of the invention,

represents W ₁ Dimension is->

Represents W ₂ Dimension is->

r is a scaling factor;

f ₄ ＝F _scale (f ₂ ,f ₃ )＝f ₂ ·f ₃ (5)

where, represents a matrix multiplication operation;

step 6: high-dimensional image features f of underexposed and overexposed images by stitching operation _u,4 ，f _o,4 The feature map F0 was obtained, and the size of F0 was w×h×128, calculated as follows:

F ₀ ＝concat(f _u,4 +f _o,4 ) (6)

wherein f _u,4 And f _o,4 Respectively representing the image characteristics obtained after the underexposure image and the overexposure image pass through an attention mechanism, wherein concat represents characteristic splicing operation;

F ₁ ＝C ₁ (F ₀ ) (7)

F ₂ ＝DRDB(F ₁ ) (8)

F ₃ ＝C ₂ (F ₂ ) (9)

HDR＝C ₃ (F ₃ ) (10)

Loss＝λL _SSIM +L _content (16)

L _SSIM ＝α _o SSIM _O,F +α _u SSIM _U,F (12)

L _content ＝β _O L _O,F +β _U L _U,F (15)

wherein SSIM _O,F ，SSIM _U,F Respectively representing the structural similarity of the overexposed image O and the underexposed image U with the fusion image F, lambda represents the super parameter and alpha _o And alpha _u The weight coefficients, beta, of the overexposed image O and the underexposed image U respectively _O And beta _U The weight coefficients of the overexposed image O and the underexposed image U are respectively L _O,F 、L _U,F The content similarity of the overexposed image O and the underexposed image U with the fusion image F is respectively represented;

3. The high dynamic range multi-exposure image fusion method based on the attention mechanism of claim 2, wherein: w=256, h=256, c=3 as described in step 1.

4. The high dynamic range multi-exposure image fusion method based on the attention mechanism of claim 2, wherein: in the step 1, data enhancement is realized by rotating, horizontally overturning and vertically overturning.

5. The high dynamic range multi-exposure image fusion method based on the attention mechanism of claim 2, wherein: w=3 and h=3 as described in step 2.

6. The high dynamic range multi-exposure image fusion method based on the attention mechanism of claim 2, wherein: and 8, realizing model updating by adopting an Adam optimizer.