CN115689962A

CN115689962A - Multi-exposure image fusion method based on multi-scale self-encoder

Info

Publication number: CN115689962A
Application number: CN202211424921.1A
Authority: CN
Inventors: 刘羽; 杨智刚; 成娟; 李畅; 宋仁成; 陈勋
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2022-11-14
Filing date: 2022-11-14
Publication date: 2023-02-03

Abstract

The invention discloses a multi-exposure image fusion method based on a multi-scale self-encoder, which comprises the following steps: 1, preparing data, preprocessing the data, and constructing a multi-scale self-encoder network, which mainly comprises a multi-scale encoder and a decoder, wherein the encoder comprises a convolution and activation function and a visual converter and is mainly used for multi-scale feature extraction; the encoder is a convolution and activation function of multi-scale fusion dense cross connection and mainly used for image reconstruction; and 2, fusing the input multi-exposure image pair, including network training and multi-exposure image fusion. The invention can fully utilize complementary and redundant information in the overexposed image and the underexposed image to fuse high-quality images with better quality, provides images with better quality for human eye observation, and simultaneously provides support for computer vision tasks such as segmentation, classification and the like of the images, thereby assisting human eye identification, computer analysis processing and other researches.

Description

Multi-exposure image fusion method based on multi-scale self-encoder

Technical Field

The invention relates to the technical field of multi-exposure image fusion, in particular to a multi-exposure image fusion method based on a multi-scale self-encoder.

Background

The brightness in natural scenes can often be quite different, with the dynamic range of a single image being much lower than that of natural scenes due to the limitations of the imaging device. The scene being photographed may be affected by light, weather, sun altitude and other factors, and overexposure and underexposure often occur. A single image may not completely reflect the light and dark levels of the scene and some information may be lost, resulting in unsatisfactory imaging. It remains challenging to solve the problem of incomplete dynamic range matching in the dynamic response of existing imaging devices, display monitors, and human eyes to real natural scenes. Multi-exposure image fusion (MEF) technology provides a simple, economical and efficient way to overcome the contradiction between HDR imaging and Low Dynamic Range (LDR) display. It avoids the complexity of imaging hardware circuit design and reduces the weight and power consumption of the whole device, it improves image quality and has important applicability in the field of digital photography. MEF is a process of fusing multiple images differently exposed to produce a single visually pleasing and high quality fused image, and in particular MEF is a process of fusing multiple more differently exposed images to produce a single visually pleasing and high quality fused image. MEFs are similar to other image fusion tasks, such as medical image fusion and remote sensing image fusion, in that they both combine important information from multiple source images together to produce a high quality fused image. The main difference between these image fusion tasks is the source image, which contains the different information to be fused. In contrast, the source image of the MEF is an image with a different exposure. MEFs have attracted considerable attention due to their effectiveness in facilitating high quality images.

The multi-exposure image fusion is an important branch in image fusion, and the main task of the multi-exposure image fusion is to perform fusion processing on a plurality of images with different exposure degrees in the same scene so as to obtain an image with a high dynamic range and high quality. The existing method has the problems that firstly, the method mainly based on the traditional method has poor robustness due to the manually designed feature extractor and rule, has poor effect in different scenes, and can generate uneven brightness and artifact; the other method is a deep learning-based method, which relies on training of a multi-exposure data set, and the multi-exposure data set is smaller than a natural data set, so that a network with more layers and deeper parameters cannot be trained fully; most of the previous methods are based on convolution, and lack of neglected global information; but also lacks multi-scale feature fusion and interaction.

Through multi-exposure image fusion, people can obtain all important characteristic information from one image, so that the human eye perception and the subsequent image processing, such as target detection, target segmentation, edge extraction and the like, are facilitated. Therefore, the realization of the multi-exposure image fusion technology has important significance.

Disclosure of Invention

The invention provides a multi-exposure image fusion method based on a multi-scale self-encoder, aiming at overcoming the problems of the existing image fusion in the multi-exposure image fusion, so as to provide better image characteristic expression by fully utilizing complementary and redundant information of images with different exposure degrees and reconstruct an image with higher quality, thereby providing an image with better quality for human eye observation and simultaneously providing support for computer vision tasks such as segmentation, classification and the like of the image.

The invention adopts the following technical scheme for solving the problems:

the invention discloses a multi-exposure image fusion method based on a multi-scale self-encoder, which is characterized by comprising the following steps:

step 1: obtaining P RGB natural images and converting the images into gray level images, and marking as { I ₁ ,I ₂ ,…,I _p ,…,I _P And as a training set, wherein I _p Representing the p-th gray scaleAn image;

step 2, constructing a multi-scale self-encoder network, comprising: a multi-scale encoder and decoder;

step 2.1: the multi-scale encoder comprises W convolution blocks A ₁ ,A ₂ ,…,A _w ,…,A _W X convolution blocks N ₁ ,N ₂ ,…,N _x ,…,N _X And Y vision converter Trans ₁ ,Trans ₂ ,…,Trans _y ,…,Trans _Y Wherein A is _w Represents the w-th convolution block, and the w-th convolution block A _w The method comprises the following steps of (1) including a convolution layer with convolution kernel of A multiplied by A and a ReLU activation function; n is a radical of hydrogen _x Represents the xth volume block; and the xth convolution block N _x The method comprises the following steps of (1) including a convolution layer with convolution kernel of NxN and a ReLU activation function; trans _y Represents the y-th vision converter; y = W-2;

step 2.1.1: the p-th gray image I _p Input into the multi-scale encoder and sequentially pass through the 1 st convolution block A ₁ And the 1 st convolution block N ₁ After the treatment, obtaining a primary shallow feature map

Then sequentially passes through Y vision converters Trans ₁ ,Trans ₂ ,…,Trans _y ,…,Trans _Y After the treatment, Y primary deep characteristic maps are correspondingly obtained

Wherein, the first and the second end of the pipe are connected with each other,

representing the y primary deep feature map;

step 2.1.2: will be provided with

Input the 2 nd convolution block A ₂ Processing to obtain shallow layer characteristic diagram with channel number of C

Will be provided with

Sequentially corresponding to input W-2 convolution blocks A ₃ ,…,A _w ,…,A _W Processing to obtain deep characteristic diagram with channel number C

represents the w-2 deep characteristic diagram; from shallow feature maps

And deep level feature map

Form W-1 comprehensive characteristic graphs

Step 2.1.3: w-1 th comprehensive characteristic diagram

Through the Xth convolution block N _X After the treatment, obtaining a plurality of X-1 multi-scale characteristic graphs

Step 2.1.4: for the W-1 comprehensive characteristic diagram

Obtaining the W-2 th up-sampling characteristic after up-sampling

The up-sampling feature

And then with the W-2 comprehensive characteristic diagram

Performing element-by-element addition to obtain W-2 intermediate characteristic diagram

The W-2 intermediate feature map

Through the X-1 th convolution block N _X-1 Then obtaining the X-2 characteristic

Step 2.1.5: to pair

Obtaining W-3 th up-sampling characteristic after up-sampling

And then combined with the W-3 th feature

Performing element-by-element addition to obtain W-3 intermediate characteristic diagram

The W-3 intermediate feature map

Through the X-2 th convolution block N _X-2 Then obtaining the X-3 characteristic

Step 2.1.6: the processes according to step 2.1.5 are sequentially performed

After processing, obtaining X-1 multi-scale characteristic graphs

Representing an x-1 th multi-scale feature map;

step 2.2: the decoder comprises P convolution blocks and an output block Conv, wherein the P convolution blocks are connected densely in an upper triangle mode and are recorded as the upper triangle mode in sequence

Wherein, decoder _(i,j) A convolutional block Decoder representing the ith column and the jth row _(i,j) The method comprises the following steps of (1) including a convolution layer with convolution kernel of NxN and a ReLU activation function; i = J = X-2,

the output block Conv includes a convolution layer with convolution kernel a × a and a ReLU activation function;

step 2.2.1: multiple X-1 multi-scale features to be output by an encoder

The row subscripts of the corresponding decoders are respectively marked as 1,2, \8230;, J \8230;, and J; then will be

In turn mark as

Wherein the content of the first and second substances,

a multi-scale feature representing a jth output;

step (ii) of2.2.2: decoder for 1 st column j row _(1,j) Is inputted as

And upsampling the feature map

Decoder for 1 st column j row _(1,j) Is characteristic I _(1,j) ；

Step 2.2.3: convolution block Decoder for jth row of other columns except for first column _(i,j) Is input as a jth multi-scale feature map

J +1 th row, I-1 th column decoder output characteristic I _(i-1,j+1) Up-sampled feature map of

And a characteristic diagram I output by a decoder of the ith-1 column of the jth row and the ith-2 column of the jth row to the 1 st column of the jth row _(i-1,j) …I _(1,j) Splicing the characteristic graphs; volume block Decoder _(i,j) Is a characteristic diagram I _(i,j) Thereby is composed of

Obtaining characteristic I after P convolution blocks of decoder _(I,1) (ii) a Said characteristic I _(I,1) Processing the output result by an output block Conv to obtain an output result O _p ；

And step 3: the overall loss function L of the multi-scale self-encoder network is constructed using equation (1):

L＝L _ssim +λL _pixel (1)

in the formula (1), λ represents a weight coefficient of pixel loss, L _ssim Representing the structural similarity loss function and obtained from equation (2), L _pixel A pixel loss function is expressed and obtained by formula (3);

L _ssim ＝1-SSIM(I _p ,O _p ) (2)

in formula (2), SSIM represents structural similarity;

and 4, step 4: based on the training set, training the multi-scale self-encoder network by adopting a back propagation algorithm, and calculating the total loss function L to adjust network parameters until the maximum iteration number is reached, thereby obtaining the trained multi-scale self-encoder network;

and 5: obtaining B pairs of multi-exposure images and converting the images into Ycbcr color gamut, and then only keeping the image pair of Y channel, thereby obtaining preprocessed B pairs of multi-exposure images { (I) _o1 ,I _u1 ),(I _o2 ,I _u2 ),…,(I _ob ,I _ub ),…,(I _oB ,I _uB ) Wherein (I) _ob ,I _ub ) Denotes the b-th pair of multiple exposure images, I _ob Representing the overexposed image of the b-th Y channel, I _ub An underexposed image representing the b-th Y-channel;

step 6: multiple exposure image pair { (I) _o1 ,I _u1 ),(I _o2 ,I _u2 ),…,(I _ob ,I _ub ),…,(I _oB ,I _uB ) Inputting the data into a trained multi-scale encoder for processing to obtain overexposure image characteristics (Io) of S scales _f1 ，Io _f2 ，…,Io _fs ，…,Io _fS } and underexposed image features { Iu _f1 ,Iu _f2 ,…,Iu _fs ,…,Iu _fS In which, io _fs Representing the characteristic of the s-th overexposed image, iu _fs Representing the feature of the s < th > underexposed image;

the s-th overexposure image characteristic Io _fs And the s-th underexposed image feature Iu _fs Adding the obtained sums and averaging to obtain the s-th fusion feature f _s To obtain a fused feature set { f } ₁ ,f ₂ ,…,f _s ,…,f _S Inputting the result into the trained decoder to obtain the fused result { Output } ₁ ,Output ₂ ,…,Output _b ,…,Output _B Wherein, output _b Overexposed image I representing the b-th Y channel _ob Underexposed image I of the b-th Y channel _ub The fusion result of (2);

will { Output } ₁ ,Output ₂ ,…,Output _b ,…,Output _B The color image is converted to an RGB domain through an Ycbcr domain, and finally, a color image with uniform exposure { Result }is obtained ₁ ,Result ₂ ,…,Result _b ,…,Result _B Where Result _b Representing the b-th color image result.

The electronic equipment comprises a memory and a processor, and is characterized in that the memory is used for storing a program for supporting the processor to execute the multi-exposure image fusion method, and the processor is configured to execute the program stored in the memory.

The invention relates to a computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, performs the steps of the multi-exposure image fusion method.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides a unified network framework, simultaneously realizes the fusion tasks of the overexposed image and the underexposed image, fully utilizes the redundant and complementary information among the images in different modes, and fuses the images with high quality. Compared with the existing method which needs to train the learning network for the multi-exposure image, the method can realize the high-quality fusion of the multi-exposure image only by training on a normal natural image data set (such as a 2014MS-COCO data set), thereby avoiding depending on the multi-exposure data set and training parameters, and avoiding the network with more layers and deeper layers.

2. The invention designs a top-down and bottom-up encoder which combines the multi-scale characteristics of CNN and transformer to effectively extract local and global characteristics; the pyramid multi-scale extraction is used for well processing the characteristics of the multi-scale changed picture; the method can better ensure that the characteristics of different scales have stronger semantic information; and the details of the bottom layer and the high-level semantic information are integrated, so that better detail expression is brought to the fusion result.

3. The invention designs the decoder consisting of upper triangular dense connection and upper sampling, can effectively fuse multi-scale features, fully utilizes depth features, keeps more information of different scales extracted by a coder network, prevents the network from losing shallow features while extracting deeper features, enables the feature information extracted by the network to be more comprehensive, and further can fully utilize the multi-scale features obtained by the decoder, and enhances the quality of fused images.

Drawings

FIG. 1 is a flow chart of a multi-exposure image fusion method based on a multi-scale self-encoder according to the present invention;

FIG. 2 is a schematic diagram of a network architecture according to the present invention;

FIG. 3 is a schematic view of the fusion construct of the present invention;

FIG. 4 is a schematic diagram of an encoder of the present invention;

FIG. 5 is a block diagram of a decoder according to the present invention.

Detailed Description

In this embodiment, a general flow of a multi-exposure image fusion method based on a multi-scale self-encoder is shown in fig. 1, and includes the following steps:

step 1: obtaining P RGB natural images, converting the P RGB natural images into gray level images, and recording the gray level images as { I } ₁ ,I ₂ ,…,I _p ,…,I _P In which I _p Representing the p-th grayscale image.

Step 2: constructing a multi-scale self-encoder network which adopts the multi-scale self-encoder network shown in FIG. 2 and comprises a multi-scale encoder and a decoder;

step 2.1: the multiscale encoder comprises W convolutional blocks A ₁ ,A ₂ ,…,A _w ,…,A _W X convolution blocks N ₁ ,N ₂ ,…,N _x ,…,N _X And Y vision converter Trans ₁ ,Trans ₂ ,…,Trans _y ,…,Trans _Y Wherein A is _w Represents the w-th convolution block, and the w-th convolution block A _w The method comprises the following steps of (1) including a convolution layer with convolution kernel of A multiplied by A and a ReLU activation function; n is a radical of _x Represents the x-th convolution block(ii) a And the x-th convolution block N _x The method comprises the following steps of (1) including a convolution layer with convolution kernel of NxN and a ReLU activation function; trans _y Represents the y-th vision converter; y = W-2; in the present embodiment, as shown in fig. 4, W =5, x =5, y =3.

Step 2.1.1: p-th gray image I _p Input into a multi-scale encoder and sequentially pass through a 1 st convolution block A ₁ And the 1 st convolution block N ₁ After the treatment, obtaining a primary shallow feature map

Then sequentially passes through Y visual converters Trans ₁ ,Trans ₂ ,…,Trans _y ,…,Trans _Y After the treatment, Y primary deep characteristic maps are correspondingly obtained

Wherein the content of the first and second substances,

representing the y-th primary deep profile.

Step 2.1.2: will be provided with

Input the 2 nd convolution block A ₂ Processing to obtain shallow layer characteristic diagram with channel number C

Will be provided with

Corresponding to the input W-2 convolution blocks A in sequence ₃ ,…,A _w ,…,A _W Processing to obtain deep characteristic diagram with channel number C

represents the w-2 deep characteristic diagram; from shallow feature maps

And deep level feature map

Form W-1 comprehensive characteristic graphs

Step 2.1.3: w-1 comprehensive characteristic diagram

Through the Xth convolution block N _X After the treatment, obtaining an X-1 th multi-scale characteristic diagram

Step 2.1.4: for the W-1 comprehensive characteristic diagram

Obtaining the W-2 th up-sampling characteristic after up-sampling

Upsampling feature

And then with the W-2 comprehensive characteristic diagram

W-2 th intermediate feature map

Through the X-1 th convolution block N _X-1 Then obtaining the X-2 characteristics

Step 2.1.5: to pair

Obtaining W-3 th up-sampling characteristic after up-sampling

Combined with the W-3 th feature

W-3 intermediate feature map

Passes through the X-2 th convolution block N _X-2 Then obtaining the X-3 characteristics

Step 2.1.6: according to the process of step 2.1.5

After processing, X-1 multi-scale characteristic graphs are obtained

Showing an x-1 th multi-scale feature map.

Encoder as shown in FIG. 4, input I _p 256 × 256 × 1 images, passing through A ₁ The output is a feature map 236 × 256 × 16, passing through N ₁ Is a primary shallow feature map 256 × 256 × 32, and is recorded as

Sequentially pass through Trans ₁ ,Trans ₂ ,Trans ₃ The obtained primary deep characteristic maps are respectively recorded as 128 × 128 × 64, 64 × 64 × 128 and 32 × 32 × 256

Wherein the vision converter is a standard vision converter. Followed by

By passing through A respectively ₂ Obtaining a shallow layer characteristic diagram 256 multiplied by 128 and noted as

Primary deep profile

By rolling up block A ₃ 、A ₄ 、A ₅ Obtaining deep characteristic maps of 128 × 128 × 128, 64 × 64 × 128 and 32 × 32 × 128, which are respectively marked as

Shallow feature map

And deep level feature map

Form 4 comprehensive characteristic maps

4 th comprehensive characteristic diagram

Through the 5 th convolution block N ₅ After processing, a 4 th multi-scale feature map 32 × 32 × 256 is obtained and recorded as

For characteristic diagram

Upsampling to obtain an upsampled profile

Up-sampled feature map of

And then with the 3 rd comprehensive characteristic diagram

Performing element-by-element addition to obtain an intermediate feature map of 64 × 64 × 128, and recording as

Middle characteristic diagram of

By a fourth convolution N ₄ Then, a 3 rd multi-scale feature 64 multiplied by 128 is obtained and is marked as

To the middle feature map

Upsampling to obtain an upsampled profile

Up-sampled feature map of

And then with the 2 nd comprehensive characteristic diagram

Performing element-by-element addition to obtain an intermediate feature map of 128 × 128 × 128, and recording as

Middle characteristic diagram of

By a third convolution N ₃ Then obtaining a 2 nd multi-scale characteristic 128 multiplied by 64 which is marked as

To the middle feature map

Upsampling to obtain an upsampled profile

Up-sampled feature map of

Then, the 1 st integrated feature map

Carrying out element-by-element addition operation to obtain an intermediate characteristic diagram of 256 multiplied by 128, which is recorded as

Middle characteristic diagram of

After the third convolution N ₂ Then, the 1 st multi-scale feature 256 multiplied by 32 is obtained and is marked as

Thereby obtaining four multi-scale characteristics

Step 2.2: as shown in fig. 5, the decoder is composed of P convolutional blocks and an output block Conv, where the P convolutional blocks are connected densely at an upper triangle and are sequentially marked as

Wherein, decoder _(i,j) Represents the convolution block of the ith column and jth row, and the convolution block Decoder of the ith column and jth row _(i,j) The method comprises the following steps of (1) including a convolution layer with convolution kernel of NxN and a ReLU activation function; i = J = X-2,

the output block Conv comprises a convolution layer with convolution kernel a × a and a ReLU activation function.

Step 2.2.1: multiple X-1 multi-scale features to be output by an encoder

The row subscripts of the corresponding decoders are respectively marked as 1,2, \8230;, J; then will be

In turn mark as

Wherein the content of the first and second substances,

representing the multi-scale features of the jth output.

In the specific design, the material is selected,

in the description of the decoder it is noted

So as to correspond to the rows.

Step 2.2.2: decoder for 1 st column j row _(1,j) Is inputted as

And up-sampling feature maps thereof

Decoder for 1 st column j row _(1,j) Is characterized by _(1,j) ；

Two times of upsampling and

merging and inputting Decoder _(1,3) Decoder ₁₃ Obtain a characteristic diagram I _(1,3) At the same time as this, the first and second,

two times up sampling and

merging and inputting Decoder _(1,2) Obtain a characteristic diagram I _(1,2) ；

Two times up sampling and

merging input Decoder _(1,1) Get the characteristic diagram I _(1,1) 。

And from the ith row, the ith-1 column, the jth row, the ith-2 column to the jth rowSignature I of decoder output of 1 column _(i-1,j) …I _(1,j) Splicing the feature maps; decoder _(i,j) Is a characteristic diagram I _(i,j) . Thereby composed of

Obtaining characteristic I after P convolution blocks of decoder _(I,1) (ii) a Characteristic I _(I,1) Processing the output result by an output block Conv to obtain an output result O _p 。

In this embodiment, the characteristic diagram I _(1,3) Two times up sampling and

and I _(1,2) Input Decoder after vector splicing _(2,2) Obtain a characteristic diagram I _(2,2) ；

Characteristic diagram I _(1,2) Two times of upsampling and

and I _(1,1) Input Decoder after vector splicing _(2,1) Get the characteristic diagram I _(2,1) ；

Characteristic diagram I _(2,2) Two times up sampling and

I _(1,1) and I _(2,1) Input Decoder after vector splicing _(3,1) Get the characteristic diagram I _(3,1) ；

Wherein the Decoder _(1,1) ，Decoder _(1,2) ，Decoder _(1,3) ，Decoder _(2,1) ，Decoder _(2,2) ，Decoder _(3,1) The input and output channels of (96, 32), (192, 64), (384, 128), (128, 32), (256, 64), (160, 32).

Characteristic diagram I _(3,1) The input/output block Conv obtains an output result O _p 。

And 3, step 3: the total loss function L of the multi-scale self-encoder network is constructed using equation (1):

L＝L _ssim +λL _pixel (1)

L _ssim ＝1-SSIM(I _p ,O _p ) (2)

in formula (2), SSIM represents structural similarity.

And 4, step 4: training the multi-scale self-encoder network by adopting a back propagation algorithm based on a training set, and calculating a total loss function L to adjust network parameters until the maximum iteration number is reached, thereby obtaining the trained multi-scale self-encoder network;

step 6: multiple exposure image pair { (I) _o1 ,I _u1 ),(I _o2 ,I _u2 ),…,(I _ob ,I _ub ),…,(I _oB ,I _uB ) Inputting the data into a trained multi-scale encoder for processing to obtain overexposure image characteristics (Io) of S scales _f1 ，Io _f2 ，…,Io _fs ，…,Io _fS And underexposed image features { Iu } _f1 ,Iu _f2 ,…,Iu _fs ,…,Iu _fS In which, io _fs Representing the s-th overexposed image feature, iu _fs Representing the s-th underexposed image feature.

The s-th overexposed image characteristic Io _fs And the s < th > chipExposed image feature Iu _fs Adding the obtained sums and averaging to obtain the s-th fusion feature f _s To obtain a fused feature set { f } ₁ ,f ₂ ,…,f _s ,…,f _S Is input into the trained decoder, so as to obtain a fused result { Output } ₁ ,Output ₂ ,…,Output _b ,…,Output _B H, output therein _b Overexposed image I representing the b-th Y channel _ob Underexposed image I with the b-th Y channel _ub The fusion process of (3) is shown in FIG. 3.

In this embodiment, an electronic device includes a memory for storing a program that supports a processor to execute the above-described multi-exposure image fusion method, and a processor configured to execute the program stored in the memory.

In this embodiment, a computer-readable storage medium stores a computer program, and the computer program is executed by a processor to execute the steps of the multi-exposure image fusion method.

Claims

1. A multi-exposure image fusion method based on a multi-scale self-encoder is characterized by comprising the following steps:

step 1: obtaining P RGB natural images and converting the images into gray level images, and marking as { I ₁ ,I ₂ ,…,I _p ,…,I _P And as a training set, wherein I _p Representing a pth grayscale image;

step 2.1: the multi-scale encoder comprises W convolution blocks A ₁ ,A ₂ ,…,A _w ,…,A _W X convolution blocks N ₁ ,N ₂ ,…,N _x ,…,N _X And Y vision converter Trans ₁ ,Trans ₂ ,…,Trans _y ,…,Trans _Y Wherein A is _w Represents the w-th convolution block, and the w-th convolution block A _w The method comprises the following steps of (1) including a convolution layer with convolution kernel of A multiplied by A and a ReLU activation function; n is a radical of _x Represents the xth volume block; and the xth convolution block N _x The method comprises the following steps of (1) including a convolution layer with convolution kernel of NxN and a ReLU activation function; trans _y Represents the y-th vision converter; y = W-2;

step 2.1.1: the p-th gray image I _p Input into the multi-scale encoder and sequentially pass through the 1 st convolution block A ₁ And the 1 st convolution block N ₁ After the treatment, obtaining a primary shallow layer characteristic diagram I _N1 Then sequentially passes through Y vision converters Trans ₁ ,Trans ₂ ,…,Trans _y ,…,Trans _Y After the treatment, Y primary deep characteristic maps are correspondingly obtained

representing the y primary deep profile;

step 2.1.2: will be provided with

Will be provided with

Sequentially corresponding to input W-2 convolution blocks A ₃ ,…,A _w ,…,A _W Processing to obtain deep layer characteristics with channel number CDrawing (A)

Wherein the content of the first and second substances,

represents the w-2 deep characteristic diagram; from shallow feature maps

And deep level feature map

Form W-1 comprehensive characteristic graphs

Step 2.1.3: w-1 th comprehensive characteristic diagram

Step 2.1.4: for the W-1 comprehensive characteristic diagram

Obtaining the W-2 th up-sampling characteristic after up-sampling

The up-sampling feature

And then with the W-2 comprehensive characteristic diagram

Carry out element-by-element additionObtaining a W-2 intermediate characteristic diagram after the method operation

The W-2 intermediate feature map

Step 2.1.5: to pair

Obtaining W-3 th up-sampling characteristic after up-sampling

Combined with the W-3 th feature

The W-3 intermediate feature map

Step 2.1.6: according to the process of step 2.1.5

After processing, X-1 multi-scale characteristic graphs are obtained

Representing an x-1 th multi-scale feature map;

step 2.2: the decoder consists of P convolution blocks and an output block Conv, wherein the P convolution blocks are densely connected in an upper triangle mode and are sequentially marked as

Wherein, decoder _(i,j) A convolution block which represents the ith column and the jth row and is a Decoder _(i,j) The method comprises the following steps of (1) including a convolution layer with convolution kernel of NxN and a ReLU activation function; i = J = X-2,

the output block Conv comprises a convolution layer with convolution kernel AxA and a ReLU activation function;

step 2.2.1: multiple scale features of X-1 output from an encoder

In turn mark as

Wherein the content of the first and second substances,

a multi-scale feature representing a jth output;

step 2.2.2: decoder for 1 st column j row _(1,j) Is inputted as

And up-sampling feature maps thereof

Decoder for 1 st column j row _(1,j) Is characteristic I _(1,j) ；

Obtaining characteristic I after P convolution blocks of decoder _(I,1) (ii) a Said feature I _(I,1) Processing the output result by an output block Conv to obtain an output result O _p ；

L＝L _ssim +λL _pixel (1)

L _ssim ＝1-SSIM(I _p ,O _p ) (2)

in formula (2), SSIM represents structural similarity;

and 5: obtaining B pairs of multi-exposure images and converting the images into Ycbcr color gamut, and then only keeping the image pair of Y channel, thereby obtaining preprocessed B pairs of multi-exposure images { (I) _o1 ,I _u1 ),(I _o2 ,I _u2 ),…,(I _ob ,I _ub ),…,(I _oB ,I _uB ) In which (I) _ob ,I _ub ) Denotes the b-th pair of multiple exposure images, I _ob Representing the overexposed image of the b-th Y channel, I _ub An underexposed image representing the b-th Y-channel;

and 6: multiple exposure image pair { (I) _o1 ,I _u1 ),(I _o2 ,I _u2 ),…,(I _ob ,I _ub ),…,(I _oB ,I _uB ) Inputting the data into a trained multi-scale encoder for processing to obtain overexposure image characteristics (Io) of S scales _f1 ，Io _f2 ，…,Io _fs ，…,Io _fS And underexposed image features { Iu } _f1 ,Iu _f2 ,…,Iu _fs ,…,Iu _fS In which Io _fs Representing the characteristic of the s-th overexposed image, iu _fs Representing the feature of the s < th > underexposed image;

the s-th overexposed image characteristic Io _fs And the s-th underexposed image feature Iu _fs Adding the obtained sums and averaging to obtain the s-th fusion feature f _s To obtain a fused feature set { f } ₁ ,f ₂ ,…,f _s ,…,f _S And input into the trained decoder, thereby obtaining the convergenceResultant result { Output } ₁ ,Output ₂ ,…,Output _b ,…,Output _B Wherein, output _b Overexposed image I representing the b-th Y channel _ob Underexposed image I of the b-th Y channel _ub The fusion result of (1);

will { Output } ₁ ,Output ₂ ,…,Output _b ,…,Output _B Converting the image into RGB domain through Ycbcr domain, finally obtaining color image { Result } with uniform exposure ₁ ,Result ₂ ,…,Result _b ,…,Result _B In which Result _b Representing the b-th color image result.

2. An electronic device comprising a memory for storing a program that enables the processor to perform the multi-exposure image fusion method of claim 1 and a processor configured to execute the program stored in the memory.

3. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the multi-exposure image fusion method of claim 1.