CN115689962A - Multi-exposure image fusion method based on multi-scale self-encoder - Google Patents

Multi-exposure image fusion method based on multi-scale self-encoder Download PDF

Info

Publication number
CN115689962A
CN115689962A CN202211424921.1A CN202211424921A CN115689962A CN 115689962 A CN115689962 A CN 115689962A CN 202211424921 A CN202211424921 A CN 202211424921A CN 115689962 A CN115689962 A CN 115689962A
Authority
CN
China
Prior art keywords
image
convolution
scale
output
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211424921.1A
Other languages
Chinese (zh)
Inventor
刘羽
杨智刚
成娟
李畅
宋仁成
陈勋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202211424921.1A priority Critical patent/CN115689962A/en
Publication of CN115689962A publication Critical patent/CN115689962A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Processing (AREA)

Abstract

The invention discloses a multi-exposure image fusion method based on a multi-scale self-encoder, which comprises the following steps: 1, preparing data, preprocessing the data, and constructing a multi-scale self-encoder network, which mainly comprises a multi-scale encoder and a decoder, wherein the encoder comprises a convolution and activation function and a visual converter and is mainly used for multi-scale feature extraction; the encoder is a convolution and activation function of multi-scale fusion dense cross connection and mainly used for image reconstruction; and 2, fusing the input multi-exposure image pair, including network training and multi-exposure image fusion. The invention can fully utilize complementary and redundant information in the overexposed image and the underexposed image to fuse high-quality images with better quality, provides images with better quality for human eye observation, and simultaneously provides support for computer vision tasks such as segmentation, classification and the like of the images, thereby assisting human eye identification, computer analysis processing and other researches.

Description

Multi-exposure image fusion method based on multi-scale self-encoder
Technical Field
The invention relates to the technical field of multi-exposure image fusion, in particular to a multi-exposure image fusion method based on a multi-scale self-encoder.
Background
The brightness in natural scenes can often be quite different, with the dynamic range of a single image being much lower than that of natural scenes due to the limitations of the imaging device. The scene being photographed may be affected by light, weather, sun altitude and other factors, and overexposure and underexposure often occur. A single image may not completely reflect the light and dark levels of the scene and some information may be lost, resulting in unsatisfactory imaging. It remains challenging to solve the problem of incomplete dynamic range matching in the dynamic response of existing imaging devices, display monitors, and human eyes to real natural scenes. Multi-exposure image fusion (MEF) technology provides a simple, economical and efficient way to overcome the contradiction between HDR imaging and Low Dynamic Range (LDR) display. It avoids the complexity of imaging hardware circuit design and reduces the weight and power consumption of the whole device, it improves image quality and has important applicability in the field of digital photography. MEF is a process of fusing multiple images differently exposed to produce a single visually pleasing and high quality fused image, and in particular MEF is a process of fusing multiple more differently exposed images to produce a single visually pleasing and high quality fused image. MEFs are similar to other image fusion tasks, such as medical image fusion and remote sensing image fusion, in that they both combine important information from multiple source images together to produce a high quality fused image. The main difference between these image fusion tasks is the source image, which contains the different information to be fused. In contrast, the source image of the MEF is an image with a different exposure. MEFs have attracted considerable attention due to their effectiveness in facilitating high quality images.
The multi-exposure image fusion is an important branch in image fusion, and the main task of the multi-exposure image fusion is to perform fusion processing on a plurality of images with different exposure degrees in the same scene so as to obtain an image with a high dynamic range and high quality. The existing method has the problems that firstly, the method mainly based on the traditional method has poor robustness due to the manually designed feature extractor and rule, has poor effect in different scenes, and can generate uneven brightness and artifact; the other method is a deep learning-based method, which relies on training of a multi-exposure data set, and the multi-exposure data set is smaller than a natural data set, so that a network with more layers and deeper parameters cannot be trained fully; most of the previous methods are based on convolution, and lack of neglected global information; but also lacks multi-scale feature fusion and interaction.
Through multi-exposure image fusion, people can obtain all important characteristic information from one image, so that the human eye perception and the subsequent image processing, such as target detection, target segmentation, edge extraction and the like, are facilitated. Therefore, the realization of the multi-exposure image fusion technology has important significance.
Disclosure of Invention
The invention provides a multi-exposure image fusion method based on a multi-scale self-encoder, aiming at overcoming the problems of the existing image fusion in the multi-exposure image fusion, so as to provide better image characteristic expression by fully utilizing complementary and redundant information of images with different exposure degrees and reconstruct an image with higher quality, thereby providing an image with better quality for human eye observation and simultaneously providing support for computer vision tasks such as segmentation, classification and the like of the image.
The invention adopts the following technical scheme for solving the problems:
the invention discloses a multi-exposure image fusion method based on a multi-scale self-encoder, which is characterized by comprising the following steps:
step 1: obtaining P RGB natural images and converting the images into gray level images, and marking as { I 1 ,I 2 ,…,I p ,…,I P And as a training set, wherein I p Representing the p-th gray scaleAn image;
step 2, constructing a multi-scale self-encoder network, comprising: a multi-scale encoder and decoder;
step 2.1: the multi-scale encoder comprises W convolution blocks A 1 ,A 2 ,…,A w ,…,A W X convolution blocks N 1 ,N 2 ,…,N x ,…,N X And Y vision converter Trans 1 ,Trans 2 ,…,Trans y ,…,Trans Y Wherein A is w Represents the w-th convolution block, and the w-th convolution block A w The method comprises the following steps of (1) including a convolution layer with convolution kernel of A multiplied by A and a ReLU activation function; n is a radical of hydrogen x Represents the xth volume block; and the xth convolution block N x The method comprises the following steps of (1) including a convolution layer with convolution kernel of NxN and a ReLU activation function; trans y Represents the y-th vision converter; y = W-2;
step 2.1.1: the p-th gray image I p Input into the multi-scale encoder and sequentially pass through the 1 st convolution block A 1 And the 1 st convolution block N 1 After the treatment, obtaining a primary shallow feature map
Figure BDA0003941743700000021
Then sequentially passes through Y vision converters Trans 1 ,Trans 2 ,…,Trans y ,…,Trans Y After the treatment, Y primary deep characteristic maps are correspondingly obtained
Figure BDA0003941743700000022
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003941743700000023
representing the y primary deep feature map;
step 2.1.2: will be provided with
Figure BDA0003941743700000024
Input the 2 nd convolution block A 2 Processing to obtain shallow layer characteristic diagram with channel number of C
Figure BDA0003941743700000025
Will be provided with
Figure BDA0003941743700000026
Sequentially corresponding to input W-2 convolution blocks A 3 ,…,A w ,…,A W Processing to obtain deep characteristic diagram with channel number C
Figure BDA0003941743700000027
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003941743700000028
represents the w-2 deep characteristic diagram; from shallow feature maps
Figure BDA0003941743700000029
And deep level feature map
Figure BDA00039417437000000210
Form W-1 comprehensive characteristic graphs
Figure BDA00039417437000000211
Step 2.1.3: w-1 th comprehensive characteristic diagram
Figure BDA00039417437000000212
Through the Xth convolution block N X After the treatment, obtaining a plurality of X-1 multi-scale characteristic graphs
Figure BDA00039417437000000213
Step 2.1.4: for the W-1 comprehensive characteristic diagram
Figure BDA0003941743700000031
Obtaining the W-2 th up-sampling characteristic after up-sampling
Figure BDA0003941743700000032
The up-sampling feature
Figure BDA0003941743700000033
And then with the W-2 comprehensive characteristic diagram
Figure BDA0003941743700000034
Performing element-by-element addition to obtain W-2 intermediate characteristic diagram
Figure BDA0003941743700000035
The W-2 intermediate feature map
Figure BDA0003941743700000036
Through the X-1 th convolution block N X-1 Then obtaining the X-2 characteristic
Figure BDA0003941743700000037
Step 2.1.5: to pair
Figure BDA0003941743700000038
Obtaining W-3 th up-sampling characteristic after up-sampling
Figure BDA0003941743700000039
Figure BDA00039417437000000310
And then combined with the W-3 th feature
Figure BDA00039417437000000311
Performing element-by-element addition to obtain W-3 intermediate characteristic diagram
Figure BDA00039417437000000312
The W-3 intermediate feature map
Figure BDA00039417437000000313
Through the X-2 th convolution block N X-2 Then obtaining the X-3 characteristic
Figure BDA00039417437000000314
Step 2.1.6: the processes according to step 2.1.5 are sequentially performed
Figure BDA00039417437000000315
After processing, obtaining X-1 multi-scale characteristic graphs
Figure BDA00039417437000000316
Figure BDA00039417437000000317
Representing an x-1 th multi-scale feature map;
step 2.2: the decoder comprises P convolution blocks and an output block Conv, wherein the P convolution blocks are connected densely in an upper triangle mode and are recorded as the upper triangle mode in sequence
Figure BDA00039417437000000318
Wherein, decoder (i,j) A convolutional block Decoder representing the ith column and the jth row (i,j) The method comprises the following steps of (1) including a convolution layer with convolution kernel of NxN and a ReLU activation function; i = J = X-2,
Figure BDA00039417437000000319
the output block Conv includes a convolution layer with convolution kernel a × a and a ReLU activation function;
step 2.2.1: multiple X-1 multi-scale features to be output by an encoder
Figure BDA00039417437000000320
The row subscripts of the corresponding decoders are respectively marked as 1,2, \8230;, J \8230;, and J; then will be
Figure BDA00039417437000000321
In turn mark as
Figure BDA00039417437000000322
Wherein the content of the first and second substances,
Figure BDA00039417437000000323
a multi-scale feature representing a jth output;
step (ii) of2.2.2: decoder for 1 st column j row (1,j) Is inputted as
Figure BDA00039417437000000324
And upsampling the feature map
Figure BDA00039417437000000325
Decoder for 1 st column j row (1,j) Is characteristic I (1,j)
Step 2.2.3: convolution block Decoder for jth row of other columns except for first column (i,j) Is input as a jth multi-scale feature map
Figure BDA0003941743700000041
J +1 th row, I-1 th column decoder output characteristic I (i-1,j+1) Up-sampled feature map of
Figure BDA0003941743700000042
And a characteristic diagram I output by a decoder of the ith-1 column of the jth row and the ith-2 column of the jth row to the 1 st column of the jth row (i-1,j) …I (1,j) Splicing the characteristic graphs; volume block Decoder (i,j) Is a characteristic diagram I (i,j) Thereby is composed of
Figure BDA0003941743700000043
Obtaining characteristic I after P convolution blocks of decoder (I,1) (ii) a Said characteristic I (I,1) Processing the output result by an output block Conv to obtain an output result O p
And step 3: the overall loss function L of the multi-scale self-encoder network is constructed using equation (1):
L=L ssim +λL pixel (1)
in the formula (1), λ represents a weight coefficient of pixel loss, L ssim Representing the structural similarity loss function and obtained from equation (2), L pixel A pixel loss function is expressed and obtained by formula (3);
L ssim =1-SSIM(I p ,O p ) (2)
Figure BDA0003941743700000044
in formula (2), SSIM represents structural similarity;
and 4, step 4: based on the training set, training the multi-scale self-encoder network by adopting a back propagation algorithm, and calculating the total loss function L to adjust network parameters until the maximum iteration number is reached, thereby obtaining the trained multi-scale self-encoder network;
and 5: obtaining B pairs of multi-exposure images and converting the images into Ycbcr color gamut, and then only keeping the image pair of Y channel, thereby obtaining preprocessed B pairs of multi-exposure images { (I) o1 ,I u1 ),(I o2 ,I u2 ),…,(I ob ,I ub ),…,(I oB ,I uB ) Wherein (I) ob ,I ub ) Denotes the b-th pair of multiple exposure images, I ob Representing the overexposed image of the b-th Y channel, I ub An underexposed image representing the b-th Y-channel;
step 6: multiple exposure image pair { (I) o1 ,I u1 ),(I o2 ,I u2 ),…,(I ob ,I ub ),…,(I oB ,I uB ) Inputting the data into a trained multi-scale encoder for processing to obtain overexposure image characteristics (Io) of S scales f1 ,Io f2 ,…,Io fs ,…,Io fS } and underexposed image features { Iu f1 ,Iu f2 ,…,Iu fs ,…,Iu fS In which, io fs Representing the characteristic of the s-th overexposed image, iu fs Representing the feature of the s < th > underexposed image;
the s-th overexposure image characteristic Io fs And the s-th underexposed image feature Iu fs Adding the obtained sums and averaging to obtain the s-th fusion feature f s To obtain a fused feature set { f } 1 ,f 2 ,…,f s ,…,f S Inputting the result into the trained decoder to obtain the fused result { Output } 1 ,Output 2 ,…,Output b ,…,Output B Wherein, output b Overexposed image I representing the b-th Y channel ob Underexposed image I of the b-th Y channel ub The fusion result of (2);
will { Output } 1 ,Output 2 ,…,Output b ,…,Output B The color image is converted to an RGB domain through an Ycbcr domain, and finally, a color image with uniform exposure { Result }is obtained 1 ,Result 2 ,…,Result b ,…,Result B Where Result b Representing the b-th color image result.
The electronic equipment comprises a memory and a processor, and is characterized in that the memory is used for storing a program for supporting the processor to execute the multi-exposure image fusion method, and the processor is configured to execute the program stored in the memory.
The invention relates to a computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, performs the steps of the multi-exposure image fusion method.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention provides a unified network framework, simultaneously realizes the fusion tasks of the overexposed image and the underexposed image, fully utilizes the redundant and complementary information among the images in different modes, and fuses the images with high quality. Compared with the existing method which needs to train the learning network for the multi-exposure image, the method can realize the high-quality fusion of the multi-exposure image only by training on a normal natural image data set (such as a 2014MS-COCO data set), thereby avoiding depending on the multi-exposure data set and training parameters, and avoiding the network with more layers and deeper layers.
2. The invention designs a top-down and bottom-up encoder which combines the multi-scale characteristics of CNN and transformer to effectively extract local and global characteristics; the pyramid multi-scale extraction is used for well processing the characteristics of the multi-scale changed picture; the method can better ensure that the characteristics of different scales have stronger semantic information; and the details of the bottom layer and the high-level semantic information are integrated, so that better detail expression is brought to the fusion result.
3. The invention designs the decoder consisting of upper triangular dense connection and upper sampling, can effectively fuse multi-scale features, fully utilizes depth features, keeps more information of different scales extracted by a coder network, prevents the network from losing shallow features while extracting deeper features, enables the feature information extracted by the network to be more comprehensive, and further can fully utilize the multi-scale features obtained by the decoder, and enhances the quality of fused images.
Drawings
FIG. 1 is a flow chart of a multi-exposure image fusion method based on a multi-scale self-encoder according to the present invention;
FIG. 2 is a schematic diagram of a network architecture according to the present invention;
FIG. 3 is a schematic view of the fusion construct of the present invention;
FIG. 4 is a schematic diagram of an encoder of the present invention;
FIG. 5 is a block diagram of a decoder according to the present invention.
Detailed Description
In this embodiment, a general flow of a multi-exposure image fusion method based on a multi-scale self-encoder is shown in fig. 1, and includes the following steps:
step 1: obtaining P RGB natural images, converting the P RGB natural images into gray level images, and recording the gray level images as { I } 1 ,I 2 ,…,I p ,…,I P In which I p Representing the p-th grayscale image.
Step 2: constructing a multi-scale self-encoder network which adopts the multi-scale self-encoder network shown in FIG. 2 and comprises a multi-scale encoder and a decoder;
step 2.1: the multiscale encoder comprises W convolutional blocks A 1 ,A 2 ,…,A w ,…,A W X convolution blocks N 1 ,N 2 ,…,N x ,…,N X And Y vision converter Trans 1 ,Trans 2 ,…,Trans y ,…,Trans Y Wherein A is w Represents the w-th convolution block, and the w-th convolution block A w The method comprises the following steps of (1) including a convolution layer with convolution kernel of A multiplied by A and a ReLU activation function; n is a radical of x Represents the x-th convolution block(ii) a And the x-th convolution block N x The method comprises the following steps of (1) including a convolution layer with convolution kernel of NxN and a ReLU activation function; trans y Represents the y-th vision converter; y = W-2; in the present embodiment, as shown in fig. 4, W =5, x =5, y =3.
Step 2.1.1: p-th gray image I p Input into a multi-scale encoder and sequentially pass through a 1 st convolution block A 1 And the 1 st convolution block N 1 After the treatment, obtaining a primary shallow feature map
Figure BDA0003941743700000061
Then sequentially passes through Y visual converters Trans 1 ,Trans 2 ,…,Trans y ,…,Trans Y After the treatment, Y primary deep characteristic maps are correspondingly obtained
Figure BDA0003941743700000062
Wherein the content of the first and second substances,
Figure BDA0003941743700000063
representing the y-th primary deep profile.
Step 2.1.2: will be provided with
Figure BDA0003941743700000064
Input the 2 nd convolution block A 2 Processing to obtain shallow layer characteristic diagram with channel number C
Figure BDA0003941743700000065
Will be provided with
Figure BDA0003941743700000066
Corresponding to the input W-2 convolution blocks A in sequence 3 ,…,A w ,…,A W Processing to obtain deep characteristic diagram with channel number C
Figure BDA0003941743700000067
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003941743700000068
represents the w-2 deep characteristic diagram; from shallow feature maps
Figure BDA0003941743700000069
And deep level feature map
Figure BDA00039417437000000610
Form W-1 comprehensive characteristic graphs
Figure BDA00039417437000000611
Step 2.1.3: w-1 comprehensive characteristic diagram
Figure BDA00039417437000000612
Through the Xth convolution block N X After the treatment, obtaining an X-1 th multi-scale characteristic diagram
Figure BDA0003941743700000071
Step 2.1.4: for the W-1 comprehensive characteristic diagram
Figure BDA0003941743700000072
Obtaining the W-2 th up-sampling characteristic after up-sampling
Figure BDA0003941743700000073
Upsampling feature
Figure BDA0003941743700000074
And then with the W-2 comprehensive characteristic diagram
Figure BDA0003941743700000075
Performing element-by-element addition to obtain W-2 intermediate characteristic diagram
Figure BDA0003941743700000076
W-2 th intermediate feature map
Figure BDA0003941743700000077
Through the X-1 th convolution block N X-1 Then obtaining the X-2 characteristics
Figure BDA0003941743700000078
Step 2.1.5: to pair
Figure BDA0003941743700000079
Obtaining W-3 th up-sampling characteristic after up-sampling
Figure BDA00039417437000000710
Figure BDA00039417437000000711
Combined with the W-3 th feature
Figure BDA00039417437000000712
Performing element-by-element addition to obtain W-3 intermediate characteristic diagram
Figure BDA00039417437000000713
W-3 intermediate feature map
Figure BDA00039417437000000714
Passes through the X-2 th convolution block N X-2 Then obtaining the X-3 characteristics
Figure BDA00039417437000000715
Step 2.1.6: according to the process of step 2.1.5
Figure BDA00039417437000000716
After processing, X-1 multi-scale characteristic graphs are obtained
Figure BDA00039417437000000717
Figure BDA00039417437000000718
Showing an x-1 th multi-scale feature map.
Encoder as shown in FIG. 4, input I p 256 × 256 × 1 images, passing through A 1 The output is a feature map 236 × 256 × 16, passing through N 1 Is a primary shallow feature map 256 × 256 × 32, and is recorded as
Figure BDA00039417437000000719
Figure BDA00039417437000000720
Sequentially pass through Trans 1 ,Trans 2 ,Trans 3 The obtained primary deep characteristic maps are respectively recorded as 128 × 128 × 64, 64 × 64 × 128 and 32 × 32 × 256
Figure BDA00039417437000000721
Wherein the vision converter is a standard vision converter. Followed by
Figure BDA00039417437000000722
By passing through A respectively 2 Obtaining a shallow layer characteristic diagram 256 multiplied by 128 and noted as
Figure BDA00039417437000000723
Primary deep profile
Figure BDA00039417437000000724
By rolling up block A 3 、A 4 、A 5 Obtaining deep characteristic maps of 128 × 128 × 128, 64 × 64 × 128 and 32 × 32 × 128, which are respectively marked as
Figure BDA00039417437000000725
Shallow feature map
Figure BDA00039417437000000726
And deep level feature map
Figure BDA00039417437000000727
Form 4 comprehensive characteristic maps
Figure BDA00039417437000000728
4 th comprehensive characteristic diagram
Figure BDA00039417437000000729
Through the 5 th convolution block N 5 After processing, a 4 th multi-scale feature map 32 × 32 × 256 is obtained and recorded as
Figure BDA00039417437000000730
For characteristic diagram
Figure BDA00039417437000000731
Upsampling to obtain an upsampled profile
Figure BDA00039417437000000732
Up-sampled feature map of
Figure BDA00039417437000000733
And then with the 3 rd comprehensive characteristic diagram
Figure BDA00039417437000000734
Performing element-by-element addition to obtain an intermediate feature map of 64 × 64 × 128, and recording as
Figure BDA00039417437000000735
Middle characteristic diagram of
Figure BDA00039417437000000736
By a fourth convolution N 4 Then, a 3 rd multi-scale feature 64 multiplied by 128 is obtained and is marked as
Figure BDA00039417437000000737
To the middle feature map
Figure BDA00039417437000000738
Upsampling to obtain an upsampled profile
Figure BDA00039417437000000739
Up-sampled feature map of
Figure BDA00039417437000000740
And then with the 2 nd comprehensive characteristic diagram
Figure BDA00039417437000000741
Performing element-by-element addition to obtain an intermediate feature map of 128 × 128 × 128, and recording as
Figure BDA00039417437000000742
Middle characteristic diagram of
Figure BDA0003941743700000081
By a third convolution N 3 Then obtaining a 2 nd multi-scale characteristic 128 multiplied by 64 which is marked as
Figure BDA0003941743700000082
To the middle feature map
Figure BDA0003941743700000083
Upsampling to obtain an upsampled profile
Figure BDA0003941743700000084
Up-sampled feature map of
Figure BDA0003941743700000085
Then, the 1 st integrated feature map
Figure BDA0003941743700000086
Carrying out element-by-element addition operation to obtain an intermediate characteristic diagram of 256 multiplied by 128, which is recorded as
Figure BDA0003941743700000087
Middle characteristic diagram of
Figure BDA0003941743700000088
After the third convolution N 2 Then, the 1 st multi-scale feature 256 multiplied by 32 is obtained and is marked as
Figure BDA0003941743700000089
Thereby obtaining four multi-scale characteristics
Figure BDA00039417437000000810
Step 2.2: as shown in fig. 5, the decoder is composed of P convolutional blocks and an output block Conv, where the P convolutional blocks are connected densely at an upper triangle and are sequentially marked as
Figure BDA00039417437000000811
Wherein, decoder (i,j) Represents the convolution block of the ith column and jth row, and the convolution block Decoder of the ith column and jth row (i,j) The method comprises the following steps of (1) including a convolution layer with convolution kernel of NxN and a ReLU activation function; i = J = X-2,
Figure BDA00039417437000000812
the output block Conv comprises a convolution layer with convolution kernel a × a and a ReLU activation function.
Step 2.2.1: multiple X-1 multi-scale features to be output by an encoder
Figure BDA00039417437000000813
The row subscripts of the corresponding decoders are respectively marked as 1,2, \8230;, J; then will be
Figure BDA00039417437000000814
In turn mark as
Figure BDA00039417437000000815
Wherein the content of the first and second substances,
Figure BDA00039417437000000816
representing the multi-scale features of the jth output.
In the specific design, the material is selected,
Figure BDA00039417437000000817
in the description of the decoder it is noted
Figure BDA00039417437000000818
So as to correspond to the rows.
Step 2.2.2: decoder for 1 st column j row (1,j) Is inputted as
Figure BDA00039417437000000819
And up-sampling feature maps thereof
Figure BDA00039417437000000820
Decoder for 1 st column j row (1,j) Is characterized by (1,j)
Figure BDA00039417437000000821
Two times of upsampling and
Figure BDA00039417437000000822
merging and inputting Decoder (1,3) Decoder 13 Obtain a characteristic diagram I (1,3) At the same time as this, the first and second,
Figure BDA00039417437000000823
two times up sampling and
Figure BDA00039417437000000824
merging and inputting Decoder (1,2) Obtain a characteristic diagram I (1,2)
Figure BDA00039417437000000825
Two times up sampling and
Figure BDA00039417437000000826
merging input Decoder (1,1) Get the characteristic diagram I (1,1)
Step 2.2.3: convolution block Decoder for jth row of other columns except for first column (i,j) Is input as a jth multi-scale feature map
Figure BDA0003941743700000091
J +1 th row, I-1 th column decoder output characteristic I (i-1,j+1) Up-sampled feature map of
Figure BDA0003941743700000092
And from the ith row, the ith-1 column, the jth row, the ith-2 column to the jth rowSignature I of decoder output of 1 column (i-1,j) …I (1,j) Splicing the feature maps; decoder (i,j) Is a characteristic diagram I (i,j) . Thereby composed of
Figure BDA0003941743700000093
Obtaining characteristic I after P convolution blocks of decoder (I,1) (ii) a Characteristic I (I,1) Processing the output result by an output block Conv to obtain an output result O p
In this embodiment, the characteristic diagram I (1,3) Two times up sampling and
Figure BDA0003941743700000094
and I (1,2) Input Decoder after vector splicing (2,2) Obtain a characteristic diagram I (2,2)
Characteristic diagram I (1,2) Two times of upsampling and
Figure BDA0003941743700000095
and I (1,1) Input Decoder after vector splicing (2,1) Get the characteristic diagram I (2,1)
Characteristic diagram I (2,2) Two times up sampling and
Figure BDA0003941743700000096
I (1,1) and I (2,1) Input Decoder after vector splicing (3,1) Get the characteristic diagram I (3,1)
Wherein the Decoder (1,1) ,Decoder (1,2) ,Decoder (1,3) ,Decoder (2,1) ,Decoder (2,2) ,Decoder (3,1) The input and output channels of (96, 32), (192, 64), (384, 128), (128, 32), (256, 64), (160, 32).
Characteristic diagram I (3,1) The input/output block Conv obtains an output result O p
And 3, step 3: the total loss function L of the multi-scale self-encoder network is constructed using equation (1):
L=L ssim +λL pixel (1)
in the formula (1), λ represents a weight coefficient of pixel loss, L ssim Representing the structural similarity loss function and obtained from equation (2), L pixel A pixel loss function is expressed and obtained by formula (3);
L ssim =1-SSIM(I p ,O p ) (2)
Figure BDA0003941743700000097
in formula (2), SSIM represents structural similarity.
And 4, step 4: training the multi-scale self-encoder network by adopting a back propagation algorithm based on a training set, and calculating a total loss function L to adjust network parameters until the maximum iteration number is reached, thereby obtaining the trained multi-scale self-encoder network;
and 5: obtaining B pairs of multi-exposure images and converting the images into Ycbcr color gamut, and then only keeping the image pair of Y channel, thereby obtaining preprocessed B pairs of multi-exposure images { (I) o1 ,I u1 ),(I o2 ,I u2 ),…,(I ob ,I ub ),…,(I oB ,I uB ) Wherein (I) ob ,I ub ) Denotes the b-th pair of multiple exposure images, I ob Representing the overexposed image of the b-th Y channel, I ub An underexposed image representing the b-th Y-channel;
step 6: multiple exposure image pair { (I) o1 ,I u1 ),(I o2 ,I u2 ),…,(I ob ,I ub ),…,(I oB ,I uB ) Inputting the data into a trained multi-scale encoder for processing to obtain overexposure image characteristics (Io) of S scales f1 ,Io f2 ,…,Io fs ,…,Io fS And underexposed image features { Iu } f1 ,Iu f2 ,…,Iu fs ,…,Iu fS In which, io fs Representing the s-th overexposed image feature, iu fs Representing the s-th underexposed image feature.
The s-th overexposed image characteristic Io fs And the s < th > chipExposed image feature Iu fs Adding the obtained sums and averaging to obtain the s-th fusion feature f s To obtain a fused feature set { f } 1 ,f 2 ,…,f s ,…,f S Is input into the trained decoder, so as to obtain a fused result { Output } 1 ,Output 2 ,…,Output b ,…,Output B H, output therein b Overexposed image I representing the b-th Y channel ob Underexposed image I with the b-th Y channel ub The fusion process of (3) is shown in FIG. 3.
Will { Output } 1 ,Output 2 ,…,Output b ,…,Output B The color image is converted to an RGB domain through an Ycbcr domain, and finally, a color image with uniform exposure { Result }is obtained 1 ,Result 2 ,…,Result b ,…,Result B Where Result b Representing the b-th color image result.
In this embodiment, an electronic device includes a memory for storing a program that supports a processor to execute the above-described multi-exposure image fusion method, and a processor configured to execute the program stored in the memory.
In this embodiment, a computer-readable storage medium stores a computer program, and the computer program is executed by a processor to execute the steps of the multi-exposure image fusion method.

Claims (3)

1. A multi-exposure image fusion method based on a multi-scale self-encoder is characterized by comprising the following steps:
step 1: obtaining P RGB natural images and converting the images into gray level images, and marking as { I 1 ,I 2 ,…,I p ,…,I P And as a training set, wherein I p Representing a pth grayscale image;
step 2, constructing a multi-scale self-encoder network, comprising: a multi-scale encoder and decoder;
step 2.1: the multi-scale encoder comprises W convolution blocks A 1 ,A 2 ,…,A w ,…,A W X convolution blocks N 1 ,N 2 ,…,N x ,…,N X And Y vision converter Trans 1 ,Trans 2 ,…,Trans y ,…,Trans Y Wherein A is w Represents the w-th convolution block, and the w-th convolution block A w The method comprises the following steps of (1) including a convolution layer with convolution kernel of A multiplied by A and a ReLU activation function; n is a radical of x Represents the xth volume block; and the xth convolution block N x The method comprises the following steps of (1) including a convolution layer with convolution kernel of NxN and a ReLU activation function; trans y Represents the y-th vision converter; y = W-2;
step 2.1.1: the p-th gray image I p Input into the multi-scale encoder and sequentially pass through the 1 st convolution block A 1 And the 1 st convolution block N 1 After the treatment, obtaining a primary shallow layer characteristic diagram I N1 Then sequentially passes through Y vision converters Trans 1 ,Trans 2 ,…,Trans y ,…,Trans Y After the treatment, Y primary deep characteristic maps are correspondingly obtained
Figure FDA0003941743690000011
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003941743690000012
representing the y primary deep profile;
step 2.1.2: will be provided with
Figure FDA0003941743690000013
Input the 2 nd convolution block A 2 Processing to obtain shallow layer characteristic diagram with channel number of C
Figure FDA0003941743690000014
Will be provided with
Figure FDA0003941743690000015
Sequentially corresponding to input W-2 convolution blocks A 3 ,…,A w ,…,A W Processing to obtain deep layer characteristics with channel number CDrawing (A)
Figure FDA0003941743690000016
Wherein the content of the first and second substances,
Figure FDA0003941743690000017
represents the w-2 deep characteristic diagram; from shallow feature maps
Figure FDA0003941743690000018
And deep level feature map
Figure FDA0003941743690000019
Form W-1 comprehensive characteristic graphs
Figure FDA00039417436900000110
Step 2.1.3: w-1 th comprehensive characteristic diagram
Figure FDA00039417436900000111
Through the Xth convolution block N X After the treatment, obtaining an X-1 th multi-scale characteristic diagram
Figure FDA00039417436900000112
Step 2.1.4: for the W-1 comprehensive characteristic diagram
Figure FDA00039417436900000113
Obtaining the W-2 th up-sampling characteristic after up-sampling
Figure FDA00039417436900000114
The up-sampling feature
Figure FDA00039417436900000115
And then with the W-2 comprehensive characteristic diagram
Figure FDA00039417436900000116
Carry out element-by-element additionObtaining a W-2 intermediate characteristic diagram after the method operation
Figure FDA00039417436900000117
The W-2 intermediate feature map
Figure FDA00039417436900000118
Through the X-1 th convolution block N X-1 Then obtaining the X-2 characteristics
Figure FDA0003941743690000021
Step 2.1.5: to pair
Figure FDA0003941743690000022
Obtaining W-3 th up-sampling characteristic after up-sampling
Figure FDA0003941743690000023
Figure FDA0003941743690000024
Combined with the W-3 th feature
Figure FDA0003941743690000025
Performing element-by-element addition to obtain W-3 intermediate characteristic diagram
Figure FDA0003941743690000026
The W-3 intermediate feature map
Figure FDA0003941743690000027
Through the X-2 th convolution block N X-2 Then obtaining the X-3 characteristic
Figure FDA0003941743690000028
Step 2.1.6: according to the process of step 2.1.5
Figure FDA0003941743690000029
After processing, X-1 multi-scale characteristic graphs are obtained
Figure FDA00039417436900000210
Figure FDA00039417436900000211
Representing an x-1 th multi-scale feature map;
step 2.2: the decoder consists of P convolution blocks and an output block Conv, wherein the P convolution blocks are densely connected in an upper triangle mode and are sequentially marked as
Figure FDA00039417436900000212
Wherein, decoder (i,j) A convolution block which represents the ith column and the jth row and is a Decoder (i,j) The method comprises the following steps of (1) including a convolution layer with convolution kernel of NxN and a ReLU activation function; i = J = X-2,
Figure FDA00039417436900000213
the output block Conv comprises a convolution layer with convolution kernel AxA and a ReLU activation function;
step 2.2.1: multiple scale features of X-1 output from an encoder
Figure FDA00039417436900000214
The row subscripts of the corresponding decoders are respectively marked as 1,2, \8230;, J; then will be
Figure FDA00039417436900000215
In turn mark as
Figure FDA00039417436900000216
Wherein the content of the first and second substances,
Figure FDA00039417436900000217
a multi-scale feature representing a jth output;
step 2.2.2: decoder for 1 st column j row (1,j) Is inputted as
Figure FDA00039417436900000218
And up-sampling feature maps thereof
Figure FDA00039417436900000219
Decoder for 1 st column j row (1,j) Is characteristic I (1,j)
Step 2.2.3: convolution block Decoder for jth row of other columns except for first column (i,j) Is input as a jth multi-scale feature map
Figure FDA00039417436900000220
J +1 th row, I-1 th column decoder output characteristic I (i-1,j+1) Up-sampled feature map of
Figure FDA00039417436900000221
And a characteristic diagram I output by a decoder of the ith-1 column of the jth row and the ith-2 column of the jth row to the 1 st column of the jth row (i-1,j) …I (1,j) Splicing the characteristic graphs; volume block Decoder (i,j) Is a characteristic diagram I (i,j) Thereby is composed of
Figure FDA0003941743690000031
Obtaining characteristic I after P convolution blocks of decoder (I,1) (ii) a Said feature I (I,1) Processing the output result by an output block Conv to obtain an output result O p
And step 3: the overall loss function L of the multi-scale self-encoder network is constructed using equation (1):
L=L ssim +λL pixel (1)
in the formula (1), λ represents a weight coefficient of pixel loss, L ssim Representing the structural similarity loss function and obtained from equation (2), L pixel A pixel loss function is expressed and obtained by formula (3);
L ssim =1-SSIM(I p ,O p ) (2)
Figure FDA0003941743690000032
in formula (2), SSIM represents structural similarity;
and 4, step 4: based on the training set, training the multi-scale self-encoder network by adopting a back propagation algorithm, and calculating the total loss function L to adjust network parameters until the maximum iteration number is reached, thereby obtaining the trained multi-scale self-encoder network;
and 5: obtaining B pairs of multi-exposure images and converting the images into Ycbcr color gamut, and then only keeping the image pair of Y channel, thereby obtaining preprocessed B pairs of multi-exposure images { (I) o1 ,I u1 ),(I o2 ,I u2 ),…,(I ob ,I ub ),…,(I oB ,I uB ) In which (I) ob ,I ub ) Denotes the b-th pair of multiple exposure images, I ob Representing the overexposed image of the b-th Y channel, I ub An underexposed image representing the b-th Y-channel;
and 6: multiple exposure image pair { (I) o1 ,I u1 ),(I o2 ,I u2 ),…,(I ob ,I ub ),…,(I oB ,I uB ) Inputting the data into a trained multi-scale encoder for processing to obtain overexposure image characteristics (Io) of S scales f1 ,Io f2 ,…,Io fs ,…,Io fS And underexposed image features { Iu } f1 ,Iu f2 ,…,Iu fs ,…,Iu fS In which Io fs Representing the characteristic of the s-th overexposed image, iu fs Representing the feature of the s < th > underexposed image;
the s-th overexposed image characteristic Io fs And the s-th underexposed image feature Iu fs Adding the obtained sums and averaging to obtain the s-th fusion feature f s To obtain a fused feature set { f } 1 ,f 2 ,…,f s ,…,f S And input into the trained decoder, thereby obtaining the convergenceResultant result { Output } 1 ,Output 2 ,…,Output b ,…,Output B Wherein, output b Overexposed image I representing the b-th Y channel ob Underexposed image I of the b-th Y channel ub The fusion result of (1);
will { Output } 1 ,Output 2 ,…,Output b ,…,Output B Converting the image into RGB domain through Ycbcr domain, finally obtaining color image { Result } with uniform exposure 1 ,Result 2 ,…,Result b ,…,Result B In which Result b Representing the b-th color image result.
2. An electronic device comprising a memory for storing a program that enables the processor to perform the multi-exposure image fusion method of claim 1 and a processor configured to execute the program stored in the memory.
3. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the multi-exposure image fusion method of claim 1.
CN202211424921.1A 2022-11-14 2022-11-14 Multi-exposure image fusion method based on multi-scale self-encoder Pending CN115689962A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211424921.1A CN115689962A (en) 2022-11-14 2022-11-14 Multi-exposure image fusion method based on multi-scale self-encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211424921.1A CN115689962A (en) 2022-11-14 2022-11-14 Multi-exposure image fusion method based on multi-scale self-encoder

Publications (1)

Publication Number Publication Date
CN115689962A true CN115689962A (en) 2023-02-03

Family

ID=85051690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211424921.1A Pending CN115689962A (en) 2022-11-14 2022-11-14 Multi-exposure image fusion method based on multi-scale self-encoder

Country Status (1)

Country Link
CN (1) CN115689962A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117115926A (en) * 2023-10-25 2023-11-24 天津大树智能科技有限公司 Human body action standard judging method and device based on real-time image processing
CN117173525A (en) * 2023-09-05 2023-12-05 北京交通大学 Universal multi-mode image fusion method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173525A (en) * 2023-09-05 2023-12-05 北京交通大学 Universal multi-mode image fusion method and device
CN117115926A (en) * 2023-10-25 2023-11-24 天津大树智能科技有限公司 Human body action standard judging method and device based on real-time image processing
CN117115926B (en) * 2023-10-25 2024-02-06 天津大树智能科技有限公司 Human body action standard judging method and device based on real-time image processing

Similar Documents

Publication Publication Date Title
Anwar et al. Diving deeper into underwater image enhancement: A survey
Yang et al. Underwater image enhancement based on conditional generative adversarial network
Wang et al. An experiment-based review of low-light image enhancement methods
Rana et al. Deep tone mapping operator for high dynamic range images
Liu et al. HoLoCo: Holistic and local contrastive learning network for multi-exposure image fusion
CN111242883B (en) Dynamic scene HDR reconstruction method based on deep learning
Zamir et al. Learning digital camera pipeline for extreme low-light imaging
Zhu et al. Stacked U-shape networks with channel-wise attention for image super-resolution
Yan et al. High dynamic range imaging via gradient-aware context aggregation network
US20220076459A1 (en) Image optimization method, apparatus, device and storage medium
Li et al. Hdrnet: Single-image-based hdr reconstruction using channel attention cnn
Lv et al. BacklitNet: A dataset and network for backlit image enhancement
Lv et al. Low-light image enhancement via deep Retinex decomposition and bilateral learning
CN115689962A (en) Multi-exposure image fusion method based on multi-scale self-encoder
Chen et al. End-to-end single image enhancement based on a dual network cascade model
Yang et al. Low‐light image enhancement based on Retinex decomposition and adaptive gamma correction
Zhang et al. Multi-branch and progressive network for low-light image enhancement
Li et al. Low-light hyperspectral image enhancement
Wang et al. Low-light image enhancement by deep learning network for improved illumination map
Chen et al. Improving dynamic hdr imaging with fusion transformer
Li et al. AMBCR: Low‐light image enhancement via attention guided multi‐branch construction and Retinex theory
Liu et al. Non-homogeneous haze data synthesis based real-world image dehazing with enhancement-and-restoration fused CNNs
CN104123707B (en) Local rank priori based single-image super-resolution reconstruction method
Cao et al. A deep thermal-guided approach for effective low-light visible image enhancement
Zhang et al. Invertible network for unpaired low-light image enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination