CN111242883A

CN111242883A - Dynamic scene HDR reconstruction method based on deep learning

Info

Publication number: CN111242883A
Application number: CN202010026179.3A
Authority: CN
Inventors: 何刚; 卢星星; 宋嘉轩; 李云松; 谢卫莹
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-01-10
Filing date: 2020-01-10
Publication date: 2020-06-05
Anticipated expiration: 2040-01-10
Also published as: CN111242883B

Abstract

The invention discloses a dynamic scene HDR reconstruction method based on deep learning, which solves the problem that the image processing effect needs to be improved in the prior art. The method comprises the following steps of acquiring three images of underexposure, normal exposure and overexposure in the same static scene by using a fixed camera; in a dynamic scene, the three images are acquired by a handheld camera and are marked as D1, D2 and D3; registering D1, S2 and D3 by using an LK optical flow method, and recording image sequences as a training set of pairs consisting of R1, R2 and R3 and the group Truth obtained in the step 1; transforming R1, R2 and R3 into linear domains, denoted as H1, H2 and H3, using the corresponding curves of the camera; extracting brightness information in H1, H2 and H3 images by using a contrast operator, and recording the brightness information as M1, M2 and M3; extracting detail information in the images of R1, R2 and R3 by using a gradient operator, and recording the detail information as L1, L2 and L3; the orientation module based on Resnet is designed. The HDR image generated by the technology has rich details, high contrast and wide color gamut and high dynamic range.

Description

Dynamic scene HDR reconstruction method based on deep learning

Technical Field

The invention relates to the field of digital video and computer photographic image processing, in particular to a dynamic scene HDR reconstruction method based on deep learning.

Background

Dynamic range refers to the ratio of the luminance maximum to the luminance minimum in a scene. In a real scene, the dynamic range can reach 10^8 from the brightest sunlight to the darkest starlight, and the brightness range which can be distinguished by human eyes is also as high as 10^ 5. However, the dynamic range captured by the common sensor does not exceed 10^3, and the dynamic range of the display is only 10^ 2. Due to the fact that the dynamic ranges of a real scene and a common digital device are not matched with each other, the problems of overexposure, underexposure, loss of detail information and the like of an image captured by an imaging device usually occur. In practical applications, HDR images are difficult to obtain, reducing the effectiveness of the images in applications such as digital television, computer photography, and game rendering. Therefore, the method has stronger practical significance for the research of the HDR image reconstruction algorithm.

Currently, HDR image reconstruction algorithms are mainly divided into two directions of traditional image fusion and deep learning. The method for acquiring the HDR image through the traditional image fusion mainly comprises a direct fusion method, a block fusion method and a layered fusion method. The direct fusion method mainly comprises the following steps of converting High Dynamic Range radiation Maps from photos, providing that the brightness values and the exposure time of images with different exposure degrees are related to the illuminance of pixel points at corresponding positions, establishing a corresponding curve model of the camera according to the brightness values and the exposure time, solving a response function of the camera, and obtaining the illuminance value of a real scene through inverse operation. And fusing the plurality of images into a high dynamic range image after obtaining the illumination numerical value of the real scene, and finally displaying the high dynamic range image on a common display screen through tone mapping. The method is computationally complex and the resulting high dynamic range image cannot be directly displayed on a common display screen. The fusion method based on the region is to divide blocks along with the image and adopt the information entropy theory. And selecting the blocks containing large amount of information for fusion. However, the method has poor effect on processing the block boundary, and the fused image is easy to generate obvious block effect. The method based on hierarchical Fusion mainly comprises Exposure Fusion, the thesis provides a method based on Laplace pyramid Fusion, a plurality of multi-Exposure images are subjected to scale decomposition, then three evaluation indexes of contrast, saturation and Exposure are synthesized to obtain a weight graph of each image, a comprehensive pyramid coefficient is obtained after the weight graph is subjected to average weighting, and finally the Laplace pyramid is reconstructed to obtain a Fusion image. This method is currently the most efficient fusion method. However, this method has a certain disadvantage that the detail information of the image is seriously lost in the too bright and too dark areas.

The method for acquiring the high dynamic range image based on Deep learning mainly comprises the following methods, the paper HDR image reconstruction From a Single exposed use Deep CNNs proposes to use an automatic encoder to take a Single LDR image as input, firstly perform down-sampling to extract features, and then perform up-sampling to reconstruct the HDR image. A paper debug Patch-Based HDR Reconnection of Dynamic Scenes utilizes a group of images with different exposures, including N overexposed images, N underexposed images and a normally exposed image, the method proposed by the paper firstly utilizes a camera response function to adjust the overexposed and underexposed images to ensure that the overexposed and underexposed images have the same exposure amount as the normally exposed image, then utilizes MBDS (scanning visual data using bidirectional exposure) to select two images which are closest to the content of the normally exposed image from the underexposed and overexposed images after the exposure amount is changed, and converts the normally exposed image and the two selected images into 10 bits to be fused, thereby obtaining a final high Dynamic range image. The image contrast obtained by the algorithm is obviously improved, the details of highlight and dark parts are effectively enhanced, but when micro-motion exists in a multi-frame image or a camera shakes, the high dynamic range image obtained by the algorithm has a ghost phenomenon.

Disclosure of Invention

The invention overcomes the problem that the image processing effect needs to be improved in the prior art, and provides the dynamic scene HDR reconstruction method based on deep learning, which is rich in details and high in definition.

The technical solution of the present invention is to provide a dynamic scene HDR reconstruction method based on deep learning, which comprises the following steps: comprises the following steps of (a) carrying out,

step 1: in the same static scene, three images of underexposure, normal exposure and overexposure with the same details and range are obtained by a tripod camera and recorded as S1, S2 and S3, the exposure time of the corresponding images is recorded, and the images are fused by a weighted fusion algorithm to obtain a Ground Truth which is recorded as T;

step 2, in a dynamic scene, acquiring three images of underexposure, normal exposure and overexposure by using a handheld camera, recording the three images as D1, D2 and D3, and replacing the D2 with the image S2 obtained in the step 1;

and step 3: registering D1, S2 and D3 by using an LK optical flow method, recording the registered image sequences as R1, R2 and R3, and forming a paired training set by the group Truth obtained in the step 1;

and 4, step 4: transforming R1, R2 and R3 into a linear domain using the camera corresponding curves, recording the transformed images as H1, H2 and H3;

and 5: extracting brightness information in H1, H2 and H3 images by using a contrast operator, and recording the obtained brightness images as M1, M2 and M3;

step 6: extracting detail information in the R1, R2 and R3 images by using a gradient operator, and recording the obtained detail images as L1, L2 and L3;

and 7: designing an Attention module based on Resnet;

and 8: constructing an HDR reconstruction network based on U-Net and ResNet, and designing a mixed structure loss function;

and step 9: merging the channels of the images R1, R2 and R3 obtained in the step 3 and the channels of the images H1, H2 and H3 obtained in the step 4 as the input of the step 8, merging the channels of the images M1, M2 and M3 obtained in the step 5 and the channels of the images L1, L2 and L3 obtained in the step 6 as the input of the Attention module constructed in the step 7, and taking the image T obtained in the step 1 as a label to train the network;

step 10: inputting the test image into the trained reconstruction network for the network model trained in the step 9 to obtain an HDR image;

step 11: and carrying out tone mapping on the generated HDR image by utilizing a Reinhard tone mapping algorithm, and displaying a reconstructed image on an 8bit display screen.

Preferably, the specific steps of step 1 are:

step 1-1: exposure adjustment was performed on the resulting images S1, S2, and S3, which are denoted as L1, L2, and L3 as shown in the following formula:

step 1-2: and (3) fusing the L1, the L2 and the L3 obtained in the step 1-1 according to a simple fusion algorithm to generate an HDR image as a group Truth, wherein a specific formula is as follows:

preferably, the specific steps of step 3 are:

step 3-1: exposure adjustment is carried out on the three images D1, S2 and D3 obtained in the step 2, the exposure amount of S2 is adjusted to be the same as the exposure amount of D1 by using the exposure response curve of the camera, which is recorded as D2-1, and the exposure response curve of the camera is Ev-f (Bv, Sv), wherein E is_VThe exposure of the image is determined by the exposure F and the exposure time T of the camera, and the calculation method is shown as the following formula:

the exposure of the camera is determined by the focal length f and the aperture diameter D of the camera, and the calculation method is shown as the following formula:

B_vis the brightness value of the image, i.e. the pixel value, S_vIs the ISO photosensitivity coefficient of the camera, which is a constant, where the value is 100;

step 3-2: detecting characteristic points in D1 and D2-1 by using a Harris corner detection method;

step 3-3: calculating an optical flow vector between D1 and D2-1 by using an LK optical flow method;

step 3-4: aligning D1 and D2-1 by using a bicubic interpolation method and the optical flow vector obtained in the step 3-3;

step 3-5: repeating the step 3-1, adjusting the exposure amount of S2 to be the same as D3 by using the camera response curve, and recording as D2-3;

step 3-6: repeat steps 3-2, 3-3 and 3-4 above to align D3 with D2-3.

Preferably, the step 4 uses a gamma curve to convert the image from a linear domain to a non-linear domain, as shown in the following formula: f ═ x^γWhere γ is 2, x is an LDR image, and f is an HDR domain image obtained after transformation.

Preferably, in the step 5, the luminance information of the images H1, H2 and H3 obtained in the step 4 is extracted by using a contrast operator, which is specifically expressed as follows:

preferably, the step 6 uses a gradient operator to extract detail information of the images R1, R2 and R3 obtained in the step 3, as shown in the following formula:

preferably, the specific steps of step 7 are:

step 7-1: combining the brightness characteristic maps M1, M2 and M3 obtained in the step 5 and the detail characteristic maps L1, L2 and L3 obtained in the step 6 as the input of the Attention module;

step 7-2: constructing an Attention module, and passing the input obtained in the step 7-1 through a Resnet module;

and 7-3: the output obtained in the step 7-2 passes through three layers of convolution layers, the size of a convolution kernel is 3 x 3, the convolution step size is 2, the used activation function is Relu, and the expression of the activation function is as follows: max (0, x);

and 7-4: and (4) outputting the characteristic diagram obtained in the step (7-3) as f _ A through a Sigmoid activation function, wherein the output characteristic diagram is specifically shown as the following formula:

preferably, the specific steps of step 8 are:

step 8-1: constructing an encoding network, namely a down-sampling network, wherein the network consists of four layers of convolution blocks, and the structure of each convolution block comprises a convolution layer, a batch normalization BN layer and an activation function Relu layer;

step 8-2: merging, namely concat, the images H1, H2 and H3 obtained in the step 4 and the corresponding channels of the images R1, R2 and R3 obtained in the step 3 respectively as the input of a coding network, merging the output channels of the three encoders after down-sampling of two layers of rolling blocks respectively after 3 groups of images, and recording the output characteristic image as f _ U;

step 8-3: and performing dot multiplication on the output characteristic diagram of the step 8-2 and the output characteristic diagram of the step 7-4, wherein the output characteristic diagram is recorded as F: f — a · F _ U;

step 8-4: adding the output characteristic diagram obtained in the step 8-3 and the output characteristic diagram obtained in the step 8-2, and recording the obtained characteristic diagram as F _ R, wherein F _ R is F + F _ u;

and 8-5: constructing a fusion network, wherein the network consists of a residual block, and the input is the output characteristic diagram obtained in the step 8-4;

and 8-6: constructing a decoding network, namely an up-sampling network, wherein the network consists of four layers of convolution blocks and is symmetrical to a coding network, each convolution block has a BN layer, a Relu layer and a deconvolution layer, and jump connection is established between corresponding layers with the same size as the image size of the coding network;

and 8-7: the loss function of the network consists of two parts, including MSE loss and VGG loss, as follows: MSE loss calculation is carried out on an image obtained by carrying out tone mapping on a HDR image generated by a network

And the mean square error between the groups Truth obtained in the step 1 is as follows:

the perceptual loss function is a VGG-16 network pre-trained on the ImageNet dataset and is denoted as φ, and is calculated as follows:

and 8-8: and (4) training the network by using the network input and the label obtained in the step (9) to complete the reconstruction process of the HDR.

Preferably, step 11 performs tone mapping on the test image obtained in step 10 by using a Reinhard tone mapping algorithm.

Compared with the prior art, the dynamic scene HDR reconstruction method based on deep learning has the following advantages: the method can be used for reconstructing a Low Dynamic Range (LDR) image containing a small moving object to obtain a High Dynamic Range (HDR) image, firstly, the images are registered by using an optical flow method for ghost and halo phenomena occurring when a traditional method based on image fusion and the existing method based on deep learning is used for processing a dynamic scene, meanwhile, brightness and detail information in the low dynamic range image are extracted, a deep learning model based on U-Net and ResNet is constructed, and the extracted brightness and detail information are used for assisting in training the model, so that the HDR image generated after multi-exposure image fusion contains rich details and higher contrast. Aiming at the problems of highlight and dark part detail loss of the existing HDR reconstruction algorithm, a mixed structure loss function is designed to ensure detail reconstruction, so that the aim of the invention is fulfilled.

The HDR reconstruction algorithm is combined with a dynamic scene and a static scene to produce an available data set and a real HDR image on the premise of not depending on hardware equipment; designing a CNN network based on a U-ResNet frame, and fusing multiple frames of LDR images with different exposures by utilizing deep learning to reconstruct an HDR image; and meanwhile, designing an attention module, extracting the detail and brightness information of the LDR image through a traditional image algorithm, and taking the detail and brightness information as the input of the attention module to assist in training the reconstruction network. The detail and brightness information of the image are improved through a designed algorithm, and meanwhile, the dynamic range of the image is expanded. The method can process images with larger motion in a scene and images with more saturated areas, and the generated HDR images have rich details, high contrast, wide color gamut and high dynamic range.

Drawings

FIG. 1 is a schematic diagram of the network architecture of the present invention;

FIG. 2 is a schematic diagram of a network structure of the Attention module in the present invention;

FIG. 3 is an LDR image with an exposure of-2 EV according to a first simulation test of the present invention;

FIG. 4 is an LDR image with an exposure of 0EV according to a first simulation test of the present invention;

FIG. 5 is an LDR image with an exposure of +2EV according to a first simulation test of the present invention;

FIG. 6 is a HDR image tone mapped using Deep-HDR method according to the first simulation test of the present invention;

FIG. 7 is a tone mapped HDR image using the Expand-HDR method of the present invention simulation test one;

FIG. 8 is a tone mapped HDR image using the Sen method in simulation test one of the present invention;

FIG. 9 is a tone mapped HDR image using the method of the present invention in a simulation test of the present invention;

FIG. 10 is an image of a group Truth in simulation test one of the present invention;

FIG. 11 is an LDR image with an exposure of-2 EV according to a second simulation test of the present invention;

FIG. 12 is an LDR image with an exposure of 0EV according to a second simulation test of the present invention;

FIG. 13 is an LDR image with an exposure of +2EV according to a second simulation test of the present invention;

FIG. 14 is a HDR image tone mapped using Deep-HDR method according to the first simulation test of the present invention;

FIG. 15 is a tone mapped HDR image using the Expand-HDR method of the present invention simulation test one;

FIG. 16 is a tone mapped HDR image using the Sen method in simulation test one of the present invention;

FIG. 17 is a tone mapped HDR image using the method of the present invention in a simulation test of the present invention.

FIG. 18 is an image of a group Truth in simulation test one of the present invention;

wherein Deep-hdr refers to the method proposed in the paper Deep high dynamic range imaging with large for the estimated movements;

the Expand-hdr refers to the method proposed in the article Expand connected network for high dynamic range expansion from low dynamic range content computer graphics Forum;

sen refers to the method proposed in the paper Robust batch-based hdr recovery of dynamic scenes;

our refers to the methods set forth herein;

the group Truth refers to a real image.

Detailed Description

The deep learning based dynamic scene HDR reconstruction method of the present invention is further described with reference to the accompanying drawings and the detailed description below: as shown in the figure, the present embodiment includes the following steps,

and 7: designing an Attention module based on Resnet;

The specific steps of the step 1 are as follows:

the specific steps of the step 3 are as follows:

step 3-1: exposure adjustment is performed on the three images D1, S2, and D3 obtained in step 2, and a camera is usedThe exposure response curve of (a) adjusts the exposure amount of S2 to be the same as the exposure amount of D1, and is denoted as D2-1, and the exposure response curve of the camera is Ev-f (Bv, Sv), where E is_VThe exposure of the image is determined by the exposure F and the exposure time T of the camera, and the calculation method is shown as the following formula:

step 3-6: repeat steps 3-2, 3-3 and 3-4 above to align D3 with D2-3.

Step 4, converting the image from a linear domain to a non-linear domain by using a gamma curve, as shown in the following formula: f ═ x^γWhere γ is 2, x is an LDR image, and f is an HDR domain image obtained after transformation.

In the step 5, the contrast operator is used to extract the brightness information of the images H1, H2, and H3 obtained in the step 4, which is specifically shown as follows:

the step 6 utilizes a gradient operator to extract detail information of the images R1, R2 and R3 obtained in the step 3, and the following formula is shown as follows:

the specific steps of the step 7 are as follows:

the specific steps of the step 8 are as follows:

And 11, performing tone mapping on the test image obtained in the step 10 by adopting a Reinhard tone mapping algorithm.

The effect of the present invention is further described below with the simulation experiment:

1. simulation experiment conditions are as follows:

the hardware environment of the invention simulation is as follows: intel Core (TM) i5-4570 CPU @3.20GHz x 8, GPU NVIDIAGeForce GTX 10808G run memory; software environment: ubuntu16.04, python 3.6; experiment framework: tensorflow.

2. Simulation and instance content and result analysis

The invention selects a test set in a public HDR data set as an experimental sample, inputs the experimental sample into a trained network for experiment, obtains the contrast effect of the image after a tone mapping algorithm, inputs the image into LDR images with different exposures in three dynamic scenes, and respectively outputs the exposure as-2 EV, 0EV and +2EV, if a first group of images are shown in figures 3-5 and a second group of images are shown in figures 11-13, the HDR image after tone mapping is output:

the green boxes in the two sets of comparison result figures indicate that our algorithm recovers better than the existing algorithm at the details of the highlights and the darks. As can be seen from the first set of comparison results shown in FIGS. 6-10, the spill-HDR, Expand-HDR and Sen methods all have the overflow phenomenon in the marked highlight region, and the details of the highlight region are completely recovered by our algorithm, so as to achieve the same effect as the real image. From the second set of comparison results, fig. 14-18, it can be seen that the existing algorithm is wrong in the recovery at highlights, especially the Sen method, and the details of the recovery of our method are closest to the real scene. Meanwhile, compared with the existing algorithm, the objective index PSNR is averagely improved by 0.1 dB.

References in the practice of the invention:

[1]Wu,Shangzhe,Xu,Jiarui,Tai,Yu-Wing,&Tang,Chi-Keung..Deep highdynamic range imaging with large foreground motions.

[2]Marnerides,D.,Bashford-Rogers,T.,Hatchett,J.,&Debattista,K..Expandnet:a deep convolutional neural network for high dynamic rangeexpansion from low dynamic range content.Computer Graphics Forum,37(2),37-49.

[3]Sen,P.,Kalantari,N.K.,Yaesoubi,M.,Darabi,S.,&Shechtman,E..(2012).Robust patch-based hdr reconstruction of dynamic scenes.ACM Transactions onGraphics,31(6).

Claims

1. a dynamic scene HDR reconstruction method based on deep learning is characterized in that: comprises the following steps of (a) carrying out,

and 7: designing an Attention module based on Resnet;

2. The deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: the specific steps of the step 1 are as follows:

3. the deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: the specific steps of the step 3 are as follows:

step 3-6: repeat steps 3-2, 3-3 and 3-4 above to align D3 with D2-3.

4. The deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: step 4, converting the image from a linear domain to a non-linear domain by using a gamma curve, as shown in the following formula: f ═ x^γWhere γ is 2, x is an LDR image, and f is an HDR domain image obtained after transformation.

5. The deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: in the step 5, the contrast operator is used to extract the brightness information of the images H1, H2, and H3 obtained in the step 4, which is specifically shown as follows:

6. the method of claim 1 for deep learning based HDR reconstruction of dynamic scenes, wherein: the step 6 utilizes a gradient operator to extract detail information of the images R1, R2 and R3 obtained in the step 3, and the following formula is shown as follows:

7. the deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: the specific steps of the step 7 are as follows:

8. the deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: the specific steps of the step 8 are as follows:

And the mean square error between the group Truth obtained in the step 1 is as follows:

9. The deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: and 11, performing tone mapping on the test image obtained in the step 10 by adopting a Reinhard tone mapping algorithm.